ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Introduction to Data Pipeline at Kakao
Table of Contents
1. What is a Data Pipeline?
2. Architecture
3. Future Work
What is a Data Pipeline?
+ A??? ?????
IT?
Ryan? ??? 7
?? ??(A)? ???.
+A??? ?? ???? ?
?? 7, ??, ????.
+A??? ???? ???? ?:
? ??? 6:4?.
Definition
???(OLAP, Data Warehouse)???(OLTP, Data Hub)
Ryan? ?? ? ????
A
Ryan?
??? ?? ? ???? ??? ??????
IT
Ryan? ??? ?? ? ????? ?? ??
?? ?????? ??? 7, ??, ??, ??,
V20
A? ? ????? ??? ????
30? 90%
IT ????? ??? ? ?? ?? ?? ????
A??.
Ryan? ?? ? ??? ??? ??
?? B
Ryan? ?? ??? ? ?? ?????
?? ?? ???? C
Ryan? ???? ?? ?? ??? D
A? ? ???? ?? ?? ?? ???
B, C
??? 7? ?? ?? ?????? N?, Ryan.
A??? ? ??
? ?? ????? ??? ?????? N?,
Ryan.
Usecase ???
- ???? ??? event??
- ?? ?? 40~50? ?? ??? event??.
- 10? ?? Data pipeline? output? ??.
- ??? ??? ?? ??? ?? ???(meta)? ????? ??, ?? ??, ??? event? join.
- Event?? join??? ?? meta?? 2~3?(Avg).
- ?? ????? ??? ?? ?? ??? ??? ?? ??? 70??.
- ?? data source? ??? ? ?? ??? ??? ??, scalable? ??? ??.
- ?? ????? ???? event? ?? ??? ????? ??, 50? x 30 = 1500??.
- ?? ?? ???? ?? ?? ?? ????? ??? 500??.
- 6???? ????? ???
- ???? ????? ???? ??? data? ??? storage? ????? ??? storage?
???? ??? ???.
- ????? ??? data? ?? ???? unified, scalable? ??? API? ???? ?.
- ?? ?? ??? ??? ??? ??? ??? ?? ??.
Technical Challenges
- Open source? ??? ???? ? ??? ?? ????.
- ?????? ??? ??? ?? ???, ? ???? ????? ??? ????? ???
????? ??.
- ??? ???? ????, ?? ???? ???? iteration? ??? ?? ?. (?? << ???).
- ?? ???? ?? ????.
- ??? ?? ? ?? ??
- ??? ?? Endpoint?? ? ???, ??.
- Data? ??, ??, API? ?? ??? ??.
- ?? ???? ????? ???, ?? ??? ??? ??? ??? ? ??? ??? ???
??? ???.
- ??? ??? ???? ??? ??? ??.
- ???? ???? ????? ???
- ??? ????? ?? ?? ?? -> ??? ?? -> AB??? -> ???? -> ?? ?? ???
???? Iteration.
- Iteration ???? ??? ???? ?????? ?? ??? ??? ?
- Data >> Opinion
Non-Technical Challenges
- ? ??? ???? ??
- ???? ???? ???? ????
- ??? platform ??? ??? ? ??? ??(?? ?????? ??? ??, ??? ??? ?)
- ?? ??? ???
- ????? ??? ???? ?????
- ????? ? ???? ??? ? ?? Input/Outpu format? ?????
- Event?? ?? data?? join???? ?? view? ?? ? ? ??? ?? ?? ??.
- ??? ?? ??? ???!
- ???? ??.
- ?? ???? ?? ???? ???? ??? ??. ?? ??? ?? ??????
- ??? ? ?? Data Product/API? ???? ??? ??.
- ???? ? ????
- ??? ???? ??? ? ??????
- ???? ?? ?????
- ??? raw data? ?????.
- ????? operation? ?? ?? ??? ??? ???? ?? ???, ??? ?? ????
??.
- ??? ?? kafka????, spark ????, ? ??? ????? ??.
Personal Note.
[236] ????????????????????????? ????????
Architecture
?? ??
?? ?
API??
1. ??? ??
a. ?????? ??? ??? ???? ??.
b. ??? format? ????, ???? ??? endpoint? ??? ?? ??.
c. ??? ??? ?? ??? ??? ??.
2. ??
a. ??? ???, ????, ??? ?????? ?? Fancy? ??.
b. ??? ??? ????, ??? ??.
3. ?? ? API??
a. ??? ???? ????? ??? ??? ? ?? product??? ????, API? ??.
b. ???? ??, adhoc?? ?? ?? ??? ??.
Physical ?? ??: ?? + ?? + ?? + API??
?? ??
?? ?
API??
1. ?? Layer ?? ??
Client
Javascript
IOS
app
Android
app
Web
app
REST
endpoint
Kafka
log_event_sync()
log_event_async()
S2Graph
log_event_sync()
log_event_async()
- ??? service?? Data Pipeline? REST API? ?? data? ingest.
- ? ??? input? ??? ??.
- Synchronous: ingest??? ??? ??? Storage? persist? ??? ?????.
- ?? ??? ?? ???? ???? ??
- Asynchronous: ingest??? ??? ?? ??, asynchronous?? event? persist?.
- ?? activity ???? ???? ??
- Apache Kafka? ??? ??.
- Kafka ?? != Data hub. Kafka ?? != data pipeline.
?? ??
?? ?
API??
1. ?? Layer Solution: S2Event
Client
Javascript
IOS
app
Android
app
Web
app
REST
Endpoint
Kafka
log_event_sync()
log_event_async()
S2Graph
log_event_sync()
log_event_async()
S2Event: REST Endpoint
- Kafka broker? down??? ?? event?? local file? buffering ??? Fallback??.
- Multiple Kafka cluster? publish?? Fanout ??.
- Admin? ?? rule? ???? data? ?? kafka cluster?? publish?? ?? ??.
- Event ????? Metrics ??.
- ?? ???? ?? ???(click??, ??? ??)?? ?? in/out???.
- Synchronous Input
- S2Graph? CRUD
- Asynchronous Input
- Kafka? Publish
?? ??
?? ?
API??
1. ?? Layer Solution: S2Event
Client
Javascript
IOS
app
Android
app
Web
app
REST
Endpoint
Kafka
log_event_sync()
log_event_async()
S2Graph
log_event_sync()
log_event_async()
- ???? ?? Edge(??, ???, ?? ??? ??, extra data) format.
timestamp operation from to label props
now insert Ryan ?? A ??? ??? {¡°Tags¡±: [¡°???7¡±, ¡°??"]}
Edge Format: https://steamshon.gitbooks.io/s2graph-book/content/manage_edges.html
2. ?? Layer ?? ??
- Event?? ??? operation?? ???? ??? Event? ??.
- ?? ??? ?? Real-time, Batch ? ??? ??.
- Batch
- ?? ?? ??? ??? ???? ????? ?? ????.
- ??? 70? ??, ?? 500? ?? ???? ????
- ?? ???? ?? similarity matrix?? ???? ??? ??.
- ???? ?? ??? ????? ???? API? ??? ??? ???, ??? ?? ???,
???? ???? ???? ??, ?? Read throughput, latency? ??? ??? ?.
- Real-time
- ??? stream? ??? ETL? ??.
- Input event? ?? ???? ? ???? ?? 2~3? ??.
- event? ????? topology?? 10??? ?? ??.
?? ?? ?? ? API??
Computing Engine
Spark
Streaming MLlib
S2Lambda Bulk Loader
Batch
2. ?? Layer Solution: S2Lambda(Real-time) ?? ??
?? ?
API??
Computing Engine
Spark
Streaming MLlib
S2Lambda Bulk Loader
Batch
S2Lambda: ETL environment.
- AWS Lambda? ??? ??? spark streaming?? ??.
- Event?? ??? ? event handler(scala function)? ????.
- Source -> Parser -> Flow -> Sink? topology? ??? event handler?? ??? ??.
- Spark Streaming Job? ?? ??? topology?? event?? ??.
- Marathon? ?? Streaming Job HA??.
2. ?? Layer Solution: S2Lambda, Why?
- ???? ?? Spark Streaming job? ????.
- Event handle logic? runtime? tightly coupling.
- Not Composable!
- ????? resource??.
- Streaming Job?? executor? ??? ??.
- Fragmentation!
- ETL process? test case?? ? validation?? ???.
- ???? ??? ?.
- Data? ingest, transform, join, sink? ??? flow? ?? ??? ???.
- IntelliJ?? Spark Streaming Job ?? ?? ???? ???.
- Event??? ??? function??, ?? ??? ?? engine? ??.
- Event processing framework? ???? ?? ??? ?? ?? ?? ?.
- ?? ??? Join.
- ?? ??? Join?? ?? ? ?? cache?? ???(Local ?? >> Remote??).
- ??? JVM??? local cache(RocksDB) instance?? ??.
?? ??
?? ?
API??
Computing Engine
Spark
Streaming MLlib
S2Lambda Bulk Loader
Batch
2. ?? Layer Solution: S2Lambda ?? ??
?? ?
API??
Computing Engine
Spark
Streaming MLlib
S2Lambda Bulk Loader
Batch
//Flow example. parse external api route and
//Get data through HTTP Get then merge with input event.
new EtlFlow {
var baseUrl = ""
def init(options: String) = {
baseUrl = (Json.parse(options)  "external.api.route").as[String]
}
override def apply: Apply = {
case (event @ StringInput(s), ctx) =>
val tokens = s.split("t")
val articleId = tokens(3)
val uri = baseUrl + articleId
ctx.httpClient.url(uri).get().map { response =>
Seq(InputWithResult(event, response.json, response.status))
}
}
}
2. ?? Layer Solution: S2Lambda ?? ??
?? ?
API??
Computing Engine
Spark
Streaming MLlib
S2Lambda Bulk Loader
Batch
//Sink Example. parse kafka topic from option parameter and publish to kafka.
new EtlFlow {
var topics = Array.empty[String]
def init(options: String) = {
topics = (Json.parse(options)  "kafka.topics").as[String].split(",").map(_.trim)
}
def apply: Apply = {
case (input, ctx) =>
ctx.kafkaProducerOpt.foreach { producer =>
topics.foreach { topic =>
producer.send(new ProducerRecord[String, String](topic, null, input.toString)
}
}
Future.successful(Nil)
}
}
S2Lambda Admin Console
- Write event handler as
scala code.
- Check if the code
compiles.
- Check if the code runs
correctly with an
example input.
2. ?? Layer Solution: S2Lambda, ?? ?? ??
?? ?
API??
Computing Engine
Spark
Streaming MLlib
S2Lambda Bulk Loader
Batch
- ??? ?? ?? ?? ? ??.
- ?? ?? ?? Event Handler???, ?? ? ?? Parameter? ????? ??/??? ?? ??.
- Event Handler??? Composable!
- Event Handler? Validation?? ??? + ???? DB? ??.
- ?? ???? ??? ?? ???? ??? sample input?? ??? validation?? ? ??.
- ?? ???? test case? ??? ??.
- ?? ???? ??? DB??? ???.
= ??/?? ?? ?? + ??? ??(JVM, classpath, compiler)
2. ?? Layer Solution: S2Loader ?? ??
?? ?
API??
Computing Engine
Spark
Streaming MLlib
S2Lambda Bulk Loader
Batch
- Production HBase????? ??, ??? ??
?????? Spark Job? ?? ??? HFile???
??? ??.
- HFileOutputFormat2
- ??? HFile? Production HBase ????? Distcp
?, HBase? completebulkload ?? ????
HBase? ???.
Put? ?? ???? ??? ?? Region??? ???
??.
- Memstore??
- WAL??
- ?? compaction
- Region?? GC
Bulk Load? read throughput, latency? ?? ?? ??.
http://blog.cloudera.com/blog/2013/09/how-to-use-hba
se-bulk-loading-and-why/
3. ?? ? API?? ?? ??
?? ??
?? ?
API??
- OLTP
- ?? ???? ????, ?? ???? ??? local graph? traverse?? ???.
- ??? ??? ??. Data Hub? ??.
- ex) Ryan? ? ??? ??? ??.
- ex) A??? ??? ???.
- ex) Ryan? ???? ?? ? ??.
- OLAP
- ???? ???? ?? ?? graph? ?? scan, aggregation? ?? ???.
- ??? ??? ??. Data Warehouse? ??.
- ex) A??? ? ????? ?? ? ??.
- ex) ??? 7??? ?? ????? ???? ???? ???? ?? ???.
Data Hub
(S2Graph)
Data Warehouse
(Druid, Hive)
3. ?? ? API?? Solution: S2Graph(Data Hub)
?? ??
?? ?
API?? Data Hub
(S2Graph)
Data Warehouse
(Druid, Hive)
- Data Hub: Apache S2Graph(incubating), Scalable distributed OLTP Graph Database on HBase.
- OLTP Query? ?? ?? ???. ??, ?? ???? ???? ??? ??? ??.
- Asynchronous Input type??? ??? Synchronous? ???? ???.
- Vertex/Edge? ?? CRUD? ????, ??? Edge?? ?? BFS search traverse? ???.
- ????? ??? cache layer?? ?? ?? ??? ?? ??? ??.
- API? ?? Asynchronous.
3. ?? ? API?? Solution: S2Graph(Data Hub)
?? ??
?? ?
API?? Data Hub
(S2Graph)
3. ?? ? API?? Solution: Druid(Warehouse)
?? ??
?? ?
API?? Data Warehouse
(Druid, Hive)
- Data Warehouse: Druid, Hive
- Realtime Slice & Dice with Druid.
- ??? fix?? ?? ????? interactive?? ??? ???? ??? ??(explore)
- Fix? ??? Hive + Jenkins? ??.
- Dashboard, Report?? ??.
- Long term ???? ?? ???? ??? ??? ????? ??.
- ?? ?? ????? ?? S2Graph(Data Hub)? Bulk Load?? ?????? ???
???? ??.
3. ?? ? API?? Solution: Druid(Warehouse)
?? ??
?? ?
API?? Data Warehouse
(Druid, Hive)
3. ?? ? API??: Summary
?? ??
?? ?
API?? Data Hub
(S2Graph)
Data Warehouse
(Druid, Hive)
- OLTP
- S2Graph Query REST API? ?? Service??? ??? ???? Query.
- ??? Service server?? ?? ??.
- ~70K QPS, ~100ms latency.
- Druid? ??? ????? ???? Druid to S2Graph ETL? ?? S2Graph? ??.
- ???? ?? ?? ?? ?? ??? TopK.
- ??? ?? ?? ?? ?? TopK.
- ?? ??? ??? ?? ?? ??.
- OLAP
- Druid
data? visualize??? Pivot??? UI?? ???? ??? ? ?????? interactive?? slice &
dice?? ? ?? ?? ??.
- ??? ???? ??? ?? ?? ????? ???? admin?? druid REST Query
API? ?? ??.
- 1~10 QPS, ~ 10s latency.
- ?? ???? ???? abstract layer(Property Graph Model)? ????, ???? API(Graph Query)
? ??? ??, service?? ?? API??? ?? ? ??? ??.
- ?? ??? getEdges, getVertices ??? API??? ?? ???? ????? ??.
- ??? Vertex, Edge??, ??? getEdges, getVertices.
- ?? ?? ?? ??, Kafka cluster, spark, HBase?? ?? ?? ??, ?? cost? ?? ?.
- ???? ?? ??? scalable? architecture? ????, ? ?? ???? ??? multi tenancy
? ??? ???? ??? ?? ? ? ?? ??.
- ?? ?? ?? ?? ?? ??? ??? ???? ???? ???? ? ?? ??(??)? ? ? ??
??? ???.
- A/B??? ? Data? ??? ?? ???? ??.
= ?? ??? + ?? ??? ?? + ???? ?? + ?? ?? ?? ?? + ??? ?? ????
Data Pipeline? ?? ?? ?? ???.
Future Work
- Apache S2Graph(incubating)
- ?? ASF incubating??, ?? ? ??? open source? ?? source? ??? ???.
- ? ?? ??? v0.1.0 release vote?? pass.
- ?? ???
- Apache Tinkerpop initial??.
- OLAP layer? ??.
- ?? Storage ??(Redis, RocksDB, Postgresql, Mysql, Cassandra, ¡­)
- S2Lambda open source(Apache License V2)
- ??, ???, ??.
- ???? ?? ?? ???? ?? ? ?? ????¡­(?? ??)??? ???.
- ??? stream ?? framework? ?? ? ? ?? ??.
- ?? ??? ?? ?? ??? ??? ????, ???? ??? ?? Pipeline? ?? ?? ????
??? ????, ?? ???? ?? ?? ???? ?? ??? ?????? ?? ??.
? ??.

More Related Content

What's hot (20)

??? ???? ???, ???? ?? ????
??? ???? ???, ???? ?? ??????? ???? ???, ???? ?? ????
??? ???? ???, ???? ?? ????
Seongyun Byeon
?
[MLOps KR ??] MLOps ?? ?? ?? ??(210605)
[MLOps KR ??] MLOps ?? ?? ?? ??(210605)[MLOps KR ??] MLOps ?? ?? ?? ??(210605)
[MLOps KR ??] MLOps ?? ?? ?? ??(210605)
Seongyun Byeon
?
BigQuery? ?? ?(???, ???, ?? ??? ???? ??) ???
BigQuery? ?? ?(???, ???, ?? ??? ???? ??) ???BigQuery? ?? ?(???, ???, ?? ??? ???? ??) ???
BigQuery? ?? ?(???, ???, ?? ??? ???? ??) ???
Seongyun Byeon
?
Little Big Data #1. ???? ???? ??? ???
Little Big Data #1. ???? ???? ??? ???Little Big Data #1. ???? ???? ??? ???
Little Big Data #1. ???? ???? ??? ???
Seongyun Byeon
?
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ?? (2?)
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ?? (2?)[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ?? (2?)
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ?? (2?)
Hyojun Jeon
?
Aws glue? ?? ??? ??? ??? ????
Aws glue? ?? ??? ??? ??? ????Aws glue? ?? ??? ??? ??? ????
Aws glue? ?? ??? ??? ??? ????
Amazon Web Services Korea
?
[215] Druid? ?? ??? ??? ????
[215] Druid? ?? ??? ??? ????[215] Druid? ?? ??? ??? ????
[215] Druid? ?? ??? ??? ????
NAVER D2
?
Spark ? ??? ????? RDD! (RDD paper review)
Spark ? ??? ????? RDD! (RDD paper review)Spark ? ??? ????? RDD! (RDD paper review)
Spark ? ??? ????? RDD! (RDD paper review)
Yongho Ha
?
??????????????????????? airs???????_?????????????????
??????????????????????? airs???????_???????????????????????????????????????? airs???????_?????????????????
??????????????????????? airs???????_?????????????????
NAVER D2
?
AWS EMR Cost optimization
AWS EMR Cost optimizationAWS EMR Cost optimization
AWS EMR Cost optimization
SANG WON PARK
?
Kafka¤ÈAWS Kinesis¤Î±ÈÝ^
Kafka¤ÈAWS Kinesis¤Î±ÈÝ^Kafka¤ÈAWS Kinesis¤Î±ÈÝ^
Kafka¤ÈAWS Kinesis¤Î±ÈÝ^
Yoshiyasu SAEKI
?
??? ????? ???? ??? ?? ??? ??? - ??? ??? ????, ?? / ??? ??? ????, ????? :: AWS...
??? ????? ???? ??? ?? ??? ??? - ??? ??? ????, ?? / ??? ??? ????, ????? :: AWS...??? ????? ???? ??? ?? ??? ??? - ??? ??? ????, ?? / ??? ??? ????, ????? :: AWS...
??? ????? ???? ??? ?? ??? ??? - ??? ??? ????, ?? / ??? ??? ????, ????? :: AWS...
Amazon Web Services Korea
?
[DevGround] ??? ???? ???? ????????
[DevGround] ??? ???? ???? ????????[DevGround] ??? ???? ???? ????????
[DevGround] ??? ???? ???? ????????
Jae Young Park
?
4. ???????? ????????? ????? ?????
4. ???????? ????????? ????? ?????4. ???????? ????????? ????? ?????
4. ???????? ????????? ????? ?????
Terry Cho
?
[??? ???? ?? ?] ??? ?? ?? ??? ?? ??? - ????? ??? ??
[??? ???? ?? ?] ??? ?? ?? ??? ?? ??? - ????? ??? ??[??? ???? ?? ?] ??? ?? ?? ??? ?? ??? - ????? ??? ??
[??? ???? ?? ?] ??? ?? ?? ??? ?? ??? - ????? ??? ??
Dylan Ko
?
[AWS EXpert Online for JAWS-UG 18] ÒŠ¤»¤Æ¤ä¤ë¤è¡¢Step Functions ¤Î±¾šÝ¤Ã¤Æ¤ä¤Ä¤ò¤Ê
[AWS EXpert Online for JAWS-UG 18] ÒŠ¤»¤Æ¤ä¤ë¤è¡¢Step Functions ¤Î±¾šÝ¤Ã¤Æ¤ä¤Ä¤ò¤Ê[AWS EXpert Online for JAWS-UG 18] ÒŠ¤»¤Æ¤ä¤ë¤è¡¢Step Functions ¤Î±¾šÝ¤Ã¤Æ¤ä¤Ä¤ò¤Ê
[AWS EXpert Online for JAWS-UG 18] ÒŠ¤»¤Æ¤ä¤ë¤è¡¢Step Functions ¤Î±¾šÝ¤Ã¤Æ¤ä¤Ä¤ò¤Ê
Amazon Web Services Japan
?
???? ??? ??? ????: ??? ?? ?? ??? - ??? (Tappytoon) :: AWS Community Day Onlin...
???? ??? ??? ????: ??? ?? ?? ??? - ??? (Tappytoon) :: AWS Community Day Onlin...???? ??? ??? ????: ??? ?? ?? ??? - ??? (Tappytoon) :: AWS Community Day Onlin...
???? ??? ??? ????: ??? ?? ?? ??? - ??? (Tappytoon) :: AWS Community Day Onlin...
AWSKRUG - AWS???????
?
??????? 1???, ????????? ???????
??????? 1???, ????????? ?????????????? 1???, ????????? ???????
??????? 1???, ????????? ???????
Brian Hong
?
AWS??? ?????????? Daily Report ???????? : ?? ???? ???? ????
AWS??? ?????????? Daily Report ???????? : ?? ???? ???? ????AWS??? ?????????? Daily Report ???????? : ?? ???? ???? ????
AWS??? ?????????? Daily Report ???????? : ?? ???? ???? ????
Changje Jeong
?
???????? ?? AWS ???? ?? ? ?? ?? - AWS Summit Seoul 2017
???????? ?? AWS ???? ?? ? ?? ?? - AWS Summit Seoul 2017???????? ?? AWS ???? ?? ? ?? ?? - AWS Summit Seoul 2017
???????? ?? AWS ???? ?? ? ?? ?? - AWS Summit Seoul 2017
Amazon Web Services Korea
?
??? ???? ???, ???? ?? ????
??? ???? ???, ???? ?? ??????? ???? ???, ???? ?? ????
??? ???? ???, ???? ?? ????
Seongyun Byeon
?
[MLOps KR ??] MLOps ?? ?? ?? ??(210605)
[MLOps KR ??] MLOps ?? ?? ?? ??(210605)[MLOps KR ??] MLOps ?? ?? ?? ??(210605)
[MLOps KR ??] MLOps ?? ?? ?? ??(210605)
Seongyun Byeon
?
BigQuery? ?? ?(???, ???, ?? ??? ???? ??) ???
BigQuery? ?? ?(???, ???, ?? ??? ???? ??) ???BigQuery? ?? ?(???, ???, ?? ??? ???? ??) ???
BigQuery? ?? ?(???, ???, ?? ??? ???? ??) ???
Seongyun Byeon
?
Little Big Data #1. ???? ???? ??? ???
Little Big Data #1. ???? ???? ??? ???Little Big Data #1. ???? ???? ??? ???
Little Big Data #1. ???? ???? ??? ???
Seongyun Byeon
?
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ?? (2?)
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ?? (2?)[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ?? (2?)
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ?? (2?)
Hyojun Jeon
?
[215] Druid? ?? ??? ??? ????
[215] Druid? ?? ??? ??? ????[215] Druid? ?? ??? ??? ????
[215] Druid? ?? ??? ??? ????
NAVER D2
?
Spark ? ??? ????? RDD! (RDD paper review)
Spark ? ??? ????? RDD! (RDD paper review)Spark ? ??? ????? RDD! (RDD paper review)
Spark ? ??? ????? RDD! (RDD paper review)
Yongho Ha
?
??????????????????????? airs???????_?????????????????
??????????????????????? airs???????_???????????????????????????????????????? airs???????_?????????????????
??????????????????????? airs???????_?????????????????
NAVER D2
?
AWS EMR Cost optimization
AWS EMR Cost optimizationAWS EMR Cost optimization
AWS EMR Cost optimization
SANG WON PARK
?
Kafka¤ÈAWS Kinesis¤Î±ÈÝ^
Kafka¤ÈAWS Kinesis¤Î±ÈÝ^Kafka¤ÈAWS Kinesis¤Î±ÈÝ^
Kafka¤ÈAWS Kinesis¤Î±ÈÝ^
Yoshiyasu SAEKI
?
??? ????? ???? ??? ?? ??? ??? - ??? ??? ????, ?? / ??? ??? ????, ????? :: AWS...
??? ????? ???? ??? ?? ??? ??? - ??? ??? ????, ?? / ??? ??? ????, ????? :: AWS...??? ????? ???? ??? ?? ??? ??? - ??? ??? ????, ?? / ??? ??? ????, ????? :: AWS...
??? ????? ???? ??? ?? ??? ??? - ??? ??? ????, ?? / ??? ??? ????, ????? :: AWS...
Amazon Web Services Korea
?
[DevGround] ??? ???? ???? ????????
[DevGround] ??? ???? ???? ????????[DevGround] ??? ???? ???? ????????
[DevGround] ??? ???? ???? ????????
Jae Young Park
?
4. ???????? ????????? ????? ?????
4. ???????? ????????? ????? ?????4. ???????? ????????? ????? ?????
4. ???????? ????????? ????? ?????
Terry Cho
?
[??? ???? ?? ?] ??? ?? ?? ??? ?? ??? - ????? ??? ??
[??? ???? ?? ?] ??? ?? ?? ??? ?? ??? - ????? ??? ??[??? ???? ?? ?] ??? ?? ?? ??? ?? ??? - ????? ??? ??
[??? ???? ?? ?] ??? ?? ?? ??? ?? ??? - ????? ??? ??
Dylan Ko
?
[AWS EXpert Online for JAWS-UG 18] ÒŠ¤»¤Æ¤ä¤ë¤è¡¢Step Functions ¤Î±¾šÝ¤Ã¤Æ¤ä¤Ä¤ò¤Ê
[AWS EXpert Online for JAWS-UG 18] ÒŠ¤»¤Æ¤ä¤ë¤è¡¢Step Functions ¤Î±¾šÝ¤Ã¤Æ¤ä¤Ä¤ò¤Ê[AWS EXpert Online for JAWS-UG 18] ÒŠ¤»¤Æ¤ä¤ë¤è¡¢Step Functions ¤Î±¾šÝ¤Ã¤Æ¤ä¤Ä¤ò¤Ê
[AWS EXpert Online for JAWS-UG 18] ÒŠ¤»¤Æ¤ä¤ë¤è¡¢Step Functions ¤Î±¾šÝ¤Ã¤Æ¤ä¤Ä¤ò¤Ê
Amazon Web Services Japan
?
???? ??? ??? ????: ??? ?? ?? ??? - ??? (Tappytoon) :: AWS Community Day Onlin...
???? ??? ??? ????: ??? ?? ?? ??? - ??? (Tappytoon) :: AWS Community Day Onlin...???? ??? ??? ????: ??? ?? ?? ??? - ??? (Tappytoon) :: AWS Community Day Onlin...
???? ??? ??? ????: ??? ?? ?? ??? - ??? (Tappytoon) :: AWS Community Day Onlin...
AWSKRUG - AWS???????
?
??????? 1???, ????????? ???????
??????? 1???, ????????? ?????????????? 1???, ????????? ???????
??????? 1???, ????????? ???????
Brian Hong
?
AWS??? ?????????? Daily Report ???????? : ?? ???? ???? ????
AWS??? ?????????? Daily Report ???????? : ?? ???? ???? ????AWS??? ?????????? Daily Report ???????? : ?? ???? ???? ????
AWS??? ?????????? Daily Report ???????? : ?? ???? ???? ????
Changje Jeong
?
???????? ?? AWS ???? ?? ? ?? ?? - AWS Summit Seoul 2017
???????? ?? AWS ???? ?? ? ?? ?? - AWS Summit Seoul 2017???????? ?? AWS ???? ?? ? ?? ?? - AWS Summit Seoul 2017
???????? ?? AWS ???? ?? ? ?? ?? - AWS Summit Seoul 2017
Amazon Web Services Korea
?

Similar to [236] ????????????????????????? ???????? (20)

????? ?????? Real-time In-memory Stream Processing ??????
????? ?????? Real-time In-memory Stream Processing ??????????? ?????? Real-time In-memory Stream Processing ??????
????? ?????? Real-time In-memory Stream Processing ??????
Ted Won
?
?? ?? ??? ?? ???? AWS ?? ?? :: ??? :: AWS Summit Seoul 2016
?? ?? ??? ?? ???? AWS ?? ?? :: ??? :: AWS Summit Seoul 2016?? ?? ??? ?? ???? AWS ?? ?? :: ??? :: AWS Summit Seoul 2016
?? ?? ??? ?? ???? AWS ?? ?? :: ??? :: AWS Summit Seoul 2016
Amazon Web Services Korea
?
Streaming platform Kafka in SK planet
Streaming platform Kafka in SK planetStreaming platform Kafka in SK planet
Streaming platform Kafka in SK planet
Byeongsu Kang
?
?????? ????????? ???? ???? ????? ???????
?????? ????????? ???? ???? ????? ????????????? ????????? ???? ???? ????? ???????
?????? ????????? ???? ???? ????? ???????
Yeonhee Kim
?
AWS ???? ?? ??? ??? - ??2, Community Day 2018 re:Invent ??
AWS ???? ?? ??? ??? - ??2, Community Day 2018 re:Invent ??AWS ???? ?? ??? ??? - ??2, Community Day 2018 re:Invent ??
AWS ???? ?? ??? ??? - ??2, Community Day 2018 re:Invent ??
AWSKRUG - AWS???????
?
2013 ???? ? API ?? ??? ??- ???
2013 ???? ? API ?? ??? ??- ???2013 ???? ? API ?? ??? ??- ???
2013 ???? ? API ?? ??? ??- ???
Channy Yun
?
AWS ????? ?????? ??????? ??????? - ??? (????? ??) :: AWS Community Day 2017
AWS ????? ?????? ??????? ??????? - ??? (????? ??) :: AWS Community Day 2017AWS ????? ?????? ??????? ??????? - ??? (????? ??) :: AWS Community Day 2017
AWS ????? ?????? ??????? ??????? - ??? (????? ??) :: AWS Community Day 2017
AWSKRUG - AWS???????
?
Real-time Big Data Analytics Practice with Unstructured Data
Real-time Big Data Analytics Practice with Unstructured DataReal-time Big Data Analytics Practice with Unstructured Data
Real-time Big Data Analytics Practice with Unstructured Data
Ted Won
?
AWS Cloud ????? DB Migration ?? ????
AWS Cloud ????? DB Migration ?? ????AWS Cloud ????? DB Migration ?? ????
AWS Cloud ????? DB Migration ?? ????
BESPIN GLOBAL
?
Kafka streams 20201012
Kafka streams 20201012Kafka streams 20201012
Kafka streams 20201012
? ??
?
??? ??? ??? ???? node.js ?? ??? - Playnode 2015
??? ??? ??? ???? node.js ?? ??? - Playnode 2015??? ??? ??? ???? node.js ?? ??? - Playnode 2015
??? ??? ??? ???? node.js ?? ??? - Playnode 2015
Goonoo Kim
?
???? ?? ??? ?? ??(2014)
???? ?? ??? ?? ??(2014)???? ?? ??? ?? ??(2014)
???? ?? ??? ?? ??(2014)
Channy Yun
?
???? ??? ??? ??? - ??? (Superb AI) :: AWS Community Day 2020
???? ??? ??? ??? - ??? (Superb AI) :: AWS Community Day 2020???? ??? ??? ??? - ??? (Superb AI) :: AWS Community Day 2020
???? ??? ??? ??? - ??? (Superb AI) :: AWS Community Day 2020
AWSKRUG - AWS???????
?
API Design & Performance Optimization_ºÝºÝߣShare.pdf
API Design & Performance Optimization_ºÝºÝߣShare.pdfAPI Design & Performance Optimization_ºÝºÝߣShare.pdf
API Design & Performance Optimization_ºÝºÝߣShare.pdf
seojung19
?
??? ???: AWS ?????? ????? ??? ??? ??? ?? ¨C ??? AWS ???? ????, ??? AWS ???? ??...
??? ???: AWS ?????? ????? ??? ??? ??? ?? ¨C ??? AWS ???? ????, ??? AWS ???? ??...??? ???: AWS ?????? ????? ??? ??? ??? ?? ¨C ??? AWS ???? ????, ??? AWS ???? ??...
??? ???: AWS ?????? ????? ??? ??? ??? ?? ¨C ??? AWS ???? ????, ??? AWS ???? ??...
Amazon Web Services Korea
?
???? ??? ? ?? ??? ?? : Tajo on AWS
???? ??? ? ?? ??? ?? : Tajo on AWS???? ??? ? ?? ??? ?? : Tajo on AWS
???? ??? ? ?? ??? ?? : Tajo on AWS
Matthew (???)
?
DataWorks Summit 2017
DataWorks Summit 2017DataWorks Summit 2017
DataWorks Summit 2017
Daesung Park
?
spark database Service
spark database Servicespark database Service
spark database Service
?? ?
?
AWS BigData ??? ?? AWS ??? ????
AWS BigData ??? ?? AWS ??? ????AWS BigData ??? ?? AWS ??? ????
AWS BigData ??? ?? AWS ??? ????
BESPIN GLOBAL
?
DB Monitoring ?? ? ?? (???)
DB Monitoring ?? ? ?? (???)DB Monitoring ?? ? ?? (???)
DB Monitoring ?? ? ?? (???)
WhaTap Labs
?
????? ?????? Real-time In-memory Stream Processing ??????
????? ?????? Real-time In-memory Stream Processing ??????????? ?????? Real-time In-memory Stream Processing ??????
????? ?????? Real-time In-memory Stream Processing ??????
Ted Won
?
?? ?? ??? ?? ???? AWS ?? ?? :: ??? :: AWS Summit Seoul 2016
?? ?? ??? ?? ???? AWS ?? ?? :: ??? :: AWS Summit Seoul 2016?? ?? ??? ?? ???? AWS ?? ?? :: ??? :: AWS Summit Seoul 2016
?? ?? ??? ?? ???? AWS ?? ?? :: ??? :: AWS Summit Seoul 2016
Amazon Web Services Korea
?
Streaming platform Kafka in SK planet
Streaming platform Kafka in SK planetStreaming platform Kafka in SK planet
Streaming platform Kafka in SK planet
Byeongsu Kang
?
?????? ????????? ???? ???? ????? ???????
?????? ????????? ???? ???? ????? ????????????? ????????? ???? ???? ????? ???????
?????? ????????? ???? ???? ????? ???????
Yeonhee Kim
?
AWS ???? ?? ??? ??? - ??2, Community Day 2018 re:Invent ??
AWS ???? ?? ??? ??? - ??2, Community Day 2018 re:Invent ??AWS ???? ?? ??? ??? - ??2, Community Day 2018 re:Invent ??
AWS ???? ?? ??? ??? - ??2, Community Day 2018 re:Invent ??
AWSKRUG - AWS???????
?
2013 ???? ? API ?? ??? ??- ???
2013 ???? ? API ?? ??? ??- ???2013 ???? ? API ?? ??? ??- ???
2013 ???? ? API ?? ??? ??- ???
Channy Yun
?
AWS ????? ?????? ??????? ??????? - ??? (????? ??) :: AWS Community Day 2017
AWS ????? ?????? ??????? ??????? - ??? (????? ??) :: AWS Community Day 2017AWS ????? ?????? ??????? ??????? - ??? (????? ??) :: AWS Community Day 2017
AWS ????? ?????? ??????? ??????? - ??? (????? ??) :: AWS Community Day 2017
AWSKRUG - AWS???????
?
Real-time Big Data Analytics Practice with Unstructured Data
Real-time Big Data Analytics Practice with Unstructured DataReal-time Big Data Analytics Practice with Unstructured Data
Real-time Big Data Analytics Practice with Unstructured Data
Ted Won
?
AWS Cloud ????? DB Migration ?? ????
AWS Cloud ????? DB Migration ?? ????AWS Cloud ????? DB Migration ?? ????
AWS Cloud ????? DB Migration ?? ????
BESPIN GLOBAL
?
Kafka streams 20201012
Kafka streams 20201012Kafka streams 20201012
Kafka streams 20201012
? ??
?
??? ??? ??? ???? node.js ?? ??? - Playnode 2015
??? ??? ??? ???? node.js ?? ??? - Playnode 2015??? ??? ??? ???? node.js ?? ??? - Playnode 2015
??? ??? ??? ???? node.js ?? ??? - Playnode 2015
Goonoo Kim
?
???? ?? ??? ?? ??(2014)
???? ?? ??? ?? ??(2014)???? ?? ??? ?? ??(2014)
???? ?? ??? ?? ??(2014)
Channy Yun
?
???? ??? ??? ??? - ??? (Superb AI) :: AWS Community Day 2020
???? ??? ??? ??? - ??? (Superb AI) :: AWS Community Day 2020???? ??? ??? ??? - ??? (Superb AI) :: AWS Community Day 2020
???? ??? ??? ??? - ??? (Superb AI) :: AWS Community Day 2020
AWSKRUG - AWS???????
?
API Design & Performance Optimization_ºÝºÝߣShare.pdf
API Design & Performance Optimization_ºÝºÝߣShare.pdfAPI Design & Performance Optimization_ºÝºÝߣShare.pdf
API Design & Performance Optimization_ºÝºÝߣShare.pdf
seojung19
?
??? ???: AWS ?????? ????? ??? ??? ??? ?? ¨C ??? AWS ???? ????, ??? AWS ???? ??...
??? ???: AWS ?????? ????? ??? ??? ??? ?? ¨C ??? AWS ???? ????, ??? AWS ???? ??...??? ???: AWS ?????? ????? ??? ??? ??? ?? ¨C ??? AWS ???? ????, ??? AWS ???? ??...
??? ???: AWS ?????? ????? ??? ??? ??? ?? ¨C ??? AWS ???? ????, ??? AWS ???? ??...
Amazon Web Services Korea
?
???? ??? ? ?? ??? ?? : Tajo on AWS
???? ??? ? ?? ??? ?? : Tajo on AWS???? ??? ? ?? ??? ?? : Tajo on AWS
???? ??? ? ?? ??? ?? : Tajo on AWS
Matthew (???)
?
DataWorks Summit 2017
DataWorks Summit 2017DataWorks Summit 2017
DataWorks Summit 2017
Daesung Park
?
spark database Service
spark database Servicespark database Service
spark database Service
?? ?
?
AWS BigData ??? ?? AWS ??? ????
AWS BigData ??? ?? AWS ??? ????AWS BigData ??? ?? AWS ??? ????
AWS BigData ??? ?? AWS ??? ????
BESPIN GLOBAL
?
DB Monitoring ?? ? ?? (???)
DB Monitoring ?? ? ?? (???)DB Monitoring ?? ? ?? (???)
DB Monitoring ?? ? ?? (???)
WhaTap Labs
?

More from NAVER D2 (20)

[211] ????? ???? ??? ???
[211] ????? ???? ??? ???[211] ????? ???? ??? ???
[211] ????? ???? ??? ???
NAVER D2
?
[233] ?? ???? ??????? ???? Network Load Balancing: Maglev Hashing Scheduler i...
[233] ?? ???? ??????? ???? Network Load Balancing: Maglev Hashing Scheduler i...[233] ?? ???? ??????? ???? Network Load Balancing: Maglev Hashing Scheduler i...
[233] ?? ???? ??????? ???? Network Load Balancing: Maglev Hashing Scheduler i...
NAVER D2
?
[245]Papago Internals: ????? ???? ??
[245]Papago Internals: ????? ???? ??[245]Papago Internals: ????? ???? ??
[245]Papago Internals: ????? ???? ??
NAVER D2
?
[236] ??? ??? ??? ???: ??? ??????? ?? ??
[236] ??? ??? ??? ???: ??? ??????? ?? ??[236] ??? ??? ??? ???: ??? ??????? ?? ??
[236] ??? ??? ??? ???: ??? ??????? ?? ??
NAVER D2
?
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
NAVER D2
?
[244]??? ?? ??? ?? ????? ???
[244]??? ?? ??? ?? ????? ???[244]??? ?? ??? ?? ????? ???
[244]??? ?? ??? ?? ????? ???
NAVER D2
?
[243] Deep Learning to help student¡¯s Deep Learning
[243] Deep Learning to help student¡¯s Deep Learning[243] Deep Learning to help student¡¯s Deep Learning
[243] Deep Learning to help student¡¯s Deep Learning
NAVER D2
?
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
NAVER D2
?
Old version: [233]?? ???? ??????? ???? Network Load Balancing
Old version: [233]?? ???? ??????? ???? Network Load BalancingOld version: [233]?? ???? ??????? ???? Network Load Balancing
Old version: [233]?? ???? ??????? ???? Network Load Balancing
NAVER D2
?
[226]NAVER ?? deep click prediction: ????? ????
[226]NAVER ?? deep click prediction: ????? ????[226]NAVER ?? deep click prediction: ????? ????
[226]NAVER ?? deep click prediction: ????? ????
NAVER D2
?
[225]NSML: ???? ??? ????? & ?? ?? ?????
[225]NSML: ???? ??? ????? & ?? ?? ?????[225]NSML: ???? ??? ????? & ?? ?? ?????
[225]NSML: ???? ??? ????? & ?? ?? ?????
NAVER D2
?
[224]??? ??? ???
[224]??? ??? ???[224]??? ??? ???
[224]??? ??? ???
NAVER D2
?
[216]Search Reliability Engineering (??: ???? ???? ?? ??? ?????)
[216]Search Reliability Engineering (??: ???? ???? ?? ??? ?????)[216]Search Reliability Engineering (??: ???? ???? ?? ??? ?????)
[216]Search Reliability Engineering (??: ???? ???? ?? ??? ?????)
NAVER D2
?
[214] Ai Serving Platform: ?? ? ? ?? ????? ???? ?? ?????
[214] Ai Serving Platform: ?? ? ? ?? ????? ???? ?? ?????[214] Ai Serving Platform: ?? ? ? ?? ????? ???? ?? ?????
[214] Ai Serving Platform: ?? ? ? ?? ????? ???? ?? ?????
NAVER D2
?
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual Search
NAVER D2
?
[232] TensorRT? ??? ??? Inference ???
[232] TensorRT? ??? ??? Inference ???[232] TensorRT? ??? ??? Inference ???
[232] TensorRT? ??? ??? Inference ???
NAVER D2
?
[242]??? ??? ??? ?? ?? ?? ???? ??: ???? ?? POI ?? ??[242]??? ??? ??? ?? ?? ?? ???? ??: ???? ?? POI ?? ??
[242]??? ??? ??? ?? ?? ?? ???? ??: ???? ?? POI ?? ??
NAVER D2
?
[212]C3, ??? ???? ???? ??? ?? ????[212]C3, ??? ???? ???? ??? ?? ????
[212]C3, ??? ???? ???? ??? ?? ????
NAVER D2
?
[223]???? QA: ????, NLP???
[223]???? QA: ????, NLP???[223]???? QA: ????, NLP???
[223]???? QA: ????, NLP???
NAVER D2
?
[231] Clova ????
[231] Clova ????[231] Clova ????
[231] Clova ????
NAVER D2
?
[211] ????? ???? ??? ???
[211] ????? ???? ??? ???[211] ????? ???? ??? ???
[211] ????? ???? ??? ???
NAVER D2
?
[233] ?? ???? ??????? ???? Network Load Balancing: Maglev Hashing Scheduler i...
[233] ?? ???? ??????? ???? Network Load Balancing: Maglev Hashing Scheduler i...[233] ?? ???? ??????? ???? Network Load Balancing: Maglev Hashing Scheduler i...
[233] ?? ???? ??????? ???? Network Load Balancing: Maglev Hashing Scheduler i...
NAVER D2
?
[245]Papago Internals: ????? ???? ??
[245]Papago Internals: ????? ???? ??[245]Papago Internals: ????? ???? ??
[245]Papago Internals: ????? ???? ??
NAVER D2
?
[236] ??? ??? ??? ???: ??? ??????? ?? ??
[236] ??? ??? ??? ???: ??? ??????? ?? ??[236] ??? ??? ??? ???: ??? ??????? ?? ??
[236] ??? ??? ??? ???: ??? ??????? ?? ??
NAVER D2
?
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
NAVER D2
?
[244]??? ?? ??? ?? ????? ???
[244]??? ?? ??? ?? ????? ???[244]??? ?? ??? ?? ????? ???
[244]??? ?? ??? ?? ????? ???
NAVER D2
?
[243] Deep Learning to help student¡¯s Deep Learning
[243] Deep Learning to help student¡¯s Deep Learning[243] Deep Learning to help student¡¯s Deep Learning
[243] Deep Learning to help student¡¯s Deep Learning
NAVER D2
?
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
NAVER D2
?
Old version: [233]?? ???? ??????? ???? Network Load Balancing
Old version: [233]?? ???? ??????? ???? Network Load BalancingOld version: [233]?? ???? ??????? ???? Network Load Balancing
Old version: [233]?? ???? ??????? ???? Network Load Balancing
NAVER D2
?
[226]NAVER ?? deep click prediction: ????? ????
[226]NAVER ?? deep click prediction: ????? ????[226]NAVER ?? deep click prediction: ????? ????
[226]NAVER ?? deep click prediction: ????? ????
NAVER D2
?
[225]NSML: ???? ??? ????? & ?? ?? ?????
[225]NSML: ???? ??? ????? & ?? ?? ?????[225]NSML: ???? ??? ????? & ?? ?? ?????
[225]NSML: ???? ??? ????? & ?? ?? ?????
NAVER D2
?
[224]??? ??? ???
[224]??? ??? ???[224]??? ??? ???
[224]??? ??? ???
NAVER D2
?
[216]Search Reliability Engineering (??: ???? ???? ?? ??? ?????)
[216]Search Reliability Engineering (??: ???? ???? ?? ??? ?????)[216]Search Reliability Engineering (??: ???? ???? ?? ??? ?????)
[216]Search Reliability Engineering (??: ???? ???? ?? ??? ?????)
NAVER D2
?
[214] Ai Serving Platform: ?? ? ? ?? ????? ???? ?? ?????
[214] Ai Serving Platform: ?? ? ? ?? ????? ???? ?? ?????[214] Ai Serving Platform: ?? ? ? ?? ????? ???? ?? ?????
[214] Ai Serving Platform: ?? ? ? ?? ????? ???? ?? ?????
NAVER D2
?
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual Search
NAVER D2
?
[232] TensorRT? ??? ??? Inference ???
[232] TensorRT? ??? ??? Inference ???[232] TensorRT? ??? ??? Inference ???
[232] TensorRT? ??? ??? Inference ???
NAVER D2
?
[242]??? ??? ??? ?? ?? ?? ???? ??: ???? ?? POI ?? ??[242]??? ??? ??? ?? ?? ?? ???? ??: ???? ?? POI ?? ??
[242]??? ??? ??? ?? ?? ?? ???? ??: ???? ?? POI ?? ??
NAVER D2
?
[212]C3, ??? ???? ???? ??? ?? ????[212]C3, ??? ???? ???? ??? ?? ????
[212]C3, ??? ???? ???? ??? ?? ????
NAVER D2
?
[223]???? QA: ????, NLP???
[223]???? QA: ????, NLP???[223]???? QA: ????, NLP???
[223]???? QA: ????, NLP???
NAVER D2
?
[231] Clova ????
[231] Clova ????[231] Clova ????
[231] Clova ????
NAVER D2
?

[236] ????????????????????????? ????????

  • 1. Introduction to Data Pipeline at Kakao
  • 2. Table of Contents 1. What is a Data Pipeline? 2. Architecture 3. Future Work
  • 3. What is a Data Pipeline?
  • 4. + A??? ????? IT? Ryan? ??? 7 ?? ??(A)? ???. +A??? ?? ???? ? ?? 7, ??, ????. +A??? ???? ???? ?: ? ??? 6:4?. Definition
  • 5. ???(OLAP, Data Warehouse)???(OLTP, Data Hub) Ryan? ?? ? ???? A Ryan? ??? ?? ? ???? ??? ?????? IT Ryan? ??? ?? ? ????? ?? ?? ?? ?????? ??? 7, ??, ??, ??, V20 A? ? ????? ??? ???? 30? 90% IT ????? ??? ? ?? ?? ?? ???? A??. Ryan? ?? ? ??? ??? ?? ?? B Ryan? ?? ??? ? ?? ????? ?? ?? ???? C Ryan? ???? ?? ?? ??? D A? ? ???? ?? ?? ?? ??? B, C ??? 7? ?? ?? ?????? N?, Ryan. A??? ? ?? ? ?? ????? ??? ?????? N?, Ryan. Usecase ???
  • 6. - ???? ??? event?? - ?? ?? 40~50? ?? ??? event??. - 10? ?? Data pipeline? output? ??. - ??? ??? ?? ??? ?? ???(meta)? ????? ??, ?? ??, ??? event? join. - Event?? join??? ?? meta?? 2~3?(Avg). - ?? ????? ??? ?? ?? ??? ??? ?? ??? 70??. - ?? data source? ??? ? ?? ??? ??? ??, scalable? ??? ??. - ?? ????? ???? event? ?? ??? ????? ??, 50? x 30 = 1500??. - ?? ?? ???? ?? ?? ?? ????? ??? 500??. - 6???? ????? ??? - ???? ????? ???? ??? data? ??? storage? ????? ??? storage? ???? ??? ???. - ????? ??? data? ?? ???? unified, scalable? ??? API? ???? ?. - ?? ?? ??? ??? ??? ??? ??? ?? ??. Technical Challenges
  • 7. - Open source? ??? ???? ? ??? ?? ????. - ?????? ??? ??? ?? ???, ? ???? ????? ??? ????? ??? ????? ??. - ??? ???? ????, ?? ???? ???? iteration? ??? ?? ?. (?? << ???). - ?? ???? ?? ????. - ??? ?? ? ?? ?? - ??? ?? Endpoint?? ? ???, ??. - Data? ??, ??, API? ?? ??? ??. - ?? ???? ????? ???, ?? ??? ??? ??? ??? ? ??? ??? ??? ??? ???. - ??? ??? ???? ??? ??? ??. - ???? ???? ????? ??? - ??? ????? ?? ?? ?? -> ??? ?? -> AB??? -> ???? -> ?? ?? ??? ???? Iteration. - Iteration ???? ??? ???? ?????? ?? ??? ??? ? - Data >> Opinion Non-Technical Challenges
  • 8. - ? ??? ???? ?? - ???? ???? ???? ???? - ??? platform ??? ??? ? ??? ??(?? ?????? ??? ??, ??? ??? ?) - ?? ??? ??? - ????? ??? ???? ????? - ????? ? ???? ??? ? ?? Input/Outpu format? ????? - Event?? ?? data?? join???? ?? view? ?? ? ? ??? ?? ?? ??. - ??? ?? ??? ???! - ???? ??. - ?? ???? ?? ???? ???? ??? ??. ?? ??? ?? ?????? - ??? ? ?? Data Product/API? ???? ??? ??. - ???? ? ???? - ??? ???? ??? ? ?????? - ???? ?? ????? - ??? raw data? ?????. - ????? operation? ?? ?? ??? ??? ???? ?? ???, ??? ?? ???? ??. - ??? ?? kafka????, spark ????, ? ??? ????? ??. Personal Note.
  • 11. ?? ?? ?? ? API?? 1. ??? ?? a. ?????? ??? ??? ???? ??. b. ??? format? ????, ???? ??? endpoint? ??? ?? ??. c. ??? ??? ?? ??? ??? ??. 2. ?? a. ??? ???, ????, ??? ?????? ?? Fancy? ??. b. ??? ??? ????, ??? ??. 3. ?? ? API?? a. ??? ???? ????? ??? ??? ? ?? product??? ????, API? ??. b. ???? ??, adhoc?? ?? ?? ??? ??. Physical ?? ??: ?? + ?? + ?? + API??
  • 12. ?? ?? ?? ? API?? 1. ?? Layer ?? ?? Client Javascript IOS app Android app Web app REST endpoint Kafka log_event_sync() log_event_async() S2Graph log_event_sync() log_event_async() - ??? service?? Data Pipeline? REST API? ?? data? ingest. - ? ??? input? ??? ??. - Synchronous: ingest??? ??? ??? Storage? persist? ??? ?????. - ?? ??? ?? ???? ???? ?? - Asynchronous: ingest??? ??? ?? ??, asynchronous?? event? persist?. - ?? activity ???? ???? ?? - Apache Kafka? ??? ??. - Kafka ?? != Data hub. Kafka ?? != data pipeline.
  • 13. ?? ?? ?? ? API?? 1. ?? Layer Solution: S2Event Client Javascript IOS app Android app Web app REST Endpoint Kafka log_event_sync() log_event_async() S2Graph log_event_sync() log_event_async() S2Event: REST Endpoint - Kafka broker? down??? ?? event?? local file? buffering ??? Fallback??. - Multiple Kafka cluster? publish?? Fanout ??. - Admin? ?? rule? ???? data? ?? kafka cluster?? publish?? ?? ??. - Event ????? Metrics ??. - ?? ???? ?? ???(click??, ??? ??)?? ?? in/out???. - Synchronous Input - S2Graph? CRUD - Asynchronous Input - Kafka? Publish
  • 14. ?? ?? ?? ? API?? 1. ?? Layer Solution: S2Event Client Javascript IOS app Android app Web app REST Endpoint Kafka log_event_sync() log_event_async() S2Graph log_event_sync() log_event_async() - ???? ?? Edge(??, ???, ?? ??? ??, extra data) format. timestamp operation from to label props now insert Ryan ?? A ??? ??? {¡°Tags¡±: [¡°???7¡±, ¡°??"]} Edge Format: https://steamshon.gitbooks.io/s2graph-book/content/manage_edges.html
  • 15. 2. ?? Layer ?? ?? - Event?? ??? operation?? ???? ??? Event? ??. - ?? ??? ?? Real-time, Batch ? ??? ??. - Batch - ?? ?? ??? ??? ???? ????? ?? ????. - ??? 70? ??, ?? 500? ?? ???? ???? - ?? ???? ?? similarity matrix?? ???? ??? ??. - ???? ?? ??? ????? ???? API? ??? ??? ???, ??? ?? ???, ???? ???? ???? ??, ?? Read throughput, latency? ??? ??? ?. - Real-time - ??? stream? ??? ETL? ??. - Input event? ?? ???? ? ???? ?? 2~3? ??. - event? ????? topology?? 10??? ?? ??. ?? ?? ?? ? API?? Computing Engine Spark Streaming MLlib S2Lambda Bulk Loader Batch
  • 16. 2. ?? Layer Solution: S2Lambda(Real-time) ?? ?? ?? ? API?? Computing Engine Spark Streaming MLlib S2Lambda Bulk Loader Batch S2Lambda: ETL environment. - AWS Lambda? ??? ??? spark streaming?? ??. - Event?? ??? ? event handler(scala function)? ????. - Source -> Parser -> Flow -> Sink? topology? ??? event handler?? ??? ??. - Spark Streaming Job? ?? ??? topology?? event?? ??. - Marathon? ?? Streaming Job HA??.
  • 17. 2. ?? Layer Solution: S2Lambda, Why? - ???? ?? Spark Streaming job? ????. - Event handle logic? runtime? tightly coupling. - Not Composable! - ????? resource??. - Streaming Job?? executor? ??? ??. - Fragmentation! - ETL process? test case?? ? validation?? ???. - ???? ??? ?. - Data? ingest, transform, join, sink? ??? flow? ?? ??? ???. - IntelliJ?? Spark Streaming Job ?? ?? ???? ???. - Event??? ??? function??, ?? ??? ?? engine? ??. - Event processing framework? ???? ?? ??? ?? ?? ?? ?. - ?? ??? Join. - ?? ??? Join?? ?? ? ?? cache?? ???(Local ?? >> Remote??). - ??? JVM??? local cache(RocksDB) instance?? ??. ?? ?? ?? ? API?? Computing Engine Spark Streaming MLlib S2Lambda Bulk Loader Batch
  • 18. 2. ?? Layer Solution: S2Lambda ?? ?? ?? ? API?? Computing Engine Spark Streaming MLlib S2Lambda Bulk Loader Batch //Flow example. parse external api route and //Get data through HTTP Get then merge with input event. new EtlFlow { var baseUrl = "" def init(options: String) = { baseUrl = (Json.parse(options) "external.api.route").as[String] } override def apply: Apply = { case (event @ StringInput(s), ctx) => val tokens = s.split("t") val articleId = tokens(3) val uri = baseUrl + articleId ctx.httpClient.url(uri).get().map { response => Seq(InputWithResult(event, response.json, response.status)) } } }
  • 19. 2. ?? Layer Solution: S2Lambda ?? ?? ?? ? API?? Computing Engine Spark Streaming MLlib S2Lambda Bulk Loader Batch //Sink Example. parse kafka topic from option parameter and publish to kafka. new EtlFlow { var topics = Array.empty[String] def init(options: String) = { topics = (Json.parse(options) "kafka.topics").as[String].split(",").map(_.trim) } def apply: Apply = { case (input, ctx) => ctx.kafkaProducerOpt.foreach { producer => topics.foreach { topic => producer.send(new ProducerRecord[String, String](topic, null, input.toString) } } Future.successful(Nil) } }
  • 20. S2Lambda Admin Console - Write event handler as scala code. - Check if the code compiles. - Check if the code runs correctly with an example input.
  • 21. 2. ?? Layer Solution: S2Lambda, ?? ?? ?? ?? ? API?? Computing Engine Spark Streaming MLlib S2Lambda Bulk Loader Batch - ??? ?? ?? ?? ? ??. - ?? ?? ?? Event Handler???, ?? ? ?? Parameter? ????? ??/??? ?? ??. - Event Handler??? Composable! - Event Handler? Validation?? ??? + ???? DB? ??. - ?? ???? ??? ?? ???? ??? sample input?? ??? validation?? ? ??. - ?? ???? test case? ??? ??. - ?? ???? ??? DB??? ???. = ??/?? ?? ?? + ??? ??(JVM, classpath, compiler)
  • 22. 2. ?? Layer Solution: S2Loader ?? ?? ?? ? API?? Computing Engine Spark Streaming MLlib S2Lambda Bulk Loader Batch - Production HBase????? ??, ??? ?? ?????? Spark Job? ?? ??? HFile??? ??? ??. - HFileOutputFormat2 - ??? HFile? Production HBase ????? Distcp ?, HBase? completebulkload ?? ???? HBase? ???. Put? ?? ???? ??? ?? Region??? ??? ??. - Memstore?? - WAL?? - ?? compaction - Region?? GC Bulk Load? read throughput, latency? ?? ?? ??. http://blog.cloudera.com/blog/2013/09/how-to-use-hba se-bulk-loading-and-why/
  • 23. 3. ?? ? API?? ?? ?? ?? ?? ?? ? API?? - OLTP - ?? ???? ????, ?? ???? ??? local graph? traverse?? ???. - ??? ??? ??. Data Hub? ??. - ex) Ryan? ? ??? ??? ??. - ex) A??? ??? ???. - ex) Ryan? ???? ?? ? ??. - OLAP - ???? ???? ?? ?? graph? ?? scan, aggregation? ?? ???. - ??? ??? ??. Data Warehouse? ??. - ex) A??? ? ????? ?? ? ??. - ex) ??? 7??? ?? ????? ???? ???? ???? ?? ???. Data Hub (S2Graph) Data Warehouse (Druid, Hive)
  • 24. 3. ?? ? API?? Solution: S2Graph(Data Hub) ?? ?? ?? ? API?? Data Hub (S2Graph) Data Warehouse (Druid, Hive) - Data Hub: Apache S2Graph(incubating), Scalable distributed OLTP Graph Database on HBase. - OLTP Query? ?? ?? ???. ??, ?? ???? ???? ??? ??? ??. - Asynchronous Input type??? ??? Synchronous? ???? ???. - Vertex/Edge? ?? CRUD? ????, ??? Edge?? ?? BFS search traverse? ???. - ????? ??? cache layer?? ?? ?? ??? ?? ??? ??. - API? ?? Asynchronous.
  • 25. 3. ?? ? API?? Solution: S2Graph(Data Hub) ?? ?? ?? ? API?? Data Hub (S2Graph)
  • 26. 3. ?? ? API?? Solution: Druid(Warehouse) ?? ?? ?? ? API?? Data Warehouse (Druid, Hive) - Data Warehouse: Druid, Hive - Realtime Slice & Dice with Druid. - ??? fix?? ?? ????? interactive?? ??? ???? ??? ??(explore) - Fix? ??? Hive + Jenkins? ??. - Dashboard, Report?? ??. - Long term ???? ?? ???? ??? ??? ????? ??. - ?? ?? ????? ?? S2Graph(Data Hub)? Bulk Load?? ?????? ??? ???? ??.
  • 27. 3. ?? ? API?? Solution: Druid(Warehouse) ?? ?? ?? ? API?? Data Warehouse (Druid, Hive)
  • 28. 3. ?? ? API??: Summary ?? ?? ?? ? API?? Data Hub (S2Graph) Data Warehouse (Druid, Hive) - OLTP - S2Graph Query REST API? ?? Service??? ??? ???? Query. - ??? Service server?? ?? ??. - ~70K QPS, ~100ms latency. - Druid? ??? ????? ???? Druid to S2Graph ETL? ?? S2Graph? ??. - ???? ?? ?? ?? ?? ??? TopK. - ??? ?? ?? ?? ?? TopK. - ?? ??? ??? ?? ?? ??. - OLAP - Druid data? visualize??? Pivot??? UI?? ???? ??? ? ?????? interactive?? slice & dice?? ? ?? ?? ??. - ??? ???? ??? ?? ?? ????? ???? admin?? druid REST Query API? ?? ??. - 1~10 QPS, ~ 10s latency.
  • 29. - ?? ???? ???? abstract layer(Property Graph Model)? ????, ???? API(Graph Query) ? ??? ??, service?? ?? API??? ?? ? ??? ??. - ?? ??? getEdges, getVertices ??? API??? ?? ???? ????? ??. - ??? Vertex, Edge??, ??? getEdges, getVertices. - ?? ?? ?? ??, Kafka cluster, spark, HBase?? ?? ?? ??, ?? cost? ?? ?. - ???? ?? ??? scalable? architecture? ????, ? ?? ???? ??? multi tenancy ? ??? ???? ??? ?? ? ? ?? ??. - ?? ?? ?? ?? ?? ??? ??? ???? ???? ???? ? ?? ??(??)? ? ? ?? ??? ???. - A/B??? ? Data? ??? ?? ???? ??. = ?? ??? + ?? ??? ?? + ???? ?? + ?? ?? ?? ?? + ??? ?? ???? Data Pipeline? ?? ?? ?? ???.
  • 31. - Apache S2Graph(incubating) - ?? ASF incubating??, ?? ? ??? open source? ?? source? ??? ???. - ? ?? ??? v0.1.0 release vote?? pass. - ?? ??? - Apache Tinkerpop initial??. - OLAP layer? ??. - ?? Storage ??(Redis, RocksDB, Postgresql, Mysql, Cassandra, ¡­) - S2Lambda open source(Apache License V2) - ??, ???, ??. - ???? ?? ?? ???? ?? ? ?? ????¡­(?? ??)??? ???. - ??? stream ?? framework? ?? ? ? ?? ??. - ?? ??? ?? ?? ??? ??? ????, ???? ??? ?? Pipeline? ?? ?? ???? ??? ????, ?? ???? ?? ?? ???? ?? ??? ?????? ?? ??. ? ??.