際際滷

際際滷Share a Scribd company logo
??? ???? ????:

??? ????? 

????? ?? ?????
???

NUMBERWORKS ??
???
? ?
??? `????¨??
data science firm? ?????
?? ??? ?????
1. ????? ?? ?? ????

2. ?? ???? ???? ??

3. ???? ? ??? ?????.

4. ??
?????? ?
?? ?? ???.
Recommendation systems
come in
two types
Content-based
Collaborative ?
Filtering
vs
Content-based??????.
-??????????????
-????????????????
-?????????????????.
Collaborative Filtering
??? ??? ??????
? ???.
[261] ???????? ??????????? ???????????? ?????????
10
Collaborative Filtering
?? 2??? ??? ???.
Memory-based Model-based
??? ?? ??(hybrid)?? ???.
11
Memory-based?
???
??? ?????
?? ? ??? ?????.
memory-based
? ???? ??
YOU
YOU
Someone?
LIKEYOU!
recommend
?? ?? ??? Jaccard similarity
???? ??? ??? ???? ??
A? ??? ??
B? ??? ??
but ?
size
matters
[261] ???????? ??????????? ???????????? ?????????
TOO MANY COMBINATIONS
?? ??? ???
Hadoop? ??? ????
img from : http://thepage.time.com/2009/04/18/why-is-this-elephant-crying/
?? Hadoop???
?? ? ?? ???
?? ??? ??? ??
pre-clustering? ?????
??? naive ?
clustering ????
O(N^2)
???? ??
????? ????
????? ???? ???
? ??? ?????
??? ??!
pre-clustering? ???
??? ??? ??? ????
[261] ???????? ??????????? ???????????? ?????????
Use MinHASH as LSH(min-wiseindependentpermutationslocalitysensitivehashing)
? ??? ?? ?????
?? ? ?????
????.
Locality Sensitive Hashing
?? ?????? hash? ?
???? ?? ?? ??? ?????
?? ?? ??? ??? ?
??? ???? ???? ????
clustering?? ???? ??.
Make hash functions
Hash function #1 Hash function #2
HOW DOES LSH WORK?
(localitysensitivehashing)
Hash
Func #1
Hash
Func #2
?????? ?? O(n2) ???
hash snapshot?? O(n) ??
?????
TIME
SIZE
2) ??? ??? O(n2)
??? ???? ??
n?? ????.
1) ??? ???
??? ??? ??
????? ???
?? hash function?
??? ??? ??? ?? ????? ?? hash??.
??? ??? ??? ???
??? ????, ????? ??? ??
??? hash function? ????.
- cosine similarity
- hamming distance
- euclidean distance
- jaccard similarity
??? ??? ??.
MinHash?
Jaccard similarity?
???? ???
LSH
?? ?? ????!
A B C
r1 1 0 1
r2 0 1 1
r3 0 0 1
r4 1 1 0
r5 1 0 0
r6 0 1 1
A B C
h1 2 4 1
h2 2 1 1
h3 1 2 3
r3
r5
r1
r2
r4
r6
r6
r1
r3
r2
r4
r5
hash1 hash2
random permutation
r5
r4
r6
r2
r1
r3
hash3
random?? permutation? ??? ???? ? 1? index? ???.
jaccard(B,C) = 2/5 = 0.4
jaccard(A,C) = 1/6 = 0.16
sim(B,C) = 1/3 = 0.3
sim(A,C) = 0/3 = 0
jaccard = intersect/union sim = intersect/length
???? random permutation? ??? ??
???? ?? ???? ???
universal hash? ?? random generation
?? random permutation? ???(proxy)?
???? ??.
?? ?? ??? ????
??? ??? ????
minhash? ?? ? ????.
?? ??? dimension reduction
A B C
r1 1 0 1
r2 0 1 1
r3 0 0 1
r4 1 1 0
r5 1 0 0
r6 0 1 1
A B C
h1 2 2 1
h2 2 1 1
h3 1 2 3
6??
3??
Q P R S
s1 1 2 2 1
s2 3 1 3 1
s3 1 2 1 2
s4 4 1 1 4
hash?? ?? ?? signature? ??
???? ??? concate??? cluster id ???.
`s1-s2¨, `s2-s3¨,
`s3-s4¨, `s4-s1¨
Q
13
Q, R
31
Q
14
Q,S
41
P
21
P, R
12
P,S
12
R
23
R,S
11
S
24
??? typical?
Batch Implementation?
??? ????.
(item to item ???? ??)
minhash? ???? ???
for ?? in ?? ???:
? ??? ?? click stream ????
??? ??? minhash signature ????
signature?? concate?? ??? cluster id
? ??? ?? cluster id?? ????
for cluster in ???? ?????:
if length(????) > threshold:
????? ??? ?? ??
???? ??? ????
?? item? ???
? ?? item? minhash?? ???
minhash?? concate?? ???? cluster ? ? ??
for cluster in ??? ??? ?????:
????? ??? ?? ?? item? ??
? item?? click stream? ??
click stream??? ??? ?? pair ??? ??
? ????? ?? ??? ?? ??? top N?? ??
It is typical
implementation.
but not attractive :(
? ???
??? ??!
?? ??? 2?? ??? ??.
??
??
Heavy I/O ? ?? : ???? I/O
??? ?? ??? ??
??!
minhash? ?????
for ?? in ?? ???:
? ??? ?? click stream ????
??? ??? minhash signature ????
signature?? concate?? ??? cluster id
? ??? ?? cluster id?? ????
for cluster in ???? ?????:
if length(????) > threshold:
????? ??? ?? ??
???? ??? ????
?? item? ???
? ?? item? minhash?? ???
minhash?? concate?? cluster id ?????
?? ?? ??? ??? cluster?? ? ??
for cluster in ??? ??? ?????:
????? ??? ?? item? ??
? item?? click stream? ??
click stream??? ??? ?? pair ??? ??
? ????? ?? ??? ?? ??? top N?? ??
?? ??? ???? ??
- speed gain
??? ????.
??? ????.
- quality loss
?? :
?? :
????? ??
??? ??
???? ?? ???
??? ????
??? ???? ? ????
? I/O ??? ?? ????
??? ??? ??? ??? ??????.
???? ???? ?? ?? : user? view?
??? ??
?? ??
???? ???? ?? ?? : item? view?
??? ???? item
?? item
??? ? ?? item
??? ???
????
?????
?? ???? ??? ??? ??
= ?????? ?? ?? (??????)
= ???? ? click stream
= ????? ?? ???
= ????? ?? ??? ?? ????.
= ?? page out? ????.
=????? ????.
?? ???
?? ???
?? ???? ??
?? ???
?? ???
? ?? ???? ???
???? ??
??? ???
???? ?? ?? ??
?? ? ??.
?? ??
= ?? ??? ?? ??, ?? ???
= use good dimension reducer
= ????? ??? ?? ???? ???.
= ? ??? ?? ?? ????
= minhash
?? ???
S1 S2 ´ Sn
S1 S2 ´ Sn
?? ???
hash function? n?? signature??
??? ?? ?
?? ??
S1 S2 S3 S4
signature? ?? ??? sig? ??? ? jaccard
S5
S1 S2 S3 S4 S5
??? jaccard
coefficient
2/5
= 0.4
??? ??? ???, ? ????
?? jaccard?? ??? ????
? ?? hash func? ??
signature? ??? ?? ???
??? ????!
??? ???? ????
???????
100?? signature?
root mean square error
??? ?? RMSE 0.03 ??
??? ????!
???? ????
?? ??? ??? ????
?? ?? ??? ??
signature? ?? ??
??? ????
?? clustering ??
?? ??? ?
click stream? ???
signature? ?? ??
??? ???
?? ???
click stream??
signature overlap? ???
? ??? ??
??? ??? in-memory? fit??
????? ??? ?? ??? ??
+??? ??? in-memory? fit??
+??? ??? ???? ?? ??
???? ?????? ??????
??? ??? ?? ?? ???.
???? ?? disk io?? ??
item??? 4byte?? ???? 100?
100 x 4 + 200 = 600 byte
?? ?? ?? 200 byte ? ?? ??
??? 1G? ??? 1024^3 / 800 = 1,789,569
1??? 180? ?? item? click stream?
???? ?? ?? ? ??! (?? ?? ?? ??)
+??? ??? ??? ? ?? ?? Job
^?? ? ??? ????? ̄
+?? ?? ?? ??? ??????? ??.
^??? ??? ???
?? ???? ?? ?? ???
?? ???? ̄, ^??? ???? ̄
??? click? ????
??? ??? ?? signature???
? ?? ?? ?? ???
No!
minhash? `min ??¨? `chain¨??.
??? ?? ??? ??? ??.
min(A,B,C) =
min(min(A,B),C)
Associative property ????
? ???? ???? ?? ???? ??
Idempotence ????
???? ???? ?? ???? ??.
???? ???? ??? ???
= ? ??? ?? ???? ??? ?
?? ??? ?????, ???? ??? ???!
click? ?????? ?? ????!
? ??? ???? high TPS? ??
save minhash(110)[1 DB write] =
load minhash(100)[1 DB read] x
compute minhash(101~110)[input buffer]
old minhash new new new
local minhash
memorystorage
updated minhash
??? ????
??? ?????
?????.
micro batch
?? ???
????? ???
?? ??.
(????)
?~ ??? ?
?? ?? ???? ? ????.
??? ??? ??
??? ??? ????
S1 S2 ´ Sn
??? ? ???.
S1 S2 ´ Sn
S1 S2 ´ Sn
S1 S2 ´ Sn
S1 S2 ´ Sn
??? n^2? ???
item A
item B
item C
item D
item E
?? ?? ?? ??
Secondary Index? ???.
item A
item B
item C
923 1032 58 74
87 1032 123 80
923 872 58 80
Sig1 Sig2 Sig3 Sig4
??? signature? ?? ? ? Secondary-Index KV Storage
Sig??? ? pair? Key? ??
?? ??? item???? Value
Key Value
Sig1-923 A-C
Sig2-872 C
sig3-58 A-C
sig4-80 B-C
A : 2?/4sigs
B : 1?/4sigs
A : 0.5
B : 0.25
Value? ??? item?? ?????
? sig? ??? ???? = jaccard? ?? =
Secondary Index lookup???
Jaccard? ??? ? ??.
???~!
requirement:
minhash??? ??????
2nd idx? ?? ?? ?? ????
= ??? ????? ???? ???!
Large Secondary-Index?
?? ????? ?????
Key Value
Sig1-923 A-C
Sig2-872 C
sig3-58 A-C
sig4-80 B-C
??? membership
??? ???? ??.
REDIS? ?? ?????? ? ?? ???
Redis Data structure candidates
Strings
Sets - support add, remove, union, intersection
- plain K/V
?????!
VS
?? ??? ??? string?? ??? ???? ? ???.
sig45 Tom Jerry Robert Jack
string?? ??? ??
^[Tom, Jerry, Robert, Jack] ̄
??? ? ?? ???
json.dumps(data)
json.loads(data_str)
[write] [read]
^[Tom, Jerry, Robert, Jack] ̄
??? ??? String??. ??!
Order! O(1) vs O(N)
In [24]: %time gs.load_benchmark('user','key')
CPU times: user 0.32 s, sys: 0.03 s, total: 0.34 s
Wall time: 0.42 s
In [25]: %time gs.load_benchmark('user','set')
CPU times: user 32.34 s, sys: 0.13 s, total: 32.47 s
Wall time: 33.88 s
1)
? ????? String? Set?? ?? ??.
string set
`redis string¨ ? mget ??? ????.
(multiple get at once)
N call round trip -> 1 call
In [9]: %timeit [redis.get(s) for s in sigs]
100 loops, best of 3: 9.99 ms per loop
In [10]: %timeit redis.mget(sigs)
1000 loops, best of 3: 759 us per loop
2)
3)
string ? ??? ?????? ????.
= ???? ? ?? ?? ?? ? ?? ???
compress speed(μs)
0.0
35.0
70.0
105.0
140.0
snappy zlib
134.0
17.3
size(%)
0%
25%
50%
75%
100%
snappy zlib
40%
70%
snappy??
????? ??
????? ??
redis ? pipe??? ??? transaction ? ???.
??? set? ???? ?? ??? ????
4)
item A 923 1032 58 74
item B 87 1032 123 80
item C 923 973 58 80
sig2-1032
sig2-973
A-B
C
Secondary-Index
973
sig2-1032
sig2-973
B
A-C
sig1 sig2 sig3 sig4 key:
pos-value
value:
member
case) sig2-1032? A? ???? sig2-973? ?? - atomic??
minhash signature ????
?? ? ?? ???
2nd index key space? ??????
?????? hash max val * sig cnt? key? ??
hash max? 433,494,437? Fibonacci primes? ??
100?? signature? ??? ??
????? 433,494,437,00 ?? key?? ??. ???.
minhash ? min? chain?? ???, ?
??? ??? ?? ??? ??
100
15,082,312
4334944370
sigcnt
43,349,443,700
0?? member? ?? key? ??? ??
example case)?
3,727,344 products, ?
100 sig =>
15082312 keys
???? ?, KV??
??? 2G ?? ??
REDIS? ????
transaction? ?????
?? update? ??
Secondary Index?
??????.
??
minhash? min?? ?????.
item Z? user click? ???? ? ???
Z 183 1032 942 80??? signature ??
click ? minhash?? 87 2043 123 300click
??? ?? ?? ?? ???
signature ? 2nd index??
new 87 1032 123 80
click ??? ??? ?? ??
?? item Z? ?? ?? ??? ????
Secondary Index ? ?? ? ?? flow
item Z 87 1032 123 80
??? signature ?? Sig1 - 87
Sig2 - 1032
Sig3 - 123
Sig4 - 80
A-B-D-F-Z
C-G-Z
A-B-D-Z
B-D-F-Zredis.mget?? ??? ???
??? ?????, ??? A:2/4 =0.5 B:3/4 =0.75
C:1/4 =0.25 D:3/4 =0.75
???? ????
??????
minhash?1G
2ndindex?2G
??? Mem Size
??1G
??? CPU Power
REDIS1core
minhash??1core
????1core
???? ?????.
?? ??? ?????
?? ??? Memory based ???
??? ?? ???? ?
(ALS, NMF, Markov Chain??)
??? ??????? ? ??? ???.
?? ??? ????
No!
99.9% ???? ?? ? ??? ??? ?
1169KB 51KB
logo.bmp logo.jpg
1/
?22
?? ??? ??, ???? ??? ?? ?
???
??? ??
Amortized
?? ?? ??? ?
??? ????
?? ?? ???
?? ????? ?????
??? ?? ??
?? ??????.
??? ????? ????
?? ??? ?? ????
??? ??? ????
??? ??

More Related Content

What's hot (20)

Little Big Data #1. ???? ???? ??? ???
Little Big Data #1. ???? ???? ??? ???Little Big Data #1. ???? ???? ??? ???
Little Big Data #1. ???? ???? ??? ???
Seongyun Byeon
?
??? ?? ??? (??)
??? ?? ??? (??)??? ?? ??? (??)
??? ?? ??? (??)
Heungsub Lee
?
???? ??? ??? ?? ??? ??.
???? ??? ??? ?? ??? ??.???? ??? ??? ?? ??? ??.
???? ??? ??? ?? ??? ??.
Yongho Ha
?
???? ??? ?? ??? - ??????
???? ??? ?? ??? - ?????????? ??? ?? ??? - ??????
???? ??? ?? ??? - ??????
?? ?
?
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ??
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ??[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ??
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ??
Hyojun Jeon
?
???? ???? A/B???
???? ???? A/B??????? ???? A/B???
???? ???? A/B???
JeongMin Kwon
?
??? ???? ???, ???? ?? ????
??? ???? ???, ???? ?? ??????? ???? ???, ???? ?? ????
??? ???? ???, ???? ?? ????
Seongyun Byeon
?
及45指elasticsearch茶氏 BERTモデルを旋喘した猟蛍
及45指elasticsearch茶氏 BERTモデルを旋喘した猟蛍及45指elasticsearch茶氏 BERTモデルを旋喘した猟蛍
及45指elasticsearch茶氏 BERTモデルを旋喘した猟蛍
shinhiguchi
?
Data pipeline and data lake
Data pipeline and data lakeData pipeline and data lake
Data pipeline and data lake
DaeMyung Kang
?
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ?? (2?)
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ?? (2?)[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ?? (2?)
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ?? (2?)
Hyojun Jeon
?
??? ?? ??? ??????? ? ? ????
??? ?? ??? ??????? ? ? ??????? ?? ??? ??????? ? ? ????
??? ?? ??? ??????? ? ? ????
Yongho Ha
?
鮫I尖粥鴛を喘いた呟械返
鮫I尖粥鴛を喘いた呟械返鮫I尖粥鴛を喘いた呟械返
鮫I尖粥鴛を喘いた呟械返
Hideo Terada
?
? 17? ???(BOAZ) ???? ???? - [?????] : ??? ???? ??? Elasticsearch ???? ???
? 17? ???(BOAZ) ???? ???? - [?????] : ??? ???? ??? Elasticsearch ???? ???? 17? ???(BOAZ) ???? ???? - [?????] : ??? ???? ??? Elasticsearch ???? ???
? 17? ???(BOAZ) ???? ???? - [?????] : ??? ???? ??? Elasticsearch ???? ???
BOAZ Bigdata
?
??????? 1???, ????????? ???????
??????? 1???, ????????? ?????????????? 1???, ????????? ???????
??????? 1???, ????????? ???????
Brian Hong
?
[NDC ??] ??? ??????? ? ?? ??
[NDC ??] ??? ??????? ? ?? ??[NDC ??] ??? ??????? ? ?? ??
[NDC ??] ??? ??????? ? ?? ??
Tapjoy X 5Rocks
?
???????? ????? ?????? ??????????????? ????
???????? ????? ?????? ??????????????? ???????????? ????? ?????? ??????????????? ????
???????? ????? ?????? ??????????????? ????
Jaimie Kwon (???)
?
? ?? ??? ??? ??? ?? ?????. (Deep Learning for Natural Language Processing)
? ?? ??? ??? ??? ?? ?????. (Deep Learning for Natural Language Processing)? ?? ??? ??? ??? ?? ?????. (Deep Learning for Natural Language Processing)
? ?? ??? ??? ??? ?? ?????. (Deep Learning for Natural Language Processing)
WON JOON YOO
?
?? ??? ? ???? ???? ?? ?? (????? KUCC, 2022? 4?)
?? ??? ? ???? ???? ?? ?? (????? KUCC, 2022? 4?)?? ??? ? ???? ???? ?? ?? (????? KUCC, 2022? 4?)
?? ??? ? ???? ???? ?? ?? (????? KUCC, 2022? 4?)
Suhyun Park
?
グラフデ`タベ`スは泌採に徭隼冱囂を尖盾するか?
グラフデ`タベ`スは泌採に徭隼冱囂を尖盾するか?グラフデ`タベ`スは泌採に徭隼冱囂を尖盾するか?
グラフデ`タベ`スは泌採に徭隼冱囂を尖盾するか?
Insight Technology, Inc.
?
???????? ?? ?? ?? ??? ???? - PyCon Korea 2018
???????? ?? ?? ?? ??? ???? - PyCon Korea 2018???????? ?? ?? ?? ??? ???? - PyCon Korea 2018
???????? ?? ?? ?? ??? ???? - PyCon Korea 2018
?? ?
?
Little Big Data #1. ???? ???? ??? ???
Little Big Data #1. ???? ???? ??? ???Little Big Data #1. ???? ???? ??? ???
Little Big Data #1. ???? ???? ??? ???
Seongyun Byeon
?
???? ??? ??? ?? ??? ??.
???? ??? ??? ?? ??? ??.???? ??? ??? ?? ??? ??.
???? ??? ??? ?? ??? ??.
Yongho Ha
?
???? ??? ?? ??? - ??????
???? ??? ?? ??? - ?????????? ??? ?? ??? - ??????
???? ??? ?? ??? - ??????
?? ?
?
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ??
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ??[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ??
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ??
Hyojun Jeon
?
??? ???? ???, ???? ?? ????
??? ???? ???, ???? ?? ??????? ???? ???, ???? ?? ????
??? ???? ???, ???? ?? ????
Seongyun Byeon
?
及45指elasticsearch茶氏 BERTモデルを旋喘した猟蛍
及45指elasticsearch茶氏 BERTモデルを旋喘した猟蛍及45指elasticsearch茶氏 BERTモデルを旋喘した猟蛍
及45指elasticsearch茶氏 BERTモデルを旋喘した猟蛍
shinhiguchi
?
Data pipeline and data lake
Data pipeline and data lakeData pipeline and data lake
Data pipeline and data lake
DaeMyung Kang
?
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ?? (2?)
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ?? (2?)[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ?? (2?)
[NDC18] ??? ? ???? ??? ????? ???: ?? ??? ?? ?? ?? (2?)
Hyojun Jeon
?
??? ?? ??? ??????? ? ? ????
??? ?? ??? ??????? ? ? ??????? ?? ??? ??????? ? ? ????
??? ?? ??? ??????? ? ? ????
Yongho Ha
?
鮫I尖粥鴛を喘いた呟械返
鮫I尖粥鴛を喘いた呟械返鮫I尖粥鴛を喘いた呟械返
鮫I尖粥鴛を喘いた呟械返
Hideo Terada
?
? 17? ???(BOAZ) ???? ???? - [?????] : ??? ???? ??? Elasticsearch ???? ???
? 17? ???(BOAZ) ???? ???? - [?????] : ??? ???? ??? Elasticsearch ???? ???? 17? ???(BOAZ) ???? ???? - [?????] : ??? ???? ??? Elasticsearch ???? ???
? 17? ???(BOAZ) ???? ???? - [?????] : ??? ???? ??? Elasticsearch ???? ???
BOAZ Bigdata
?
??????? 1???, ????????? ???????
??????? 1???, ????????? ?????????????? 1???, ????????? ???????
??????? 1???, ????????? ???????
Brian Hong
?
[NDC ??] ??? ??????? ? ?? ??
[NDC ??] ??? ??????? ? ?? ??[NDC ??] ??? ??????? ? ?? ??
[NDC ??] ??? ??????? ? ?? ??
Tapjoy X 5Rocks
?
???????? ????? ?????? ??????????????? ????
???????? ????? ?????? ??????????????? ???????????? ????? ?????? ??????????????? ????
???????? ????? ?????? ??????????????? ????
Jaimie Kwon (???)
?
? ?? ??? ??? ??? ?? ?????. (Deep Learning for Natural Language Processing)
? ?? ??? ??? ??? ?? ?????. (Deep Learning for Natural Language Processing)? ?? ??? ??? ??? ?? ?????. (Deep Learning for Natural Language Processing)
? ?? ??? ??? ??? ?? ?????. (Deep Learning for Natural Language Processing)
WON JOON YOO
?
?? ??? ? ???? ???? ?? ?? (????? KUCC, 2022? 4?)
?? ??? ? ???? ???? ?? ?? (????? KUCC, 2022? 4?)?? ??? ? ???? ???? ?? ?? (????? KUCC, 2022? 4?)
?? ??? ? ???? ???? ?? ?? (????? KUCC, 2022? 4?)
Suhyun Park
?
グラフデ`タベ`スは泌採に徭隼冱囂を尖盾するか?
グラフデ`タベ`スは泌採に徭隼冱囂を尖盾するか?グラフデ`タベ`スは泌採に徭隼冱囂を尖盾するか?
グラフデ`タベ`スは泌採に徭隼冱囂を尖盾するか?
Insight Technology, Inc.
?
???????? ?? ?? ?? ??? ???? - PyCon Korea 2018
???????? ?? ?? ?? ??? ???? - PyCon Korea 2018???????? ?? ?? ?? ??? ???? - PyCon Korea 2018
???????? ?? ?? ?? ??? ???? - PyCon Korea 2018
?? ?
?

Similar to [261] ???????? ??????????? ???????????? ????????? (20)

Webservice cache strategy
Webservice cache strategyWebservice cache strategy
Webservice cache strategy
DaeMyung Kang
?
[2B2]???? ??????????? ?????????? ????????? ??????? ?????
[2B2]???? ??????????? ?????????? ????????? ??????? ?????[2B2]???? ??????????? ?????????? ????????? ??????? ?????
[2B2]???? ??????????? ?????????? ????????? ??????? ?????
NAVER D2
?
Vectorized processing in_a_nutshell_DeView2014
Vectorized processing in_a_nutshell_DeView2014Vectorized processing in_a_nutshell_DeView2014
Vectorized processing in_a_nutshell_DeView2014
Gruter
?
?????? ? 5: ^??? ???? ????? ??? ̄ (?????? ??)
?????? ? 5:  ^??? ???? ????? ??? ̄ (?????? ??)?????? ? 5:  ^??? ???? ????? ??? ̄ (?????? ??)
?????? ? 5: ^??? ???? ????? ??? ̄ (?????? ??)
Jaimie Kwon (???)
?
[E-commerce & Retail Day] ??????? ????
[E-commerce & Retail Day] ??????? ????[E-commerce & Retail Day] ??????? ????
[E-commerce & Retail Day] ??????? ????
Amazon Web Services Korea
?
??? ??? ????2 ??? ??? ????? ??? ???
??? ??? ????2 ??? ??? ????? ??? ?????? ??? ????2 ??? ??? ????? ??? ???
??? ??? ????2 ??? ??? ????? ??? ???
Wonha Ryu
?
Kafka streams 20201012
Kafka streams 20201012Kafka streams 20201012
Kafka streams 20201012
? ??
?
?????? 1??? 2??? ??????? ????? ????? ????.pptx
?????? 1??? 2??? ??????? ????? ????? ????.pptx?????? 1??? 2??? ??????? ????? ????? ????.pptx
?????? 1??? 2??? ??????? ????? ????? ????.pptx
YeongKiKim1
?
???? ?? ??? ?? ??? ??? #3
???? ?? ??? ?? ??? ??? #3???? ?? ??? ?? ??? ??? #3
???? ?? ??? ?? ??? ??? #3
Amazon Web Services Korea
?
DataWorks Summit 2017
DataWorks Summit 2017DataWorks Summit 2017
DataWorks Summit 2017
Daesung Park
?
Rhea_MMO_SNG_Convergence_Server_Architecture
Rhea_MMO_SNG_Convergence_Server_ArchitectureRhea_MMO_SNG_Convergence_Server_Architecture
Rhea_MMO_SNG_Convergence_Server_Architecture
Rhea Strike
?
???? ???? ???? ?? ???
???? ???? ???? ?? ??????? ???? ???? ?? ???
???? ???? ???? ?? ???
YoungSu Son
?
NDC 2017 ??? NEXON ZERO (?? ??) ???? ????? ?? ?? ? ?? ?? ????
NDC 2017 ??? NEXON ZERO (?? ??) ???? ????? ?? ?? ? ?? ?? ????NDC 2017 ??? NEXON ZERO (?? ??) ???? ????? ?? ?? ? ?? ?? ????
NDC 2017 ??? NEXON ZERO (?? ??) ???? ????? ?? ?? ? ?? ?? ????
Jaeseung Ha
?
????? - ??? ???? ??? (Colab? ????)
????? - ??? ???? ??? (Colab? ????)????? - ??? ???? ??? (Colab? ????)
????? - ??? ???? ??? (Colab? ????)
ansuhyun927
?
Cassandra ??? | Devon 2012
Cassandra ??? | Devon 2012Cassandra ??? | Devon 2012
Cassandra ??? | Devon 2012
Daum DNA
?
?? ??? ?? ??? ?? ??? ?? ???? - ??? ???? ????:: AWS Cloud Track 3 Gaming
?? ??? ?? ??? ?? ??? ?? ???? - ??? ???? ????:: AWS Cloud Track 3 Gaming?? ??? ?? ??? ?? ??? ?? ???? - ??? ???? ????:: AWS Cloud Track 3 Gaming
?? ??? ?? ??? ?? ??? ?? ???? - ??? ???? ????:: AWS Cloud Track 3 Gaming
Amazon Web Services Korea
?
AWS?? ???? ?????? ???? - ??? (SK C&C) :: AWS Community Day 2020
AWS?? ???? ?????? ???? - ??? (SK C&C) :: AWS Community Day 2020 AWS?? ???? ?????? ???? - ??? (SK C&C) :: AWS Community Day 2020
AWS?? ???? ?????? ???? - ??? (SK C&C) :: AWS Community Day 2020
AWSKRUG - AWS???????
?
AWS?? ???? ?????? ???? - ??? (SK C&C) :: AWS Community Day 2020
AWS?? ???? ?????? ???? - ??? (SK C&C) :: AWS Community Day 2020AWS?? ???? ?????? ???? - ??? (SK C&C) :: AWS Community Day 2020
AWS?? ???? ?????? ???? - ??? (SK C&C) :: AWS Community Day 2020
Jinwoong Kim
?
NDC 2016, [???] ???? ??? ?? ??? ??????
NDC 2016, [???] ???? ??? ?? ??? ??????NDC 2016, [???] ???? ??? ?? ??? ??????
NDC 2016, [???] ???? ??? ?? ??? ??????
Wonha Ryu
?
Webservice cache strategy
Webservice cache strategyWebservice cache strategy
Webservice cache strategy
DaeMyung Kang
?
[2B2]???? ??????????? ?????????? ????????? ??????? ?????
[2B2]???? ??????????? ?????????? ????????? ??????? ?????[2B2]???? ??????????? ?????????? ????????? ??????? ?????
[2B2]???? ??????????? ?????????? ????????? ??????? ?????
NAVER D2
?
Vectorized processing in_a_nutshell_DeView2014
Vectorized processing in_a_nutshell_DeView2014Vectorized processing in_a_nutshell_DeView2014
Vectorized processing in_a_nutshell_DeView2014
Gruter
?
?????? ? 5: ^??? ???? ????? ??? ̄ (?????? ??)
?????? ? 5:  ^??? ???? ????? ??? ̄ (?????? ??)?????? ? 5:  ^??? ???? ????? ??? ̄ (?????? ??)
?????? ? 5: ^??? ???? ????? ??? ̄ (?????? ??)
Jaimie Kwon (???)
?
??? ??? ????2 ??? ??? ????? ??? ???
??? ??? ????2 ??? ??? ????? ??? ?????? ??? ????2 ??? ??? ????? ??? ???
??? ??? ????2 ??? ??? ????? ??? ???
Wonha Ryu
?
Kafka streams 20201012
Kafka streams 20201012Kafka streams 20201012
Kafka streams 20201012
? ??
?
?????? 1??? 2??? ??????? ????? ????? ????.pptx
?????? 1??? 2??? ??????? ????? ????? ????.pptx?????? 1??? 2??? ??????? ????? ????? ????.pptx
?????? 1??? 2??? ??????? ????? ????? ????.pptx
YeongKiKim1
?
DataWorks Summit 2017
DataWorks Summit 2017DataWorks Summit 2017
DataWorks Summit 2017
Daesung Park
?
Rhea_MMO_SNG_Convergence_Server_Architecture
Rhea_MMO_SNG_Convergence_Server_ArchitectureRhea_MMO_SNG_Convergence_Server_Architecture
Rhea_MMO_SNG_Convergence_Server_Architecture
Rhea Strike
?
???? ???? ???? ?? ???
???? ???? ???? ?? ??????? ???? ???? ?? ???
???? ???? ???? ?? ???
YoungSu Son
?
NDC 2017 ??? NEXON ZERO (?? ??) ???? ????? ?? ?? ? ?? ?? ????
NDC 2017 ??? NEXON ZERO (?? ??) ???? ????? ?? ?? ? ?? ?? ????NDC 2017 ??? NEXON ZERO (?? ??) ???? ????? ?? ?? ? ?? ?? ????
NDC 2017 ??? NEXON ZERO (?? ??) ???? ????? ?? ?? ? ?? ?? ????
Jaeseung Ha
?
????? - ??? ???? ??? (Colab? ????)
????? - ??? ???? ??? (Colab? ????)????? - ??? ???? ??? (Colab? ????)
????? - ??? ???? ??? (Colab? ????)
ansuhyun927
?
Cassandra ??? | Devon 2012
Cassandra ??? | Devon 2012Cassandra ??? | Devon 2012
Cassandra ??? | Devon 2012
Daum DNA
?
?? ??? ?? ??? ?? ??? ?? ???? - ??? ???? ????:: AWS Cloud Track 3 Gaming
?? ??? ?? ??? ?? ??? ?? ???? - ??? ???? ????:: AWS Cloud Track 3 Gaming?? ??? ?? ??? ?? ??? ?? ???? - ??? ???? ????:: AWS Cloud Track 3 Gaming
?? ??? ?? ??? ?? ??? ?? ???? - ??? ???? ????:: AWS Cloud Track 3 Gaming
Amazon Web Services Korea
?
AWS?? ???? ?????? ???? - ??? (SK C&C) :: AWS Community Day 2020
AWS?? ???? ?????? ???? - ??? (SK C&C) :: AWS Community Day 2020 AWS?? ???? ?????? ???? - ??? (SK C&C) :: AWS Community Day 2020
AWS?? ???? ?????? ???? - ??? (SK C&C) :: AWS Community Day 2020
AWSKRUG - AWS???????
?
AWS?? ???? ?????? ???? - ??? (SK C&C) :: AWS Community Day 2020
AWS?? ???? ?????? ???? - ??? (SK C&C) :: AWS Community Day 2020AWS?? ???? ?????? ???? - ??? (SK C&C) :: AWS Community Day 2020
AWS?? ???? ?????? ???? - ??? (SK C&C) :: AWS Community Day 2020
Jinwoong Kim
?
NDC 2016, [???] ???? ??? ?? ??? ??????
NDC 2016, [???] ???? ??? ?? ??? ??????NDC 2016, [???] ???? ??? ?? ??? ??????
NDC 2016, [???] ???? ??? ?? ??? ??????
Wonha Ryu
?

More from NAVER D2 (20)

[211] ????? ???? ??? ???
[211] ????? ???? ??? ???[211] ????? ???? ??? ???
[211] ????? ???? ??? ???
NAVER D2
?
[233] ?? ???? ??????? ???? Network Load Balancing: Maglev Hashing Scheduler i...
[233] ?? ???? ??????? ???? Network Load Balancing: Maglev Hashing Scheduler i...[233] ?? ???? ??????? ???? Network Load Balancing: Maglev Hashing Scheduler i...
[233] ?? ???? ??????? ???? Network Load Balancing: Maglev Hashing Scheduler i...
NAVER D2
?
[215] Druid? ?? ??? ??? ????
[215] Druid? ?? ??? ??? ????[215] Druid? ?? ??? ??? ????
[215] Druid? ?? ??? ??? ????
NAVER D2
?
[245]Papago Internals: ????? ???? ??
[245]Papago Internals: ????? ???? ??[245]Papago Internals: ????? ???? ??
[245]Papago Internals: ????? ???? ??
NAVER D2
?
[236] ??? ??? ??? ???: ??? ??????? ?? ??
[236] ??? ??? ??? ???: ??? ??????? ?? ??[236] ??? ??? ??? ???: ??? ??????? ?? ??
[236] ??? ??? ??? ???: ??? ??????? ?? ??
NAVER D2
?
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
NAVER D2
?
[244]??? ?? ??? ?? ????? ???
[244]??? ?? ??? ?? ????? ???[244]??? ?? ??? ?? ????? ???
[244]??? ?? ??? ?? ????? ???
NAVER D2
?
[243] Deep Learning to help student¨s Deep Learning
[243] Deep Learning to help student¨s Deep Learning[243] Deep Learning to help student¨s Deep Learning
[243] Deep Learning to help student¨s Deep Learning
NAVER D2
?
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
NAVER D2
?
Old version: [233]?? ???? ??????? ???? Network Load Balancing
Old version: [233]?? ???? ??????? ???? Network Load BalancingOld version: [233]?? ???? ??????? ???? Network Load Balancing
Old version: [233]?? ???? ??????? ???? Network Load Balancing
NAVER D2
?
[226]NAVER ?? deep click prediction: ????? ????
[226]NAVER ?? deep click prediction: ????? ????[226]NAVER ?? deep click prediction: ????? ????
[226]NAVER ?? deep click prediction: ????? ????
NAVER D2
?
[225]NSML: ???? ??? ????? & ?? ?? ?????
[225]NSML: ???? ??? ????? & ?? ?? ?????[225]NSML: ???? ??? ????? & ?? ?? ?????
[225]NSML: ???? ??? ????? & ?? ?? ?????
NAVER D2
?
[224]??? ??? ???
[224]??? ??? ???[224]??? ??? ???
[224]??? ??? ???
NAVER D2
?
[216]Search Reliability Engineering (??: ???? ???? ?? ??? ?????)
[216]Search Reliability Engineering (??: ???? ???? ?? ??? ?????)[216]Search Reliability Engineering (??: ???? ???? ?? ??? ?????)
[216]Search Reliability Engineering (??: ???? ???? ?? ??? ?????)
NAVER D2
?
[214] Ai Serving Platform: ?? ? ? ?? ????? ???? ?? ?????
[214] Ai Serving Platform: ?? ? ? ?? ????? ???? ?? ?????[214] Ai Serving Platform: ?? ? ? ?? ????? ???? ?? ?????
[214] Ai Serving Platform: ?? ? ? ?? ????? ???? ?? ?????
NAVER D2
?
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual Search
NAVER D2
?
[232] TensorRT? ??? ??? Inference ???
[232] TensorRT? ??? ??? Inference ???[232] TensorRT? ??? ??? Inference ???
[232] TensorRT? ??? ??? Inference ???
NAVER D2
?
[242]??? ??? ??? ?? ?? ?? ???? ??: ???? ?? POI ?? ??[242]??? ??? ??? ?? ?? ?? ???? ??: ???? ?? POI ?? ??
[242]??? ??? ??? ?? ?? ?? ???? ??: ???? ?? POI ?? ??
NAVER D2
?
[212]C3, ??? ???? ???? ??? ?? ????[212]C3, ??? ???? ???? ??? ?? ????
[212]C3, ??? ???? ???? ??? ?? ????
NAVER D2
?
[223]???? QA: ????, NLP???
[223]???? QA: ????, NLP???[223]???? QA: ????, NLP???
[223]???? QA: ????, NLP???
NAVER D2
?
[211] ????? ???? ??? ???
[211] ????? ???? ??? ???[211] ????? ???? ??? ???
[211] ????? ???? ??? ???
NAVER D2
?
[233] ?? ???? ??????? ???? Network Load Balancing: Maglev Hashing Scheduler i...
[233] ?? ???? ??????? ???? Network Load Balancing: Maglev Hashing Scheduler i...[233] ?? ???? ??????? ???? Network Load Balancing: Maglev Hashing Scheduler i...
[233] ?? ???? ??????? ???? Network Load Balancing: Maglev Hashing Scheduler i...
NAVER D2
?
[215] Druid? ?? ??? ??? ????
[215] Druid? ?? ??? ??? ????[215] Druid? ?? ??? ??? ????
[215] Druid? ?? ??? ??? ????
NAVER D2
?
[245]Papago Internals: ????? ???? ??
[245]Papago Internals: ????? ???? ??[245]Papago Internals: ????? ???? ??
[245]Papago Internals: ????? ???? ??
NAVER D2
?
[236] ??? ??? ??? ???: ??? ??????? ?? ??
[236] ??? ??? ??? ???: ??? ??????? ?? ??[236] ??? ??? ??? ???: ??? ??????? ?? ??
[236] ??? ??? ??? ???: ??? ??????? ?? ??
NAVER D2
?
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
NAVER D2
?
[244]??? ?? ??? ?? ????? ???
[244]??? ?? ??? ?? ????? ???[244]??? ?? ??? ?? ????? ???
[244]??? ?? ??? ?? ????? ???
NAVER D2
?
[243] Deep Learning to help student¨s Deep Learning
[243] Deep Learning to help student¨s Deep Learning[243] Deep Learning to help student¨s Deep Learning
[243] Deep Learning to help student¨s Deep Learning
NAVER D2
?
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
NAVER D2
?
Old version: [233]?? ???? ??????? ???? Network Load Balancing
Old version: [233]?? ???? ??????? ???? Network Load BalancingOld version: [233]?? ???? ??????? ???? Network Load Balancing
Old version: [233]?? ???? ??????? ???? Network Load Balancing
NAVER D2
?
[226]NAVER ?? deep click prediction: ????? ????
[226]NAVER ?? deep click prediction: ????? ????[226]NAVER ?? deep click prediction: ????? ????
[226]NAVER ?? deep click prediction: ????? ????
NAVER D2
?
[225]NSML: ???? ??? ????? & ?? ?? ?????
[225]NSML: ???? ??? ????? & ?? ?? ?????[225]NSML: ???? ??? ????? & ?? ?? ?????
[225]NSML: ???? ??? ????? & ?? ?? ?????
NAVER D2
?
[224]??? ??? ???
[224]??? ??? ???[224]??? ??? ???
[224]??? ??? ???
NAVER D2
?
[216]Search Reliability Engineering (??: ???? ???? ?? ??? ?????)
[216]Search Reliability Engineering (??: ???? ???? ?? ??? ?????)[216]Search Reliability Engineering (??: ???? ???? ?? ??? ?????)
[216]Search Reliability Engineering (??: ???? ???? ?? ??? ?????)
NAVER D2
?
[214] Ai Serving Platform: ?? ? ? ?? ????? ???? ?? ?????
[214] Ai Serving Platform: ?? ? ? ?? ????? ???? ?? ?????[214] Ai Serving Platform: ?? ? ? ?? ????? ???? ?? ?????
[214] Ai Serving Platform: ?? ? ? ?? ????? ???? ?? ?????
NAVER D2
?
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual Search
NAVER D2
?
[232] TensorRT? ??? ??? Inference ???
[232] TensorRT? ??? ??? Inference ???[232] TensorRT? ??? ??? Inference ???
[232] TensorRT? ??? ??? Inference ???
NAVER D2
?
[242]??? ??? ??? ?? ?? ?? ???? ??: ???? ?? POI ?? ??[242]??? ??? ??? ?? ?? ?? ???? ??: ???? ?? POI ?? ??
[242]??? ??? ??? ?? ?? ?? ???? ??: ???? ?? POI ?? ??
NAVER D2
?
[212]C3, ??? ???? ???? ??? ?? ????[212]C3, ??? ???? ???? ??? ?? ????
[212]C3, ??? ???? ???? ??? ?? ????
NAVER D2
?
[223]???? QA: ????, NLP???
[223]???? QA: ????, NLP???[223]???? QA: ????, NLP???
[223]???? QA: ????, NLP???
NAVER D2
?

[261] ???????? ??????????? ???????????? ?????????

  • 1. ??? ???? ????: ??? ????? ????? ?? ????? ??? NUMBERWORKS ??
  • 2. ??? ? ? ??? `????¨?? data science firm? ?????
  • 3. ?? ??? ????? 1. ????? ?? ?? ???? 2. ?? ???? ???? ?? 3. ???? ? ??? ?????. 4. ??
  • 10. 10 Collaborative Filtering ?? 2??? ??? ???. Memory-based Model-based ??? ?? ??(hybrid)?? ???.
  • 13. YOU
  • 15. ?? ?? ??? Jaccard similarity ???? ??? ??? ???? ?? A? ??? ?? B? ??? ??
  • 19. ?? ??? ??? Hadoop? ??? ????
  • 20. img from : http://thepage.time.com/2009/04/18/why-is-this-elephant-crying/ ?? Hadoop??? ?? ? ?? ???
  • 21. ?? ??? ??? ?? pre-clustering? ?????
  • 22. ??? naive ? clustering ???? O(N^2) ???? ?? ????? ???? ????? ???? ??? ? ??? ????? ??? ??!
  • 25. Use MinHASH as LSH(min-wiseindependentpermutationslocalitysensitivehashing) ? ??? ?? ?????
  • 26. ?? ? ????? ????. Locality Sensitive Hashing
  • 27. ?? ?????? hash? ? ???? ?? ?? ??? ?????
  • 28. ?? ?? ??? ??? ? ??? ???? ???? ???? clustering?? ???? ??.
  • 29. Make hash functions Hash function #1 Hash function #2 HOW DOES LSH WORK? (localitysensitivehashing)
  • 31. ?????? ?? O(n2) ??? hash snapshot?? O(n) ?? ?????
  • 32. TIME SIZE 2) ??? ??? O(n2) ??? ???? ?? n?? ????. 1) ??? ??? ??? ??? ?? ????? ???
  • 33. ?? hash function? ??? ??? ??? ?? ????? ?? hash??. ??? ??? ??? ??? ??? ????, ????? ??? ?? ??? hash function? ????.
  • 34. - cosine similarity - hamming distance - euclidean distance - jaccard similarity ??? ??? ??.
  • 36. A B C r1 1 0 1 r2 0 1 1 r3 0 0 1 r4 1 1 0 r5 1 0 0 r6 0 1 1 A B C h1 2 4 1 h2 2 1 1 h3 1 2 3 r3 r5 r1 r2 r4 r6 r6 r1 r3 r2 r4 r5 hash1 hash2 random permutation r5 r4 r6 r2 r1 r3 hash3 random?? permutation? ??? ???? ? 1? index? ???. jaccard(B,C) = 2/5 = 0.4 jaccard(A,C) = 1/6 = 0.16 sim(B,C) = 1/3 = 0.3 sim(A,C) = 0/3 = 0 jaccard = intersect/union sim = intersect/length
  • 37. ???? random permutation? ??? ?? ???? ?? ???? ??? universal hash? ?? random generation ?? random permutation? ???(proxy)? ???? ??. ?? ?? ??? ???? ??? ??? ???? minhash? ?? ? ????.
  • 38. ?? ??? dimension reduction A B C r1 1 0 1 r2 0 1 1 r3 0 0 1 r4 1 1 0 r5 1 0 0 r6 0 1 1 A B C h1 2 2 1 h2 2 1 1 h3 1 2 3 6?? 3??
  • 39. Q P R S s1 1 2 2 1 s2 3 1 3 1 s3 1 2 1 2 s4 4 1 1 4 hash?? ?? ?? signature? ?? ???? ??? concate??? cluster id ???. `s1-s2¨, `s2-s3¨, `s3-s4¨, `s4-s1¨ Q 13 Q, R 31 Q 14 Q,S 41 P 21 P, R 12 P,S 12 R 23 R,S 11 S 24
  • 40. ??? typical? Batch Implementation? ??? ????. (item to item ???? ??)
  • 41. minhash? ???? ??? for ?? in ?? ???: ? ??? ?? click stream ???? ??? ??? minhash signature ???? signature?? concate?? ??? cluster id ? ??? ?? cluster id?? ???? for cluster in ???? ?????: if length(????) > threshold: ????? ??? ?? ??
  • 42. ???? ??? ???? ?? item? ??? ? ?? item? minhash?? ??? minhash?? concate?? ???? cluster ? ? ?? for cluster in ??? ??? ?????: ????? ??? ?? ?? item? ?? ? item?? click stream? ?? click stream??? ??? ?? pair ??? ?? ? ????? ?? ??? ?? ??? top N?? ??
  • 45. ?? ??? 2?? ??? ??. ?? ??
  • 46. Heavy I/O ? ?? : ???? I/O ??? ?? ??? ?? ??! minhash? ????? for ?? in ?? ???: ? ??? ?? click stream ???? ??? ??? minhash signature ???? signature?? concate?? ??? cluster id ? ??? ?? cluster id?? ???? for cluster in ???? ?????: if length(????) > threshold: ????? ??? ?? ?? ???? ??? ???? ?? item? ??? ? ?? item? minhash?? ??? minhash?? concate?? cluster id ????? ?? ?? ??? ??? cluster?? ? ?? for cluster in ??? ??? ?????: ????? ??? ?? item? ?? ? item?? click stream? ?? click stream??? ??? ?? pair ??? ?? ? ????? ?? ??? ?? ??? top N?? ?? ?? ??? ???? ??
  • 47. - speed gain ??? ????. ??? ????. - quality loss ?? : ?? : ????? ??
  • 48. ??? ?? ???? ?? ??? ??? ???? ??? ???? ? ????
  • 49. ? I/O ??? ?? ???? ??? ??? ??? ??? ??????.
  • 50. ???? ???? ?? ?? : user? view? ??? ?? ?? ??
  • 51. ???? ???? ?? ?? : item? view? ??? ???? item ?? item ??? ? ?? item
  • 53. ?? ???? ??? ??? ?? = ?????? ?? ?? (??????) = ???? ? click stream = ????? ?? ??? = ????? ?? ??? ?? ????. = ?? page out? ????. =????? ????.
  • 54. ?? ??? ?? ??? ?? ???? ?? ?? ??? ?? ??? ? ?? ???? ??? ???? ?? ??? ??? ???? ?? ?? ?? ?? ? ??.
  • 55. ?? ?? = ?? ??? ?? ??, ?? ??? = use good dimension reducer = ????? ??? ?? ???? ???. = ? ??? ?? ?? ???? = minhash
  • 56. ?? ??? S1 S2 ´ Sn S1 S2 ´ Sn ?? ??? hash function? n?? signature?? ??? ?? ? ?? ??
  • 57. S1 S2 S3 S4 signature? ?? ??? sig? ??? ? jaccard S5 S1 S2 S3 S4 S5 ??? jaccard coefficient 2/5 = 0.4 ??? ??? ???, ? ???? ?? jaccard?? ??? ????
  • 58. ? ?? hash func? ?? signature? ??? ?? ??? ??? ????!
  • 59. ??? ???? ???? ??????? 100?? signature? root mean square error ??? ?? RMSE 0.03 ?? ??? ????! ???? ????
  • 60. ?? ??? ??? ???? ?? ?? ??? ?? signature? ?? ?? ??? ???? ?? clustering ?? ?? ??? ? click stream? ??? signature? ?? ?? ??? ??? ?? ??? click stream?? signature overlap? ???
  • 61. ? ??? ?? ??? ??? in-memory? fit?? ????? ??? ?? ??? ??
  • 62. +??? ??? in-memory? fit?? +??? ??? ???? ?? ?? ???? ?????? ?????? ??? ??? ?? ?? ???. ???? ?? disk io?? ??
  • 63. item??? 4byte?? ???? 100? 100 x 4 + 200 = 600 byte ?? ?? ?? 200 byte ? ?? ?? ??? 1G? ??? 1024^3 / 800 = 1,789,569 1??? 180? ?? item? click stream? ???? ?? ?? ? ??! (?? ?? ?? ??)
  • 64. +??? ??? ??? ? ?? ?? Job ^?? ? ??? ????? ̄ +?? ?? ?? ??? ??????? ??. ^??? ??? ??? ?? ???? ?? ?? ??? ?? ???? ̄, ^??? ???? ̄
  • 65. ??? click? ???? ??? ??? ?? signature??? ? ?? ?? ?? ??? No! minhash? `min ??¨? `chain¨??. ??? ?? ??? ??? ??.
  • 67. Associative property ???? ? ???? ???? ?? ???? ?? Idempotence ???? ???? ???? ?? ???? ??.
  • 68. ???? ???? ??? ??? = ? ??? ?? ???? ??? ? ?? ??? ?????, ???? ??? ???! click? ?????? ?? ????!
  • 69. ? ??? ???? high TPS? ?? save minhash(110)[1 DB write] = load minhash(100)[1 DB read] x compute minhash(101~110)[input buffer] old minhash new new new local minhash memorystorage updated minhash ??? ???? ??? ????? ?????. micro batch ?? ??? ????? ??? ?? ??. (????)
  • 70. ?~ ??? ? ?? ?? ???? ? ????. ??? ??? ?? ??? ??? ????
  • 71. S1 S2 ´ Sn ??? ? ???. S1 S2 ´ Sn S1 S2 ´ Sn S1 S2 ´ Sn S1 S2 ´ Sn ??? n^2? ??? item A item B item C item D item E
  • 72. ?? ?? ?? ?? Secondary Index? ???.
  • 73. item A item B item C 923 1032 58 74 87 1032 123 80 923 872 58 80 Sig1 Sig2 Sig3 Sig4 ??? signature? ?? ? ? Secondary-Index KV Storage Sig??? ? pair? Key? ?? ?? ??? item???? Value Key Value Sig1-923 A-C Sig2-872 C sig3-58 A-C sig4-80 B-C A : 2?/4sigs B : 1?/4sigs A : 0.5 B : 0.25 Value? ??? item?? ????? ? sig? ??? ???? = jaccard? ?? =
  • 75. requirement: minhash??? ?????? 2nd idx? ?? ?? ?? ???? = ??? ????? ???? ???!
  • 77. Key Value Sig1-923 A-C Sig2-872 C sig3-58 A-C sig4-80 B-C ??? membership ??? ???? ??. REDIS? ?? ?????? ? ?? ???
  • 78. Redis Data structure candidates Strings Sets - support add, remove, union, intersection - plain K/V ?????! VS
  • 79. ?? ??? ??? string?? ??? ???? ? ???. sig45 Tom Jerry Robert Jack string?? ??? ?? ^[Tom, Jerry, Robert, Jack] ̄ ??? ? ?? ??? json.dumps(data) json.loads(data_str) [write] [read] ^[Tom, Jerry, Robert, Jack] ̄
  • 80. ??? ??? String??. ??! Order! O(1) vs O(N) In [24]: %time gs.load_benchmark('user','key') CPU times: user 0.32 s, sys: 0.03 s, total: 0.34 s Wall time: 0.42 s In [25]: %time gs.load_benchmark('user','set') CPU times: user 32.34 s, sys: 0.13 s, total: 32.47 s Wall time: 33.88 s 1) ? ????? String? Set?? ?? ??. string set
  • 81. `redis string¨ ? mget ??? ????. (multiple get at once) N call round trip -> 1 call In [9]: %timeit [redis.get(s) for s in sigs] 100 loops, best of 3: 9.99 ms per loop In [10]: %timeit redis.mget(sigs) 1000 loops, best of 3: 759 us per loop 2)
  • 82. 3) string ? ??? ?????? ????. = ???? ? ?? ?? ?? ? ?? ??? compress speed(μs) 0.0 35.0 70.0 105.0 140.0 snappy zlib 134.0 17.3 size(%) 0% 25% 50% 75% 100% snappy zlib 40% 70% snappy?? ????? ?? ????? ??
  • 83. redis ? pipe??? ??? transaction ? ???. ??? set? ???? ?? ??? ???? 4) item A 923 1032 58 74 item B 87 1032 123 80 item C 923 973 58 80 sig2-1032 sig2-973 A-B C Secondary-Index 973 sig2-1032 sig2-973 B A-C sig1 sig2 sig3 sig4 key: pos-value value: member case) sig2-1032? A? ???? sig2-973? ?? - atomic?? minhash signature ????
  • 84. ?? ? ?? ??? 2nd index key space? ?????? ?????? hash max val * sig cnt? key? ?? hash max? 433,494,437? Fibonacci primes? ?? 100?? signature? ??? ?? ????? 433,494,437,00 ?? key?? ??. ???.
  • 85. minhash ? min? chain?? ???, ? ??? ??? ?? ??? ?? 100 15,082,312 4334944370 sigcnt 43,349,443,700 0?? member? ?? key? ??? ?? example case)? 3,727,344 products, ? 100 sig => 15082312 keys ???? ?, KV?? ??? 2G ?? ??
  • 86. REDIS? ???? transaction? ????? ?? update? ?? Secondary Index? ??????.
  • 87. ??
  • 88. minhash? min?? ?????. item Z? user click? ???? ? ??? Z 183 1032 942 80??? signature ?? click ? minhash?? 87 2043 123 300click ??? ?? ?? ?? ??? signature ? 2nd index?? new 87 1032 123 80 click ??? ??? ?? ??
  • 89. ?? item Z? ?? ?? ??? ???? Secondary Index ? ?? ? ?? flow item Z 87 1032 123 80 ??? signature ?? Sig1 - 87 Sig2 - 1032 Sig3 - 123 Sig4 - 80 A-B-D-F-Z C-G-Z A-B-D-Z B-D-F-Zredis.mget?? ??? ??? ??? ?????, ??? A:2/4 =0.5 B:3/4 =0.75 C:1/4 =0.25 D:3/4 =0.75
  • 90. ???? ???? ?????? minhash?1G 2ndindex?2G ??? Mem Size ??1G ??? CPU Power REDIS1core minhash??1core ????1core
  • 92. ?? ??? ????? ?? ??? Memory based ??? ??? ?? ???? ? (ALS, NMF, Markov Chain??) ??? ??????? ? ??? ???.
  • 94. 99.9% ???? ?? ? ??? ??? ? 1169KB 51KB logo.bmp logo.jpg 1/
  • 95. ?22
  • 96. ?? ??? ??, ???? ??? ?? ? ??? ??? ?? Amortized ?? ?? ??? ? ??? ????
  • 97. ?? ?? ??? ?? ????? ????? ??? ?? ?? ?? ??????.
  • 98. ??? ????? ???? ?? ??? ?? ???? ??? ??? ???? ??? ??