David Gevorkyan graduated from AUA in 2008. He discusses big data, data science, and how eHarmony uses these fields to create successful relationships. eHarmony collects vast amounts of user data through detailed questionnaires and analyzes this data using algorithms to make compatibility match recommendations to users.
1 of 63
Downloaded 12 times
More Related Content
AUA Data Science Meetup
2. D AV I D G E V O R K YA N
@ d a v i d g e v
d a v i d g e v o r k y a n
6. 8 0 % O F D ATA E X I S T I N G I N A N Y E N T E R P R I S E I S
U N S T R U C T U R E D D ATA
ST R U C T U R E D
DATA
S E M I -‐
ST R U C T U R E D
U N ST R U C T U R E D
DATA
RDBMS Data Warehousing
7. 9 0 % O F T H E D ATA I N T H E W O R L D T O D AY H A S
B E E N C R E AT E D I N T H E L A S T T W O Y E A R S A L O N E
S o u rc e : h t t p : / / w w w. i n t e l . c o m / c o n t e n t / w w w / u s / e n / c o m m u n i c a t i o n s / i n t e r n e t - m i n u t e - i n f o g r a p h i c . h t m l
8. 4 V ’ S O F B I G D ATA
VOLUME (large amount of data)
VARIETY (sensors, video, audio, email, social)
VELOCITY (speed of data generation)
VERACITY (authenticity and/or accuracy)
9. S O L U T I O N S R E Q U I R E D
f o rc e s y o u t o c h a n g e t h e w a y y o u
• C O L L E C T
• T R A N S P O RT
• S T O R E
• M A N A G E
• A N A LY Z E
• V I S U A L I Z E
12. D ATA S C I E N C E ! = S TAT I S T I C A L A N A LY S I S
I T I S S C I E N C E A N D “ A RT ” O F …
• E X P L O R I N G T H E U N K N O W N A B O U T D ATA
“ m a k e d i s c o v e r i e s w h i l e s w i m m i n g i n t h e d a t a ”
• R E F I N I N G T H E R E S U LT S F O R A C C U R A C Y
• D E R I V I N G A C T I O N A B L E I N S I G H T
• C R E AT I N G D ATA - D R I V E N P R O D U C T S
16. • S c a l a , J a v a , P y t h o n , R … ( b o n u s : C l o j u re , H a s k e l l , E r l a n g )
• H a d o o p , H D F S , M a p R e d u c e … ( b o n u s : S p a r k , S t o r m , Te z )
• S c a l d i n g , H B a s e , P i g , H i v e … ( b o n u s : S h a r k , T i t a n , G i r a p h )
• F l u m e , S q o o p , E T L , We b s c r a p e r s … ( b o n u s : H u m e )
• S Q L , R D B M S , D W, O L A P… ( b o n u s : S O L R , E l a s t i c S e a rc h )
• K n i m e , We k a , R a p i d M i n e r… ( b o n u s : S c i P y, N u m P y, P a n d a s )
• D 3 . j s , K i b a n a , g g p l o t 2 , Ta b l e u … ( b o n u s : S h i n y, F l a re ,
D a t a m e e r )
• S P S S , M a t l a b , S A S … ( t h e e n t e r p r i s e m a n )
• N o S Q L , M o n g o D B , C a s s a n d r a , C o u c h D B
• A n d Ye s ! … M S - E x c e l : t h e m o s t u s e d , m o s t u n d e r r a t e d D S t o o l
19. • R e v e n u e , re v e n u e , re v e n u e
• I m p ro v e t h e c u s t o m e r e x p e r i e n c e
• I n c re a s e o p e r a t i o n a l e ff i c i e n c y
• G E : O p t i m i z e m a i n t e n a n c e i n t e r v a l s f o r i n d u s t r i a l
p ro d u c t s
• G o o g l e : R e f i n e s e a rc h a n d a d - s e r v i n g a l g o r i t h m s
• Z y n g a : O p t i m i z e t h e g a m e e x p e r i e n c e f o r b o t h
l o n g - t e r m e n g a g e m e n t a n d re v e n u e
• N e t f l i x : M o v i e re c o m m e n d a t i o n s
• K a p l a n : U n c o v e r e ff e c t i v e l e a r n i n g s t r a t e g i e s
• e H a r m o n y : C re a t e h a p p y re l a t i o n s h i p s
21. T R A D I T I O N A L M E T H O D S D O N O T W O R K
A N Y M O R E …
22. E H A R M O N Y C R E AT E S
T H E H A P P I E S T,
M O S T PA S S I O N AT E
A N D M O S T F U L F I L L I N G
R E L AT I O N S H I P S *
* A C C O R D I N G T O A R E C E N T S T U D Y
25. T H E D I F F E R E N C E ?
Compatibility Matching System®
C O M PAT I B I L I T Y
M AT C H I N G
A F F I N I T Y
M AT C H I N G
M AT C H
D I S T R I B U T I O N
26. T H E D I F F E R E N C E ?
Compatibility Matching System®
C O M PAT I B I L I T Y
M AT C H I N G
A F F I N I T Y
M AT C H I N G
M AT C H
D I S T R I B U T I O N
27. U N I D I R E C T I O N A L U S E R D E F I N E D C R I T E R I A
Nicolette
28. U N I D I R E C T I O N A L U S E R D E F I N E D C R I T E R I A
B I D I R E C T I O N A L
Leo
Ian
Steve
Nicolette
29. U N I D I R E C T I O N A L U S E R D E F I N E D C R I T E R I A
Leo
Ian
Steve
Nicolette
B I D I R E C T I O N A L
35. C O M PAT I B I L I T Y M AT C H I N G
U S E R D E F I N E D
C R I T E R I A
C O M PAT I B I L I T Y
M O D E L S
M O N G O D B
V O L D E M O RT
36. M O N G O D B
DATA STORE NEEDS
P O W E R F U L
I N D E X I N G
M O D E L S
FA S T M U LT I -
AT T R I B U T E
S E A R C H E S
E A S Y T O
M A I N TA I N
6 0 M +
Q U E R I E S
per day
37. M O N G O D B
WINS
A U T O
S C A L I N G
B U I LT- I N
S H A R D I N G
A U T O
B A L A N C I N G
M M S
38. V O L D E M O RT ?
T H AT N A M E
S O U N D S FA M I L I A R
39. V O L D E M O RT
DATA STORE NEEDS
C R U D
O P E R AT I O N S
VA R I E D
T R A N S A C T I O N
S I Z E S
B I L L I O N +
P O T E N T I A L
M AT C H E S
per day
40. V O L D E M O RT
WINS
A U T O
R E P L I C AT I O N
A U T O
PA RT I T I O N I N G
P L U G G A B L E
S E R I A L I Z AT I O N
41. A F F I N I T Y M AT C H I N G
Compatibility Matching System®
C O M PAT I B I L I T Y
M AT C H I N G
A F F I N I T Y
M AT C H I N G
M AT C H
D I S T R I B U T I O N
49. D ATA N E E D S F O R A F F I N I T Y
5 0 M + R E G I S T E R E D U S E R S
1 0 3
AT T R I B U T E S
1 0 7
D A I LY M AT C H E S
2 5 0 M +
P H O T O S
4 B + Q U E S T I O N N A I R E S
A N S W E R E D
50. C O M M U N I C AT I O N A G G R E G AT E S
E V E N T L I S T E N E R
S E R V I C E
U S E R A C T I V I T Y
S E R V I C E
~ 5 M S
R E S P O N S E
T I M E S
1 0 K E V E N T S
P E R S E C O N D
U S E R
S E R V I C E
H O U R LY, D A I LY
T O TA L
51. O F F L I N E B AT C H J O B S
U S E R
S E R V I C E
M A P - S I D E J O I N S
( T B )
S C O R I N G
1+GB
Compressed
Protocol
Buffers
PA I R I N G S
S E R V I C E
750M
Compressed
Protocol
Buffers
B I L L I O N +
P O T E N T I A L
M AT C H E S
52. A M A Z O N
E M R
AW S D I R E C T
C O N N E C T
2 5 6 N O D E S
5 0 T B S T O R A G E
I N - H O U S E
S E A M I C R O
D ATA R E T R I E VA L L AT E N C Y
L O W O P E R AT I O N A L C O S T
L O W P O W E R C O N S U M P T I O N
P R E D I C TA B L E C O M P L E T I O N T I M E S
53. M O D E L R E T R A I N I N G
distcp
Protocol
Buffers
from
Offline
Jobs
54. M AT C H D I S T R I B U T I O N
Compatibility Matching System®
C O M PAT I B I L I T Y
M AT C H I N G
A F F I N I T Y
M AT C H I N G
M AT C H
D I S T R I B U T I O N
55. Delivering the right matches
at the right time to as many
people as possible across
the entire network