狠狠撸

狠狠撸Share a Scribd company logo
AI, ????/???
- ???
????? ??!
???? ?? ?? ?? ??
3
Signal
Normalization
Feature Extraction /
Dimension
Reduction
Clustering /
Outlier Detection
Classification/
Prediction
Normalized
Signals
Temporal
Features
Clusters
Adaptation Feedback
Decision
Making
Raw Time
Series
Facts/
Truth
Signal processing techniques +
ICA- Independent Component Analysis
K-means
Random Fore
st
Conditions,
Unknowns
????
? ??? ?? ?????
? Open API ?????
? ????? ?????
??
????
(???)
???, ??,
??? ??
???? ??,
??(????),
????
??? ??? ?? ?? ?? ???? ??
AI? ????? ??
??? ?? ?? ??
??? ?????
?? ????(GPU)
?? ??? ??? ??!
?? 3?? ???? ??
http://deview.kr/2013/detail.nhn?topicSeq=39
?? ?? ?? ???
AI Platform
? ?? ?? ???
-???? ??? ?? ?? :
??? GPU ???? ??? ??? ???
‘???P4, P40’ ???? ?? ???? ?? ??? ?????
??? ????? ‘??RT’? ‘????’ ???? ??
-IBM/HP…..????
???(?) + ???? GPU??
????/??? ?????? ???? ????
Google TensorFLow https://github.com/carpedm20/DCGAN-tensorflow
Mahout http://mahout.apache.org/users/basics/algorithms.html
Classification
Na?ve Bayes, Hidden Markov Models, Logistic regression, Random Forest
Clustering
k-Means,Canpoy,Fuzzy k-Means,Streaming Kmeans,Spectral clustering
Spark https://spark.apache.org/docs/1.1.0/mllib-guide.html
Classification and regression
Linear models (SVM, Logistic regression, linear regression), decision tree, Na?ve Bayes
Clustering
k-means
Collaborative filtering
alternating least squares (ALS)
Microsft Azure ML http://azure.microsoft.com/ko-kr/documentation/articles/machine-learning-algorithm-choice/
Clustering
K means
Classification
Decision Tree,SVM (Support Vector Machines),Na?ve Bayes
Regression
Bayesian linear regression, Boosted decision tree regression, decision forest regression,linear regression, neural network regression, ordinal
regression, poisson regression
Bigger Data. Better Insights.?
?????? ????????
???? ???????.
CONFIDENTIAL
2012, Alexander Gray, Ph.D., Associate Professor, Georgia Tech
Berkeley, Carnegie Mellon, NASA Jet Propulsion Lab
Software plaLorm that provides enterprise class Machine Learning
for Big Data that lets Data Scientists & BI Analysts create more Acc
urate Predictive Models in Less Time
Freemium download, software subscription, node-based pricing model.
On-prem or in-cloud deployment.
Prof. Michael Jordan, UC Berkeley: machine learning ‘godfather’
Prof. David PaXerson, UC Berkeley: systems (inventor RISC, RAID)
Prof. Pat Hanrahan, Stanford: data visualization (Tableau, Pixar)
Prof. James Demmel, UC Berkeley: high-performance computing (LAPACK)
Extended scienti?c advisory council consists of top ML professors from 20 or so
universities (CMU, Princeton, Caltech, Purdue, Cambridge, etc)
Launched
Product
Business Model
Investors
Technical Advis
ory Board (Nat
l. Academy me
mbers)
Academic
Network
2
CONFIDENTIAL
?????? ???? ???????.
Financial Services
Manufacturing Healthcare
Technology Services Other
Information Providers
CONFIDENTIAL
?????? ???
Bigger Data. Better Insights.?
?????…
4CONFIDENTIAL
???, ??? ???, ?? ??, ?? ?? ??? ?????
???? ?????? ??? ?? ???? ?? ?????.
???? ????(?? software)? ?? ????? ???? ??
??? ?? ??? ??? ???? ???? ?????.
? ????? ??? ???? ??? ????, ? ????? ???
??? ??? ???? ??? ??? ???? ???? ???.
???? ????
Predict categories and classes
Predict values and numbers
Grouping and segmentation
Detection and characterization
Visualization and reduction
Find similar items
Classification R
egression Clu
stering
Density Estimation Di
mension Reduction Multidime
nsional Querying
Example Skytree Algorithms: Random Decision Forests, Gradient Boosting Machines, Nearest Neighbor,
Kernel Density Estimation, K-means, Linear Regression, Support Vector Machine,
2-point Correlation, Decision Tree, Singular Value Decomposition, Range Search, Logistic Regression
Recommendations Predictions Outlier
Detection
Speed andScale
(Run MoreExperiments)
Skytree? ?? ?? ??????? ?????.
? ?? ?? ???? ??? ? ???
????? ??????.
?????
C++ ? Open Source Scripting ?? ?????.
1
4CONFIDENTIAL
?????? ???? ??? ???? ?????.
?? ???? ?????.
????? ?? ??? ????.
???+???? ???
More Data
(See MoreIndicators)
Speed andScale
(Run MoreExperiments)
Skytree? ?? ?? ??????? ?????.
? ?? ?? ???? ??? ? ???
????? ??????.
?????
C++ ? Open Source Scripting ?? ?????.
1
5CONFIDENTIAL
?????? ???? ??? ???? ?????.
Automation
(Ease of Use and Interpretability)
?? ??? ??
? ??? ????? ????.
?? ????? ?????.
??? ? ?? ??? ??? ? ????.
1
6CONFIDENTIAL
?????? ???? ??? ???? ?????.
?? ???? ?????.
????? ?? ??? ????.
???+???? ???
More Data
(See MoreIndicators)
Speed andScale
(Run MoreExperiments)
Skytree? ?? ?? ??????? ?????.
? ?? ?? ???? ??? ? ???
????? ??????.
?????
C++ ? Open Source Scripting ?? ?????.
?????? ???? ???? ??? ?????.
??? ????? ? ??? ??? ?? ????? ??? ?????.
Examples of High Value Analytics UseCases
Customer
? Segmentation
? Recommendation
? Churn
? Lead Scoring
? Pricing
? Credit Scoring
Risk & Security
? Fraud Analysis
? Risk Analysis
? Anomaly Detection
? Cyber Security
? Situational Pro?ling
? Pattern of Life
Operational
? Prescriptive Maintenance
? Default/Fault Detection
? Supply Chain
? Cost Forecasting
? Operational Analysis
? Failure Analysis
CONFIDENTIAL
9
CONFIDENTIAL
“???? ??? ??? ??
???.”
????? ?? ???? ?? ?? ???.
“?? ?? ??? ?? ?????
???? ?? ???.”
“??? ??? ?? ??? ???.”
“ ??? ??,??????? ? ?
? ????? ???? ????.”
“????? ?? CPU? ??? ?
?? ???? ??? ???.”
“??? ??? ???? ?? ???
??? ???.”
“?? ????? ?? ????.”
“??? ?? ??? ??? ??? ?????
??? ? ?? ?? ????? ????
?.”
“??? ???? ????? ?? ?? ??? ?????.”
“??? ????? ?? ??? ???? ? ? ?
? ??? ??? ? ??? ???? ??? ????
??? ???? ?????.”
“?? ??? ?? ??? ??? ??? ? ??
??? ???? ?????. ??? ?? ???
? ?? ?????.”
“??? ???? ???? ??? ???? ??
?????.”
“???? ???? ??? ????? ? ?? ??? ?? ????.”
???? ?????:DisparateTools,ManualProcesses
Data Prep:
??? ??? ??? ?
??? ??, ??
Validation:
??? ???? ??
Deployment:
?? ??? ?? ??
Method Selection:
???? ???? ???
Parameter Selection:
??? ??? ??? ? ??? ?
? ?? ??? ???? ??
Sampling:
??? ??? ??? ?? ??
?? ??
t1 t4t3
Timeline
(Months/Quarters) t
CONFIDENTIAL
2
Prediction / Results
New Data
??? ?? ??? ??
t0
Skill level: PhDsThroughout
Skill level: Data Analysts, freeing PhDs to focus on high leveragechallenges
?????? ???? ??? ??:Automate& Sustain
Better Results-
Much Faster & Easier
Uni?ed Skytree Environment
New DataAutomated Project Oriented Workspace
?? ????
??? ?? ??
Single Click AutoModelTM
?? ?? ??? ?? ??? ??-??-???
Timeline
(Months/Quarters) t0
t1 t4t3t2
CONFIDENTIAL
??? ???
????
??? ?? ??
??? ??
?? ?? ???? ???, ?? ??, ??? ?? ?? ??
?? ?? ?? ??
?? ???? ??? ?? ?? ?? ???
CONFIDENTI
Customer 360oView
External DataBig Data
Environment
DataData
Data warehouse
E-MailCRM
Single Customer View
with improved decision making
capabilities based on Customer
data
Big Data
Enabling innovative products
& services, customer
satisfaction
Analytics
Churn propensity and prevention,
Product Sentiment, Recommendations and m
ore.
Internal Data
????? ??
Skytree? ??? ??? ? IT ??? ??? ??
????? ??????.
?????? Cloudera, Hortonworks, MapR ? Amazon
EMR? ?? Hadoop ??? ?????.
?????? ??? ???? ??? PMML ? JAR
??? ??? ?? ??? ?? ??? ?? ?????.
?????? ???? ??? ???? ????
????? ???? ?? ? ?????. ?? SPARK
??, YARN ?? ? ???? ?? GUI? ?????.
????? ??
-??? ????? ????
Skytree? ???? ??? ?? ??
150??? ???? ??? ???
????.
-?? ???? ????? ????
Skytree? ??? ??? ??? ??
?? ?? ??? ??? ??? ??
???.
-??? ??? ?? ?? ?????
? ????? Skytree? ???? ?
? ?? ???? ?????? ??
?????.
????? ??
-??? ???? ?? ?????? ??
??????? Skytree? ????
AutoModel ??? ??? ?? ?? ???
??? ? ????.
-?? ???? ??? ??? ??? ??
?? ???? ? ???? ??? ????
??? Skytree? ?? ?? ???? ??
? ?????.
-Skytree? ?? ??? ??? ???? ?
??? ??? ????? ?? ??? ??
?? ?? ? ?? ???? ?????.
????? ??
?? ?? ?????
-Skytree? ??? ????? ML ??? ??? ????? ??? ? ?????. -Skytree? ??? ?? ?????, ??
? ?? ??, ?? ??, ???? ?? ? Skytree? ??? ?? ??? ?? ?? ??? ???? ?????? ????
?.
????? ??
?? ?? ???
-??? Skytree? ???? ?? ??? ??, ?? ? ????? ???? ????? ????.
-?? ???? ??? ?? ?? ??? ??? ?? ???? ?????? ???? ???? ???? ??? ?
? ? ???? ?? ? ? ????.
????? ??
?? - ? - ?? ???
-Skytree? ?? ???? ?? ???? ????? ??? ????. ? ????? ?? ??? ?? ??? ??????
??????.
-??? ?????? ??? ????, ?? ???? ???? ? ??? ??? ???? ?? ? ?? ? ? ?? ???
?? ??? ?? ??? ??? ? ??? ???????.
????? ??
????? ?? ? GUI ???
-???, ?? ??? ??? ? ?? ??? ??? ??? ?? ?? ? ??? GUI? ?? ?? Java, Python ?? Skytree
Command Line Interface?? ????? ???? Skytree? ???? ? ????.
-???? ?? ??? ??? ?? ??? ??? ?? ?????, ??? ??? ??, ??? ??, ???? ?? ? ???
???? ?? ???? ??? ????.
Skytree ??? ??? : ?? ?? ??
Complexity of State-of-the-Art Machine Learning methods:
1. Querying: all-nearest-neighbors O(N2)
2. Density estimation: kernel density estimation O(N2), kernel conditional density est.
O(N3)
3. Classification: logistic regression, decision tree, neural nets, nearest-neighbor classifier
O(N2), kernel discriminant O(N2), support vector machine O(N3),
4. Regression: linear regression, LASSO, kernel regression O(N2), regression tree, Gaussia
n process regression O(N3)
5. Dimension reduction: PCA, non-negative matrix factorization, kernel PCA O(N3), maximum
variance unfolding O(N3); Gaussian graphical models, discrete graphical models
6. Clustering: k-means, mean-shift O(N2), hierarchical clustering O(N3)
7. Testing and matching: MST O(N3), bipartite cross-matching O(N3), n-point correlation 2-sample
testing O(Nn), n=2, 3, 4, …
? Unfortunately O(N2), O(N3) are computationally prohibitive for big dataSkytree has invented a way to reduce the complexity of above metho
ds from O(N2) and O(N3) to O(N) or O(N log N).
????
Up to 10,000x
speedups
(on one CPU)
?????? ???? ??
Deep knowledge of algorithms
Drawing from the latest from academia
Smart programming
Efficient ways to compute order N(2) and N(3)
Distributed systems
Take advantage of parallel computing speed
35% 42%4 Sec
?? ????
Major PainPoints: Speed & Accuracy
Current Solution: Hadoop/Mahout,Homegrown
You might enjoy:
~1500x execution speedup
20% improvement in reco
mmendation relevance
SKYTREE
Results
“We are literally speechless”
Recommendation Engine (like Net?ix)
You’ve enjoyed:
- Skytree Customer
Skytree Impact
LEGACY
LEGACY
SKYTREE
97 Min
(5,820 sec)
Runtime
Precision
CONFIDENTIAL
100 Min 8 Min
Legacy Environment:
100 Node Hadoop Cluster:
1,200 Cores
Runtime: 100 Minutes
Accuracy (Gini): 57%
?? ????
Major Pain Points: Speed & Accuracy
Current Solution: SAS, Hadoop,Homegrown
“I want our analysts to create models with Skytree rather than writing software”
Micro-TargetingApplication
SKYTREE Server:
Single Server:
12 Cores
Runtime: 8 Minutes
Accuracy (Gini): 60%
Skytree Impact
- Skytree Customer
SKYTREE
LEGACY
12.5x improvement on 1 node,
~1200x expected improvement
on 100 nodes
5% improvement in accuracy
Time
CONFIDENTIAL
Customer Pain: $500M+/year infraud
BeforeSkytree:
? Fraud model updatedannually
? Internally developed algorithms
? Model accuracy maxedout
? ModeldevelopedonLinuxServers
Client Win:
? Business Impact: Chargebacks greatly reduced
? Operational Impact: 300X shorter threat response time
? Financial Impact: $50M+ savings annually
With Skytree
Modelsupdatedweekly Model accuracyimproved~10%
"Skytree? ????? ??? ?? ?? ?????. ???? ???? ?? ??? ?
?? ??? ?? ?? ??? ??? ???????. Skytree? ?????? ???
1990??? ???? ?? ?????? ??? 10% ???????. ??? ??? ?
???.”
14
CONFIDENTIAL
?? ????-FDS
R Skytree
?? ????
Test Suite 1: 20-88x execution
speed-up on same data sets
Test 2: >50,000x increase in
data size and ran to completion
R Skytree
?? : ?? ??? Skytree Impact
Application: Profit optimization through
? Loss Prediction
? Binding
? Retention
? Price Elasticity
Major Pain Points: Speed & Scale
Current Solution: R, Hadoop,
Homegrown
Speed-up Scale
Using up to 450 million rows and 450 attributes
????? ?? ??? ??
???? ??
?? ?? ??? ??? ? ????? ??? ??
?? ? ??? ?? ?? ? ??? ??? ????.
?? ??? ?? ?
Skytree ???
??, ?? ? ??? ?? ??? ??? ???? ??
??? ???? ?? ??? ?? ?? ??? ????
Machine Learning ???.
?? ??? ???? ?? ??
???? ??
? ?? ?? ?? ??
??? ?? ??? ??? ???? ?? ?? ???
???? ? ???? ?? ? ??
36
CustomerPain:
? Highdata center equipment costs
? Outages hurt usersatisfaction
? Huge& rapidlygrowingmachinedata
volumefromthousandsof feeds
BeforeSkytree:
? Overprovision to cover anticipatedpeaks
? CapExwaste
? Outages went unnoticed untilcustomers
complained
? Reactive
Client Win
Business Impact: Higher user and merchant satisfaction
Operational Impact: Next gen architecture enabled Financ
ial Impact: Estimated $20-30M savings/year
Enabled With Skytree
? Provision only what’s reallyrequired
? Monitor thousands of systems, socialmediafeedsat
>25TB/hour
? Takeaction before merchantscomplain
“??? ???? ???? ???? ???? ??? ?? ??? ???
??? ?? ??? ?????. ??? Skytree? ????? ?? ???
? ????, ???? ?? ???? ?? ???? ?? ???? ???
??..”
?? ???? - Datacenter Optimization
9
CONFIDENTIAL
?????? ???? ??? ??? ?????
16
CONFIDENTIAL
?? ???? ???? ?? ?? ??? ?????.
????? ????? ? ???? ???? ? ??.(?? ??)
? ?? ???? ??? ?? ?? ??? ?? ?? ? ????.
?? ??? ???? ?? ??? ???? ? ??.
? ??? ??? ??? ???? ???? ?????.(“Rare Item” or “Hot Seller”)
? ???? ??? ????? ???? ? ??.
? ?? ??? ????? ??? ???? ?? ??? ?? ? ????.
????? ???? -Architected for Speed andAccuracy
Machine
Learning
Algorithms
Deeply
Optimized
In
Memory
Execution
P A R A L L E L
In Memory
Execution
CPU CPU
I Z E
In Memory
Execution
CPU CPU
? ???? ???? ????
(n,nlog(n)calculationsversusn2 andn3)
? ????????????
? ?? ???? ?? ??
? ????? ??
? ??? ?? ???? (Hadoop scalingw/TrueScaletm)
? ???? ??? ???
Skytree Fast Internode Communication
CONFIDENTIAL
????? ????:Speed& E?ciency
Scikit-learn
R
MLlib
Skytree
26x
128x
153x
GBTR, Single Node, 13 million rows (in 1000s of seconds)
0 5 10 15 20 25 30 35
GBTR, Multi-node, 10M-100M Rows (in 1000s of seconds)
0
2000
4000
6000
8000
10000
0 20 40 60 80 100
Time
n
Skytree Deep Optimizations O(
n?), O(n?) vs. O(n), O(nlog(n))
n?
n?
nlog
(n)
n
0 10000 20000 30000 40000 50000
Single node
8 nodes
- Skytree
Mllib - did not complete
CONFIDENTIAL
MLlib 71x slower
- Skytree
? ?? ???????????????????????.
? ???????????????????????????.
? ?????????????? O(nlog(n))???.
?????? ??
? GUI??? ??? ?? ??
? ML??? ??, ?? ??, ?? ?? ???
? ??? ?? ???? ?? ????
? AutoModel& SmartSearch : One step ??-??-???
? ???? : GUI, CLI, Python & Java SDKs, REST API’s, ML ??, ?
? ??, ??? ?? ??
? GUI: Model comprehension, Variable importance,tree visualization;
?? ??? ?? ?? ?
CONFIDENTIAL
?????? High LevelArchitecture
Flexible Delivery On PremisesCloud
Production
CONFIDENTIAL
Bigger Data. Better Insights.?
Skytree: Machine Learning Built for theEnterprise
CONFIDENTIAL
??? ?? kosena21@naver.com 010-9338-6400

More Related Content

[???, kosena] ???? ???? ????

  • 3. ???? ?? ?? ?? ?? 3 Signal Normalization Feature Extraction / Dimension Reduction Clustering / Outlier Detection Classification/ Prediction Normalized Signals Temporal Features Clusters Adaptation Feedback Decision Making Raw Time Series Facts/ Truth Signal processing techniques + ICA- Independent Component Analysis K-means Random Fore st Conditions, Unknowns
  • 4. ???? ? ??? ?? ????? ? Open API ????? ? ????? ????? ?? ???? (???) ???, ??, ??? ?? ???? ??, ??(????), ???? ??? ??? ?? ?? ?? ???? ??
  • 5. AI? ????? ?? ??? ?? ?? ?? ??? ????? ?? ????(GPU) ?? ??? ??? ??! ?? 3?? ???? ?? http://deview.kr/2013/detail.nhn?topicSeq=39
  • 6. ?? ?? ?? ??? AI Platform
  • 7. ? ?? ?? ??? -???? ??? ?? ?? : ??? GPU ???? ??? ??? ??? ‘???P4, P40’ ???? ?? ???? ?? ??? ????? ??? ????? ‘??RT’? ‘????’ ???? ?? -IBM/HP…..???? ???(?) + ???? GPU??
  • 8. ????/??? ?????? ???? ???? Google TensorFLow https://github.com/carpedm20/DCGAN-tensorflow Mahout http://mahout.apache.org/users/basics/algorithms.html Classification Na?ve Bayes, Hidden Markov Models, Logistic regression, Random Forest Clustering k-Means,Canpoy,Fuzzy k-Means,Streaming Kmeans,Spectral clustering Spark https://spark.apache.org/docs/1.1.0/mllib-guide.html Classification and regression Linear models (SVM, Logistic regression, linear regression), decision tree, Na?ve Bayes Clustering k-means Collaborative filtering alternating least squares (ALS) Microsft Azure ML http://azure.microsoft.com/ko-kr/documentation/articles/machine-learning-algorithm-choice/ Clustering K means Classification Decision Tree,SVM (Support Vector Machines),Na?ve Bayes Regression Bayesian linear regression, Boosted decision tree regression, decision forest regression,linear regression, neural network regression, ordinal regression, poisson regression
  • 9. Bigger Data. Better Insights.? ?????? ???????? ???? ???????. CONFIDENTIAL
  • 10. 2012, Alexander Gray, Ph.D., Associate Professor, Georgia Tech Berkeley, Carnegie Mellon, NASA Jet Propulsion Lab Software plaLorm that provides enterprise class Machine Learning for Big Data that lets Data Scientists & BI Analysts create more Acc urate Predictive Models in Less Time Freemium download, software subscription, node-based pricing model. On-prem or in-cloud deployment. Prof. Michael Jordan, UC Berkeley: machine learning ‘godfather’ Prof. David PaXerson, UC Berkeley: systems (inventor RISC, RAID) Prof. Pat Hanrahan, Stanford: data visualization (Tableau, Pixar) Prof. James Demmel, UC Berkeley: high-performance computing (LAPACK) Extended scienti?c advisory council consists of top ML professors from 20 or so universities (CMU, Princeton, Caltech, Purdue, Cambridge, etc) Launched Product Business Model Investors Technical Advis ory Board (Nat l. Academy me mbers) Academic Network 2 CONFIDENTIAL ?????? ???? ???????.
  • 11. Financial Services Manufacturing Healthcare Technology Services Other Information Providers CONFIDENTIAL ?????? ???
  • 12. Bigger Data. Better Insights.? ?????… 4CONFIDENTIAL ???, ??? ???, ?? ??, ?? ?? ??? ????? ???? ?????? ??? ?? ???? ?? ?????. ???? ????(?? software)? ?? ????? ???? ?? ??? ?? ??? ??? ???? ???? ?????. ? ????? ??? ???? ??? ????, ? ????? ??? ??? ??? ???? ??? ??? ???? ???? ???.
  • 13. ???? ???? Predict categories and classes Predict values and numbers Grouping and segmentation Detection and characterization Visualization and reduction Find similar items Classification R egression Clu stering Density Estimation Di mension Reduction Multidime nsional Querying Example Skytree Algorithms: Random Decision Forests, Gradient Boosting Machines, Nearest Neighbor, Kernel Density Estimation, K-means, Linear Regression, Support Vector Machine, 2-point Correlation, Decision Tree, Singular Value Decomposition, Range Search, Logistic Regression Recommendations Predictions Outlier Detection
  • 14. Speed andScale (Run MoreExperiments) Skytree? ?? ?? ??????? ?????. ? ?? ?? ???? ??? ? ??? ????? ??????. ????? C++ ? Open Source Scripting ?? ?????. 1 4CONFIDENTIAL ?????? ???? ??? ???? ?????.
  • 15. ?? ???? ?????. ????? ?? ??? ????. ???+???? ??? More Data (See MoreIndicators) Speed andScale (Run MoreExperiments) Skytree? ?? ?? ??????? ?????. ? ?? ?? ???? ??? ? ??? ????? ??????. ????? C++ ? Open Source Scripting ?? ?????. 1 5CONFIDENTIAL ?????? ???? ??? ???? ?????.
  • 16. Automation (Ease of Use and Interpretability) ?? ??? ?? ? ??? ????? ????. ?? ????? ?????. ??? ? ?? ??? ??? ? ????. 1 6CONFIDENTIAL ?????? ???? ??? ???? ?????. ?? ???? ?????. ????? ?? ??? ????. ???+???? ??? More Data (See MoreIndicators) Speed andScale (Run MoreExperiments) Skytree? ?? ?? ??????? ?????. ? ?? ?? ???? ??? ? ??? ????? ??????. ????? C++ ? Open Source Scripting ?? ?????.
  • 17. ?????? ???? ???? ??? ?????. ??? ????? ? ??? ??? ?? ????? ??? ?????. Examples of High Value Analytics UseCases Customer ? Segmentation ? Recommendation ? Churn ? Lead Scoring ? Pricing ? Credit Scoring Risk & Security ? Fraud Analysis ? Risk Analysis ? Anomaly Detection ? Cyber Security ? Situational Pro?ling ? Pattern of Life Operational ? Prescriptive Maintenance ? Default/Fault Detection ? Supply Chain ? Cost Forecasting ? Operational Analysis ? Failure Analysis CONFIDENTIAL
  • 18. 9 CONFIDENTIAL “???? ??? ??? ?? ???.” ????? ?? ???? ?? ?? ???. “?? ?? ??? ?? ????? ???? ?? ???.” “??? ??? ?? ??? ???.” “ ??? ??,??????? ? ? ? ????? ???? ????.” “????? ?? CPU? ??? ? ?? ???? ??? ???.” “??? ??? ???? ?? ??? ??? ???.” “?? ????? ?? ????.” “??? ?? ??? ??? ??? ????? ??? ? ?? ?? ????? ???? ?.” “??? ???? ????? ?? ?? ??? ?????.” “??? ????? ?? ??? ???? ? ? ? ? ??? ??? ? ??? ???? ??? ???? ??? ???? ?????.” “?? ??? ?? ??? ??? ??? ? ?? ??? ???? ?????. ??? ?? ??? ? ?? ?????.” “??? ???? ???? ??? ???? ?? ?????.” “???? ???? ??? ????? ? ?? ??? ?? ????.”
  • 19. ???? ?????:DisparateTools,ManualProcesses Data Prep: ??? ??? ??? ? ??? ??, ?? Validation: ??? ???? ?? Deployment: ?? ??? ?? ?? Method Selection: ???? ???? ??? Parameter Selection: ??? ??? ??? ? ??? ? ? ?? ??? ???? ?? Sampling: ??? ??? ??? ?? ?? ?? ?? t1 t4t3 Timeline (Months/Quarters) t CONFIDENTIAL 2 Prediction / Results New Data ??? ?? ??? ?? t0 Skill level: PhDsThroughout
  • 20. Skill level: Data Analysts, freeing PhDs to focus on high leveragechallenges ?????? ???? ??? ??:Automate& Sustain Better Results- Much Faster & Easier Uni?ed Skytree Environment New DataAutomated Project Oriented Workspace ?? ???? ??? ?? ?? Single Click AutoModelTM ?? ?? ??? ?? ??? ??-??-??? Timeline (Months/Quarters) t0 t1 t4t3t2 CONFIDENTIAL ??? ??? ???? ??? ?? ?? ??? ?? ?? ?? ???? ???, ?? ??, ??? ?? ?? ?? ?? ?? ?? ?? ?? ???? ??? ?? ?? ?? ???
  • 21. CONFIDENTI Customer 360oView External DataBig Data Environment DataData Data warehouse E-MailCRM Single Customer View with improved decision making capabilities based on Customer data Big Data Enabling innovative products & services, customer satisfaction Analytics Churn propensity and prevention, Product Sentiment, Recommendations and m ore. Internal Data
  • 22. ????? ?? Skytree? ??? ??? ? IT ??? ??? ?? ????? ??????. ?????? Cloudera, Hortonworks, MapR ? Amazon EMR? ?? Hadoop ??? ?????. ?????? ??? ???? ??? PMML ? JAR ??? ??? ?? ??? ?? ??? ?? ?????. ?????? ???? ??? ???? ???? ????? ???? ?? ? ?????. ?? SPARK ??, YARN ?? ? ???? ?? GUI? ?????.
  • 23. ????? ?? -??? ????? ???? Skytree? ???? ??? ?? ?? 150??? ???? ??? ??? ????. -?? ???? ????? ???? Skytree? ??? ??? ??? ?? ?? ?? ??? ??? ??? ?? ???. -??? ??? ?? ?? ????? ? ????? Skytree? ???? ? ? ?? ???? ?????? ?? ?????.
  • 24. ????? ?? -??? ???? ?? ?????? ?? ??????? Skytree? ???? AutoModel ??? ??? ?? ?? ??? ??? ? ????. -?? ???? ??? ??? ??? ?? ?? ???? ? ???? ??? ???? ??? Skytree? ?? ?? ???? ?? ? ?????. -Skytree? ?? ??? ??? ???? ? ??? ??? ????? ?? ??? ?? ?? ?? ? ?? ???? ?????.
  • 25. ????? ?? ?? ?? ????? -Skytree? ??? ????? ML ??? ??? ????? ??? ? ?????. -Skytree? ??? ?? ?????, ?? ? ?? ??, ?? ??, ???? ?? ? Skytree? ??? ?? ??? ?? ?? ??? ???? ?????? ???? ?.
  • 26. ????? ?? ?? ?? ??? -??? Skytree? ???? ?? ??? ??, ?? ? ????? ???? ????? ????. -?? ???? ??? ?? ?? ??? ??? ?? ???? ?????? ???? ???? ???? ??? ? ? ? ???? ?? ? ? ????.
  • 27. ????? ?? ?? - ? - ?? ??? -Skytree? ?? ???? ?? ???? ????? ??? ????. ? ????? ?? ??? ?? ??? ?????? ??????. -??? ?????? ??? ????, ?? ???? ???? ? ??? ??? ???? ?? ? ?? ? ? ?? ??? ?? ??? ?? ??? ??? ? ??? ???????.
  • 28. ????? ?? ????? ?? ? GUI ??? -???, ?? ??? ??? ? ?? ??? ??? ??? ?? ?? ? ??? GUI? ?? ?? Java, Python ?? Skytree Command Line Interface?? ????? ???? Skytree? ???? ? ????. -???? ?? ??? ??? ?? ??? ??? ?? ?????, ??? ??? ??, ??? ??, ???? ?? ? ??? ???? ?? ???? ??? ????.
  • 29. Skytree ??? ??? : ?? ?? ?? Complexity of State-of-the-Art Machine Learning methods: 1. Querying: all-nearest-neighbors O(N2) 2. Density estimation: kernel density estimation O(N2), kernel conditional density est. O(N3) 3. Classification: logistic regression, decision tree, neural nets, nearest-neighbor classifier O(N2), kernel discriminant O(N2), support vector machine O(N3), 4. Regression: linear regression, LASSO, kernel regression O(N2), regression tree, Gaussia n process regression O(N3) 5. Dimension reduction: PCA, non-negative matrix factorization, kernel PCA O(N3), maximum variance unfolding O(N3); Gaussian graphical models, discrete graphical models 6. Clustering: k-means, mean-shift O(N2), hierarchical clustering O(N3) 7. Testing and matching: MST O(N3), bipartite cross-matching O(N3), n-point correlation 2-sample testing O(Nn), n=2, 3, 4, … ? Unfortunately O(N2), O(N3) are computationally prohibitive for big dataSkytree has invented a way to reduce the complexity of above metho ds from O(N2) and O(N3) to O(N) or O(N log N).
  • 31. ?????? ???? ?? Deep knowledge of algorithms Drawing from the latest from academia Smart programming Efficient ways to compute order N(2) and N(3) Distributed systems Take advantage of parallel computing speed
  • 32. 35% 42%4 Sec ?? ???? Major PainPoints: Speed & Accuracy Current Solution: Hadoop/Mahout,Homegrown You might enjoy: ~1500x execution speedup 20% improvement in reco mmendation relevance SKYTREE Results “We are literally speechless” Recommendation Engine (like Net?ix) You’ve enjoyed: - Skytree Customer Skytree Impact LEGACY LEGACY SKYTREE 97 Min (5,820 sec) Runtime Precision CONFIDENTIAL
  • 33. 100 Min 8 Min Legacy Environment: 100 Node Hadoop Cluster: 1,200 Cores Runtime: 100 Minutes Accuracy (Gini): 57% ?? ???? Major Pain Points: Speed & Accuracy Current Solution: SAS, Hadoop,Homegrown “I want our analysts to create models with Skytree rather than writing software” Micro-TargetingApplication SKYTREE Server: Single Server: 12 Cores Runtime: 8 Minutes Accuracy (Gini): 60% Skytree Impact - Skytree Customer SKYTREE LEGACY 12.5x improvement on 1 node, ~1200x expected improvement on 100 nodes 5% improvement in accuracy Time CONFIDENTIAL
  • 34. Customer Pain: $500M+/year infraud BeforeSkytree: ? Fraud model updatedannually ? Internally developed algorithms ? Model accuracy maxedout ? ModeldevelopedonLinuxServers Client Win: ? Business Impact: Chargebacks greatly reduced ? Operational Impact: 300X shorter threat response time ? Financial Impact: $50M+ savings annually With Skytree Modelsupdatedweekly Model accuracyimproved~10% "Skytree? ????? ??? ?? ?? ?????. ???? ???? ?? ??? ? ?? ??? ?? ?? ??? ??? ???????. Skytree? ?????? ??? 1990??? ???? ?? ?????? ??? 10% ???????. ??? ??? ? ???.” 14 CONFIDENTIAL ?? ????-FDS
  • 35. R Skytree ?? ???? Test Suite 1: 20-88x execution speed-up on same data sets Test 2: >50,000x increase in data size and ran to completion R Skytree ?? : ?? ??? Skytree Impact Application: Profit optimization through ? Loss Prediction ? Binding ? Retention ? Price Elasticity Major Pain Points: Speed & Scale Current Solution: R, Hadoop, Homegrown Speed-up Scale Using up to 450 million rows and 450 attributes
  • 36. ????? ?? ??? ?? ???? ?? ?? ?? ??? ??? ? ????? ??? ?? ?? ? ??? ?? ?? ? ??? ??? ????. ?? ??? ?? ? Skytree ??? ??, ?? ? ??? ?? ??? ??? ???? ?? ??? ???? ?? ??? ?? ?? ??? ???? Machine Learning ???. ?? ??? ???? ?? ?? ???? ?? ? ?? ?? ?? ?? ??? ?? ??? ??? ???? ?? ?? ??? ???? ? ???? ?? ? ?? 36
  • 37. CustomerPain: ? Highdata center equipment costs ? Outages hurt usersatisfaction ? Huge& rapidlygrowingmachinedata volumefromthousandsof feeds BeforeSkytree: ? Overprovision to cover anticipatedpeaks ? CapExwaste ? Outages went unnoticed untilcustomers complained ? Reactive Client Win Business Impact: Higher user and merchant satisfaction Operational Impact: Next gen architecture enabled Financ ial Impact: Estimated $20-30M savings/year Enabled With Skytree ? Provision only what’s reallyrequired ? Monitor thousands of systems, socialmediafeedsat >25TB/hour ? Takeaction before merchantscomplain “??? ???? ???? ???? ???? ??? ?? ??? ??? ??? ?? ??? ?????. ??? Skytree? ????? ?? ??? ? ????, ???? ?? ???? ?? ???? ?? ???? ??? ??..” ?? ???? - Datacenter Optimization 9 CONFIDENTIAL
  • 38. ?????? ???? ??? ??? ????? 16 CONFIDENTIAL ?? ???? ???? ?? ?? ??? ?????. ????? ????? ? ???? ???? ? ??.(?? ??) ? ?? ???? ??? ?? ?? ??? ?? ?? ? ????. ?? ??? ???? ?? ??? ???? ? ??. ? ??? ??? ??? ???? ???? ?????.(“Rare Item” or “Hot Seller”) ? ???? ??? ????? ???? ? ??. ? ?? ??? ????? ??? ???? ?? ??? ?? ? ????.
  • 39. ????? ???? -Architected for Speed andAccuracy Machine Learning Algorithms Deeply Optimized In Memory Execution P A R A L L E L In Memory Execution CPU CPU I Z E In Memory Execution CPU CPU ? ???? ???? ???? (n,nlog(n)calculationsversusn2 andn3) ? ???????????? ? ?? ???? ?? ?? ? ????? ?? ? ??? ?? ???? (Hadoop scalingw/TrueScaletm) ? ???? ??? ??? Skytree Fast Internode Communication CONFIDENTIAL
  • 40. ????? ????:Speed& E?ciency Scikit-learn R MLlib Skytree 26x 128x 153x GBTR, Single Node, 13 million rows (in 1000s of seconds) 0 5 10 15 20 25 30 35 GBTR, Multi-node, 10M-100M Rows (in 1000s of seconds) 0 2000 4000 6000 8000 10000 0 20 40 60 80 100 Time n Skytree Deep Optimizations O( n?), O(n?) vs. O(n), O(nlog(n)) n? n? nlog (n) n 0 10000 20000 30000 40000 50000 Single node 8 nodes - Skytree Mllib - did not complete CONFIDENTIAL MLlib 71x slower - Skytree ? ?? ???????????????????????. ? ???????????????????????????. ? ?????????????? O(nlog(n))???.
  • 41. ?????? ?? ? GUI??? ??? ?? ?? ? ML??? ??, ?? ??, ?? ?? ??? ? ??? ?? ???? ?? ???? ? AutoModel& SmartSearch : One step ??-??-??? ? ???? : GUI, CLI, Python & Java SDKs, REST API’s, ML ??, ? ? ??, ??? ?? ?? ? GUI: Model comprehension, Variable importance,tree visualization; ?? ??? ?? ?? ? CONFIDENTIAL
  • 42. ?????? High LevelArchitecture Flexible Delivery On PremisesCloud Production CONFIDENTIAL
  • 43. Bigger Data. Better Insights.? Skytree: Machine Learning Built for theEnterprise CONFIDENTIAL ??? ?? kosena21@naver.com 010-9338-6400

Editor's Notes

  1. Data Analysts, freeing PhDs to focus on high leverage challenges - ????? ?? ??? ???? ?? ??? ??? ??
  2. We are literally speechless - ??? ??? ? ??? ??? ??? ?????.
  3. I want our analysts to create models with Skytree rather than writing software ????? ??? ???? ??? skytree? ???? ?? ????? ???? ??? ??? ?????.(??? R script? ???? ?? ??)
  4. Skytree opened our eyes to what was possible with ML. It changed everyone at Risk Management by providing deeper, personalized understanding of our customers. We have been working on fraud detection since 1990 and now have a 10% lift using Skytree. That's a big deal. Skytree? ????? ??? ?? ?? ?????. ???? ???? ?? ??? ??? ??? ?? ?? ??? ??? ???????. Skytree? ?????? ??? 1990??? ???? ?? ?????? ??? 10% ???????. ??? ??? ????. Business Impact : Chargebacks greatly reduced - ?? ?? : ?? ??? ???? ???????. Operational Impact: 300X shorter threat response time ???? : ??? ?? ???? 300? ?? Financial Impact: $50M+ savings annually - ?? ?? : ?? 50?? ?? ??
  5. In another customer, a Fortune 100 insurance company, they were looking to optimize profitability by combining prediction of losses (freq and severity), binding (customer acquisition) and retention (price sensitivity in customers). They ran a series of tests, first to understand the improvement in speed Skytree could offer them relative to R and second to understand how large of dataset (records and attributes) they could throw at us. They found an across the board improvement for the varied methods they were using of anywhere between 22x and 80x speed improvement. During the limited pilot, they increased the dataset from a couple of hundred thousand rows and 20 attributes to more than 450 million rows and 450 attributes. Skytree ran this dataset in less than 90 minutes, and R, after running for over 24 hours, did not complete.
  6. The goal is to detect patterns and anomalies and take action before the user experience is negatively impacted. Using Skytree, we basically take all the data, put it together, and correlate events across those streams ??? ???? ???? ???? ???? ??? ?? ??? ??? ??? ?? ??? ?????. Skytree? ????, ????? ?? ???? ????, ???? ?? ???? ?? ???? ?? ???? ?????.