3. ???? ?? ?? ?? ??
3
Signal
Normalization
Feature Extraction /
Dimension
Reduction
Clustering /
Outlier Detection
Classification/
Prediction
Normalized
Signals
Temporal
Features
Clusters
Adaptation Feedback
Decision
Making
Raw Time
Series
Facts/
Truth
Signal processing techniques +
ICA- Independent Component Analysis
K-means
Random Fore
st
Conditions,
Unknowns
10. 2012, Alexander Gray, Ph.D., Associate Professor, Georgia Tech
Berkeley, Carnegie Mellon, NASA Jet Propulsion Lab
Software plaLorm that provides enterprise class Machine Learning
for Big Data that lets Data Scientists & BI Analysts create more Acc
urate Predictive Models in Less Time
Freemium download, software subscription, node-based pricing model.
On-prem or in-cloud deployment.
Prof. Michael Jordan, UC Berkeley: machine learning ‘godfather’
Prof. David PaXerson, UC Berkeley: systems (inventor RISC, RAID)
Prof. Pat Hanrahan, Stanford: data visualization (Tableau, Pixar)
Prof. James Demmel, UC Berkeley: high-performance computing (LAPACK)
Extended scienti?c advisory council consists of top ML professors from 20 or so
universities (CMU, Princeton, Caltech, Purdue, Cambridge, etc)
Launched
Product
Business Model
Investors
Technical Advis
ory Board (Nat
l. Academy me
mbers)
Academic
Network
2
CONFIDENTIAL
?????? ???? ???????.
13. ???? ????
Predict categories and classes
Predict values and numbers
Grouping and segmentation
Detection and characterization
Visualization and reduction
Find similar items
Classification R
egression Clu
stering
Density Estimation Di
mension Reduction Multidime
nsional Querying
Example Skytree Algorithms: Random Decision Forests, Gradient Boosting Machines, Nearest Neighbor,
Kernel Density Estimation, K-means, Linear Regression, Support Vector Machine,
2-point Correlation, Decision Tree, Singular Value Decomposition, Range Search, Logistic Regression
Recommendations Predictions Outlier
Detection
21. CONFIDENTI
Customer 360oView
External DataBig Data
Environment
DataData
Data warehouse
E-MailCRM
Single Customer View
with improved decision making
capabilities based on Customer
data
Big Data
Enabling innovative products
& services, customer
satisfaction
Analytics
Churn propensity and prevention,
Product Sentiment, Recommendations and m
ore.
Internal Data
31. ?????? ???? ??
Deep knowledge of algorithms
Drawing from the latest from academia
Smart programming
Efficient ways to compute order N(2) and N(3)
Distributed systems
Take advantage of parallel computing speed
32. 35% 42%4 Sec
?? ????
Major PainPoints: Speed & Accuracy
Current Solution: Hadoop/Mahout,Homegrown
You might enjoy:
~1500x execution speedup
20% improvement in reco
mmendation relevance
SKYTREE
Results
“We are literally speechless”
Recommendation Engine (like Net?ix)
You’ve enjoyed:
- Skytree Customer
Skytree Impact
LEGACY
LEGACY
SKYTREE
97 Min
(5,820 sec)
Runtime
Precision
CONFIDENTIAL
33. 100 Min 8 Min
Legacy Environment:
100 Node Hadoop Cluster:
1,200 Cores
Runtime: 100 Minutes
Accuracy (Gini): 57%
?? ????
Major Pain Points: Speed & Accuracy
Current Solution: SAS, Hadoop,Homegrown
“I want our analysts to create models with Skytree rather than writing software”
Micro-TargetingApplication
SKYTREE Server:
Single Server:
12 Cores
Runtime: 8 Minutes
Accuracy (Gini): 60%
Skytree Impact
- Skytree Customer
SKYTREE
LEGACY
12.5x improvement on 1 node,
~1200x expected improvement
on 100 nodes
5% improvement in accuracy
Time
CONFIDENTIAL
35. R Skytree
?? ????
Test Suite 1: 20-88x execution
speed-up on same data sets
Test 2: >50,000x increase in
data size and ran to completion
R Skytree
?? : ?? ??? Skytree Impact
Application: Profit optimization through
? Loss Prediction
? Binding
? Retention
? Price Elasticity
Major Pain Points: Speed & Scale
Current Solution: R, Hadoop,
Homegrown
Speed-up Scale
Using up to 450 million rows and 450 attributes
39. ????? ???? -Architected for Speed andAccuracy
Machine
Learning
Algorithms
Deeply
Optimized
In
Memory
Execution
P A R A L L E L
In Memory
Execution
CPU CPU
I Z E
In Memory
Execution
CPU CPU
? ???? ???? ????
(n,nlog(n)calculationsversusn2 andn3)
? ????????????
? ?? ???? ?? ??
? ????? ??
? ??? ?? ???? (Hadoop scalingw/TrueScaletm)
? ???? ??? ???
Skytree Fast Internode Communication
CONFIDENTIAL
40. ????? ????:Speed& E?ciency
Scikit-learn
R
MLlib
Skytree
26x
128x
153x
GBTR, Single Node, 13 million rows (in 1000s of seconds)
0 5 10 15 20 25 30 35
GBTR, Multi-node, 10M-100M Rows (in 1000s of seconds)
0
2000
4000
6000
8000
10000
0 20 40 60 80 100
Time
n
Skytree Deep Optimizations O(
n?), O(n?) vs. O(n), O(nlog(n))
n?
n?
nlog
(n)
n
0 10000 20000 30000 40000 50000
Single node
8 nodes
- Skytree
Mllib - did not complete
CONFIDENTIAL
MLlib 71x slower
- Skytree
? ?? ???????????????????????.
? ???????????????????????????.
? ?????????????? O(nlog(n))???.
43. Bigger Data. Better Insights.?
Skytree: Machine Learning Built for theEnterprise
CONFIDENTIAL
??? ?? kosena21@naver.com 010-9338-6400
Editor's Notes
Data Analysts, freeing PhDs to focus on high leverage challenges
- ????? ?? ??? ???? ?? ??? ??? ??
We are literally speechless
- ??? ??? ? ??? ??? ??? ?????.
I want our analysts to create models with Skytree rather than writing software
????? ??? ???? ??? skytree? ???? ?? ????? ???? ??? ??? ?????.(??? R script? ???? ?? ??)
Skytree opened our eyes to what was possible with ML. It changed everyone at Risk Management by providing deeper, personalized understanding of our customers. We have been working on fraud detection since 1990 and now have a 10% lift using Skytree. That's a big deal.
Skytree? ????? ??? ?? ?? ?????. ???? ???? ?? ??? ??? ??? ?? ?? ??? ??? ???????. Skytree? ?????? ??? 1990??? ???? ?? ?????? ??? 10% ???????. ??? ??? ????.
Business Impact : Chargebacks greatly reduced
- ?? ?? : ?? ??? ???? ???????.
Operational Impact: 300X shorter threat response time
???? : ??? ?? ???? 300? ??
Financial Impact: $50M+ savings annually
- ?? ?? : ?? 50?? ?? ??
In another customer, a Fortune 100 insurance company, they were looking to optimize profitability by combining prediction of losses (freq and severity), binding (customer acquisition) and retention (price sensitivity in customers).
They ran a series of tests, first to understand the improvement in speed Skytree could offer them relative to R and second to understand how large of dataset (records and attributes) they could throw at us. They found an across the board improvement for the varied methods they were using of anywhere between 22x and 80x speed improvement. During the limited pilot, they increased the dataset from a couple of hundred thousand rows and 20 attributes to more than 450 million rows and 450 attributes. Skytree ran this dataset in less than 90 minutes, and R, after running for over 24 hours, did not complete.
The goal is to detect patterns and anomalies and take action before the user experience is negatively impacted. Using Skytree, we basically take all the data, put it together, and correlate events across those streams
??? ???? ???? ???? ???? ??? ?? ??? ??? ??? ?? ??? ?????. Skytree? ????, ????? ?? ???? ????, ???? ?? ???? ?? ???? ?? ???? ?????.