ݺߣ

ݺߣShare a Scribd company logo
Parameter Server Approach for
Online Learning @ Twitter
Joe Xie, Yong Wang and Yue Lu
ML Infra Group, Ads Prediction Team
Oct 10, 2017
Outline
? Background
C Online learning
C Challenges
? Parameter Server Approaches
C v1.0 Decouple the training and prediction
C v2.0 Scale the training
C v3.0 Scale the model
? Future Directions
Background
Twitter is Realtime
? Twitter is all about real-time: news, events, trends,
hashtags.
C Users interest and intent change in realtime.
C Context changes in realtime.
C New advertisers, new campaigns are added in realtime.
? ML is increasingly at the core of everything we build at
Twitter
C ML model dynamically adapts to changes spanning as short as a few
hours even minutes
Real time:
Time
Model
Data Stream
Prediction Stream
Time
Model
Data Stream
Prediction Stream
Online Learning Offline Learning
Learning Phase Training Phase Serving Phase
ReadWriteRead &
Write
Read &
Write
Real time C Online Learning
Architecture
Simple and efficient for Ads Prediction and
Moments Relevance production services
Challenges
? Network fanout
C The same traffic stream is sent many times over to each prediction
instance, wasting network bandwidth.
? Limit to training traffic size
COnline training throughput is currently limited by the capacity (CPU /
Network bandwidth) of a single mesos worker
? Limit to model size
C All model are hosted within the memory for each instance.
Parameter Server Approaches
Model Architecture
Raw Features
Raw Features Feature Crosses Decision Tree
(e.g., XGBoost...)
Neural Network
(e.g., Torch,
TensorFlow...)
...
Distributed Large-scale Online Logistic Regression
(Parameter Server)
 Fully explore the feature interaction
w/o training latency constraint.
 The feature interactions dont
change frequently historically.
 Flexible architecture with new model
structure & external machine
learning framework.
20X training data
- Parameter server v2.0 to scale the
training traffic
10X features+algo complexity
- Parameter server v3.0 to scale the
model size
10X prediction qps
- Parameter server v1.0 to decouple
the training and prediction requests
Parameter Server Approaches
Parameter Server v1.0
Training
Worker
Training
Traffic
Observation
Service
Observation
Service
Observation
Workers
Instance of
Prediction
Service
M
od
el
Instance of
Prediction
Service
M
od
el
Instance of
Prediction
ServicePrediction
Worker
Pull Model
Model
Model
Pull
Downsampling
Through
 New architecture to decouple
the training / prediction services
into different clusters.
10X Prediction capacity
Higher Serving efficiency
Prediction
Requests
Updates
Downsampling
Parameter Server v1.0
? Separated training service
CTake training traffic to generate incremental model update
? New observation service
C Consume incremental model update
C Evaluate training traffic for model quality assurance
? Separated prediction service
C Consume incremental model update
C Serve the prediction request
Parameter Server v1.0
? Launched into ads engagement
prediction models.
C Mesos Efficiency: 40% reduction in CPU cores
required.
C Network Efficiency: 60% reduction in fan-out
messages required.
Parameter Server v2.0
Parameter
Server
Mo
del
Instance of
Prediction
Service Mo
del
Training
Workers
Training
Traffic
Observation
Service
Observation
Service
Observation
Worker
NO downsamplingPull
Push/Pull
Instance of
Prediction
Service
M
od
el
Instance of
Prediction
Service
M
od
el
Instance of
Prediction
Service
M
od
el
Instance of
Prediction
Service
M
od
el
M
od
el
Instance of
Prediction
ServicePrediction
Workers
Pull
Model
ModelModel
Model
Through
 New architecture to
distribute the training
20X Training data
Higher model quality
Dispatch
Workers
Dispatch
Workers
Dispatch
Workers
Downsampling
Prediction
Requests
Parameter Server v2.0
? New dispatch service
CTake un-sampled training traffic and dispatch to training service
? Updated training service
CTake training traffic and produce updates for parameter service
CReceive model update from parameter service
? New parameter service
C Aggregate the updates from training services
C Send model update to training / observation / prediction services
Parameter Server v2.0
? Launched into ads engagement
prediction models.
? First version using simple model-average
aggregation.
C20x training capacity
Cxx% model quality gain
Parameter Server v3.0
Mo
del
Instance of
Prediction
Service Mo
del
Training
Workers
Training
Traffic
Observation
Service
Observation
Service
Observation
Worker
NO downsamplingPull
Push/Pull
Instance of
Prediction
Service
M
od
el
Instance of
Prediction
Service
M
od
el
Instance of
Prediction
Service
M
od
el
Instance of
Prediction
Service
M
od
el
M
od
el
Instance of
Prediction
ServicePrediction
Workers
Pull
Model
ModelModel
Model
Dispatch
Workers
Dispatch
Workers
Dispatch
Workers
Downsampling
Prediction
RequestsParameter
Server
Parameter
Server
Parameter
Server
Model
Through
 New architecture for
model / feature sharding
More complex model
Higher model quality
Parameter Server v3.0
? Updated parameter service (In progress)
CModel sharding: Parameter instance hosts single model instead of
multiple models.
?xx% model quality gain in experimentation.
CFeature sharding: Parameter instance hosts partial of single model.
Future Directions
Future Works
?
?
Parameter Server Approach for Online Learning at Twitter

More Related Content

Viewers also liked (6)

PDF
Horovod - Distributed TensorFlow Made Easy
Alexander Sergeev
?
PDF
Large-Scale Training with GPUs at Facebook
Faisal Siddiqi
?
PDF
2017 10-10 (netflix ml platform meetup) learning item and user representation...
Ed Chi
?
PPTX
Olivier Mathiot - Rakuten PriceMinister - Extrait Livre Blanc 100 #PortraitDe...
Sbastien Bourguignon
?
PPTX
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Sujit Pal
?
PPTX
Understanding Feature Space in Machine Learning
Alice Zheng
?
Horovod - Distributed TensorFlow Made Easy
Alexander Sergeev
?
Large-Scale Training with GPUs at Facebook
Faisal Siddiqi
?
2017 10-10 (netflix ml platform meetup) learning item and user representation...
Ed Chi
?
Olivier Mathiot - Rakuten PriceMinister - Extrait Livre Blanc 100 #PortraitDe...
Sbastien Bourguignon
?
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Sujit Pal
?
Understanding Feature Space in Machine Learning
Alice Zheng
?

Similar to Parameter Server Approach for Online Learning at Twitter (20)

PDF
ML Model Serving at Twitter
Zhiyong (Joe) Xie
?
PPTX
ICML'16 Scaling ML System@Twitter
Jack Xiaojiang Guo
?
PDF
Scaling ml @ careem (oreilly ai conf)
Ahmed Kamal
?
PDF
Service Virtualization - Next Gen Testing Conference Singapore 2013
Min Fang
?
PPSX
Automation & Professional Services
MarketingArrowECS_CZ
?
PDF
PureApplication: System, Service, Software
Prolifics
?
PDF
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Paul Brebner
?
PPTX
BT Group: Use of Graph in VENA (a smart broadcast network)
Neo4j
?
PDF
DEVNET-1153 Enterprise Application to Infrastructure Integration C SDN Apps
Cisco DevNet
?
PPTX
Practical soa for business and researchers
Mustafa Gamal
?
PDF
Enterprise Application to Infrastructure Integration - SDN Apps
MiftakhZein1
?
PPTX
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Prasanna Hegde
?
PPT
How to improve customer experience with a self organizing network
Comarch
?
PPTX
Mohamed Sabri: Operationalize machine learning with Kubeflow
Lviv Startup Club
?
PPTX
Mohamed Sabri: Operationalize machine learning with Kubeflow
Edunomica
?
PDF
How to Revamp your Legacy Applications For More Agility and Better Service - ...
NRB
?
PDF
Service Provider Architectures for Tomorrow by Chow Khay Kid
MyNOG
?
PDF
The Why and How of Applications with APIs and microservices
Ronald Ashri
?
ML Model Serving at Twitter
Zhiyong (Joe) Xie
?
ICML'16 Scaling ML System@Twitter
Jack Xiaojiang Guo
?
Scaling ml @ careem (oreilly ai conf)
Ahmed Kamal
?
Service Virtualization - Next Gen Testing Conference Singapore 2013
Min Fang
?
Automation & Professional Services
MarketingArrowECS_CZ
?
PureApplication: System, Service, Software
Prolifics
?
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Paul Brebner
?
BT Group: Use of Graph in VENA (a smart broadcast network)
Neo4j
?
DEVNET-1153 Enterprise Application to Infrastructure Integration C SDN Apps
Cisco DevNet
?
Practical soa for business and researchers
Mustafa Gamal
?
Enterprise Application to Infrastructure Integration - SDN Apps
MiftakhZein1
?
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Prasanna Hegde
?
How to improve customer experience with a self organizing network
Comarch
?
Mohamed Sabri: Operationalize machine learning with Kubeflow
Lviv Startup Club
?
Mohamed Sabri: Operationalize machine learning with Kubeflow
Edunomica
?
How to Revamp your Legacy Applications For More Agility and Better Service - ...
NRB
?
Service Provider Architectures for Tomorrow by Chow Khay Kid
MyNOG
?
The Why and How of Applications with APIs and microservices
Ronald Ashri
?
Ad

Recently uploaded (20)

PDF
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
?
PPTX
Comparison of Flexible and Rigid Pavements in Bangladesh
Arifur Rahman
?
PDF
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
?
PDF
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
?
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
?
PPTX
Introduction to Python Programming Language
merlinjohnsy
?
PPTX
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
?
PPT
SF 9_Unit 1.ppt software engineering ppt
AmarrKannthh
?
PPSX
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
?
PPTX
WHO And BIS std- for water quality .pptx
dhanashree78
?
PDF
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
?
PDF
Rapid Prototyping for XR: Lecture 2 - Low Fidelity Prototyping.
Mark Billinghurst
?
PPTX
CST413 KTU S7 CSE Machine Learning Neural Networks and Support Vector Machine...
resming1
?
PDF
FSE-Journal-First-Automated code editing with search-generate-modify.pdf
cl144
?
PDF
Rapid Prototyping for XR: Lecture 6 - AI for Prototyping and Research Directi...
Mark Billinghurst
?
PDF
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Mark Billinghurst
?
PDF
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
?
PDF
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
?
PPT
FINAL plumbing code for board exam passer
MattKristopherDiaz
?
PPTX
Stability of IBR Dominated Grids - IEEE PEDG 2025 - short.pptx
ssuser307730
?
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
?
Comparison of Flexible and Rigid Pavements in Bangladesh
Arifur Rahman
?
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
?
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
?
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
?
Introduction to Python Programming Language
merlinjohnsy
?
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
?
SF 9_Unit 1.ppt software engineering ppt
AmarrKannthh
?
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
?
WHO And BIS std- for water quality .pptx
dhanashree78
?
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
?
Rapid Prototyping for XR: Lecture 2 - Low Fidelity Prototyping.
Mark Billinghurst
?
CST413 KTU S7 CSE Machine Learning Neural Networks and Support Vector Machine...
resming1
?
FSE-Journal-First-Automated code editing with search-generate-modify.pdf
cl144
?
Rapid Prototyping for XR: Lecture 6 - AI for Prototyping and Research Directi...
Mark Billinghurst
?
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Mark Billinghurst
?
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
?
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
?
FINAL plumbing code for board exam passer
MattKristopherDiaz
?
Stability of IBR Dominated Grids - IEEE PEDG 2025 - short.pptx
ssuser307730
?
Ad

Parameter Server Approach for Online Learning at Twitter

  • 1. Parameter Server Approach for Online Learning @ Twitter Joe Xie, Yong Wang and Yue Lu ML Infra Group, Ads Prediction Team Oct 10, 2017
  • 2. Outline ? Background C Online learning C Challenges ? Parameter Server Approaches C v1.0 Decouple the training and prediction C v2.0 Scale the training C v3.0 Scale the model ? Future Directions
  • 4. Twitter is Realtime ? Twitter is all about real-time: news, events, trends, hashtags. C Users interest and intent change in realtime. C Context changes in realtime. C New advertisers, new campaigns are added in realtime. ? ML is increasingly at the core of everything we build at Twitter C ML model dynamically adapts to changes spanning as short as a few hours even minutes
  • 5. Real time: Time Model Data Stream Prediction Stream Time Model Data Stream Prediction Stream Online Learning Offline Learning Learning Phase Training Phase Serving Phase ReadWriteRead & Write Read & Write
  • 6. Real time C Online Learning Architecture Simple and efficient for Ads Prediction and Moments Relevance production services
  • 7. Challenges ? Network fanout C The same traffic stream is sent many times over to each prediction instance, wasting network bandwidth. ? Limit to training traffic size COnline training throughput is currently limited by the capacity (CPU / Network bandwidth) of a single mesos worker ? Limit to model size C All model are hosted within the memory for each instance.
  • 9. Model Architecture Raw Features Raw Features Feature Crosses Decision Tree (e.g., XGBoost...) Neural Network (e.g., Torch, TensorFlow...) ... Distributed Large-scale Online Logistic Regression (Parameter Server) Fully explore the feature interaction w/o training latency constraint. The feature interactions dont change frequently historically. Flexible architecture with new model structure & external machine learning framework.
  • 10. 20X training data - Parameter server v2.0 to scale the training traffic 10X features+algo complexity - Parameter server v3.0 to scale the model size 10X prediction qps - Parameter server v1.0 to decouple the training and prediction requests Parameter Server Approaches
  • 11. Parameter Server v1.0 Training Worker Training Traffic Observation Service Observation Service Observation Workers Instance of Prediction Service M od el Instance of Prediction Service M od el Instance of Prediction ServicePrediction Worker Pull Model Model Model Pull Downsampling Through New architecture to decouple the training / prediction services into different clusters. 10X Prediction capacity Higher Serving efficiency Prediction Requests Updates Downsampling
  • 12. Parameter Server v1.0 ? Separated training service CTake training traffic to generate incremental model update ? New observation service C Consume incremental model update C Evaluate training traffic for model quality assurance ? Separated prediction service C Consume incremental model update C Serve the prediction request
  • 13. Parameter Server v1.0 ? Launched into ads engagement prediction models. C Mesos Efficiency: 40% reduction in CPU cores required. C Network Efficiency: 60% reduction in fan-out messages required.
  • 14. Parameter Server v2.0 Parameter Server Mo del Instance of Prediction Service Mo del Training Workers Training Traffic Observation Service Observation Service Observation Worker NO downsamplingPull Push/Pull Instance of Prediction Service M od el Instance of Prediction Service M od el Instance of Prediction Service M od el Instance of Prediction Service M od el M od el Instance of Prediction ServicePrediction Workers Pull Model ModelModel Model Through New architecture to distribute the training 20X Training data Higher model quality Dispatch Workers Dispatch Workers Dispatch Workers Downsampling Prediction Requests
  • 15. Parameter Server v2.0 ? New dispatch service CTake un-sampled training traffic and dispatch to training service ? Updated training service CTake training traffic and produce updates for parameter service CReceive model update from parameter service ? New parameter service C Aggregate the updates from training services C Send model update to training / observation / prediction services
  • 16. Parameter Server v2.0 ? Launched into ads engagement prediction models. ? First version using simple model-average aggregation. C20x training capacity Cxx% model quality gain
  • 17. Parameter Server v3.0 Mo del Instance of Prediction Service Mo del Training Workers Training Traffic Observation Service Observation Service Observation Worker NO downsamplingPull Push/Pull Instance of Prediction Service M od el Instance of Prediction Service M od el Instance of Prediction Service M od el Instance of Prediction Service M od el M od el Instance of Prediction ServicePrediction Workers Pull Model ModelModel Model Dispatch Workers Dispatch Workers Dispatch Workers Downsampling Prediction RequestsParameter Server Parameter Server Parameter Server Model Through New architecture for model / feature sharding More complex model Higher model quality
  • 18. Parameter Server v3.0 ? Updated parameter service (In progress) CModel sharding: Parameter instance hosts single model instead of multiple models. ?xx% model quality gain in experimentation. CFeature sharding: Parameter instance hosts partial of single model.