際際滷

際際滷Share a Scribd company logo
SEARCH, NETWORK AND ANALYTICS
Using Set Cover to Optimize a Large-
Scale Low Latency Distributed Graph
LinkedIn Presentation Template
息2013 LinkedIn Corporation. All Rights Reserved. 1
Rui Wang
Staff Software Engineer
SEARCH, NETWORK AND ANALYTICS
Outline
 LinkedIns Distributed Graph Infrastructure
 The Scaling Problem
 Set Cover by Example
 Evaluation
 Related Work
 Conclusions and Future Work
息2013 LinkedIn Corporation. All Rights Reserved. 2
SEARCH, NETWORK AND ANALYTICS
Outline
 LinkedIns Distributed Graph Infrastructure
 The Scaling Problem
 Set Cover by Example
 Evaluation
 Related Work
 Conclusions and Future Work
息2013 LinkedIn Corporation. All Rights Reserved. 3
SEARCH, NETWORK AND ANALYTICS
LinkedIn Products Backed By Social Graph
息2013 LinkedIn Corporation. All Rights Reserved. 4
SEARCH, NETWORK AND ANALYTICS
LinkedIns Distributed Graph APIs
 Graph APIs
 Get Connections
 Who does member X know?
 Get Shared Connections
 Who do I know in common with member Y?
 Get Graph Distances
 Whats the graph distances for each member returned from a search
query?
 Is member Z within my second degree network so that I can view her
profile?
 Over 50% queries is to get graph distances
息2013 LinkedIn Corporation. All Rights Reserved. 5
SEARCH, NETWORK AND ANALYTICS
Distributed Graph Architecture Diagram
息2013 LinkedIn Corporation. All Rights Reserved. 6
SEARCH, NETWORK AND ANALYTICS
LinkedIns Distributed Graph Infrastructure
 Graph Database (Graph DB)
 Partitioned and replicated graph database
 Distributed adjacency list
 Distributed Network Cache (NCS)
 LRU cache stores second degree network for active members
 Graph traversals are converted to set intersections
 Graph APIs
 Get Connections
 Get Shared Connections
 Get Graph Distances
息2013 LinkedIn Corporation. All Rights Reserved. 7
SEARCH, NETWORK AND ANALYTICS
Outline
 LinkedIns Distributed Graph Infrastructure
 The Scaling Problem
 Set Cover by Example
 Evaluation
 Related Work
 Conclusions and Future Work
息2013 LinkedIn Corporation. All Rights Reserved. 8
SEARCH, NETWORK AND ANALYTICS
Graph Distance Queries and Second Degree Creation
息2013 LinkedIn Corporation. All Rights Reserved. 9
get graph distances
cache lookup
[exists=false] retrieve
1st degree connections
[exists=true] set
intersections
Partition IDs
partial merges
K-Way
merges
return
return
SEARCH, NETWORK AND ANALYTICS
The Scaling Problem Illustrated
息2013 LinkedIn Corporation. All Rights Reserved. 10
Graph DBNCS
 Increased Query Distribution
 Merging and De-duping shift to single NCS node
SEARCH, NETWORK AND ANALYTICS
Outline
 LinkedIns Distributed Graph Infrastructure
 The Scaling Problem
 Set Cover and Example
 Evaluation
 Related Work
 Conclusions and Future Work
息2013 LinkedIn Corporation. All Rights Reserved. 11
SEARCH, NETWORK AND ANALYTICS
Set Cover Problem
 Definition
 Given a set of elements U = {1, 2, , m } (called the universe) and a
family S of n sets whose union equals the universe, the set cover
problem is to identify the smallest subset of S the union of which
contains all elements in the universe.
 Greedy Algorithm
 Rule to select sets
 At each stage, choose the set that contains the largest number of
uncovered elements.
 Upper bound
 O(log s), where s is the size of the largest set from S
息2013 LinkedIn Corporation. All Rights Reserved. 12
SEARCH, NETWORK AND ANALYTICS
Modified Set Cover Algorithm Example Cont.
息2013 LinkedIn Corporation. All Rights Reserved. 13
 Build a map of partition ID -> Graph DB nodes for the input graph
SEARCH, NETWORK AND ANALYTICS
Modified Set Cover Algorithm Example Cont.
息2013 LinkedIn Corporation. All Rights Reserved. 14
 Randomly pick an element
from U = { 1,2,5,6 }
 e = 5
 Retrieve from map
 nodes = { R21, R13 }
 Intersect
 R21 = {1,5 } with U,
 R13 = { 5,6 } with U
 Select R21
 U = {2,6}, C = { R21}
SEARCH, NETWORK AND ANALYTICS
Modified Set Cover Algorithm Example Cont.
息2013 LinkedIn Corporation. All Rights Reserved. 15
 Randomly pick an element
from U = { 2,6 }
 e = 2
 Retrieve from map
 nodes = { R11, R22 }
 Intersect
 R11 = {1,2 } with U,
 R22 = { 2,4 } with U
 Select R22
 U = {6}, C = { R21, R22 }
SEARCH, NETWORK AND ANALYTICS
Modified Set Cover Algorithm Example Cont.
息2013 LinkedIn Corporation. All Rights Reserved. 16
 Pick the final element from
U = { 6 }
 e = 6
 Retrieve from map
 nodes = { R23, R13 }
 Intersect
 R23 = { 3,6 } with U,
 R13 = { 5,6 } with U
 Select R23
 U = {}, C = {R21, R22, R23 }
SEARCH, NETWORK AND ANALYTICS
Modified Set Cover Algorithm Example Solution
 Solution Compared to Optimal Result for U , where U = {1,2,5,6}
息2013 LinkedIn Corporation. All Rights Reserved. 17
SEARCH, NETWORK AND ANALYTICS
Outline
 LinkedIns Distributed Graph Infrastructure
 The Scaling Problem
 Set Cover by Example
 Evaluation
 Related Work
 Conclusions and Future Work
息2013 LinkedIn Corporation. All Rights Reserved. 18
SEARCH, NETWORK AND ANALYTICS
Evaluation
 Production Results
 Second degree cache creation drops 38% on 99th percentile
 Graph distance calculation drops 25% on 99th percentile
 Outbound traffic drops 40%; Inbound traffic drops 10%
息2013 LinkedIn Corporation. All Rights Reserved. 19
95th quantile 99th quantile
0
1
2
3
control setcover control setcover
latency(normalized)
network cache building
95th quantile 99th quantile
0
1
2
3
control setcover control setcover
latency(normalized)
getdistances
SEARCH, NETWORK AND ANALYTICS
Outline
 LinkedIns Distributed Graph Infrastructure
 The Scaling Problem
 Set Cover by Example
 Evaluation
 Related Work
 Conclusions and Future Work
息2013 LinkedIn Corporation. All Rights Reserved. 20
SEARCH, NETWORK AND ANALYTICS
Related Work
 Scaling through Replications
 Collocating neighbors [Pujol2010]
 Replication based on read/write frequencies [Mondal2012]
 Replication based on locality [Carrasco2011]
 Multi-Core Implementations
 Parallel graph exploration [Hong2011]
 Offline Graph Systems
 Googles Pregel [Malewicz2010]
 Distributed GraphLab [Low2012]
息2013 LinkedIn Corporation. All Rights Reserved. 21
SEARCH, NETWORK AND ANALYTICS
Outline
 LinkedIns Distributed Graph Infrastructure
 The Scaling Problem
 Set Cover by Example
 Evaluation
 Related Work
 Conclusions and Future Work
息2013 LinkedIn Corporation. All Rights Reserved. 22
SEARCH, NETWORK AND ANALYTICS
Conclusions and Future Work
 Future Work
 Incremental cache updates
 Replication on GraphDB nodes through LRU caching
 New graph traversal algorithms
 Conclusions
 Key challenges tackled
 Work distribution balancing
 Communication Bandwidth
 Set cover optimized latency for distributed query by
 Identifying a much smaller set of GraphDB nodes serving queries
 Shifting workload to GraphDB to utilize parallel powers
 Alleviating the K-way merging costs on NCS by reducing K
 Available at
https://github.com/linkedin-
sna/norbert/blob/master/network/src/main/scala/com/linkedin/norbert/network/partitioned
/loadbalancer/DefaultPartitionedLoadBalancerFactory.scala
息2013 LinkedIn Corporation. All Rights Reserved. 23
SEARCH, NETWORK AND ANALYTICS息2013 LinkedIn Corporation. All Rights Reserved. 24
Ad

Recommended

PDF
Neo4j Graph Data Science Training - June 9 & 10 - 際際滷s #6 Graph Algorithms
Neo4j
PPTX
From a sea of projects to collaboration opportunities within seconds
Michel Drescher
PPT
Artificial Intelligence
Institute of Technology Telkom
PDF
Poster
Kevin Razavet
PDF
50120130406017
IAEME Publication
PPTX
FDS_dept_ppt.pptx
SatyajitPatil42
PDF
Improve ml predictions using graph algorithms (webinar july 23_19).pptx
Neo4j
PDF
Government GraphSummit: Optimizing the Supply Chain
Neo4j
PPTX
Optimizing Your Supply Chain with Neo4j
Neo4j
PPTX
Unit 3- Software Design.pptx
LSURYAPRAKASHREDDY
PPTX
Introduction to Mahout given at Twin Cities HUG
MapR Technologies
PPTX
Introduction to Mahout
Ted Dunning
PPTX
K anonymity for crowdsourcing database
LeMeniz Infotech
PDF
Using Linkurious in your Enterprise Architecture projects
Linkurious
PPTX
Microtask Crowdsourcing Applications for Linked Data
EUCLID project
PDF
GraphTour 2020 - Graphs & AI: A Path for Data Science
Neo4j
PDF
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET Journal
PPTX
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
Neo4j
PDF
The Neo4j Data Platform for Today & Tomorrow.pdf
Neo4j
PDF
El camino hacia el 辿xito con las bases de datos de grafos, la ciencia de dato...
Neo4j
PDF
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
IRJET Journal
PDF
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...
Keiichiro Ono
PDF
Metric Recovery from Unweighted k-NN Graphs
joisino
PDF
A Literature Survey on Image Linguistic Visual Question Answering
IRJET Journal
PDF
Neo4j GraphTalk Helsinki - Next-Gerneation Telecommunication Solutions with N...
Neo4j
PDF
Partial Object Detection in Inclined Weather Conditions
IRJET Journal
PDF
Optimizing Your Supply Chain with the Neo4j Graph
Neo4j
PDF
High-Performance Graph Analysis and Modeling
Nesreen K. Ahmed
PDF
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
PPTX
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore

More Related Content

Similar to Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph (20)

PPTX
Optimizing Your Supply Chain with Neo4j
Neo4j
PPTX
Unit 3- Software Design.pptx
LSURYAPRAKASHREDDY
PPTX
Introduction to Mahout given at Twin Cities HUG
MapR Technologies
PPTX
Introduction to Mahout
Ted Dunning
PPTX
K anonymity for crowdsourcing database
LeMeniz Infotech
PDF
Using Linkurious in your Enterprise Architecture projects
Linkurious
PPTX
Microtask Crowdsourcing Applications for Linked Data
EUCLID project
PDF
GraphTour 2020 - Graphs & AI: A Path for Data Science
Neo4j
PDF
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET Journal
PPTX
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
Neo4j
PDF
The Neo4j Data Platform for Today & Tomorrow.pdf
Neo4j
PDF
El camino hacia el 辿xito con las bases de datos de grafos, la ciencia de dato...
Neo4j
PDF
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
IRJET Journal
PDF
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...
Keiichiro Ono
PDF
Metric Recovery from Unweighted k-NN Graphs
joisino
PDF
A Literature Survey on Image Linguistic Visual Question Answering
IRJET Journal
PDF
Neo4j GraphTalk Helsinki - Next-Gerneation Telecommunication Solutions with N...
Neo4j
PDF
Partial Object Detection in Inclined Weather Conditions
IRJET Journal
PDF
Optimizing Your Supply Chain with the Neo4j Graph
Neo4j
PDF
High-Performance Graph Analysis and Modeling
Nesreen K. Ahmed
Optimizing Your Supply Chain with Neo4j
Neo4j
Unit 3- Software Design.pptx
LSURYAPRAKASHREDDY
Introduction to Mahout given at Twin Cities HUG
MapR Technologies
Introduction to Mahout
Ted Dunning
K anonymity for crowdsourcing database
LeMeniz Infotech
Using Linkurious in your Enterprise Architecture projects
Linkurious
Microtask Crowdsourcing Applications for Linked Data
EUCLID project
GraphTour 2020 - Graphs & AI: A Path for Data Science
Neo4j
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET Journal
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
Neo4j
The Neo4j Data Platform for Today & Tomorrow.pdf
Neo4j
El camino hacia el 辿xito con las bases de datos de grafos, la ciencia de dato...
Neo4j
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
IRJET Journal
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...
Keiichiro Ono
Metric Recovery from Unweighted k-NN Graphs
joisino
A Literature Survey on Image Linguistic Visual Question Answering
IRJET Journal
Neo4j GraphTalk Helsinki - Next-Gerneation Telecommunication Solutions with N...
Neo4j
Partial Object Detection in Inclined Weather Conditions
IRJET Journal
Optimizing Your Supply Chain with the Neo4j Graph
Neo4j
High-Performance Graph Analysis and Modeling
Nesreen K. Ahmed

Recently uploaded (20)

PDF
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
PPTX
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
PDF
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
PDF
How to Visualize the Spatio-Temporal Data Using CesiumJS
SANGHEE SHIN
PDF
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
PDF
From Chatbot to Destroyer of Endpoints - Can ChatGPT Automate EDR Bypasses (1...
Priyanka Aash
PDF
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
PDF
Why aren't you using FME Flow's CPU Time?
Safe Software
PDF
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
PPTX
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
PDF
Python Conference Singapore - 19 Jun 2025
ninefyi
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
PDF
Plugging AI into everything: Model Context Protocol Simplified.pdf
Abati Adewale
PPSX
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
PDF
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
PDF
Unlocking FME Flows Potential: Architecture Design for Modern Enterprises
Safe Software
PPTX
Practical Applications of AI in Local Government
OnBoard
PDF
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
How to Visualize the Spatio-Temporal Data Using CesiumJS
SANGHEE SHIN
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
From Chatbot to Destroyer of Endpoints - Can ChatGPT Automate EDR Bypasses (1...
Priyanka Aash
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
Why aren't you using FME Flow's CPU Time?
Safe Software
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
Python Conference Singapore - 19 Jun 2025
ninefyi
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
Plugging AI into everything: Model Context Protocol Simplified.pdf
Abati Adewale
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
Unlocking FME Flows Potential: Architecture Design for Modern Enterprises
Safe Software
Practical Applications of AI in Local Government
OnBoard
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
Ad

Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph

  • 1. SEARCH, NETWORK AND ANALYTICS Using Set Cover to Optimize a Large- Scale Low Latency Distributed Graph LinkedIn Presentation Template 息2013 LinkedIn Corporation. All Rights Reserved. 1 Rui Wang Staff Software Engineer
  • 2. SEARCH, NETWORK AND ANALYTICS Outline LinkedIns Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work 息2013 LinkedIn Corporation. All Rights Reserved. 2
  • 3. SEARCH, NETWORK AND ANALYTICS Outline LinkedIns Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work 息2013 LinkedIn Corporation. All Rights Reserved. 3
  • 4. SEARCH, NETWORK AND ANALYTICS LinkedIn Products Backed By Social Graph 息2013 LinkedIn Corporation. All Rights Reserved. 4
  • 5. SEARCH, NETWORK AND ANALYTICS LinkedIns Distributed Graph APIs Graph APIs Get Connections Who does member X know? Get Shared Connections Who do I know in common with member Y? Get Graph Distances Whats the graph distances for each member returned from a search query? Is member Z within my second degree network so that I can view her profile? Over 50% queries is to get graph distances 息2013 LinkedIn Corporation. All Rights Reserved. 5
  • 6. SEARCH, NETWORK AND ANALYTICS Distributed Graph Architecture Diagram 息2013 LinkedIn Corporation. All Rights Reserved. 6
  • 7. SEARCH, NETWORK AND ANALYTICS LinkedIns Distributed Graph Infrastructure Graph Database (Graph DB) Partitioned and replicated graph database Distributed adjacency list Distributed Network Cache (NCS) LRU cache stores second degree network for active members Graph traversals are converted to set intersections Graph APIs Get Connections Get Shared Connections Get Graph Distances 息2013 LinkedIn Corporation. All Rights Reserved. 7
  • 8. SEARCH, NETWORK AND ANALYTICS Outline LinkedIns Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work 息2013 LinkedIn Corporation. All Rights Reserved. 8
  • 9. SEARCH, NETWORK AND ANALYTICS Graph Distance Queries and Second Degree Creation 息2013 LinkedIn Corporation. All Rights Reserved. 9 get graph distances cache lookup [exists=false] retrieve 1st degree connections [exists=true] set intersections Partition IDs partial merges K-Way merges return return
  • 10. SEARCH, NETWORK AND ANALYTICS The Scaling Problem Illustrated 息2013 LinkedIn Corporation. All Rights Reserved. 10 Graph DBNCS Increased Query Distribution Merging and De-duping shift to single NCS node
  • 11. SEARCH, NETWORK AND ANALYTICS Outline LinkedIns Distributed Graph Infrastructure The Scaling Problem Set Cover and Example Evaluation Related Work Conclusions and Future Work 息2013 LinkedIn Corporation. All Rights Reserved. 11
  • 12. SEARCH, NETWORK AND ANALYTICS Set Cover Problem Definition Given a set of elements U = {1, 2, , m } (called the universe) and a family S of n sets whose union equals the universe, the set cover problem is to identify the smallest subset of S the union of which contains all elements in the universe. Greedy Algorithm Rule to select sets At each stage, choose the set that contains the largest number of uncovered elements. Upper bound O(log s), where s is the size of the largest set from S 息2013 LinkedIn Corporation. All Rights Reserved. 12
  • 13. SEARCH, NETWORK AND ANALYTICS Modified Set Cover Algorithm Example Cont. 息2013 LinkedIn Corporation. All Rights Reserved. 13 Build a map of partition ID -> Graph DB nodes for the input graph
  • 14. SEARCH, NETWORK AND ANALYTICS Modified Set Cover Algorithm Example Cont. 息2013 LinkedIn Corporation. All Rights Reserved. 14 Randomly pick an element from U = { 1,2,5,6 } e = 5 Retrieve from map nodes = { R21, R13 } Intersect R21 = {1,5 } with U, R13 = { 5,6 } with U Select R21 U = {2,6}, C = { R21}
  • 15. SEARCH, NETWORK AND ANALYTICS Modified Set Cover Algorithm Example Cont. 息2013 LinkedIn Corporation. All Rights Reserved. 15 Randomly pick an element from U = { 2,6 } e = 2 Retrieve from map nodes = { R11, R22 } Intersect R11 = {1,2 } with U, R22 = { 2,4 } with U Select R22 U = {6}, C = { R21, R22 }
  • 16. SEARCH, NETWORK AND ANALYTICS Modified Set Cover Algorithm Example Cont. 息2013 LinkedIn Corporation. All Rights Reserved. 16 Pick the final element from U = { 6 } e = 6 Retrieve from map nodes = { R23, R13 } Intersect R23 = { 3,6 } with U, R13 = { 5,6 } with U Select R23 U = {}, C = {R21, R22, R23 }
  • 17. SEARCH, NETWORK AND ANALYTICS Modified Set Cover Algorithm Example Solution Solution Compared to Optimal Result for U , where U = {1,2,5,6} 息2013 LinkedIn Corporation. All Rights Reserved. 17
  • 18. SEARCH, NETWORK AND ANALYTICS Outline LinkedIns Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work 息2013 LinkedIn Corporation. All Rights Reserved. 18
  • 19. SEARCH, NETWORK AND ANALYTICS Evaluation Production Results Second degree cache creation drops 38% on 99th percentile Graph distance calculation drops 25% on 99th percentile Outbound traffic drops 40%; Inbound traffic drops 10% 息2013 LinkedIn Corporation. All Rights Reserved. 19 95th quantile 99th quantile 0 1 2 3 control setcover control setcover latency(normalized) network cache building 95th quantile 99th quantile 0 1 2 3 control setcover control setcover latency(normalized) getdistances
  • 20. SEARCH, NETWORK AND ANALYTICS Outline LinkedIns Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work 息2013 LinkedIn Corporation. All Rights Reserved. 20
  • 21. SEARCH, NETWORK AND ANALYTICS Related Work Scaling through Replications Collocating neighbors [Pujol2010] Replication based on read/write frequencies [Mondal2012] Replication based on locality [Carrasco2011] Multi-Core Implementations Parallel graph exploration [Hong2011] Offline Graph Systems Googles Pregel [Malewicz2010] Distributed GraphLab [Low2012] 息2013 LinkedIn Corporation. All Rights Reserved. 21
  • 22. SEARCH, NETWORK AND ANALYTICS Outline LinkedIns Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work 息2013 LinkedIn Corporation. All Rights Reserved. 22
  • 23. SEARCH, NETWORK AND ANALYTICS Conclusions and Future Work Future Work Incremental cache updates Replication on GraphDB nodes through LRU caching New graph traversal algorithms Conclusions Key challenges tackled Work distribution balancing Communication Bandwidth Set cover optimized latency for distributed query by Identifying a much smaller set of GraphDB nodes serving queries Shifting workload to GraphDB to utilize parallel powers Alleviating the K-way merging costs on NCS by reducing K Available at https://github.com/linkedin- sna/norbert/blob/master/network/src/main/scala/com/linkedin/norbert/network/partitioned /loadbalancer/DefaultPartitionedLoadBalancerFactory.scala 息2013 LinkedIn Corporation. All Rights Reserved. 23
  • 24. SEARCH, NETWORK AND ANALYTICS息2013 LinkedIn Corporation. All Rights Reserved. 24