際際滷

際際滷Share a Scribd company logo
Benchmarking graph databases on the problem 
of community detection 
Sotirios Beis, Symeon Papadopoulos, Yiannis Kompatsiaris 
Centre for Research and Technology Hellas, Information Technologies Institute (CERTH-ITI) 
ADBIS 2014 Conference 
Ohrid, Republic of Macedonia, September 7-10, 2014
Important Note 
 This is an update of the actual presentation given in 
September 2014. After the presentation, we received 
comments about the implementation of our benchmark, 
which led us to update the implementation and rerun the 
experiments. We would like to thank @lgarulli and 
@OrientDB for their contribution. 
 The updated paper is now available on: 
http://mklab.iti.gr/files/beis_adbis2014_corrected.pdf 
while the original is still available on the Springer site: 
http://link.springer.com/chapter/10.1007/978-3-319-10518-5_1 
#2
#3 
Overview 
 Problem formulation 
 Related work 
 Workload description 
 Evaluation 
 Conclusions
Motivation 
 The rapid growth of Online Social Networks contributes 
to the creation of high-volume and velocity graph data 
 Twitter had 145M users in 2010 and 300M in 2011 
http://dstevenwhite.com/2011/12/29/social-media-growth-2006-2011/ 
 The need for massive graph data management and 
processing systems is constantly increasing 
 Many methods and technologies available (RDBMS, OODBMS 
and graph databases) 
 Benchmarks to evaluate candidate solutions are 
absolutely essential! 
 We had to choose one in the context of the SocialSensor 
research project 
#4
Problem Formulation 
 Given a set of database management systems assess 
the performance of each system with respect to a 
common mining task: community detection 
 Many DBMS available to use 
 Graph databases selected, as they are designed to store 
and manage efficiently big graph data 
 Benchmarked systems: Titan, OrientDB and Neo4j 
 Many operations used in databases to benchmark 
 Workloads that simulate common operations in OSNs 
employed 
#5
Related Work (1) 
 Evaluation between different DBMS 
 Giatsoglou et al. focused on the Social Tagging System use 
case scenario 
 Angles et al. and Armstrong et al. proposed a synthetic 
graph generator with OSN characteristics and use this data 
for evaluation 
 Grossniklaus et al. used a workloads that cover a wide 
variety of graph data use case 
 Vicknair et al. utilized query workload that simulates 
typical operations performed in provenance system 
#6
Related Work (2) 
 Evaluation between graph DBMS 
 Bader et al. proposed a benchmark with four operations: 
insert, h-hops node and edge selection, while Dominguez 
et al. reports the implementation. 
 A similar workload is proposed by Jouili et al. emphasizing 
the effects of increasing multiple concurrent users. 
 Ciglan et al. proposed a benchmark focusing on graph 
traversal operations. 
 Dayarathna et al. used similar workload, focusing on graph 
databases server and cloud environments. 
#7
Workloads Description (1) 
Our Contribution 
 Clustering Workload (CW) 
 Very important due to its numerous applications, such as 
topic detection, photo clustering and event detection. 
 Until now most community detection algorithms used 
main memory to store the graph and perform the 
required operations  fast execution time, unable to 
manage big graphs. 
 We propose an implementation of Louvain Method on top 
of graph databases employing cache techniques  take 
advantage of both graph database capabilities and in-memory 
execution speed. 
#8
Workload Description (2) 
Supplementary Workloads 
 Massive Insertion Workload (MIW) 
 Populates a graph with bulk load operations. 
 Simulates batch creation of a graph. 
 Single Insertion Workload (SIW) 
 Every object insertion is committed directly. 
 Simulates the growth of a OSN. 
 Query Workload (QW) 
 FindNeighbors query (FN)  Finds the neighbors of all nodes 
 simulates the friends retrieval a Facebook user 
 FindAdjacentNodes query (FA)  Finds the adjacent nodes of all edges 
 used to find whether two user joined the same Facebook group 
 FindShortestPath (FS)  Finds the shortest paths between 100 nodes 
 used to find the level two LinkedIn users are connected 
#9
Experimental Study 
 Datasets 
 Synthetic for CW, generated with LFR generator. 
 Real for MIW, SIW & QW, from SNAP dataset collection. 
 Evaluation Measure: execution time (second) 
#10
Results: Comparison wrt CW 
#11
Results: Cache size impact 
#12 
Titan 
Neo4j 
OrientDB
Results: comparison wrt MIW and QW 
#13
Results: Comparison wrt SIW 
YouTube LiveJournal 
#14
Conclusions 
SUMMARY 
 OrientDB is the most efficient solution for CW, while Titan is 
#15 
the winner in SIW 
 Neo4j is the fastest graph database for MIW and QW 
 Titan and OrientDB have competitive performance in MIW 
and QW respectively  don't scale as good as Neo4j 
FUTURE WORK 
 Test with bigger graphs. 
 Run the benchmark employing the distributed version of Titan 
and OrientDB and other graph databases (Sparksee already 
added). 
 Improve performance of the implemented community 
detection method.
References (1) 
Evaluation between different DBMS 
Giatsoglou, M., Papadopoulos, S., Vakali, A.: Massive graph management for the web 
and web 2.0. In Vakali, A., Jain, L., eds.: New Directions in Web Data 
Management 1. Volume 331 of Studies in Computational Intelligence. Springer 
Berlin Heidelberg (2011) 19-58 
Angles, R., Prat-Perez, A., Dominguez-Sal, D., Larriba-Pey, J.L.: Benchmarking database 
systems for social network applications. In: First International Workshop on 
Graph Data Management Experiences and Systems. GRADES '13, New York, NY, 
USA, ACM (2013) 15:1-15:7 
Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: a database 
benchmark based on the facebook social graph. (2013) 
Grossniklaus, M., Leone, S., Zaschke, T.: Towards a benchmark for graph data 
management and processing. (2013) 
Vicknair, C., Macias, M., Zhao, Z., Nan, X., Chen, Y., Wilkins, D.: A comparison of a 
graph database and a relational database: A data provenance perspective. In: 
Proceedings of the 48th Annual Southeast Regional Conference. ACM SE '10, 
New York, NY, USA, ACM (2010) 42:1-42:6 
#16
References (2) 
Evaluation between graph DBMS 
Bader, D.A., Feo, J., Gilbert, J., Kepner, J., Koester, D., Loh, E., Madduri, K., Mann, 
B., Meuse, T., Robinson, E.: Hpc scalable graph analysis benchmark (2009) 
Dominguez-Sal, D., Urbn-Bayes, P., Gimnez-Va, A., Gmez-Villamor, S., Martnez- 
Bazn, N., Larriba-Pey, J.: Survey of graph database performance on the hpc 
scalable graph analysis benchmark. In Shen, H., Pei, J., zsu, M., Zou, L., Lu, J., 
Ling, T.W., Yu, G., Zhuang, Y., Shao, J., eds.:Web-Age Information Management. 
Volume 6185 of Lecture Notes in Computer Science. Springer Berlin Heidelberg 
(2010) 37-48 
Ciglan, M., Averbuch, A., Hluchy, L.: Benchmarking traversal operations over graph 
databases. In: Data Engineering Workshops (ICDEW), 2012 IEEE 28th 
International Conference on. (April 2012) 186-189 
Jouili, S., Vansteenberghe, V.: An empirical comparison of graph databases. In: Social 
Computing (SocialCom), 2013 International Conference on. (Sept 2013) 708- 
715 
Dayarathna, M., Suzumura, T.: Xgdbench: A benchmarking platform for graph stores 
in exascale clouds. In: Cloud Computing Technology and Science (Cloud-Com), 
2012 IEEE 4th International Conference on. (Dec 2012) 363-370 
#17
Source: https://github.com/socialsensor/graphdb-benchmarks 
#18 
Questions 
Further contact: 
sotbeis@iti.gr / papadop@iti.gr 
Acknowledgement: 
www.socialsensor.eu

More Related Content

Similar to Benchmarking graph databases on the problem of community detection (20)

PDF
IRJET- Recommendation System based on Graph Database Techniques
IRJET Journal
PDF
Graph-TA 2013 - Josep Llu鱈s Larriba Pey
LDBC council
PDF
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
Ioan Toma
PPTX
Designing real-time recommendations engine using graph databases.pptx
Gopi Krishna
ODP
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
Ioan Toma
PDF
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Jason Riedy
PDF
A Survey on Graph Database Management Techniques for Huge Unstructured Data
IJECEIAES
PPTX
EDBT 2015: Summer School Overview
dgarijo
PDF
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Jason Riedy
PDF
Query Optimization Techniques in Graph Databases
IJDMS
PDF
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
G叩bor Sz叩rnyas
PDF
Graph Analysis Beyond Linear Algebra
Jason Riedy
PDF
Academic research on graph processing: connecting recent findings to industri...
openCypher
PDF
Ashwin_Thesis
Ashwin Ramesh
PDF
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
ijcsity
PPT
Making sense of the Graph Revolution
InfiniteGraph
PDF
Dgraph: Graph database for production environment
openCypher
ODP
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
LDBC council
PPTX
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Creating Knowledge out of Interlinked Data
PDF
Introducing Stig: A New Open Source, Non-relational, Distributed Graph Databa...
DATAVERSITY
IRJET- Recommendation System based on Graph Database Techniques
IRJET Journal
Graph-TA 2013 - Josep Llu鱈s Larriba Pey
LDBC council
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
Ioan Toma
Designing real-time recommendations engine using graph databases.pptx
Gopi Krishna
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
Ioan Toma
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Jason Riedy
A Survey on Graph Database Management Techniques for Huge Unstructured Data
IJECEIAES
EDBT 2015: Summer School Overview
dgarijo
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Jason Riedy
Query Optimization Techniques in Graph Databases
IJDMS
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
G叩bor Sz叩rnyas
Graph Analysis Beyond Linear Algebra
Jason Riedy
Academic research on graph processing: connecting recent findings to industri...
openCypher
Ashwin_Thesis
Ashwin Ramesh
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
ijcsity
Making sense of the Graph Revolution
InfiniteGraph
Dgraph: Graph database for production environment
openCypher
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
LDBC council
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Creating Knowledge out of Interlinked Data
Introducing Stig: A New Open Source, Non-relational, Distributed Graph Databa...
DATAVERSITY

Recently uploaded (20)

PPTX
Simplifica la seguridad en la nube y la detecci坦n de amenazas con FortiCNAPP
Cristian Garcia G.
PPTX
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
PPTX
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
PPTX
UserCon Belgium: Honey, VMware increased my bill
stijn40
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
PDF
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
PDF
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
PDF
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
PDF
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
PDF
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
PPTX
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
PDF
Why aren't you using FME Flow's CPU Time?
Safe Software
PDF
MPU+: A Transformative Solution for Next-Gen AI at the Edge, a Presentation...
Edge AI and Vision Alliance
PDF
The Growing Value and Application of FME & GenAI
Safe Software
DOCX
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
PDF
Open Source Milvus Vector Database v 2.6
Zilliz
PDF
From Chatbot to Destroyer of Endpoints - Can ChatGPT Automate EDR Bypasses (1...
Priyanka Aash
PDF
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
PDF
Plugging AI into everything: Model Context Protocol Simplified.pdf
Abati Adewale
PDF
UiPath Agentic AI ile Ak脹ll脹 Otomasyonun Yeni a脹
UiPathCommunity
Simplifica la seguridad en la nube y la detecci坦n de amenazas con FortiCNAPP
Cristian Garcia G.
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
UserCon Belgium: Honey, VMware increased my bill
stijn40
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
Why aren't you using FME Flow's CPU Time?
Safe Software
MPU+: A Transformative Solution for Next-Gen AI at the Edge, a Presentation...
Edge AI and Vision Alliance
The Growing Value and Application of FME & GenAI
Safe Software
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
Open Source Milvus Vector Database v 2.6
Zilliz
From Chatbot to Destroyer of Endpoints - Can ChatGPT Automate EDR Bypasses (1...
Priyanka Aash
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
Plugging AI into everything: Model Context Protocol Simplified.pdf
Abati Adewale
UiPath Agentic AI ile Ak脹ll脹 Otomasyonun Yeni a脹
UiPathCommunity
Ad

Benchmarking graph databases on the problem of community detection

  • 1. Benchmarking graph databases on the problem of community detection Sotirios Beis, Symeon Papadopoulos, Yiannis Kompatsiaris Centre for Research and Technology Hellas, Information Technologies Institute (CERTH-ITI) ADBIS 2014 Conference Ohrid, Republic of Macedonia, September 7-10, 2014
  • 2. Important Note This is an update of the actual presentation given in September 2014. After the presentation, we received comments about the implementation of our benchmark, which led us to update the implementation and rerun the experiments. We would like to thank @lgarulli and @OrientDB for their contribution. The updated paper is now available on: http://mklab.iti.gr/files/beis_adbis2014_corrected.pdf while the original is still available on the Springer site: http://link.springer.com/chapter/10.1007/978-3-319-10518-5_1 #2
  • 3. #3 Overview Problem formulation Related work Workload description Evaluation Conclusions
  • 4. Motivation The rapid growth of Online Social Networks contributes to the creation of high-volume and velocity graph data Twitter had 145M users in 2010 and 300M in 2011 http://dstevenwhite.com/2011/12/29/social-media-growth-2006-2011/ The need for massive graph data management and processing systems is constantly increasing Many methods and technologies available (RDBMS, OODBMS and graph databases) Benchmarks to evaluate candidate solutions are absolutely essential! We had to choose one in the context of the SocialSensor research project #4
  • 5. Problem Formulation Given a set of database management systems assess the performance of each system with respect to a common mining task: community detection Many DBMS available to use Graph databases selected, as they are designed to store and manage efficiently big graph data Benchmarked systems: Titan, OrientDB and Neo4j Many operations used in databases to benchmark Workloads that simulate common operations in OSNs employed #5
  • 6. Related Work (1) Evaluation between different DBMS Giatsoglou et al. focused on the Social Tagging System use case scenario Angles et al. and Armstrong et al. proposed a synthetic graph generator with OSN characteristics and use this data for evaluation Grossniklaus et al. used a workloads that cover a wide variety of graph data use case Vicknair et al. utilized query workload that simulates typical operations performed in provenance system #6
  • 7. Related Work (2) Evaluation between graph DBMS Bader et al. proposed a benchmark with four operations: insert, h-hops node and edge selection, while Dominguez et al. reports the implementation. A similar workload is proposed by Jouili et al. emphasizing the effects of increasing multiple concurrent users. Ciglan et al. proposed a benchmark focusing on graph traversal operations. Dayarathna et al. used similar workload, focusing on graph databases server and cloud environments. #7
  • 8. Workloads Description (1) Our Contribution Clustering Workload (CW) Very important due to its numerous applications, such as topic detection, photo clustering and event detection. Until now most community detection algorithms used main memory to store the graph and perform the required operations fast execution time, unable to manage big graphs. We propose an implementation of Louvain Method on top of graph databases employing cache techniques take advantage of both graph database capabilities and in-memory execution speed. #8
  • 9. Workload Description (2) Supplementary Workloads Massive Insertion Workload (MIW) Populates a graph with bulk load operations. Simulates batch creation of a graph. Single Insertion Workload (SIW) Every object insertion is committed directly. Simulates the growth of a OSN. Query Workload (QW) FindNeighbors query (FN) Finds the neighbors of all nodes simulates the friends retrieval a Facebook user FindAdjacentNodes query (FA) Finds the adjacent nodes of all edges used to find whether two user joined the same Facebook group FindShortestPath (FS) Finds the shortest paths between 100 nodes used to find the level two LinkedIn users are connected #9
  • 10. Experimental Study Datasets Synthetic for CW, generated with LFR generator. Real for MIW, SIW & QW, from SNAP dataset collection. Evaluation Measure: execution time (second) #10
  • 12. Results: Cache size impact #12 Titan Neo4j OrientDB
  • 13. Results: comparison wrt MIW and QW #13
  • 14. Results: Comparison wrt SIW YouTube LiveJournal #14
  • 15. Conclusions SUMMARY OrientDB is the most efficient solution for CW, while Titan is #15 the winner in SIW Neo4j is the fastest graph database for MIW and QW Titan and OrientDB have competitive performance in MIW and QW respectively don't scale as good as Neo4j FUTURE WORK Test with bigger graphs. Run the benchmark employing the distributed version of Titan and OrientDB and other graph databases (Sparksee already added). Improve performance of the implemented community detection method.
  • 16. References (1) Evaluation between different DBMS Giatsoglou, M., Papadopoulos, S., Vakali, A.: Massive graph management for the web and web 2.0. In Vakali, A., Jain, L., eds.: New Directions in Web Data Management 1. Volume 331 of Studies in Computational Intelligence. Springer Berlin Heidelberg (2011) 19-58 Angles, R., Prat-Perez, A., Dominguez-Sal, D., Larriba-Pey, J.L.: Benchmarking database systems for social network applications. In: First International Workshop on Graph Data Management Experiences and Systems. GRADES '13, New York, NY, USA, ACM (2013) 15:1-15:7 Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: a database benchmark based on the facebook social graph. (2013) Grossniklaus, M., Leone, S., Zaschke, T.: Towards a benchmark for graph data management and processing. (2013) Vicknair, C., Macias, M., Zhao, Z., Nan, X., Chen, Y., Wilkins, D.: A comparison of a graph database and a relational database: A data provenance perspective. In: Proceedings of the 48th Annual Southeast Regional Conference. ACM SE '10, New York, NY, USA, ACM (2010) 42:1-42:6 #16
  • 17. References (2) Evaluation between graph DBMS Bader, D.A., Feo, J., Gilbert, J., Kepner, J., Koester, D., Loh, E., Madduri, K., Mann, B., Meuse, T., Robinson, E.: Hpc scalable graph analysis benchmark (2009) Dominguez-Sal, D., Urbn-Bayes, P., Gimnez-Va, A., Gmez-Villamor, S., Martnez- Bazn, N., Larriba-Pey, J.: Survey of graph database performance on the hpc scalable graph analysis benchmark. In Shen, H., Pei, J., zsu, M., Zou, L., Lu, J., Ling, T.W., Yu, G., Zhuang, Y., Shao, J., eds.:Web-Age Information Management. Volume 6185 of Lecture Notes in Computer Science. Springer Berlin Heidelberg (2010) 37-48 Ciglan, M., Averbuch, A., Hluchy, L.: Benchmarking traversal operations over graph databases. In: Data Engineering Workshops (ICDEW), 2012 IEEE 28th International Conference on. (April 2012) 186-189 Jouili, S., Vansteenberghe, V.: An empirical comparison of graph databases. In: Social Computing (SocialCom), 2013 International Conference on. (Sept 2013) 708- 715 Dayarathna, M., Suzumura, T.: Xgdbench: A benchmarking platform for graph stores in exascale clouds. In: Cloud Computing Technology and Science (Cloud-Com), 2012 IEEE 4th International Conference on. (Dec 2012) 363-370 #17
  • 18. Source: https://github.com/socialsensor/graphdb-benchmarks #18 Questions Further contact: sotbeis@iti.gr / papadop@iti.gr Acknowledgement: www.socialsensor.eu