The document discusses graph databases and their advantages over traditional relational databases. It covers the NoSQL movement, graph databases, use cases for graph databases like social networks and semantic web applications. It provides an overview of graph database technologies like Neo4j and DEX and examples of querying and modeling data in a graph database using Neo4j.rb.
1 of 22
More Related Content
Bcn On Rails May2010 On Graph Databases
1. On Graph Databases
Pere Urbón Bayes
purbon@purbon.com
May of 2010
BcnOnRails May - 2010 - On Graph Databases 1
2. On Graph Databases
● NoSQL movement.
● Graph databases.
● Pros and cons.
● Use cases.
● Technology overview.
● Example.
BcnOnRails May - 2010 - On Graph Databases 2
3. NoSQL Movement
● Next Generation of Databases.
● Innovative. (?)
● Open Source. (?)
● Non-Relational.
● Schema-less.
● Distributed.
● Scalable.
BcnOnRails May - 2010 - On Graph Databases 3
4. NoSQL Movement
● Stores. ● More Stores.
– Document. – Grid database.
– Key/Value. – XML Database.
– Object oriented. – RDF.
– Column. – .....
– Graph database.
BcnOnRails May - 2010 - On Graph Databases 4
5. NoSQL Movement
● NoSQL is not the holy grail, never forget it.
● Precursors & roots begun at the early 70's.
– Network databases, Charles Bachman 1969.
案ずるより産むが易し。
– Giving birth to a baby is easier than worrying about it.
BcnOnRails May - 2010 - On Graph Databases 5
6. Graph Databases
● Data strongly related.
– Social networks.
– GIS Systems.
– Transportation.
– Bibliographic.
– File systems.
– ........
GitHub Ruby community by country
BcnOnRails May - 2010 - On Graph Databases 6
7. Graph Databases
● The Property Graph.
– Labeled.
– Directed.
– Attributed.
– Multigraph.
● Talk about.
– Nodes with types.
– Edges with types.
– Attributes.
BcnOnRails May - 2010 - On Graph Databases 7
9. Graph Databases
Query MySQL OIM DEX
Q1:count 20,38 17,35 0
RDBMS OIM DEX Q2:scan 32,76 174,64 3,14
data 27.36 GB 54 GB 9.69 GB Q3:select 7,34 5,43 0,84
Q4:projection 17,34 43,7 33,19
ratio 10,9 21,51 3,86
overhead Q5:combine 0,74 2,61 0,01
load time 52891 s 17543 s 95579 s
Q6:explode 0,07 202,07 0,01
Q7:values 12,28 20,77 0,01
Q8:hub >3hours >3hours 624,68
BcnOnRails May - 2010 - On Graph Databases 9
10. Graph Databases
BcnOnRails May - 2010 - On Graph Databases 10
11. Use cases
● Network analysis.
● Link analysis.
● Graph mining.
● Neural networks.
● Bibliographic search.
● Semantic web.
BcnOnRails May - 2010 - On Graph Databases 11
12. Use cases
● Algorithmic recruitment with GitHub.
– Centrality: The importance of a vertex within a
graph.
● Betweens: Vertex that occur on many shortest
path have higher centrality.
– O(v^3) without any optimization.
● Another possible choices:
– Closeness: Vertex with a short geodesic distance
to other ones have a high closeness.
● Usually preferred on network analysis.
BcnOnRails May - 2010 - On Graph Databases 12
14. Pros and cons
● Data facts. ● Relational model facts.
– Growths – E.F Codd model.
exponentially. – Normalization.
– Hugh – Object-Relational
interdependency impedance
and complexity. mismatch.
– Relationships are – Join's doesn't scale.
important.
– Big tables.
– Structure change
over time. – Denormalization.
BcnOnRails May - 2010 - On Graph Databases 14
15. Technology overview
● Neo4J: Open source database NoSQL graph.
● Dex: The high performance graph database.
● HyperGraphDB: An IA and semantic web
graph database.
● Infogrid: The Internet Graph database.
● Sones: SaaS dot Net Graph database.
● VertexDB: High performance database server.
BcnOnRails May - 2010 - On Graph Databases 15
16. Benchmarking
Kernel Scale 15 DEX Neo4j Jena HypergraphDB
K1 Load (s) 7,44 697 141 +24h
K2 Scan edges (s) 0,0010 2,71 0,689
K3 2-hops (s) 0,0120 0,0260 0,443
Kernel DEX Neo4j Jena Hypergr
K4 BC (s) 14,8 8,24 138 aphDB
Scale 20
Db size (MB) 30 17 207 K1 Load (s) 317 32.094 4.560 +24h
K2 Scan 0,005 751 18,6
Graph Database Performance on the edges (s)
HPC Scalable Graph Analysis Benchmark K3 2-hops (s) 0,033 0,0230 0,4580
K4 BC (s) 617 7.027 59.512
Db size (MB) 893 539 6.656
BcnOnRails May - 2010 - On Graph Databases 16
18. Technology overview
● Neo4J.rb ( JRuby target )
– Active record integration.
– Dynamic and schema free.
– Fast traversal of relationships.
– Transactions with rollbacks support.
– Indexing and querying of ruby objects.
– Massive loaders.
http://wiki.neo4j.org/content/Ruby
– Ruby on Rails integration.
– Accessible throw REST.
BcnOnRails May - 2010 - On Graph Databases 18
19. Technology overview
Creating nodes Properties
require "rubygems" node = Neo4j::Node.new
require 'neo4j' node[:name] = 'foo'
node[:age] = 123
Neo4j::Transaction.run do node[:hungry] = false
node = Neo4j::Node.new node[4] = 3.14
end node[:age] # => 123
Transactions over blocks Creating relationships
Neo4j::Transaction.run do node1 = Neo4j::Node.new
# neo4j operations goes here node2 = Neo4j::Node.new
end Neo4j::Relationship.new(:friends, node1, node2)
# which is same as
node1.rels.outgoing(:friends) << node2
BcnOnRails May - 2010 - On Graph Databases 19
20. Technology overview
Accessing relationships
node1.rels.empty? # => false
# The rels method returns an enumeration of relationship objects.
# The nodes method on the relationships returns the nodes instead.
node1.rels.nodes.include?(node2) # => true
node1.rels.first # => the first relationship this node1 has.
node1.rels.nodes.first # => node2 first node of any relationship type
node2.rels.incoming(:friends).nodes.first # => node1 first node of relationship type 'friends'
node2.rels.incoming(:friends).first # => a relationship object between node1 and node2
Properties on Relationships
rel = node1.rels.outgoing(:friends).first
rel[:since] = 1982
node1.rels.first[:since] # => 1982
BcnOnRails May - 2010 - On Graph Databases 20
21. Example
For the joy of someone, lets play a little with a
graph database.
BcnOnRails May - 2010 - On Graph Databases 21
22. On Graph Databases
Thanks you!
Pere Urbón Bayes
purbon@purbon.com
BcnOnRails May - 2010 - On Graph Databases 22