This document provides an overview of two graph data models: RDF and Property Graphs. It describes the key components of each model, including triples for RDF and nodes/edges/properties for Property Graphs. It also discusses Apache projects that work with each model like Apache Jena for RDF and Apache TinkerPop, Spark, Giraph and Flink for Property Graphs. Finally, it notes that while the models have different focuses, they could potentially share technologies like storage and query capabilities.
1 of 19
Downloaded 40 times
More Related Content
Two graph data models : RDF and Property Graphs
1. Two graph data models
RDF and Property Graphs
Andy Seaborne
Paolo Castagna
andy@a.o, castagna@a.o
2. Introduction
This talk is about two graph data models
(RDF and Property Graphs), example of a
couple of Apache projects using such data
models, and a few lessons learned along the
way.
4. RDF
IRIs (=URIs), literals (strings, numbers, ),
blank nodes
Triple => subject-predicate-object
Predicate (or property) is the link name : an IRI
Graph => set of triples
5. prefix : <http://example/myData/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix foaf: <http://xmlns.com/foaf/0.1/>
# foaf:name is a short form of <http://xmlns.com/foaf/0.1/name>
:alice rdf:type foaf:Person ;
foaf:name "Alice Smith" ; # ; means same subject
foaf:knows :bob .
:alice
foaf:knows
"Alice Smith"
foaf:name
foaf:Person
rdf:type
:bob
12. Apache Jena
TLP: April 2012
Involvement in standards
RDF 1.1, SPARQL 1.1
RDF database
SPARQL server
Other RDF@ASF:
Any23, Marmotta, Clerezza, Stanbol, Rya
13. Property Graph Data Model
A property graph is a set of vertexes and edges with
respective properties (i.e. key / values):
each vertex or edge has a unique identifier
each vertex has a set of outgoing edges and a set of incoming edges
edges are directed: each edge has a start vertex and an end vertex
each edge has a label which denotes the type of relationship
vertexes and edges can have a properties (i.e. key / value pairs)
Directed multigraph with properties
attached to vertexes and edges
14. Property Graph: Example
id = 1 id = 2
name = Alice
surname = Smith
age = 32
email = alice@example.com
...
name = Bob
surname = Brown
age = 45
email = bob@example.com
...
since = 01/01/1970
...
id = 3
knows
15. Apache Spark: GraphX*
// Creating a Graph
val vertexes: RDD[(VertexId, (String, String))] =
sc.parallelize (Array((1L,("Alice", "alice@example.com")), (2L,("Bob", "bob@example.com"))))
val edges: RDD[Edge[String]] =
sc.parallelize(Array(Edge(1L, 2L, "knows"))
val graph = Graph(vertexes, edges)
...
Example of parallel graph algorithms available:
// Find the triangle count for each vertex
val triCounts = graph.triangleCount().vertices
// Find the connected components
val cc = graph.connectedComponents().vertices
// Run PageRank
val ranks = graph.pageRank(0.0001).vertices
* GraphX is in the alpha stage
17. Use Case for Graphs
Analytics
Social networks and recommendation engines
Data center infrastructure management
Knowledge Graphs
Happenings: people, places, events
Customer databases / products catalogues
18. Some Conclusions
Data Graphs are (still) new to many people
RDF emphasizes information modelling
Knowledge graphs
SQL-like query
Property Graph emphasizes data processing
Data capture
Graph analytic algorithms
Naive layering of data models leads dissatisfaction
Can only mix toolsets by knowing its layered
Could share technology
Storage, data access, query algebra