狠狠撸

狠狠撸Share a Scribd company logo
Modelling Event Data As A Graph
Modelling event
data as a graph
Combining event graphs with event grammar
? People tend to intuitively visualise concepts as graphs which do not translate nicely to a
tabular structure.
? Graph databases are often designed for low-latency performance, which can make them
a better choice for certain applications, such as recommendation engines, especially at
scale.
? There are some questions (for example path analysis) that are difficult to answer when
using a relational database, but that are easy to answer with a graph.
Graph DBs have some key
advantages over relational DBs
? In a relational event data model, each event is a record in a table or index.
? The table has as many columns or properties as there are facets to that event, eg user,
timestamp, URL, etc.
? There isn’t much scope for deviation from this basic model.
We know how to model events in a
table…
? We can model events as nodes with properties, related through [:NEXT] edges:
(PageView1)-[:NEXT]->(PageView2).
? We can model events as relationships, eg (User)-[:VIEWS]->(Page).
? We can mix and match different methods.
? It’s not obvious which model is ‘the right one’: if there even is such a thing.
… but modelling events as a graph
is relatively unexplored
? In a relational database, you’d always use the same query to get specific properties of an
event:
SELECT user_id, page_url FROM events;
? However, in a graph database your query syntax depends on whether those dimensions have
been modelled as nodes, relationships, or properties of nodes or relationships:
MATCH (u:User)-[:VIEWS]->(p:Page)
RETURN u.id, p.url;
MATCH (e:Event)
RETURN e.user_id, e.page_url;
Choosing the model dictates what
queries we can run
Taking an event-grammar approach
In the event-grammar model, an event is a
snapshot of a set of entities in time.
This model is already a graph, with nodes
representing the various entities and
relationships between the nodes.
However, when mapping this model to a
tabular structure, the roles of each entity
and the relationships between them are
lost to users without knowledge of the
model and the domain.
To make the roles and relationships
explicit, we have to interpret the
event
MATCH (u:User)-[r:VIEWS]->(p:Page)
WHERE u.email = 'alice@mail.com'
RETURN COUNT(r)
But what if we have more than just page views, eg also link clicks, downloads, form
submits, etc?
This model makes it hard to find all
events by the same user
The event graph approach
A popular option for modelling events in a
graph is to make each event a node that is
related to the event that happened immediately
before it and after it through
a?NEXT?/?PREVIOUS?relationship.
The event node then has
outgoing?HAS?relationships to all of its entities,
such as user nodes, context nodes, etc.
This is an easy model to do path analysis.
MATCH (u:User)<-[:HAS]-(e:Event)-[:HAS]—>(p:Page)
WHERE u.email = 'alice@mail.com'
RETURN COUNT(DISTINCT p)
We can still find all pages visited by the user, but we always have to add a ‘hop’ in the
query, because entities are related to each other only through the event node they belong
to.
The event-graph model makes other
queries harder
The ‘denormalised’ graph approach
There is also the option to “denormalise” the
data, ie represent the same data in different
ways.
An example would be a model where each
event is a node in a time series, with outgoing
relationships to all its entities, but there
are?also?relationships between the entities.
This adds complexity and redundancy to the
model but makes queries easier.
MATCH (u:User)-[:VIEWS]->(p:Page)
WHERE u.email = 'alice@mail.com'
RETURN COUNT(DISTINCT p)
MATCH p = (u:User)<-[:HAS]-(Event)-[:NEXT*1..5]—>(Event)
WHERE u.email = 'alice@mail.com'
RETURN p
Now we can easily write a variety of
queries
One key advantage is that any insight you glean from analysing the relationships of entities
in your events, can be readily attached to your existing data set.
How is modelling event-level data as
a graph valuable?
? Users log in with different
accounts
? On multiple devices
? Across multiple networks
To illustrate, here’s a popular use
case: an Identity Resolution Graph
You need to write extra code to:
? Check the Identity Graph for all aliases of a specific user / device / network;
? Fill in those aliases in SQL queries against your relational database;
? Union the results of those queries.
An identity graph is powerful but it
remains locked away from the rest
of your relational data
You will be able to easily:
? Find all events for a specific user / device / network;
? Build relationships that link all known aliases for this user / device / network to the same
events;
? Quickly discover all of the user / device / network history, regardless of which alias they
are using at the moment.
Contrast that with building the ID
graph on top of your existing event
graph
e. dilyan@snowplowanalytics.com
t. @dilyan_damyanov
If you would like to explore how Snowplow can enable you to
take control of your data, and what that can make possible,
visit our product page, request a demo or get in touch.
Sign up to our mailing list and stay up-to-date with our new
releases and other news.
snowplowanalytics.com
? 2018 Snowplow Technology and services provider in digital analytics. All Rights Reserved.

More Related Content

Modelling Event Data As A Graph

  • 2. Modelling event data as a graph Combining event graphs with event grammar
  • 3. ? People tend to intuitively visualise concepts as graphs which do not translate nicely to a tabular structure. ? Graph databases are often designed for low-latency performance, which can make them a better choice for certain applications, such as recommendation engines, especially at scale. ? There are some questions (for example path analysis) that are difficult to answer when using a relational database, but that are easy to answer with a graph. Graph DBs have some key advantages over relational DBs
  • 4. ? In a relational event data model, each event is a record in a table or index. ? The table has as many columns or properties as there are facets to that event, eg user, timestamp, URL, etc. ? There isn’t much scope for deviation from this basic model. We know how to model events in a table…
  • 5. ? We can model events as nodes with properties, related through [:NEXT] edges: (PageView1)-[:NEXT]->(PageView2). ? We can model events as relationships, eg (User)-[:VIEWS]->(Page). ? We can mix and match different methods. ? It’s not obvious which model is ‘the right one’: if there even is such a thing. … but modelling events as a graph is relatively unexplored
  • 6. ? In a relational database, you’d always use the same query to get specific properties of an event: SELECT user_id, page_url FROM events; ? However, in a graph database your query syntax depends on whether those dimensions have been modelled as nodes, relationships, or properties of nodes or relationships: MATCH (u:User)-[:VIEWS]->(p:Page) RETURN u.id, p.url; MATCH (e:Event) RETURN e.user_id, e.page_url; Choosing the model dictates what queries we can run
  • 7. Taking an event-grammar approach In the event-grammar model, an event is a snapshot of a set of entities in time. This model is already a graph, with nodes representing the various entities and relationships between the nodes. However, when mapping this model to a tabular structure, the roles of each entity and the relationships between them are lost to users without knowledge of the model and the domain.
  • 8. To make the roles and relationships explicit, we have to interpret the event
  • 9. MATCH (u:User)-[r:VIEWS]->(p:Page) WHERE u.email = 'alice@mail.com' RETURN COUNT(r) But what if we have more than just page views, eg also link clicks, downloads, form submits, etc? This model makes it hard to find all events by the same user
  • 10. The event graph approach A popular option for modelling events in a graph is to make each event a node that is related to the event that happened immediately before it and after it through a?NEXT?/?PREVIOUS?relationship. The event node then has outgoing?HAS?relationships to all of its entities, such as user nodes, context nodes, etc. This is an easy model to do path analysis.
  • 11. MATCH (u:User)<-[:HAS]-(e:Event)-[:HAS]—>(p:Page) WHERE u.email = 'alice@mail.com' RETURN COUNT(DISTINCT p) We can still find all pages visited by the user, but we always have to add a ‘hop’ in the query, because entities are related to each other only through the event node they belong to. The event-graph model makes other queries harder
  • 12. The ‘denormalised’ graph approach There is also the option to “denormalise” the data, ie represent the same data in different ways. An example would be a model where each event is a node in a time series, with outgoing relationships to all its entities, but there are?also?relationships between the entities. This adds complexity and redundancy to the model but makes queries easier.
  • 13. MATCH (u:User)-[:VIEWS]->(p:Page) WHERE u.email = 'alice@mail.com' RETURN COUNT(DISTINCT p) MATCH p = (u:User)<-[:HAS]-(Event)-[:NEXT*1..5]—>(Event) WHERE u.email = 'alice@mail.com' RETURN p Now we can easily write a variety of queries
  • 14. One key advantage is that any insight you glean from analysing the relationships of entities in your events, can be readily attached to your existing data set. How is modelling event-level data as a graph valuable?
  • 15. ? Users log in with different accounts ? On multiple devices ? Across multiple networks To illustrate, here’s a popular use case: an Identity Resolution Graph
  • 16. You need to write extra code to: ? Check the Identity Graph for all aliases of a specific user / device / network; ? Fill in those aliases in SQL queries against your relational database; ? Union the results of those queries. An identity graph is powerful but it remains locked away from the rest of your relational data
  • 17. You will be able to easily: ? Find all events for a specific user / device / network; ? Build relationships that link all known aliases for this user / device / network to the same events; ? Quickly discover all of the user / device / network history, regardless of which alias they are using at the moment. Contrast that with building the ID graph on top of your existing event graph
  • 19. If you would like to explore how Snowplow can enable you to take control of your data, and what that can make possible, visit our product page, request a demo or get in touch. Sign up to our mailing list and stay up-to-date with our new releases and other news.
  • 20. snowplowanalytics.com ? 2018 Snowplow Technology and services provider in digital analytics. All Rights Reserved.