Shutl was experiencing performance issues with their MySQL database as the amount of data and relationships grew. They migrated to Neo4j, a graph database, to better handle the complex relationships in their data. Some benefits of Neo4j included more efficient querying of relationships, easier modeling of new data, and more consistent query performance as the database size increased. However, testing and migrations were more challenging with the schema-less Neo4j database. Shutl developed their own tools to help with testing and importing data.
7. SaaS platform
we provide an API for carriers and merchants
Tuesday, 28 May 13
8. SaaS platform
we provide an API for carriers and merchants
shutl.it C2C platform
Tuesday, 28 May 13
9. SaaS platform
we provide an API for carriers and merchants
shutl.it C2C platform
customers can chose between a delivery either:
Tuesday, 28 May 13
10. SaaS platform
we provide an API for carriers and merchants
shutl.it C2C platform
customers can chose between a delivery either:
within 90 minutes of purchase
Tuesday, 28 May 13
11. SaaS platform
we provide an API for carriers and merchants
shutl.it C2C platform
customers can chose between a delivery either:
within 90 minutes of purchase
or a 1 hour window of their choice
Tuesday, 28 May 13
12. SaaS platform
we provide an API for carriers and merchants
shutl.it C2C platform
customers can chose between a delivery either:
within 90 minutes of purchase
or a 1 hour window of their choice
(same day or any day)
Tuesday, 28 May 13
13. SaaS platform
we provide an API for carriers and merchants
shutl.it C2C platform
customers can chose between a delivery either:
within 90 minutes of purchase
or a 1 hour window of their choice
(same day or any day)
fastest delivery to date 15:00 min
Tuesday, 28 May 13
14. SaaS platform
we provide an API for carriers and merchants
shutl.it C2C platform
customers can chose between a delivery either:
within 90 minutes of purchase
or a 1 hour window of their choice
(same day or any day)
fastest delivery to date 15:00 min
SOA with services built using jRuby, sinatra, mongoDB and neo4j
Tuesday, 28 May 13
19. exponential growth of joins in mysql with added features
problems with our previous attempt (v1):
Tuesday, 28 May 13
20. exponential growth of joins in mysql with added features
code base too complex and unmaintanable
problems with our previous attempt (v1):
Tuesday, 28 May 13
21. exponential growth of joins in mysql with added features
code base too complex and unmaintanable
api response time growing too large the more data was added
problems with our previous attempt (v1):
Tuesday, 28 May 13
22. exponential growth of joins in mysql with added features
code base too complex and unmaintanable
api response time growing too large the more data was added
our fastest delivery was quicker then our slowest query!
problems with our previous attempt (v1):
Tuesday, 28 May 13
23. The case for graph databases:
Tuesday, 28 May 13
24. The case for graph databases:
relationships are explicit stored (RDBS lack relationships)
Tuesday, 28 May 13
25. The case for graph databases:
relationships are explicit stored (RDBS lack relationships)
domain modelling is simpli鍖ed because adding new subgraphs
doesnt affect the existing structure and queries (additive model)
Tuesday, 28 May 13
26. The case for graph databases:
relationships are explicit stored (RDBS lack relationships)
domain modelling is simpli鍖ed because adding new subgraphs
doesnt affect the existing structure and queries (additive model)
white board friendly
Tuesday, 28 May 13
27. The case for graph databases:
relationships are explicit stored (RDBS lack relationships)
domain modelling is simpli鍖ed because adding new subgraphs
doesnt affect the existing structure and queries (additive model)
white board friendly
schema-less
Tuesday, 28 May 13
28. The case for graph databases:
relationships are explicit stored (RDBS lack relationships)
domain modelling is simpli鍖ed because adding new subgraphs
doesnt affect the existing structure and queries (additive model)
white board friendly
schema-less
db performance remains relatively constant because queries are
localized to its portion of the graph. O(1) for same query
Tuesday, 28 May 13
29. The case for graph databases:
relationships are explicit stored (RDBS lack relationships)
domain modelling is simpli鍖ed because adding new subgraphs
doesnt affect the existing structure and queries (additive model)
white board friendly
schema-less
db performance remains relatively constant because queries are
localized to its portion of the graph. O(1) for same query
traversals of relationships are easy and very fast
Tuesday, 28 May 13
30. What is a graph anyway?
Node 1 Node 2
Node 4
Node 3
a collection of vertices (nodes)
connected by edges (relationships)
Tuesday, 28 May 13
32. directed graph
Node 1 Node 2
Node 4
Node 3
each relationship has a direction or
one start node and one end node
Tuesday, 28 May 13
33. property graph
name:Volker
nodes contain properties (key, value)
relationships have a type and are always directed
relationships can contain properties too
name: Sam
:friends
name: Megan
:knows
since: 2005
name: Paul
:friends
:works_for
:knows
Tuesday, 28 May 13
34. a graph is its own index (constant query performance)
Tuesday, 28 May 13
36. the case for Neo4j
we can run it embedded in the same jvm
Tuesday, 28 May 13
37. the case for Neo4j
we can run it embedded in the same jvm
we can use jruby as we know ruby very well already
Tuesday, 28 May 13
38. the case for Neo4j
we can run it embedded in the same jvm
we can use jruby as we know ruby very well already
lots of good ruby libraries are available, we chose the neo4j gem
by Andreas Ronge (https://github.com/andreasronge/neo4j)
Tuesday, 28 May 13
39. the case for Neo4j
we can run it embedded in the same jvm
we can use jruby as we know ruby very well already
lots of good ruby libraries are available, we chose the neo4j gem
by Andreas Ronge (https://github.com/andreasronge/neo4j)
it speaks cypher
Tuesday, 28 May 13
40. the case for Neo4j
we can run it embedded in the same jvm
we can use jruby as we know ruby very well already
lots of good ruby libraries are available, we chose the neo4j gem
by Andreas Ronge (https://github.com/andreasronge/neo4j)
it speaks cypher
the guys from neotech are awesome
Tuesday, 28 May 13
41. neo4j (jvm)
鍖ockdb (jvm)
DEX (c++)
OrientDB (jvm)
Sones GraphDB (c#)
some graph dbs available:
Tuesday, 28 May 13
42. embedded vs. standalone
pros:
cons:
better performance
transaction support
neo4j gem is available
we can use cypher and
traversal
only the code running the
db has access to the db
access via rest api and cypher
language independent and
code doesnt need to run on
JVM
not as performant
only works with cypher
transaction is on a per query
basis
need to write model wrappers
for ourselves
Tuesday, 28 May 13
44. gotchas and other stuff to consider:
testing proved to be dif鍖cult and we had to write our own tools
Tuesday, 28 May 13
45. gotchas and other stuff to consider:
testing proved to be dif鍖cult and we had to write our own tools
migrations of schemaless dbs are more dif鍖cult to stay on top of and require
special solutions in the case of graph dbs
Tuesday, 28 May 13
46. gotchas and other stuff to consider:
testing proved to be dif鍖cult and we had to write our own tools
migrations of schemaless dbs are more dif鍖cult to stay on top of and require
special solutions in the case of graph dbs
seeding an embedded database is hard
Tuesday, 28 May 13
47. gotchas and other stuff to consider:
testing proved to be dif鍖cult and we had to write our own tools
migrations of schemaless dbs are more dif鍖cult to stay on top of and require
special solutions in the case of graph dbs
seeding an embedded database is hard
graph db partioning is almost impossible and the whole graph needs to be in
memory
Tuesday, 28 May 13
48. gotchas and other stuff to consider:
testing proved to be dif鍖cult and we had to write our own tools
migrations of schemaless dbs are more dif鍖cult to stay on top of and require
special solutions in the case of graph dbs
seeding an embedded database is hard
graph db partioning is almost impossible and the whole graph needs to be in
memory
encoding Dates and Times that are stored in UTC and work across timezone is
non-trivial
Tuesday, 28 May 13
49. gotchas and other stuff to consider:
testing proved to be dif鍖cult and we had to write our own tools
migrations of schemaless dbs are more dif鍖cult to stay on top of and require
special solutions in the case of graph dbs
seeding an embedded database is hard
graph db partioning is almost impossible and the whole graph needs to be in
memory
encoding Dates and Times that are stored in UTC and work across timezone is
non-trivial
nested datastructure (hashes and array) cant be stored and need to be
converted to json
Tuesday, 28 May 13
51. Querying the graph: Cypher
declarative query language speci鍖c to neo4j
Tuesday, 28 May 13
52. Querying the graph: Cypher
declarative query language speci鍖c to neo4j
easy to learn and intuitive
Tuesday, 28 May 13
53. Querying the graph: Cypher
declarative query language speci鍖c to neo4j
easy to learn and intuitive
enables the user to specify speci鍖c patterns to query for (something that looks
like this)
Tuesday, 28 May 13
54. Querying the graph: Cypher
declarative query language speci鍖c to neo4j
easy to learn and intuitive
enables the user to specify speci鍖c patterns to query for (something that looks
like this)
inspired partly by SQL (WHERE and ORDER BY) and SPARQL (pattern matching)
Tuesday, 28 May 13
55. Querying the graph: Cypher
declarative query language speci鍖c to neo4j
easy to learn and intuitive
enables the user to specify speci鍖c patterns to query for (something that looks
like this)
inspired partly by SQL (WHERE and ORDER BY) and SPARQL (pattern matching)
focuses on what to query for and not how to query for it
Tuesday, 28 May 13
56. Querying the graph: Cypher
declarative query language speci鍖c to neo4j
easy to learn and intuitive
enables the user to specify speci鍖c patterns to query for (something that looks
like this)
inspired partly by SQL (WHERE and ORDER BY) and SPARQL (pattern matching)
focuses on what to query for and not how to query for it
switch from a mySQl world is made easier by the use of cypher instead of having
to learn a traversal framework straight away
Tuesday, 28 May 13
57. START: Starting points in the graph, obtained via index lookups or by element IDs.
MATCH: The graph pattern to match, bound to the starting points in START.
WHERE: Filtering criteria.
RETURN: What to return.
CREATE: Creates nodes and relationships.
DELETE: Removes nodes, relationships and properties.
SET: Set values to properties.
FOREACH: Performs updating actions once per element in a list.
WITH: Divides a query into multiple, distinct parts
cypher clauses
Tuesday, 28 May 13
58. an example graph
Node 1
me
Node 2
Steve
Node 3
Sam
Node 4
David
Node 5
Megan
me - [:knows] -> Steve -
[:knows] -> David
me - [:knows] -> Sam -
[:knows] -> Megan
Megan - [:knows] -> David
knows
knowsknows
knows
knows
Tuesday, 28 May 13
65. 鍖nd all events for a given range
START root=node(0)
MATCH root-[:2013]-()-[:05]-()-[:24]-start,
root-[:2013]-()-[:05]-()-[:26]-end,
start-[:next*0..]-middle-[:next*0..]-end,
middle-[:happens]-event
RETURN event
Tuesday, 28 May 13
67. does an event happen on a certain date?
START event=node(20)
MATCH event-[:24]-()-[:05]-()-[:2013]-()
RETURN event
Tuesday, 28 May 13
68. testing and importing:
we are using rspec for all tests on the api and practice tdd/bdd
setting up scenarios for an integration test was dif鍖cult and slow with existing tools
we decided to built our own dsl based on the geoff notation developed by Nigel Small to
allow for the setting up of scenarios and for the import of data from mysql
Tuesday, 28 May 13
69. geoff:
developed by Nigel Small (@technige, http://geoff.nigelsmall.net/)
allows modelling of graphs in a human readable form
(A) {"name": "Alice"}
(B) {"name": "Bob"}
(A)-[:KNOWS]->(B)
and provides a java interface to insert them into an existing graph
Tuesday, 28 May 13
70. imports any geoff 鍖le into a neo4j db
it is open source
geoff-importer gem
(https://github.com/shutl/geoff-importer)
Tuesday, 28 May 13
71. provides a dsl for creating a graph and inserting it into the db
it is open source
it works together with FactoryGirl
(https://github.com/thoughtbot/factory_girl)
it supports only the graph structure of the neo4j gem at the
moment
we havent solved all the issues with event listeners yet
geoff gem
(https://github.com/shutl/geoff)
Tuesday, 28 May 13
72. Geoff(Company, Person) do
company 'Acme' do
address "13 Something Road"
outgoing :employees do
person 'Geoff'
person 'Nigel' do
name 'Nigel Small'
end
end
end
company 'Github' do
outgoing :customers do
person 'Tom'
person 'Dick'
person 'Harry'
end
end
person 'Harry' do
incoming :customers do
company 'NeoTech'
end
end
end
geoff gem
(https://github.com/shutl/
geoff)
Tuesday, 28 May 13
73. root node
:company :person
acme
13 somthing road
NeoTech
GitHub
:all
:all
:all
Geoff
Nigel Small
Tom
Dick
Harry
:all
:all
:all
:all
:all
:employees
:employees
:customers
:customers
:customers
Tuesday, 28 May 13