E-Commerce and Graph-driven Applications:
Experiences and Optimizations while
moving to Linked Data by Andreas Both (UNISTER)
1 of 33
Download to read offline
More Related Content
E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data
1. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 1
E-Commerce and Graph-driven Applications:
Experiences and Optimizations while
moving to Linked Data
Andreas Both, Head of Research and Development
UNISTER GmbH, Germany
2. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 2
Unister Group
e-commerce company
founded 2002
major B2C web portals in Germany (and Europe)
verticals: travel, 鍖ights, travel packages, retail, . . .
integrated business model
10 million unique users per month (Germany, AGOF)
increased number of employees
2003: 1
2015: 1600
3. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 2
Unister Group
e-commerce company
founded 2002
major B2C web portals in Germany (and Europe)
verticals: travel, 鍖ights, travel packages, retail, . . .
integrated business model
10 million unique users per month (Germany, AGOF)
increased number of employees
2003: 1
2015: 1600
4. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 3
Use Case
Agenda for e-commerce companies:
take advantage of linked data
unchain datastores from schema
Requirements:
fast
robust
scalable
Users: I want it all. I want it now.
5. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 3
Use Case
Agenda for e-commerce companies:
take advantage of linked data
unchain datastores from schema
Requirements:
fast
robust
scalable
Users: I want it all. I want it now.
6. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 3
Use Case
Agenda for e-commerce companies:
take advantage of linked data
unchain datastores from schema
Requirements:
fast
robust
scalable
Users: I want it all. I want it now.
7. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 4
Typical Data Structures and Queries
hierarchical (directed) region graph
hotels and regions might have many features
typical queries: select several features of hotels
8. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 5
Example Query
PREFIX uo : <http :// ontology . u n i s t e r . de/ ontology#>
PREFIX uor : <http :// ontology . u n i s t e r . de/ r e s o u r c e/>
PREFIX u o r f : <http :// ontology . u n i s t e r . de/ h o t e l / f a c i l i t y />
PREFIX uos : <http :// ontology . u n i s t e r . de/ skos/>
SELECT d i s t i n c t ? s {
? s a uo : Hotel ;
uo : hasFeature u o r f :56 ,
u o r f :18 ,
u o r f :21 ,
u o r f :210 ,
u o r f : 5 ,
u o r f :211 ,
u o r f :34 ,
u o r f : 1 7 ;
uo : l o c a t e d I n uor : Europe ;
uo : s u i t a b l e F o r uos : Family
} LIMIT 10;
9. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 6
Experiences: standard search process
A search for attributes
...1 very selective
...2 less selective
B pick a region
C sort the results
D limit the selection
Setting:
Dataset: 71600 Hotels, resources: 278,277, literal: 3,022,583
Virtuoso: version 7.1 (fast track1
), 824 MB, bu鍖er size: 70,000
Experiments: 20 runs, charts show median
1
https://github.com/v7fasttrack/virtuoso-opensource/tree/feature/emergent
10. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 6
Experiences: standard search process
A search for attributes
...1 very selective
...2 less selective
B pick a region
C sort the results
D limit the selection
Setting:
Dataset: 71600 Hotels, resources: 278,277, literal: 3,022,583
Virtuoso: version 7.1 (fast track1
), 824 MB, bu鍖er size: 70,000
Experiments: 20 runs, charts show median
1
https://github.com/v7fasttrack/virtuoso-opensource/tree/feature/emergent
11. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 6
Experiences: standard search process
A search for attributes
...1 very selective
...2 less selective
B pick a region
C sort the results
D limit the selection
Setting:
Dataset: 71600 Hotels, resources: 278,277, literal: 3,022,583
Virtuoso: version 7.1 (fast track1
), 824 MB, bu鍖er size: 70,000
Experiments: 20 runs, charts show median
1
https://github.com/v7fasttrack/virtuoso-opensource/tree/feature/emergent
12. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 7
Requirements for Industrial Applicability (in e-commerce)
requirements for replacing
traditional databases:
fast: short response time
search query re鍖nement
shorter response time
robust: similar answer times
easy to scale up
system resource e鍖cient
requirements not ful鍖lled
13. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 7
Requirements for Industrial Applicability (in e-commerce)
requirements for replacing
traditional databases:
fast: short response time
search query re鍖nement
shorter response time
robust: similar answer times
easy to scale up
system resource e鍖cient
requirements not ful鍖lled
14. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 8
Example Query
PREFIX uo : <http :// ontology . u n i s t e r . de/ ontology#>
PREFIX uor : <http :// ontology . u n i s t e r . de/ r e s o u r c e/>
PREFIX uorf : <http :// ontology . u n i s t e r . de/ h o t e l / f a c i l i t y />
PREFIX uos : <http :// ontology . u n i s t e r . de/ skos/>
SELECT d i s t i n c t ? s {
? s a uo : Hotel ;
uo : hasFeature uorf :56 ,
uorf :18 ,
uorf :21 ,
uorf :210 ,
uorf : 5 ,
uorf :211 ,
uorf :34 ,
uorf : 1 7 ;
uo : l o c a t e d I n uor : Europe ;
uo : s u i t a b l e F o r uos : Family
} LIMIT 10;
15. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 9
Data Preparation
hotel entity p1 p2 p3 . . . pn
hotel1 0 0 1 . . . 0
hotel2 1 0 1 . . . 1
hotel3 1 1 1 . . . 0
hotel4 1 0 1 . . . 1
...
...
...
...
...
...
hotelm 0 0 1 . . . 0
BitSet representation of (hotel) properties:
p = 0010...0
Advantages:
no index
very small
operations in-memory
easy update
easy insert
16. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 9
Data Preparation
hotel entity p1 p2 p3 . . . pn
hotel1 0 0 1 . . . 0
hotel2 1 0 1 . . . 1
hotel3 1 1 1 . . . 0
hotel4 1 0 1 . . . 1
...
...
...
...
...
...
hotelm 0 0 1 . . . 0
BitSet representation of (hotel) properties:
p = 0010...0
Advantages:
no index
very small
operations in-memory
easy update
easy insert
17. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 9
Data Preparation
hotel entity p1 p2 p3 . . . pn
hotel1 0 0 1 . . . 0
hotel2 1 0 1 . . . 1
hotel3 1 1 1 . . . 0
hotel4 1 0 1 . . . 1
...
...
...
...
...
...
hotelm 0 0 1 . . . 0
BitSet representation of (hotel) properties:
p = 0010...0
Advantages:
no index
very small
operations in-memory
easy update
easy insert
18. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 10
Data Preparation
BitSet Setting, Virtuoso adaptions:
16507 stored properties 63,109,198 B RAM used
Virtuoso: 824 MB 706 MB
Virtuoso set-up update: bu鍖er size=60000
19. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 11
Implemented Process: Virtuoso plugin
(with kind help of the Openlink team, GeoKnow Project2)
1 interpret bif:contains (workaround!)
2 request bitsets from memcache via JNI (workaround!)
3 compute hotels using bit operations on addressed bitsets
4 map hotel IDs to Virtuoso literal IDs (workaround!)
query IDs from Virtuoso via literal selection
requires special predicate for each hotel resource
5 return cursor on result set
2
This work has been supported by grants from the
European Unions 7th Framework Programme provided
for the project GeoKnow (GA no. 318159)), c.f.,
http://geoknow.eu
20. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 11
Implemented Process: Virtuoso plugin
(with kind help of the Openlink team, GeoKnow Project2)
1 interpret bif:contains (workaround!)
2 request bitsets from memcache via JNI (workaround!)
3 compute hotels using bit operations on addressed bitsets
4 map hotel IDs to Virtuoso literal IDs (workaround!)
query IDs from Virtuoso via literal selection
requires special predicate for each hotel resource
5 return cursor on result set
2
This work has been supported by grants from the
European Unions 7th Framework Programme provided
for the project GeoKnow (GA no. 318159)), c.f.,
http://geoknow.eu
21. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 11
Implemented Process: Virtuoso plugin
(with kind help of the Openlink team, GeoKnow Project2)
1 interpret bif:contains (workaround!)
2 request bitsets from memcache via JNI (workaround!)
3 compute hotels using bit operations on addressed bitsets
4 map hotel IDs to Virtuoso literal IDs (workaround!)
query IDs from Virtuoso via literal selection
requires special predicate for each hotel resource
5 return cursor on result set
2
This work has been supported by grants from the
European Unions 7th Framework Programme provided
for the project GeoKnow (GA no. 318159)), c.f.,
http://geoknow.eu
22. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 12
Preliminary Results of A: properties in BitSets
Observations:
more complex
less response time
stable response
times
warmup required
23. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 13
Preliminary Results of B: non-selective property in Virtuoso
Observations:
less selective
feature answered
within Virtuoso
has largest impact
on computation
time
24. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 14
Preliminary Results of C: order by
Observations:
sorting is not
done in BitSet,
but might be
possible to
implement in the
future
25. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 15
Preliminary Results D: limit 10
Observations:
limit is not done
in BitSet, but
might be possible
to implement in
the future
26. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 16
Discussion
Summary:
proven good performance
query time is robust
very resource e鍖cient
no schema required
if a star pattern is
recognizable, then use bitset
optimization
ToDos (not production ready):
overcome workarounds
tighten the integration
generalize interface
extend to ElasticSearch
Virtuoso with full-text index
cluster)
27. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 16
Discussion
Summary:
proven good performance
query time is robust
very resource e鍖cient
no schema required
if a star pattern is
recognizable, then use bitset
optimization
ToDos (not production ready):
overcome workarounds
tighten the integration
generalize interface
extend to ElasticSearch
Virtuoso with full-text index
cluster)
28. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 16
Discussion
Summary:
proven good performance
query time is robust
very resource e鍖cient
no schema required
if a star pattern is
recognizable, then use bitset
optimization
ToDos (not production ready):
overcome workarounds
tighten the integration
generalize interface
extend to ElasticSearch
Virtuoso with full-text index
cluster)
29. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 16
Discussion
Summary:
proven good performance
query time is robust
very resource e鍖cient
no schema required
if a star pattern is
recognizable, then use bitset
optimization
ToDos (not production ready):
overcome workarounds
tighten the integration
generalize interface
extend to ElasticSearch
Virtuoso with full-text index
cluster)
30. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 16
Discussion
Summary:
proven good performance
query time is robust
very resource e鍖cient
no schema required
if a star pattern is
recognizable, then use bitset
optimization
ToDos (not production ready):
overcome workarounds
tighten the integration
generalize interface
extend to ElasticSearch
Virtuoso with full-text index
cluster)
31. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 17
Take Away Messages
e-commerce use case requires short and robust request times
BitSet-driven extension has proven its value
basic requirements of e-commerce scenario ful鍖lled
still 鍖exible (schemaless), but performant
taking advantage of external data structures is possible (in
Virtuoso)
Dr. Andreas Both
Head of Research
and Development
Unister GmbH,
Leipzig, Germany
andreas.both@unister.de
+49 341 65050 24496
http://www.unister.de
32. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 17
Take Away Messages
e-commerce use case requires short and robust request times
BitSet-driven extension has proven its value
basic requirements of e-commerce scenario ful鍖lled
still 鍖exible (schemaless), but performant
taking advantage of external data structures is possible (in
Virtuoso)
Dr. Andreas Both
Head of Research
and Development
Unister GmbH,
Leipzig, Germany
andreas.both@unister.de
+49 341 65050 24496
http://www.unister.de
33. Dr. Andreas Both, Head of R & D, Unister LDBC, Barcelona, 2015-03-20 際際滷 17
Take Away Messages
e-commerce use case requires short and robust request times
BitSet-driven extension has proven its value
basic requirements of e-commerce scenario ful鍖lled
still 鍖exible (schemaless), but performant
taking advantage of external data structures is possible (in
Virtuoso)
Dr. Andreas Both
Head of Research
and Development
Unister GmbH,
Leipzig, Germany
andreas.both@unister.de
+49 341 65050 24496
http://www.unister.de