�ݺ�ߣ

Abadi, Marcus, Madden, Hollenbach
VLDB 2007

Presented by: {Gui}llermo Cabrera
The University of Texas at Austin

 Problem
 Storage Goal
 RDBMS use
 RDF Physical Organization
 Column store vs. Row Store
 Materialized Path Expressions
 Experiment & Results
 Discussion

 Performance: Self-joins
 Many triples

 Achieve scalability & performance in triple
storage
 Survey approaches in RDBMS
 Benefits of vertical partition and column
store

 1 table with 3 indexed columns?
 Multi layer architecture
◦ Translate -> Optimize -> Execute
 Mapping tables for long URI and literals
 Jena, Oracle, Sesame, 3store (Hyunjun),
Hexastore (Donghyuk)

 Property tables
◦ Clustered property table
 Denormalize RDF (wider tables)
 Clustering algorithm
 NULL values

Review: Scalable Semantic Web Data Management Using Vertical Partitioning

 Property tables
◦ Property-Class Tables
 Exploit the type property
 Properties may exist in multiple tables

 Advantage:
◦ Fewer joins
 Disadvantage:
◦ NULL values
◦ Multivalued attributes are complicated

 Vertical Partition
◦ n two-column tables, n = # of unique properties
◦ Table sorted by subject
 Merge join

• Advantage
 Multi valued attributes supported
 No clustering algorithm (Property tables)
 Only accessed properties are read
• Disadvantage
 Use of multiple properties (table joins)
 Inserts expensive

 Triple Store
 Property Table
 Vertical Partition (Row Store)
 Vertical Partition Store (Column Store)

 Why?
 Projection is free
 Tuple headers (metadata on row)
◦ 35 bytes in Postgres vs. 8 bytes in C-Store
 Column oriented compression
◦ Run-length encoding (ex. 1,1,1,2,2  1x3, 2x2)
 Optimized merge join
◦ Prefetching

<BookID1, Author, http://preamble/FoxJoe>
<http://preamble/FoxJoe,wasBorn, “1860”>

Find all books whose authors were born in
1860

 Barton Libraries Dataset
 Longwell Queries
◦ Calculating counts
◦ Filtering
◦ Inference

 8.3 GB – Triple Store (Postgres)
 14 GB – Property Table (Postgres)
 5.2 GB – Vertically Partitioned (Postgres)
 2.7 GB – Vertically Partitioned (C-store)
 Including indices and mapping table

 Replace
◦ subject-object joins  subject-subject joins

 Add 60 integer valued columns
 7 GB increase in size

 Great for reads, writes not considered
 What about load times?
 Using another benchmark (ex. LUBM)?
 Native XML databases for RDF/XML?
 Test triple store in Sesame

�ݺ�ߣ

Review: Scalable Semantic Web Data Management Using Vertical Partitioning

More Related Content

Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Editor's Notes