際際滷

際際滷Share a Scribd company logo
Abadi, Marcus, Madden, Hollenbach
                       VLDB 2007




 Presented by: {Gui}llermo Cabrera
         The University of Texas at Austin
   Problem
   Storage Goal
   RDBMS use
   RDF Physical Organization
   Column store vs. Row Store
   Materialized Path Expressions
   Experiment & Results
   Discussion
   Performance: Self-joins
   Many triples
   Achieve scalability & performance in triple
    storage
   Survey approaches in RDBMS
   Benefits of vertical partition and column
    store
   1 table with 3 indexed columns?
   Multi layer architecture
     Translate -> Optimize -> Execute
   Mapping tables for long URI and literals
   Jena, Oracle, Sesame, 3store (Hyunjun),
    Hexastore (Donghyuk)
   Property tables
     Clustered property table
      Denormalize RDF (wider tables)
      Clustering algorithm
      NULL values
Review: Scalable Semantic Web Data Management Using Vertical Partitioning
   Property tables
     Property-Class Tables
      Exploit the type property
      Properties may exist in multiple tables
Review: Scalable Semantic Web Data Management Using Vertical Partitioning
   Advantage:
     Fewer joins
   Disadvantage:
     NULL values
     Multivalued attributes are complicated
   Vertical Partition
     n two-column tables, n = # of unique properties
     Table sorted by subject
      Merge join
Review: Scalable Semantic Web Data Management Using Vertical Partitioning
 Advantage
   Multi valued attributes supported
   No clustering algorithm (Property tables)
   Only accessed properties are read
 Disadvantage
   Use of multiple properties (table joins)
   Inserts expensive
   Triple Store
   Property Table
   Vertical Partition (Row Store)
   Vertical Partition Store (Column Store)
   Why?
   Projection is free
   Tuple headers (metadata on row)
     35 bytes in Postgres vs. 8 bytes in C-Store
   Column oriented compression
     Run-length encoding (ex. 1,1,1,2,2  1x3, 2x2)
   Optimized merge join
     Prefetching
<BookID1, Author, http://preamble/FoxJoe>
<http://preamble/FoxJoe,wasBorn, 1860>

Find all books whose authors were born in
  1860
Review: Scalable Semantic Web Data Management Using Vertical Partitioning
   Barton Libraries Dataset
   Longwell Queries
     Calculating counts
     Filtering
     Inference
   8.3 GB  Triple Store (Postgres)
   14 GB  Property Table (Postgres)
   5.2 GB  Vertically Partitioned (Postgres)
   2.7 GB  Vertically Partitioned (C-store)
   Including indices and mapping table
Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Review: Scalable Semantic Web Data Management Using Vertical Partitioning
   Replace
     subject-object joins  subject-subject joins
   Add 60 integer valued columns
   7 GB increase in size
   Great for reads, writes not considered
   What about load times?
   Using another benchmark (ex. LUBM)?
   Native XML databases for RDF/XML?
   Test triple store in Sesame
Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Review: Scalable Semantic Web Data Management Using Vertical Partitioning

More Related Content

Review: Scalable Semantic Web Data Management Using Vertical Partitioning

  • 1. Abadi, Marcus, Madden, Hollenbach VLDB 2007 Presented by: {Gui}llermo Cabrera The University of Texas at Austin
  • 2. Problem Storage Goal RDBMS use RDF Physical Organization Column store vs. Row Store Materialized Path Expressions Experiment & Results Discussion
  • 3. Performance: Self-joins Many triples
  • 4. Achieve scalability & performance in triple storage Survey approaches in RDBMS Benefits of vertical partition and column store
  • 5. 1 table with 3 indexed columns? Multi layer architecture Translate -> Optimize -> Execute Mapping tables for long URI and literals Jena, Oracle, Sesame, 3store (Hyunjun), Hexastore (Donghyuk)
  • 6. Property tables Clustered property table Denormalize RDF (wider tables) Clustering algorithm NULL values
  • 8. Property tables Property-Class Tables Exploit the type property Properties may exist in multiple tables
  • 10. Advantage: Fewer joins Disadvantage: NULL values Multivalued attributes are complicated
  • 11. Vertical Partition n two-column tables, n = # of unique properties Table sorted by subject Merge join
  • 13. Advantage Multi valued attributes supported No clustering algorithm (Property tables) Only accessed properties are read Disadvantage Use of multiple properties (table joins) Inserts expensive
  • 14. Triple Store Property Table Vertical Partition (Row Store) Vertical Partition Store (Column Store)
  • 15. Why? Projection is free Tuple headers (metadata on row) 35 bytes in Postgres vs. 8 bytes in C-Store Column oriented compression Run-length encoding (ex. 1,1,1,2,2 1x3, 2x2) Optimized merge join Prefetching
  • 16. <BookID1, Author, http://preamble/FoxJoe> <http://preamble/FoxJoe,wasBorn, 1860> Find all books whose authors were born in 1860
  • 18. Barton Libraries Dataset Longwell Queries Calculating counts Filtering Inference
  • 19. 8.3 GB Triple Store (Postgres) 14 GB Property Table (Postgres) 5.2 GB Vertically Partitioned (Postgres) 2.7 GB Vertically Partitioned (C-store) Including indices and mapping table
  • 23. Replace subject-object joins subject-subject joins
  • 24. Add 60 integer valued columns 7 GB increase in size
  • 25. Great for reads, writes not considered What about load times? Using another benchmark (ex. LUBM)? Native XML databases for RDF/XML? Test triple store in Sesame

Editor's Notes

  1. RDF as series of triples SPOPerformance: Self-joins, Low speed (# triples &gt; memory)Need to manage large number of triplesBillion Triple Challenge (semanticweb.org)
  2. Self joins become PROBLEMATIC when the LESS selective the predicates.Mapping table 1 clustered (identifiers) and 1 unclsutered index
  3. Jena2 were first to proposeBasic idea is to cluster properties that tend to be DEFINED together (type title and copyrithg date). Also, LEFT OVER TriplesWhy fewer joins? Self joins on the subject column can be eliminated.Tradeoff narrow tables = less sparse = more tables used; wide table = more space = less joins.
  4. Property may exist in MLTIPLE property class tables Good for reified statements.
  5. Exploit Type propertyReified statements
  6. Object Relational Bag structure
  7. Tuple header dominates size of actual data resulting in table
  8. Multi-valued subjects as multiple rowsNo clustering algorithm
  9. Postgres has 27 byte tuple header, compare 8 byes to 35 bytesMerge join uses prefetching to avoid seeks between columns.
  10. Why? Row store to much overhead on vertical partition
  11. For VP not merge joins.PRECALCULATe these expressions, as 2-column tableGood: inference queries (of form x party of y, y part of z, then x part of z)Bad: many tables
  12. Convert from RDF/XML to triples using REDLAND50 million triples, 221 unique properties, multivalued
  13. Average of 3 runs of the queries.VP and PT factor of 2-3 faster than triple store.C-store is 32 times faster than triple storeQ1: PT and VP identical because use of idealized property tables.Q2: Avoids subject-subject joinsQ3: multiple sequential scans.Q4: High selectivityQ5:
  14. Involves all triples of property TYPE and count of object valuesNo join for Triple storePT and VP have same schema. {Type: subject, object}
  15. 1 million to 50 million, run only query 6. linearly except triple storeall joins for this query are linear for vertical partitioningtriple-store sorts the intermediate results after performing the three selections and before performing the merge join
  16. For PT, add new column with MPEFor VP, add add table containing, subject column and a Records:Type object column.
  17. What is purpose of test???
  18. LUBM, universities, departments, students etc.15 MILLION triples
  19. Display list of PROPERTIES defined for resources of &quot;Type -&gt; Text&quot;Multiple sequential scans