The document discusses physical database design for data warehousing. It notes that physical design focuses on arranging data on disk for performance, while balancing manageability and load performance. To improve query performance, the design precomputes as much data as possible and builds data structures like partitions, materialized views, and indexes, but these require additional disk space and load time. Common estimates are that indexes are 100-200% the size of base data, while aggregates and materialized views are 100% the size of base data, with extra structures totaling 2-3 times the size of base data.
3. Logical database design
What are the facts and dimensions?
Goals: Simplicity, Expressiveness
Make the database easy to understand
Make queries easy to ask
4. Physical database design
How should the data be arranged on disk?
Goal: Performance
▪ Manageability is an important secondary concern
Make queries run fast
5. Trade-off between query performance and
load performance
6. To make queries run fast:
Precompute as much as possible
Build lots of data structures
▪ Partitions
▪ Materialized views
▪ Indexes
7. But…
Data structures require disk space to store
Building/updating data structures takes
time
More data structures → longer load time
8. Base data
Fact tables and dimension tables
Fact table space >> Dimension table space
Indexes
100%-200% of base data
Aggregates / Materialized Views
100% of base data
Extra data structures 2-3 times size of base data
9. The Data Warehouse Toolkit.Second Edition.The
Complete Guide to Dimensional Modeling.Ralph
Kimball.Margy Ross