ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
DATA WAREHOUSING
Multi Dimensional
Data Modeling.
Facts and Dimensions
2
ï‚¡ While an entity-relationship
     modeling approach from relational
     database design could be used, the
     dimensional modeling approach to
     logical design is more often used
     for a data warehouse.



3
ï‚¡ End users cannot understand,
     remember, navigate an E/R model
     (not even with a GUI)
      One reason is that an enterprise-
       level ERM would be too complex to
       understand.



4
ï‚¡ Software cannot usefully query an
     E/R model




5
 Use of E/R modeling doesn’t meet
     the DW purpose: intuitive and high
     performance querying




6
Employee_Dim
                         EmployeeKey
                         EmployeeID
                         .
                         .
                         .

      Dimension Table
    Time_Dim                 Fact Table            Product_Dim
TimeKey                  Sales_Fact                ProductKey
TheDate                  TimeKey                   ProductID
.                                                  .
.                        EmployeeKey               .
.                        ProductKey                .
                         CustomerKey
                         ShipperKey
                         $
                         .
                         .
                         .
           Shipper_Dim                    Customer_Dim
           ShipperKey                     CustomerKey
           ShipperID                      CustomerID
           .                              .
           .                              .
           .                              .
7
Several distinct dimensions, combined with
                        facts, enable you to answer business
     Dimension          questions.
       Tables
    Geographic   Dimension
                                                         Measures
                                     Fact Table
                       Geographic   Product   Time   Units   $
    Product                                                          Facts




    Time




8
Dimensions
ï‚¡ They are normally textual and
    descriptive descriptions of the
    business.




9
Dimensions
ï‚¡ dimension tables contain relatively
     small amounts of relatively static data




10
Dimensions
ï‚¡ dimension table: usually not-
     normalized




11
Dimensions
ï‚¡    Independent of each other, not hierarchically
     related




12
ï‚¡ Dimensional attributes (attributes
     no key) help to describe the
     dimensional value.

                          Dimensional attributes




13
Facts
ï‚¡ Fact are (usually numerical) measures
     of business.




14
Facts
ï‚¡ Fact table is the largest table in the
     star schema and is composed of large
     volumes of data




15
Facts
ï‚¡ Fact table is (often) normalized




16
Facts
ï‚¡ fact table has a composite primary key
     made up of foreign keys

                               PK = FKi




17
Facts
ï‚¡ fact table usually contains one or more
     numerical facts that occur for the
     combination of keys that define each
     record



                           measures



18
Facts
ï‚¡ A fact table contains either detail-level
     facts or facts that have been
     aggregated (summary tables)


                  Σ




19
Facts
ï‚¡ Facts are:
      additive
      semi-additive
      non-additive




20
Facts
ï‚¡ Non-additive facts cannot be added at all.
   An example of this is averages.
ï‚¡ Semi-additive facts can be aggregated along some of
  the dimensions and not along others:
   current_Balance is a semi-additive fact as it makes
    sense to add them up for all accounts (what's the
    total current balance for all accounts in the bank?)
    but it does not make sense to add them up through
    time (adding up all current balances for a given
    account for each day of the month does not give us
    any useful information
ï‚¡ The most useful measures are: Numeric, Additive
21
ï‚¡ Atomic level of data of the business
  process
ï‚¡ A definition of the highest level of
  detail that is supported in a data
  warehouse



22
ï‚¡ A fact table usually contains facts
  with the same level of aggregation
ï‚¡ a proper dimensional design allows
  only facts of a uniform grain (the same
  dimensionality) to coexist in a single
  fact table


23
ï‚¡   Some perfectly good fact tables represent
     measurements that have no facts! This kind of
     measurements is often called an event. The
     classic example of such a factless fact table is a
     record representing a student attending a class
     on a specific day. The dimensions are Day,
     Student, Professor, Course, and Location, but
     there are no obvious numeric facts. The tuition
     paid and grade received are good facts but not
     at the grain of the daily attendance.

24
ï‚¡   Dimensions without attributes. (Such as a
     transaction number or order number.)
 ï‚¡   Put the attribute value into the fact table
     even though it is not an additive fact.




25
26
Employee_Dim
                                   EmployeeKey
                                   EmployeeID
Fact table provides statistics     .
                                   .
for sales broken down by           .
product, time, employee, shipper
and customer, dimensions

  Time_Dim                                                      Product_Dim
  TimeKey                          Sales_Fact                   ProductKey
  TheDate                          TimeKey
                                   TimeKey                      ProductID
  .                                                             .
  .                                EmployeeKey                  .
  .             Dimensional Keys   ProductKey                   .
                                                      Multipart Key
                                   CustomerKey
                                   ShipperKey
                                   $
                                   .                   Measures
                                   .
                                   .
            Shipper_Dim                           Customer_Dim
            ShipperKey                            CustomerKey
            ShipperID                             CustomerID
            .                                     .
            .                                     .
            .                                     .
 27
28
1.    Choosing the data mart for the
       small group of end users we deal
       with.
       Choose a business process to
        model, e.g., orders, invoices, etc.


29
2.   Fact table granularity (the smallest
      defined level of data in the table) is
      determined.




30
3.    Fact table dimensions are selected.
       Choose the dimensions that will
        apply to each fact table record
       Add dimensions for "everything
        you know" about this grain.


31
4.    Determine the facts for the table. In
       most cases, the granularity is at the
       transaction level, so the fact is the
       amount.
       Choose the measure that will
         populate each fact table record
       Add numeric measured facts true to
         the grain
32
ï‚¡   The Data Warehouse Toolkit.Second
    Edition.The Complete Guide to Dimensional
    Modeling.Ralph Kimball.Margy Ross

More Related Content

Dw design 1_dim_facts

  • 1. DATA WAREHOUSING Multi Dimensional Data Modeling. Facts and Dimensions
  • 2. 2
  • 3. ï‚¡ While an entity-relationship modeling approach from relational database design could be used, the dimensional modeling approach to logical design is more often used for a data warehouse. 3
  • 4. ï‚¡ End users cannot understand, remember, navigate an E/R model (not even with a GUI)  One reason is that an enterprise- level ERM would be too complex to understand. 4
  • 5. ï‚¡ Software cannot usefully query an E/R model 5
  • 6. ï‚¡ Use of E/R modeling doesn’t meet the DW purpose: intuitive and high performance querying 6
  • 7. Employee_Dim EmployeeKey EmployeeID . . . Dimension Table Time_Dim Fact Table Product_Dim TimeKey Sales_Fact ProductKey TheDate TimeKey ProductID . . . EmployeeKey . . ProductKey . CustomerKey ShipperKey $ . . . Shipper_Dim Customer_Dim ShipperKey CustomerKey ShipperID CustomerID . . . . . . 7
  • 8. Several distinct dimensions, combined with facts, enable you to answer business Dimension questions. Tables Geographic Dimension Measures Fact Table Geographic Product Time Units $ Product Facts Time 8
  • 9. Dimensions ï‚¡ They are normally textual and descriptive descriptions of the business. 9
  • 10. Dimensions ï‚¡ dimension tables contain relatively small amounts of relatively static data 10
  • 11. Dimensions ï‚¡ dimension table: usually not- normalized 11
  • 12. Dimensions ï‚¡ Independent of each other, not hierarchically related 12
  • 13. ï‚¡ Dimensional attributes (attributes no key) help to describe the dimensional value. Dimensional attributes 13
  • 14. Facts ï‚¡ Fact are (usually numerical) measures of business. 14
  • 15. Facts ï‚¡ Fact table is the largest table in the star schema and is composed of large volumes of data 15
  • 16. Facts ï‚¡ Fact table is (often) normalized 16
  • 17. Facts ï‚¡ fact table has a composite primary key made up of foreign keys PK = FKi 17
  • 18. Facts ï‚¡ fact table usually contains one or more numerical facts that occur for the combination of keys that define each record measures 18
  • 19. Facts ï‚¡ A fact table contains either detail-level facts or facts that have been aggregated (summary tables) Σ 19
  • 20. Facts ï‚¡ Facts are:  additive  semi-additive  non-additive 20
  • 21. Facts ï‚¡ Non-additive facts cannot be added at all.  An example of this is averages. ï‚¡ Semi-additive facts can be aggregated along some of the dimensions and not along others:  current_Balance is a semi-additive fact as it makes sense to add them up for all accounts (what's the total current balance for all accounts in the bank?) but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information ï‚¡ The most useful measures are: Numeric, Additive 21
  • 22. ï‚¡ Atomic level of data of the business process ï‚¡ A definition of the highest level of detail that is supported in a data warehouse 22
  • 23. ï‚¡ A fact table usually contains facts with the same level of aggregation ï‚¡ a proper dimensional design allows only facts of a uniform grain (the same dimensionality) to coexist in a single fact table 23
  • 24. ï‚¡ Some perfectly good fact tables represent measurements that have no facts! This kind of measurements is often called an event. The classic example of such a factless fact table is a record representing a student attending a class on a specific day. The dimensions are Day, Student, Professor, Course, and Location, but there are no obvious numeric facts. The tuition paid and grade received are good facts but not at the grain of the daily attendance. 24
  • 25. ï‚¡ Dimensions without attributes. (Such as a transaction number or order number.) ï‚¡ Put the attribute value into the fact table even though it is not an additive fact. 25
  • 26. 26
  • 27. Employee_Dim EmployeeKey EmployeeID Fact table provides statistics . . for sales broken down by . product, time, employee, shipper and customer, dimensions Time_Dim Product_Dim TimeKey Sales_Fact ProductKey TheDate TimeKey TimeKey ProductID . . . EmployeeKey . . Dimensional Keys ProductKey . Multipart Key CustomerKey ShipperKey $ . Measures . . Shipper_Dim Customer_Dim ShipperKey CustomerKey ShipperID CustomerID . . . . . . 27
  • 28. 28
  • 29. 1. Choosing the data mart for the small group of end users we deal with.  Choose a business process to model, e.g., orders, invoices, etc. 29
  • 30. 2. Fact table granularity (the smallest defined level of data in the table) is determined. 30
  • 31. 3. Fact table dimensions are selected.  Choose the dimensions that will apply to each fact table record  Add dimensions for "everything you know" about this grain. 31
  • 32. 4. Determine the facts for the table. In most cases, the granularity is at the transaction level, so the fact is the amount.  Choose the measure that will populate each fact table record  Add numeric measured facts true to the grain 32
  • 33. ï‚¡ The Data Warehouse Toolkit.Second Edition.The Complete Guide to Dimensional Modeling.Ralph Kimball.Margy Ross