This document provides guidance on defining and designing fact and dimension tables for a data warehouse. It discusses determining the grain and timing of business processes and data, different types of fact grains including transaction, periodic snapshot and accumulating snapshot grains. It also covers additive and semi-additive measure types, grouping dimensions by affinity, and using behavioral dimensions. Going forward, the team needs to analyze the bankruptcy business process, identify fact and dimension attributes, and populate a template to design the data warehouse tables.
1 of 4
More Related Content
First Steps to Define Grain
1. Define the Grain and Timing (pg. 42, 62, 67)
Grain and Timing in terms of the Business Unit / Process.
o to be defined on the legend worksheet
Grain and Timing in terms of Dimensionality
o to be defined on the data point worksheet
Grain and Timing in terms of the Atomic, Unique Data Source
o in terms of atomic data, what makes the data source(s) unique?
Different Grains and Timing Require Separate Fact Tables
o Measures at Different Grains Cause Issues when Focusing on 1 Measure from 1 Individual Grain
(pg. 70)
o Different grains such as Orders and Shipments should have separate fact tables to ensure we
can study each individual process and that it has browsability on its own separate from other
processes. (pg. 118)
o We can use Drill-Across to Combine Measures at a like grain from separate fact tables by using
common dimension attributes to navigate between different fact tables. (pg. 73)
Fact Grain Types (pg. 259)
Transaction Grain
o Tracks the individual activities (business artifacts such as order_line, transactionID) that define
a business process and supports several facts that describe these activities. It can provide rich
analytic possibilities, and often serves as a granular repository of atomic data.
o Tends to be sparse.
o Mostly contain additive facts.
o Can track a series of steps, or status in terms quantification, but notoriously difficult to use
when studying the elapsed spent at one, or more stages.
o Dont store the balance with each transaction.
Periodic Snapshot Grain (pg. 265)
o Periodically samples status measurements such as balances, or levels. These measurements
may be equivalent to the cumulative effect of a series of transactions but are not easy to study
in that format. Some status measurements, such as temperature, cannot be modeled as
transactions at all.
o Almost always defined by dimensionality. 2 Parts to Define Grain: Snapshot Period and 1 or
More Dimensions to be sampled at the end of the period.
o Tends to be very dense.
o Tend to be semi-additive since aggregating (sum) over the snapshot periods does not make
sense. May be non-additive like temperature, or gauge readings.
o Can be embellished with period begin and end balances as well as related fully-additive facts
that summarize the period. May also include a sum that can be used to compute the average
balance, or level.
2. Accumulating Snapshot Grain (pg. 278)
o Used to track the progress of a readily available, individual item through a series of processing
steps. It enables the study of elapsed time between major process milestones, or events. This
kind of fact table correlates several different activities in a single row.
o 1 row for each instance of the item undergoing the business process. This row is repeatedly
updated as the process unfolds.
o Includes multiple day_keys, each representing the achievement of an important milestone.
Facts record the elapsed time associated with each stage. Additional facts may be present that
track significant values at the time of each milestone.
Measure Types, Additivity (pg. 39, 268)
Additive
o May be summed up across any and all of the dimensions in the schema, producing a useful
result.
Semi-Additive
o Cannot be summed across the date / time dimensions.
o Can be aggregated across date / time using average (requires care), maximum, and minimum.
o As long as the fact is not summed across multiple snapshot periods, it remains additive
Non-Additive
o Rates
o Percentages (Margin Rate)
o Gauge Readings, Temperatures
o Can be Broken Down into Underlying Components that are Additive
Group Dimensions by Affinity (pg. 36, 117)
Major Categories of Reference Information
Strong Business Significance
Bear Various Relationships to One Another
Junk Dimensions have no Affinity
Explicit Dimension Relationships Describe Context (pg. 36, 115)
Implicit Dimension Relationships Describe Affinities (pg. 36, 116)
Attributes that can be Related in One Way, or One Context (Products and Brands)
Browsability Test (pg. 118)
Behavioral Dimensions (pg. 35, 141)
A Dimension that is Computed Based on a Fact
Groups or Filters Facts Based on the Past Behavior on Members of a Dimension
May Capture A Past Association with another Dimension, fact, or categorization of a fact
Less Frequent Updates of these Behavioral Groupings are ideal
o More frequent update requirements may justify computing at report execution
Uses Behavior Patterns to Analyze Facts
Are Customer who generate over 1 Million in Sales receiving better discounts than those who generate
500,00 or less?
Orders are Tracked by a Fact, so how do we filter, group on a measure instead of a dimension?
We must group Customers based on their Past Order Activity.
These measure Groupings are then used as a Dimension to study discounts.
3. Going Forward | Data Point Worksheet
Team,
Going forward, after we have completed listing all of the data points from our assigned SSRS reports in the Template,
Samir and I will continue by completing the research necessary to fill in the columns that have light blue headers. If you
refer to the Legend worksheet, you will see a light blue color bar labeled Source Information. This source information
should be an atomic data source, meaning at the lowest level of sources.
The Source Information Columns include the following columns:
Source_System
Source_Database
Source_Schema
Table (DataMart)
Key (Y/N)
Null (Yes/No)
DataType
While (at the same time) we are mapping the Source Information Columns (above) that we will be entering into the
Excel Template, we all need to be considering and doing a mental analysis on the following topics:
Define the Grain and Timing of the Bankruptcy Business Unit / Process
o In Business Terms
o In Dimensionality Terms
o In Atomic, Unique Data Source Identifiers
Understand the Required / Desired Fact Grain Types
o Transaction Grain
o Periodic Snapshot Grain
o Accumulating Snapshot Grain
Consider the Additivity of Fact Tables and Measures
o Additive
o Semi-Additive
o Non-Additive
Separate Data Points into Facts by Measurability, Grain, and Timing (use Data Points worksheet Fact Information
columns in Light Green and in Column B, indicate either Fact, Dim, or Calc)
Separate Data Points into Dimensions by Context (use Data Points worksheet Fact Information columns in Light
Orange and in Column B, indicate either Fact, Dim, or Calc)
o Group Data Point Attributes in Dimensions Together by Major Categories, Strong Business Significance,
Affinity, and Browsability Test
Separate Data Points into Calculations by Formulas (use Data Points worksheet Fact Information columns in
Light Purple and in Column B, indicate either Fact, Dim, or Calc)
Investigate any Needs for Behavioral Dimensions
As soon as we finish mapping the Source Information, and we do the mental analysis of described above, we can
populate the Template Columns Light Green (facts), Orange (Dimensions), and Purple (calculations) columns.
4. Once those sections are complete along with the General Data Point Information Columns (Light Yellow), we will use the
First Column, Grouping to group the data points weather they are fact / calculation, or dimensions into the appropriate
fact and dimension tables.
Please let me know if you have any questions, or need any help getting started. I am available to have a phone call, or
share a desktop.
Regards,
Ryan