際際滷

際際滷Share a Scribd company logo
Define the Grain and Timing (pg. 42, 62, 67)
 Grain and Timing in terms of the Business Unit / Process.
o to be defined on the legend worksheet
 Grain and Timing in terms of Dimensionality
o to be defined on the data point worksheet
 Grain and Timing in terms of the Atomic, Unique Data Source
o in terms of atomic data, what makes the data source(s) unique?
 Different Grains and Timing Require Separate Fact Tables
o Measures at Different Grains Cause Issues when Focusing on 1 Measure from 1 Individual Grain
(pg. 70)
o Different grains such as Orders and Shipments should have separate fact tables to ensure we
can study each individual process and that it has browsability on its own separate from other
processes. (pg. 118)
o We can use Drill-Across to Combine Measures at a like grain from separate fact tables by using
common dimension attributes to navigate between different fact tables. (pg. 73)
Fact Grain Types (pg. 259)
 Transaction Grain
o Tracks the individual activities (business artifacts such as order_line, transactionID) that define
a business process and supports several facts that describe these activities. It can provide rich
analytic possibilities, and often serves as a granular repository of atomic data.
o Tends to be sparse.
o Mostly contain additive facts.
o Can track a series of steps, or status in terms quantification, but notoriously difficult to use
when studying the elapsed spent at one, or more stages.
o Dont store the balance with each transaction.
 Periodic Snapshot Grain (pg. 265)
o Periodically samples status measurements such as balances, or levels. These measurements
may be equivalent to the cumulative effect of a series of transactions but are not easy to study
in that format. Some status measurements, such as temperature, cannot be modeled as
transactions at all.
o Almost always defined by dimensionality. 2 Parts to Define Grain: Snapshot Period and 1 or
More Dimensions to be sampled at the end of the period.
o Tends to be very dense.
o Tend to be semi-additive since aggregating (sum) over the snapshot periods does not make
sense. May be non-additive like temperature, or gauge readings.
o Can be embellished with period begin and end balances as well as related fully-additive facts
that summarize the period. May also include a sum that can be used to compute the average
balance, or level.
 Accumulating Snapshot Grain (pg. 278)
o Used to track the progress of a readily available, individual item through a series of processing
steps. It enables the study of elapsed time between major process milestones, or events. This
kind of fact table correlates several different activities in a single row.
o 1 row for each instance of the item undergoing the business process. This row is repeatedly
updated as the process unfolds.
o Includes multiple day_keys, each representing the achievement of an important milestone.
Facts record the elapsed time associated with each stage. Additional facts may be present that
track significant values at the time of each milestone.
Measure Types, Additivity (pg. 39, 268)
 Additive
o May be summed up across any and all of the dimensions in the schema, producing a useful
result.
 Semi-Additive
o Cannot be summed across the date / time dimensions.
o Can be aggregated across date / time using average (requires care), maximum, and minimum.
o As long as the fact is not summed across multiple snapshot periods, it remains additive
 Non-Additive
o Rates
o Percentages (Margin Rate)
o Gauge Readings, Temperatures
o Can be Broken Down into Underlying Components that are Additive
Group Dimensions by Affinity (pg. 36, 117)
 Major Categories of Reference Information
 Strong Business Significance
 Bear Various Relationships to One Another
 Junk Dimensions have no Affinity
 Explicit Dimension Relationships Describe Context (pg. 36, 115)
 Implicit Dimension Relationships Describe Affinities (pg. 36, 116)
 Attributes that can be Related in One Way, or One Context (Products and Brands)
 Browsability Test (pg. 118)
Behavioral Dimensions (pg. 35, 141)
 A Dimension that is Computed Based on a Fact
 Groups or Filters Facts Based on the Past Behavior on Members of a Dimension
 May Capture A Past Association with another Dimension, fact, or categorization of a fact
 Less Frequent Updates of these Behavioral Groupings are ideal
o More frequent update requirements may justify computing at report execution
 Uses Behavior Patterns to Analyze Facts
 Are Customer who generate over 1 Million in Sales receiving better discounts than those who generate
500,00 or less?
 Orders are Tracked by a Fact, so how do we filter, group on a measure instead of a dimension?
 We must group Customers based on their Past Order Activity.
 These measure Groupings are then used as a Dimension to study discounts.
Going Forward | Data Point Worksheet
Team,
Going forward, after we have completed listing all of the data points from our assigned SSRS reports in the Template,
Samir and I will continue by completing the research necessary to fill in the columns that have light blue headers. If you
refer to the Legend worksheet, you will see a light blue color bar labeled Source Information. This source information
should be an atomic data source, meaning at the lowest level of sources.
The Source Information Columns include the following columns:
 Source_System
 Source_Database
 Source_Schema
 Table (DataMart)
 Key (Y/N)
 Null (Yes/No)
 DataType
While (at the same time) we are mapping the Source Information Columns (above) that we will be entering into the
Excel Template, we all need to be considering and doing a mental analysis on the following topics:
 Define the Grain and Timing of the Bankruptcy Business Unit / Process
o In Business Terms
o In Dimensionality Terms
o In Atomic, Unique Data Source Identifiers
 Understand the Required / Desired Fact Grain Types
o Transaction Grain
o Periodic Snapshot Grain
o Accumulating Snapshot Grain
 Consider the Additivity of Fact Tables and Measures
o Additive
o Semi-Additive
o Non-Additive
 Separate Data Points into Facts by Measurability, Grain, and Timing (use Data Points worksheet Fact Information
columns in Light Green and in Column B, indicate either Fact, Dim, or Calc)
 Separate Data Points into Dimensions by Context (use Data Points worksheet Fact Information columns in Light
Orange and in Column B, indicate either Fact, Dim, or Calc)
o Group Data Point Attributes in Dimensions Together by Major Categories, Strong Business Significance,
Affinity, and Browsability Test
 Separate Data Points into Calculations by Formulas (use Data Points worksheet Fact Information columns in
Light Purple and in Column B, indicate either Fact, Dim, or Calc)
 Investigate any Needs for Behavioral Dimensions
As soon as we finish mapping the Source Information, and we do the mental analysis of described above, we can
populate the Template Columns Light Green (facts), Orange (Dimensions), and Purple (calculations) columns.
Once those sections are complete along with the General Data Point Information Columns (Light Yellow), we will use the
First Column, Grouping to group the data points weather they are fact / calculation, or dimensions into the appropriate
fact and dimension tables.
Please let me know if you have any questions, or need any help getting started. I am available to have a phone call, or
share a desktop.
Regards,
Ryan

More Related Content

First Steps to Define Grain

  • 1. Define the Grain and Timing (pg. 42, 62, 67) Grain and Timing in terms of the Business Unit / Process. o to be defined on the legend worksheet Grain and Timing in terms of Dimensionality o to be defined on the data point worksheet Grain and Timing in terms of the Atomic, Unique Data Source o in terms of atomic data, what makes the data source(s) unique? Different Grains and Timing Require Separate Fact Tables o Measures at Different Grains Cause Issues when Focusing on 1 Measure from 1 Individual Grain (pg. 70) o Different grains such as Orders and Shipments should have separate fact tables to ensure we can study each individual process and that it has browsability on its own separate from other processes. (pg. 118) o We can use Drill-Across to Combine Measures at a like grain from separate fact tables by using common dimension attributes to navigate between different fact tables. (pg. 73) Fact Grain Types (pg. 259) Transaction Grain o Tracks the individual activities (business artifacts such as order_line, transactionID) that define a business process and supports several facts that describe these activities. It can provide rich analytic possibilities, and often serves as a granular repository of atomic data. o Tends to be sparse. o Mostly contain additive facts. o Can track a series of steps, or status in terms quantification, but notoriously difficult to use when studying the elapsed spent at one, or more stages. o Dont store the balance with each transaction. Periodic Snapshot Grain (pg. 265) o Periodically samples status measurements such as balances, or levels. These measurements may be equivalent to the cumulative effect of a series of transactions but are not easy to study in that format. Some status measurements, such as temperature, cannot be modeled as transactions at all. o Almost always defined by dimensionality. 2 Parts to Define Grain: Snapshot Period and 1 or More Dimensions to be sampled at the end of the period. o Tends to be very dense. o Tend to be semi-additive since aggregating (sum) over the snapshot periods does not make sense. May be non-additive like temperature, or gauge readings. o Can be embellished with period begin and end balances as well as related fully-additive facts that summarize the period. May also include a sum that can be used to compute the average balance, or level.
  • 2. Accumulating Snapshot Grain (pg. 278) o Used to track the progress of a readily available, individual item through a series of processing steps. It enables the study of elapsed time between major process milestones, or events. This kind of fact table correlates several different activities in a single row. o 1 row for each instance of the item undergoing the business process. This row is repeatedly updated as the process unfolds. o Includes multiple day_keys, each representing the achievement of an important milestone. Facts record the elapsed time associated with each stage. Additional facts may be present that track significant values at the time of each milestone. Measure Types, Additivity (pg. 39, 268) Additive o May be summed up across any and all of the dimensions in the schema, producing a useful result. Semi-Additive o Cannot be summed across the date / time dimensions. o Can be aggregated across date / time using average (requires care), maximum, and minimum. o As long as the fact is not summed across multiple snapshot periods, it remains additive Non-Additive o Rates o Percentages (Margin Rate) o Gauge Readings, Temperatures o Can be Broken Down into Underlying Components that are Additive Group Dimensions by Affinity (pg. 36, 117) Major Categories of Reference Information Strong Business Significance Bear Various Relationships to One Another Junk Dimensions have no Affinity Explicit Dimension Relationships Describe Context (pg. 36, 115) Implicit Dimension Relationships Describe Affinities (pg. 36, 116) Attributes that can be Related in One Way, or One Context (Products and Brands) Browsability Test (pg. 118) Behavioral Dimensions (pg. 35, 141) A Dimension that is Computed Based on a Fact Groups or Filters Facts Based on the Past Behavior on Members of a Dimension May Capture A Past Association with another Dimension, fact, or categorization of a fact Less Frequent Updates of these Behavioral Groupings are ideal o More frequent update requirements may justify computing at report execution Uses Behavior Patterns to Analyze Facts Are Customer who generate over 1 Million in Sales receiving better discounts than those who generate 500,00 or less? Orders are Tracked by a Fact, so how do we filter, group on a measure instead of a dimension? We must group Customers based on their Past Order Activity. These measure Groupings are then used as a Dimension to study discounts.
  • 3. Going Forward | Data Point Worksheet Team, Going forward, after we have completed listing all of the data points from our assigned SSRS reports in the Template, Samir and I will continue by completing the research necessary to fill in the columns that have light blue headers. If you refer to the Legend worksheet, you will see a light blue color bar labeled Source Information. This source information should be an atomic data source, meaning at the lowest level of sources. The Source Information Columns include the following columns: Source_System Source_Database Source_Schema Table (DataMart) Key (Y/N) Null (Yes/No) DataType While (at the same time) we are mapping the Source Information Columns (above) that we will be entering into the Excel Template, we all need to be considering and doing a mental analysis on the following topics: Define the Grain and Timing of the Bankruptcy Business Unit / Process o In Business Terms o In Dimensionality Terms o In Atomic, Unique Data Source Identifiers Understand the Required / Desired Fact Grain Types o Transaction Grain o Periodic Snapshot Grain o Accumulating Snapshot Grain Consider the Additivity of Fact Tables and Measures o Additive o Semi-Additive o Non-Additive Separate Data Points into Facts by Measurability, Grain, and Timing (use Data Points worksheet Fact Information columns in Light Green and in Column B, indicate either Fact, Dim, or Calc) Separate Data Points into Dimensions by Context (use Data Points worksheet Fact Information columns in Light Orange and in Column B, indicate either Fact, Dim, or Calc) o Group Data Point Attributes in Dimensions Together by Major Categories, Strong Business Significance, Affinity, and Browsability Test Separate Data Points into Calculations by Formulas (use Data Points worksheet Fact Information columns in Light Purple and in Column B, indicate either Fact, Dim, or Calc) Investigate any Needs for Behavioral Dimensions As soon as we finish mapping the Source Information, and we do the mental analysis of described above, we can populate the Template Columns Light Green (facts), Orange (Dimensions), and Purple (calculations) columns.
  • 4. Once those sections are complete along with the General Data Point Information Columns (Light Yellow), we will use the First Column, Grouping to group the data points weather they are fact / calculation, or dimensions into the appropriate fact and dimension tables. Please let me know if you have any questions, or need any help getting started. I am available to have a phone call, or share a desktop. Regards, Ryan