This document discusses data quality and provides facts about the high costs of poor data quality to businesses and the US economy. It defines data quality as ensuring data is "fit for purpose" by measuring it against its intended uses and dimensions of quality. The document outlines best practices for measuring data quality including profiling data to understand metadata and trends, using statistical process control, master data management to create standardized "gold records", and implementing a data governance program to centrally manage data quality.
2. Data Quality Facts
Cost of poor data quality in US - $600 Billion
Poor Data/Lack of visibility cited as #1 reason for
project cost overruns
Poor data quality costs the US Economy $3.1 Trillion a
year
Implementing data quality best practices boosts
revenue by 66%
Median Fortune 1000 company could increase
revenue by $2.01 Billion if they improved usability of
data by 10%
Source: http://www.webmastat.com/blog/2012/09/07/7-facts-about-data-quality/
3. What is Data Quality?
Measuring data to determine if it is
fit for purpose
4. Fit For Purpose?
Bad data is a myth!
Two Questions
What is the data used for?
What can be measured to make sure it meets
the need?
Application use vs. Reporting/Analysis
5. Data Quality Dimensions
Consistency Accuracy
Correctness Objectivity
Timeliness Conciseness
Precision Usefulness
Unamiguous Usability
Completeness Relevance
Reliability Amount of data
Source: Data Quality Fundamentals, The Data Warehousing Institute
6. Measuring Data Quality
Profiling understanding metadata
Point in time shows what data looks like now
Automating shows trends
Alert to new/potential issues as they happen
Potentially fix issues in near real time
Six Sigma Principals
8. Data Profiling Analysis
Duplication Character Set
Pattern matching Reference Data
Boolean/String/Numb Matching
er Value Distribution
Date Gap Inter-Data Set
Date/time Comparisons
Day of Week
9. Master Data Management
Create a gold standard for data
Distribute data so that all sources are uniform
Names
Addresses
Phone Numbers
Products
Can hook into third party sources
10. Data Governance Program
Central authority for data quality control
Applies information collected from data
profiling, MDM, etc. Uniformly across the
business
Communication channels between business
and IT groups