The document summarizes a presentation about data quality testing. It discusses examples of data quality issues that resulted in significant losses and errors. It then outlines different types of data quality checks like row counts, consistency, referential integrity, completeness and accuracy that are important to validate data quality. The presentation emphasizes that data quality testing is important to make accurate decisions and improve the health of data.
1 of 28
Downloaded 35 times
More Related Content
Test2008 Resurrecting The Prodigal Son Data Quality (http://www.geektester.blogspot.com)
2. Speakers Bhoomika Goyal Working @ Microsoft for over an year Engineer from Mumbai Loves playing Chess, Solving Puzzles and Reading Raj W orking @ Microsoft Business Intelligence COE 5.5 + years of Testing experience Loves watching movies, reading suspense thrillers & playing cricket Passion - Testing ( http://www.itest.co.nr ) www.Test2008.in
3. Horror Story Loss: $ 125 million Reason: Discrepancy between the two measures (rocket thrusts to newtons) NASA Mars Climate Orbiter spacecraft LOST www.Test2008.in
4. Bad, Bad, Bad Data Quality www.Test2008.in Erroneous Mailing hit $611 billion for US businesses in 2002
5. DQ is not my problem? Think Again !!!!! www.Test2008.in
6. DQ Hot Candidates www.Test2008.in Data Movement Migrations Backups Restore Import Export Data Warehousing Business Intelligence OLTP OLAP CRM ERP
7. DQ Ishikawa Diagram www.Test2008.in Bad Decisions (Loss $ & Customers) DQ Reqmts not documented Lack of white box testing Data is dynamic CRM & ERPs Implementations Mergers / Take Over
10. www.Test2008.in Data Quality Testing Involves validating , monitoring & reporting various attributes of Data like accuracy , validity , timeliness etc
19. CD Mail Fraud Man received 22,260 CDs at discounted price by making each address different enough www.Test2008.in David Loshin 123 Main Street Any town, NY 11787 David Loshin 123 Main Street, Near Wal-Mart Any town, NY 11787
24. How do we test DQ? www.Test2008.in DQ Rule Engine Metadata Results Create Procedure RowCount (SrcTbl, TgtTbl) Begin Declare SRC, TGT Integer Select SRC = /slideshow/test2008-resurrecting-the-prodigal-son-data-quality-share-presentation/662033/Count(*) from SrcTbl Select TGT = /slideshow/test2008-resurrecting-the-prodigal-son-data-quality-share-presentation/662033/Count(*) from TgtTbl) If SRC = TGT Then Return PASS Else Return SRC TGT End If End Metadata Results Row Count Logic Duplicate Logic Create Procedure Duplicate(Tbl) Begin Declare Dup Integer Select Dup = Count of Select * from Tbl GroupBy <<ColumnList>> Having count(*) > 1 If Dup = 0 Then Return PASS Else Return Dup End If End End Rule Tbl1 Tbl2 RC Emp Emp RI Emp Dept DC HR HR Rule Result Comment RC Pass - RI Fail 10 DC Pass -
25. You cant improve what you cant measure www.Test2008.in Threshold Time 5 % 10 % 100 % Data Quality Red: BAD DQ Yellow: Watch it Green: Good DQ
26. DQ Testing is your friend !!! High Data (Test) Coverage Automation (Manual Effort Reduction) High confidence about your data Accurate Decisions www.Test2008.in