
際際滷Share a Scribd company logo
Resurrecting the Prodigal Son - Data Quality     Rise from Ashes: Battle of Data Quality Testing
Speakers Bhoomika Goyal Working @ Microsoft for over an year Engineer from Mumbai Loves playing Chess, Solving Puzzles and Reading Raj   W orking @ Microsoft  Business Intelligence COE 5.5  +  years of  Testing  experience  Loves watching  movies, reading suspense thrillers & playing cricket Passion -  Testing ( http://www.itest.co.nr  ) www.Test2008.in
Horror Story   Loss: $ 125 million    Reason: Discrepancy   between the two    measures (rocket    thrusts to newtons) NASA Mars Climate Orbiter spacecraft LOST  www.Test2008.in
Bad, Bad, Bad Data Quality www.Test2008.in Erroneous Mailing hit  $611 billion  for US  businesses in 2002
DQ is not my problem? Think Again !!!!!  www.Test2008.in
DQ Hot Candidates www.Test2008.in Data Movement Migrations Backups Restore Import Export Data Warehousing Business Intelligence OLTP OLAP CRM ERP
DQ  Ishikawa Diagram www.Test2008.in Bad  Decisions  (Loss $ & Customers) DQ Reqmts not documented  Lack of white box testing Data is dynamic CRM & ERPs  Implementations Mergers / Take Over
www.Test2008.in Data Quality DQ is an  indicator  that tells about the health of the DATA
www.Test2008.in GOOD Data Quality DQ is  good  if data is fit to use for  decision making
www.Test2008.in Data Quality Testing Involves  validating ,  monitoring  &  reporting  various attributes of Data  like  accuracy ,  validity ,  timeliness  etc
DQ Checks www.Test2008.in Row Counts Consistency Referential Integrity Redundancy Usability Completeness Domain Integrity Timeliness Accuracy Validity
Row Count Check www.Test2008.in
Completeness Check www.Test2008.in
Among Voters seen Dead People www.Test2008.in US General Election: 4,755 deceased  people voted
Consistency Check www.Test2008.in
A One-House, $400 Million Bubble Goes Pop www.Test2008.in $1,21, 000 overvalued at $ 400 million Govt. Expected $8 million as Tax Revenue
Accuracy Check www.Test2008.in
Validity Check www.Test2008.in
CD Mail Fraud Man received 22,260  CDs at discounted price by making  each address different enough www.Test2008.in David Loshin  123 Main Street  Any town, NY 11787 David Loshin  123 Main Street, Near Wal-Mart  Any town, NY 11787
Redundancy Check www.Test2008.in
Referential Integrity Check www.Test2008.in
Domain Integrity Check www.Test2008.in
Timeliness www.Test2008.in
How do we test DQ? www.Test2008.in DQ Rule Engine Metadata Results Create Procedure RowCount (SrcTbl, TgtTbl) Begin Declare SRC, TGT Integer Select SRC = /slideshow/test2008-resurrecting-the-prodigal-son-data-quality-share-presentation/662033/Count(*) from SrcTbl Select TGT = /slideshow/test2008-resurrecting-the-prodigal-son-data-quality-share-presentation/662033/Count(*) from TgtTbl) If SRC = TGT Then Return PASS Else Return SRC  TGT End If End Metadata Results Row Count Logic Duplicate  Logic Create Procedure Duplicate(Tbl) Begin Declare Dup Integer Select Dup = Count of  Select * from Tbl  GroupBy <<ColumnList>>  Having count(*) > 1 If Dup = 0 Then   Return PASS Else   Return Dup End If End End Rule Tbl1 Tbl2 RC Emp Emp RI Emp Dept DC HR HR Rule Result Comment RC Pass - RI Fail 10 DC Pass -
You cant improve what you cant measure www.Test2008.in Threshold Time 5 % 10 % 100 % Data Quality Red: BAD DQ Yellow: Watch it Green: Good DQ
DQ Testing is your friend !!! High Data (Test) Coverage Automation (Manual Effort Reduction) High confidence about your data Accurate Decisions www.Test2008.in
References http://www.dataqualitysolutions.com/data/index.shtml http://searchdatamanagement.techtarget.com/generic/0,295582,sid91_gci1251808,00.html http://en.wikipedia.org/wiki/Effect_of_Hurricane_Katrina_on_New_Orleans www.Test2008.in
Thank you. [email_address]   [email_address] www.Test2008.in

More Related Content

Test2008 Resurrecting The Prodigal Son Data Quality (http://www.geektester.blogspot.com)

  • 1. Resurrecting the Prodigal Son - Data Quality Rise from Ashes: Battle of Data Quality Testing
  • 2. Speakers Bhoomika Goyal Working @ Microsoft for over an year Engineer from Mumbai Loves playing Chess, Solving Puzzles and Reading Raj W orking @ Microsoft Business Intelligence COE 5.5 + years of Testing experience Loves watching movies, reading suspense thrillers & playing cricket Passion - Testing ( http://www.itest.co.nr ) www.Test2008.in
  • 3. Horror Story Loss: $ 125 million Reason: Discrepancy between the two measures (rocket thrusts to newtons) NASA Mars Climate Orbiter spacecraft LOST www.Test2008.in
  • 4. Bad, Bad, Bad Data Quality www.Test2008.in Erroneous Mailing hit $611 billion for US businesses in 2002
  • 5. DQ is not my problem? Think Again !!!!! www.Test2008.in
  • 6. DQ Hot Candidates www.Test2008.in Data Movement Migrations Backups Restore Import Export Data Warehousing Business Intelligence OLTP OLAP CRM ERP
  • 7. DQ Ishikawa Diagram www.Test2008.in Bad Decisions (Loss $ & Customers) DQ Reqmts not documented Lack of white box testing Data is dynamic CRM & ERPs Implementations Mergers / Take Over
  • 8. www.Test2008.in Data Quality DQ is an indicator that tells about the health of the DATA
  • 9. www.Test2008.in GOOD Data Quality DQ is good if data is fit to use for decision making
  • 10. www.Test2008.in Data Quality Testing Involves validating , monitoring & reporting various attributes of Data like accuracy , validity , timeliness etc
  • 11. DQ Checks www.Test2008.in Row Counts Consistency Referential Integrity Redundancy Usability Completeness Domain Integrity Timeliness Accuracy Validity
  • 12. Row Count Check www.Test2008.in
  • 14. Among Voters seen Dead People www.Test2008.in US General Election: 4,755 deceased people voted
  • 16. A One-House, $400 Million Bubble Goes Pop www.Test2008.in $1,21, 000 overvalued at $ 400 million Govt. Expected $8 million as Tax Revenue
  • 19. CD Mail Fraud Man received 22,260 CDs at discounted price by making each address different enough www.Test2008.in David Loshin 123 Main Street Any town, NY 11787 David Loshin 123 Main Street, Near Wal-Mart Any town, NY 11787
  • 21. Referential Integrity Check www.Test2008.in
  • 22. Domain Integrity Check www.Test2008.in
  • 24. How do we test DQ? www.Test2008.in DQ Rule Engine Metadata Results Create Procedure RowCount (SrcTbl, TgtTbl) Begin Declare SRC, TGT Integer Select SRC = /slideshow/test2008-resurrecting-the-prodigal-son-data-quality-share-presentation/662033/Count(*) from SrcTbl Select TGT = /slideshow/test2008-resurrecting-the-prodigal-son-data-quality-share-presentation/662033/Count(*) from TgtTbl) If SRC = TGT Then Return PASS Else Return SRC TGT End If End Metadata Results Row Count Logic Duplicate Logic Create Procedure Duplicate(Tbl) Begin Declare Dup Integer Select Dup = Count of Select * from Tbl GroupBy <<ColumnList>> Having count(*) > 1 If Dup = 0 Then Return PASS Else Return Dup End If End End Rule Tbl1 Tbl2 RC Emp Emp RI Emp Dept DC HR HR Rule Result Comment RC Pass - RI Fail 10 DC Pass -
  • 25. You cant improve what you cant measure www.Test2008.in Threshold Time 5 % 10 % 100 % Data Quality Red: BAD DQ Yellow: Watch it Green: Good DQ
  • 26. DQ Testing is your friend !!! High Data (Test) Coverage Automation (Manual Effort Reduction) High confidence about your data Accurate Decisions www.Test2008.in
  • 28. Thank you. [email_address] [email_address] www.Test2008.in