This document discusses the importance of data quality and provides tips for ensuring high quality data. It notes that while data can be very useful, it is only valuable if it is clean and structured. When extracting large amounts of data, it recommends developing extractors, combining extractors, and automating the extraction process. For scaling operations, having processes to clean, validate, and maintain data quality is crucial. The document offers suggestions for writing effective XPaths and regex expressions to extract the right data. It also stresses the importance of measuring data quality through completeness, coverage, and detecting anomalies both during and after the extraction process.
1 of 66
Download to read offline
More Related Content
2015 - Extract SF - Data Quality
1. It's Time to Start Caring About
Data Quality
Data Quality at Scale