This document discusses deduplication and data fusion software. It introduces the benefits of identifying duplicated records across databases and merging data from different sources. The process section outlines how the software allows configuring input formats, similarity computations, filters, and validation steps. Successful stories provide examples of how a health service and beer manufacturer used the software to clean databases and identify incorrect deliveries. A demo is available to see the software in action.
8. DemoIntroductionBenefitsIdentification of suspected duplicated records inside a databaseMerging of data belonging to several databases with different formats detecting duplicated recordsValidation tools for the detected similarities
24. Percentage of the importance of each column for the similarity computationCSVConfigurationsExecutionValidationExportation30%35%35% 100% =ExcelPDFXMLCSV
39. ProcessExecutionList of detected similarities with percentage bigger than threshold 50% CSVConfigurations> 50%ExecutionValidationExportationExcelPDFXMLCSV
48. DemoSuccessful storiesHealth ServiceWho? Health ServiceObjective Detect repeated health id cardsSolution Detect repeated registers in the database and delete themDeduplicaction with DAURUMResult Health id cards database cleaned of repetitions
49. Successful storiesBeer ManufacturerWho? Beer manufacturerObjective Detect dealers that deliver to not previously assigned centers Solution Identify duplicates in each dealers delivery database and delete themDeduplication with DAURUM Detect deliveries to centers shared between different dealersFusion with DAURUMResult Master database clean of repetitions and detection of dealers with wrong deliveries