際際滷

際際滷Share a Scribd company logo
DATA
 WAREHOUSING


PURIFICATION OF
    DATA
         NADAR MISPA PAULRAJ
DATA IN THE DATA WAREHOUSE

Data warehouse is
 the collection of
 data marts as
 shown in the figure

Data in the data
 warehouse are
 from different
 source .

They are
 integreted
 together
TYPES OF DATA IN THE
                     DATA WAREHOUSE
                                                       rec
                       sec                                or
                          on                                ds
                             d   ary
                a                      dat
           d at                           a
   m ary
pri

                                               e   s
                                            ag
                                         im
                                                                 charts
Purification of data in data warehouse after etl process
OPERATIONS ON DATA
The available data are
processed in the
staging area.

i.e. ETL process

To increase the data
consistency and to
increase the scope of
data for strategic
information
DATA AFTER
            ETL PROCESS
 Even though, the data are processed in the
 staging area and made available for the end
 user. The data purity cannot be calculated and
 set to 100% .

 The level of data quality is rare.


Thus data purification process is
important
PURIFICATION PROCESS
Purification Process Is
Unpredictable i.e. We Cant
Have Idea How To Purify And
                              SINCE DATA IN
When To Stop Purification
                              THE DATA
Process On Particular Data.
                              WAREHOUSE IS
                              LARGE IN
                              NUMBER
WAY TO PURIFY HUGE DATA
STEP 1

THE DATA IS DIVIDED INTO DIFFERENT
CATEGORIES ACCORDING TO THEIR
PRIORITY
               HUGE DATA



                  PRIORITY



                                     LOW
   HIGH           MEDIUM
HUGE DATA




DIVIDED DATA
STEP 2


Process Each Data According To Its Priority

Such As ..

Data In The High Priority Should Be Purified 100%


Data In The Medium Priority
Should Be Purified 50%


                  Data In The Low Priority Can Be
                  Left As Such No Problem
STEP 3

    ELIMINATION OF REDUNDENT DATA


The Main Reason Of Data Corruption i.e.
Impurity Of Data Is Caused Due To
Duplication Of Data .

Example: record of a person in multiple
name or in different format
Necessary things during
purification of data:

knowledge to differentiate data

Select tools for data purification

Review each data after
purification.
                                     Data is ready to use with high
                                     scope
Priority should b maintained.

Schedule i.e. is time period of
purification should be conformed.
Data is ready to
use
意堰粥鰻悪額或雨!!!
Ad

Recommended

Data Preprocessing
Data Preprocessing
Object-Frontier Software Pvt. Ltd
Data preprocessing
Data preprocessing
ankur bhalla
Data Mining: Data processing
Data Mining: Data processing
DataminingTools Inc
Data PreProcessing
Data PreProcessing
tdharmaputhiran
ETL Testing Training Presentation
ETL Testing Training Presentation
Apurba Biswas
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Massimo Cenci
Etl process in data warehouse
Etl process in data warehouse
Komal Choudhary
ETL Process
ETL Process
Karthik Selvaraj
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
ijcsa
thegrowingimportanceofdatacleaning-211202141902.pptx
thegrowingimportanceofdatacleaning-211202141902.pptx
YashaswiniSrinivasan1
The Growing Importance of Data Cleaning
The Growing Importance of Data Cleaning
CarolineSmith912130
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
Arunnaik63
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Patrick Van Renterghem
Data Warehouse
Data Warehouse
Samir Sabry
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdf
ShaikSikindar1
DemographicsClients NameAddressCityStateZipPhone NumberDate of Bi.docx
DemographicsClients NameAddressCityStateZipPhone NumberDate of Bi.docx
simonithomas47935
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET Journal
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...
TELKOMNIKA JOURNAL
Role of Data Cleaning in Data Warehouse
Role of Data Cleaning in Data Warehouse
Ramakant Soni
Intro to Data warehousing lecture 10
Intro to Data warehousing lecture 10
AnwarrChaudary
Data Preparation.pptx
Data Preparation.pptx
DrAbhishekKumarSingh3
Data warehouse
Data warehouse
Samir Sabry
the study of data to extract meaningful insights for business
the study of data to extract meaningful insights for business
EyobTemesgen3
Fit l05 data_processing
Fit l05 data_processing
Aakash software cell Gujrat.
Developing A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product Data
FindWhitePapers
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Alan D. Duncan
DataManipulationTimeSeriesHandling asdfadsfas
DataManipulationTimeSeriesHandling asdfadsfas
DrManojMV
CS3C - Jonbon Libreja
CS3C - Jonbon Libreja
Pog Arenas
2025 June Year 9 Presentation: Subject selection.pptx
2025 June Year 9 Presentation: Subject selection.pptx
mansk2
How to Manage Inventory Movement in Odoo 18 POS
How to Manage Inventory Movement in Odoo 18 POS
Celine George

More Related Content

Similar to Purification of data in data warehouse after etl process (20)

A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
ijcsa
thegrowingimportanceofdatacleaning-211202141902.pptx
thegrowingimportanceofdatacleaning-211202141902.pptx
YashaswiniSrinivasan1
The Growing Importance of Data Cleaning
The Growing Importance of Data Cleaning
CarolineSmith912130
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
Arunnaik63
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Patrick Van Renterghem
Data Warehouse
Data Warehouse
Samir Sabry
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdf
ShaikSikindar1
DemographicsClients NameAddressCityStateZipPhone NumberDate of Bi.docx
DemographicsClients NameAddressCityStateZipPhone NumberDate of Bi.docx
simonithomas47935
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET Journal
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...
TELKOMNIKA JOURNAL
Role of Data Cleaning in Data Warehouse
Role of Data Cleaning in Data Warehouse
Ramakant Soni
Intro to Data warehousing lecture 10
Intro to Data warehousing lecture 10
AnwarrChaudary
Data Preparation.pptx
Data Preparation.pptx
DrAbhishekKumarSingh3
Data warehouse
Data warehouse
Samir Sabry
the study of data to extract meaningful insights for business
the study of data to extract meaningful insights for business
EyobTemesgen3
Fit l05 data_processing
Fit l05 data_processing
Aakash software cell Gujrat.
Developing A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product Data
FindWhitePapers
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Alan D. Duncan
DataManipulationTimeSeriesHandling asdfadsfas
DataManipulationTimeSeriesHandling asdfadsfas
DrManojMV
CS3C - Jonbon Libreja
CS3C - Jonbon Libreja
Pog Arenas
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
ijcsa
thegrowingimportanceofdatacleaning-211202141902.pptx
thegrowingimportanceofdatacleaning-211202141902.pptx
YashaswiniSrinivasan1
The Growing Importance of Data Cleaning
The Growing Importance of Data Cleaning
CarolineSmith912130
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
Arunnaik63
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Patrick Van Renterghem
Data Warehouse
Data Warehouse
Samir Sabry
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdf
ShaikSikindar1
DemographicsClients NameAddressCityStateZipPhone NumberDate of Bi.docx
DemographicsClients NameAddressCityStateZipPhone NumberDate of Bi.docx
simonithomas47935
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET Journal
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...
TELKOMNIKA JOURNAL
Role of Data Cleaning in Data Warehouse
Role of Data Cleaning in Data Warehouse
Ramakant Soni
Intro to Data warehousing lecture 10
Intro to Data warehousing lecture 10
AnwarrChaudary
Data warehouse
Data warehouse
Samir Sabry
the study of data to extract meaningful insights for business
the study of data to extract meaningful insights for business
EyobTemesgen3
Developing A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product Data
FindWhitePapers
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Alan D. Duncan
DataManipulationTimeSeriesHandling asdfadsfas
DataManipulationTimeSeriesHandling asdfadsfas
DrManojMV
CS3C - Jonbon Libreja
CS3C - Jonbon Libreja
Pog Arenas

Recently uploaded (20)

2025 June Year 9 Presentation: Subject selection.pptx
2025 June Year 9 Presentation: Subject selection.pptx
mansk2
How to Manage Inventory Movement in Odoo 18 POS
How to Manage Inventory Movement in Odoo 18 POS
Celine George
K12 Tableau User Group virtual event June 18, 2025
K12 Tableau User Group virtual event June 18, 2025
dogden2
LDMMIA Practitioner Student Reiki Yoga S2 Video PDF Without Yogi Goddess
LDMMIA Practitioner Student Reiki Yoga S2 Video PDF Without Yogi Goddess
LDM & Mia eStudios
Non-Communicable Diseases and National Health Programs Unit 10 | B.Sc Nursi...
Non-Communicable Diseases and National Health Programs Unit 10 | B.Sc Nursi...
RAKESH SAJJAN
University of Ghana Cracks Down on Misconduct: Over 100 Students Sanctioned
University of Ghana Cracks Down on Misconduct: Over 100 Students Sanctioned
Kweku Zurek
Introduction to Generative AI and Copilot.pdf
Introduction to Generative AI and Copilot.pdf
TechSoup
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
Belicia R.S
ABCs of Bookkeeping for Nonprofits TechSoup.pdf
ABCs of Bookkeeping for Nonprofits TechSoup.pdf
TechSoup
SPENT QUIZ NQL JR FEST 5.0 BY SOURAV.pptx
SPENT QUIZ NQL JR FEST 5.0 BY SOURAV.pptx
Sourav Kr Podder
LDMMIA Yoga S10 Free Workshop Grad Level
LDMMIA Yoga S10 Free Workshop Grad Level
LDM & Mia eStudios
THE PSYCHOANALYTIC OF THE BLACK CAT BY EDGAR ALLAN POE (1).pdf
THE PSYCHOANALYTIC OF THE BLACK CAT BY EDGAR ALLAN POE (1).pdf
nabilahk908
LDM Recording Presents Yogi Goddess by LDMMIA
LDM Recording Presents Yogi Goddess by LDMMIA
LDM & Mia eStudios
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
SHERAZ AHMAD LONE
Assisting Individuals and Families to Promote and Maintain Health Unit 7 | ...
Assisting Individuals and Families to Promote and Maintain Health Unit 7 | ...
RAKESH SAJJAN
How to Implement Least Package Removal Strategy in Odoo 18 Inventory
How to Implement Least Package Removal Strategy in Odoo 18 Inventory
Celine George
How to Manage Multi Language for Invoice in Odoo 18
How to Manage Multi Language for Invoice in Odoo 18
Celine George
VCE Literature Section A Exam Response Guide
VCE Literature Section A Exam Response Guide
jpinnuck
Pests of Maize: An comprehensive overview.pptx
Pests of Maize: An comprehensive overview.pptx
Arshad Shaikh
2025 June Year 9 Presentation: Subject selection.pptx
2025 June Year 9 Presentation: Subject selection.pptx
mansk2
How to Manage Inventory Movement in Odoo 18 POS
How to Manage Inventory Movement in Odoo 18 POS
Celine George
K12 Tableau User Group virtual event June 18, 2025
K12 Tableau User Group virtual event June 18, 2025
dogden2
LDMMIA Practitioner Student Reiki Yoga S2 Video PDF Without Yogi Goddess
LDMMIA Practitioner Student Reiki Yoga S2 Video PDF Without Yogi Goddess
LDM & Mia eStudios
Non-Communicable Diseases and National Health Programs Unit 10 | B.Sc Nursi...
Non-Communicable Diseases and National Health Programs Unit 10 | B.Sc Nursi...
RAKESH SAJJAN
University of Ghana Cracks Down on Misconduct: Over 100 Students Sanctioned
University of Ghana Cracks Down on Misconduct: Over 100 Students Sanctioned
Kweku Zurek
Introduction to Generative AI and Copilot.pdf
Introduction to Generative AI and Copilot.pdf
TechSoup
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
Belicia R.S
ABCs of Bookkeeping for Nonprofits TechSoup.pdf
ABCs of Bookkeeping for Nonprofits TechSoup.pdf
TechSoup
SPENT QUIZ NQL JR FEST 5.0 BY SOURAV.pptx
SPENT QUIZ NQL JR FEST 5.0 BY SOURAV.pptx
Sourav Kr Podder
LDMMIA Yoga S10 Free Workshop Grad Level
LDMMIA Yoga S10 Free Workshop Grad Level
LDM & Mia eStudios
THE PSYCHOANALYTIC OF THE BLACK CAT BY EDGAR ALLAN POE (1).pdf
THE PSYCHOANALYTIC OF THE BLACK CAT BY EDGAR ALLAN POE (1).pdf
nabilahk908
LDM Recording Presents Yogi Goddess by LDMMIA
LDM Recording Presents Yogi Goddess by LDMMIA
LDM & Mia eStudios
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
SHERAZ AHMAD LONE
Assisting Individuals and Families to Promote and Maintain Health Unit 7 | ...
Assisting Individuals and Families to Promote and Maintain Health Unit 7 | ...
RAKESH SAJJAN
How to Implement Least Package Removal Strategy in Odoo 18 Inventory
How to Implement Least Package Removal Strategy in Odoo 18 Inventory
Celine George
How to Manage Multi Language for Invoice in Odoo 18
How to Manage Multi Language for Invoice in Odoo 18
Celine George
VCE Literature Section A Exam Response Guide
VCE Literature Section A Exam Response Guide
jpinnuck
Pests of Maize: An comprehensive overview.pptx
Pests of Maize: An comprehensive overview.pptx
Arshad Shaikh
Ad

Purification of data in data warehouse after etl process

  • 1. DATA WAREHOUSING PURIFICATION OF DATA NADAR MISPA PAULRAJ
  • 2. DATA IN THE DATA WAREHOUSE Data warehouse is the collection of data marts as shown in the figure Data in the data warehouse are from different source . They are integreted together
  • 3. TYPES OF DATA IN THE DATA WAREHOUSE rec sec or on ds d ary a dat d at a m ary pri e s ag im charts
  • 5. OPERATIONS ON DATA The available data are processed in the staging area. i.e. ETL process To increase the data consistency and to increase the scope of data for strategic information
  • 6. DATA AFTER ETL PROCESS Even though, the data are processed in the staging area and made available for the end user. The data purity cannot be calculated and set to 100% . The level of data quality is rare. Thus data purification process is important
  • 7. PURIFICATION PROCESS Purification Process Is Unpredictable i.e. We Cant Have Idea How To Purify And SINCE DATA IN When To Stop Purification THE DATA Process On Particular Data. WAREHOUSE IS LARGE IN NUMBER
  • 8. WAY TO PURIFY HUGE DATA STEP 1 THE DATA IS DIVIDED INTO DIFFERENT CATEGORIES ACCORDING TO THEIR PRIORITY HUGE DATA PRIORITY LOW HIGH MEDIUM
  • 10. STEP 2 Process Each Data According To Its Priority Such As .. Data In The High Priority Should Be Purified 100% Data In The Medium Priority Should Be Purified 50% Data In The Low Priority Can Be Left As Such No Problem
  • 11. STEP 3 ELIMINATION OF REDUNDENT DATA The Main Reason Of Data Corruption i.e. Impurity Of Data Is Caused Due To Duplication Of Data . Example: record of a person in multiple name or in different format
  • 12. Necessary things during purification of data: knowledge to differentiate data Select tools for data purification Review each data after purification. Data is ready to use with high scope Priority should b maintained. Schedule i.e. is time period of purification should be conformed.
  • 13. Data is ready to use