This document provides an overview of key aspects of data preparation and processing for data mining. It discusses the importance of domain expertise in understanding data. The goals of data preparation are identified as cleaning missing, noisy, and inconsistent data; integrating data from multiple sources; transforming data into appropriate formats; and reducing data through feature selection, sampling, and discretization. Common techniques for each step are outlined at a high level, such as binning, clustering, and regression for handling noisy data. The document emphasizes that data preparation is crucial and can require 70-80% of the effort for effective real-world data mining.
This document discusses visualizing metadata quality for open government data. It proposes automatically assessing metadata quality by calculating metrics for fields like completeness, accuracy, and availability. Metrics are computed by analyzing metadata records and scoring them based on predefined evaluation criteria. Records are then ranked and displayed to users, with the goal of improving overall metadata quality over time by exposing issues.
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUESIRJET Journal
油
This document discusses using machine learning techniques to predict diabetes (sugar) based on patient data. It proposes a system with modules for data collection, data preprocessing using algorithms like random forest, XGBoost and logistic regression, and diabetes prediction. The system is evaluated on a Pima Indian diabetes dataset. Key findings include that machine learning has potential to improve diabetes risk assessment by analyzing large datasets. Early diagnosis of diabetes is important for treatment.
Automatic Data Reconciliation, Data Quality, and Data Observability.pdf4dalert
油
Automating Data Reconciliation, Data Observability, and Data Quality Check After Each Data Load, read more: https://medium.com/@nihar.rout_analytics/automatic-data-reconciliation-data-quality-and-data-observability-3eeca4650cd
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Zilliz
油
Enterprises have traditionally prioritized data quantity, assuming more is better for AI performance. However, a new reality is setting in: high-quality data, not just volume, is the key. This shift exposes a critical gap many organizations struggle to understand their existing data and lack effective curation strategies and tools. This talk dives into these data challenges and explores the methods of automating data curation.
IRJET- A Survey on Prediction of Heart Disease Presence using Data Mining and...IRJET Journal
油
This document summarizes a survey on using data mining and machine learning techniques to predict heart disease. It begins with an abstract stating that heart disease is a leading cause of death worldwide and that earlier detection through technology could help reduce serious cases. The document then provides an overview of data mining, machine learning, and classification algorithms that have been used by researchers to predict heart disease. It describes a commonly used heart disease dataset and discusses evaluation measures for comparing classification algorithms to identify the best model for predicting heart disease.
The document discusses common reasons for cleansing and profiling datasets such as duplicate data, data redundancy, missing values, outliers, and data validation. It provides definitions and examples of each concept as well as techniques for identifying and handling them, such as canonicalization for duplicate data, data normalization to prevent redundancy, and mean imputation for missing values. Limitations of data blending and merging are also outlined.
How can I become a data scientist? What are the most valuable skills to learn for a data scientist now? Could I learn how to be a data scientist by going through online tutorials? What does a data scientist do?
These are only some of the questions that are being discussed online, on blogs, on forums and on knowledge-sharing platforms like Quora.
Let me share the Beginner's Guide to Data Science which will be really helpful to you.
Also Checkout: http://bit.ly/2Mub6xP
The document discusses data preprocessing concepts from Chapter 3 of the book "Data Mining: Concepts and Techniques". It covers topics like data quality, major tasks in preprocessing including data cleaning, integration and reduction. Data cleaning involves handling incomplete, noisy and inconsistent data using techniques such as imputation of missing values, smoothing of noisy data, and resolving inconsistencies. Data integration combines data from multiple sources which requires tasks like schema integration and entity identification. Data reduction techniques include dimensionality reduction and data compression.
How to exchange and grant access to health data in a secure way approaches and recommendations
The health crisis due to COVID-19 is shaping a new reality in which the exchange and access to health data in a secure way will be more and more necessary. In this complex challenge converge both the respect for the individual rights as well as the interests of the patients and the need to promote the research in pursuit of the public interest. To face this challenge, we can find different approaches across Europe. In this webinar, we will present the experiences of three EU-funded projects (BigMedilytics, BodyPass, and DeepHealth), besides an overview of the legal framework and recommendations to enforce both national regulations and GDPR by an expert in data privacy and security.
The Evolution of Predictive Analytics in Maaged CareAltegra Health
油
This document discusses predictive analytics in managed care. It begins with an overview of predictive analytics terms and concepts. It then describes the company's approach, which uses a multi-disciplinary team and multiple data sources to develop predictive models. Examples of models discussed include those predicting dual eligibility, likelihood of recertification, and risk scores. Accuracy results are provided for some models showing high prediction rates.
This document discusses opportunities and challenges around data science in the financial services industry. It begins with defining data science and providing an overview of trends in data science. It then discusses four key challenges facing the financial industry: compliance, profitability and solvency growth, competitive advantage, and gaining insights from customer behavior, risk management, product optimization, and reporting. The document outlines various levels of analytics maturity and provides examples of how descriptive, predictive, and integrated analytics can be used in areas like customer analytics, risk management, and portfolio optimization. It concludes by discussing future trends in data science and artificial intelligence.
The document provides an overview of machine learning and discusses several key concepts:
1) It outlines the seven steps of machine learning including gathering data, preparing data, choosing a model, training the model, evaluating the model, tuning hyperparameters, and making predictions.
2) It discusses different machine learning tasks such as classification, regression, clustering, and dimensionality reduction. Common classification algorithms like decision trees, random forests, neural networks, and support vector machines are also covered.
3) The concepts of overfitting and underfitting are explained as well as techniques for preparing data, such as handling missing data, removing outliers, and feature engineering.
Fortune 1000 organizations spend approximately $5 billion each year to improve
the trustworthiness of data. Yet only 42 percent of the executives trust their data.
1. The document discusses machine learning and provides an overview of the seven steps of machine learning including gathering data, preparing data, choosing a model, training the model, evaluating the model, tuning hyperparameters, and making predictions.
2. It describes tips for data preparation such as exploring data for trends and issues, formatting data consistently, and handling missing values, outliers, and imbalanced data.
3. Techniques for outlier removal are discussed including clustering-based, nearest-neighbor based, density-based, graphical, and statistical approaches. Limitations and challenges of outlier removal are noted.
This document discusses data preprocessing for machine learning. It covers the importance of data preprocessing to clean and prepare raw data before building machine learning models. Specifically, it discusses tasks like data cleaning to handle missing values, noisy data and outliers. It also covers data integration, reduction and transformation techniques such as normalization, discretization and concept hierarchy generation. The goal of these techniques is to improve data quality and make it suitable for machine learning algorithms.
This document discusses data quality and provides facts about the high costs of poor data quality to businesses and the US economy. It defines data quality as ensuring data is "fit for purpose" by measuring it against its intended uses and dimensions of quality. The document outlines best practices for measuring data quality including profiling data to understand metadata and trends, using statistical process control, master data management to create standardized "gold records", and implementing a data governance program to centrally manage data quality.
This document provides an overview of data preprocessing techniques. It discusses why preprocessing is important, including that real-world data is often dirty, incomplete, noisy, and inconsistent. The major tasks of preprocessing are described as data cleaning, integration, transformation, reduction, and discretization. Specific techniques for handling missing data, noisy data, and reducing redundancy are also summarized.
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...EMC
油
Like most of healthcare and life science, pharmaceutical companies are undergoing a data-driven transformation. The industry-wide need to reduce the cost of developing, manufacturing and distributing drugs while bringing to market new products is not a novel concept or challenge. However, the ability to process and analyze large amounts of data using cutting-edge massively parallel processing (MPP) technologies means innovation can be found not only in the traditional hypothesis-driven approaches we have come to expect. New technologies and approaches make it possible to incorporate all available data, structured and unstructured. At Pivotal, it is the goal of our data science practice to demonstrate the capabilities of the technologies we offer. We focus on building predictive models by combining the vast and variable data that is available to elicit action or generate insights. In our talk we will focus on a use case in pharmaceutical manufacturing, wherein we created a predictive model to produce more consistent, high-quality products and drive decisions to abandon lots with expected poor outcomes. In addition, we demonstrate how we used machine learning to cleanse data and to improve efficiencies in data collection by identifying low information-content measurements and incorporate under-utilized data sources in manufacturing. Beyond this use case, we will discuss our vision of using machine learning in all areas of the industry, from research through distribution, to drive change.
Data quality testing a quick checklist to measure and improve data qualityJaveriaGauhar
油
Don't wait for a data migration event to test your data quality. Perform data quality tests now before it gets too late. Here's everything you need to know!
https://dataladder.com/data-quality-test-checklist/
This document discusses analytics and data mining techniques. It begins by outlining common measures used in analytics like time, proportions, size, and financials. It then discusses the universal language of cause and effect in analytics and how analytics finds relationships between causes and effects. The document outlines various sources of data, fundamental concepts in analytics like increased computing power and data volume, and common tools. It provides examples of universal applications of analytics across marketing, customer service, and other functions. The document charts the evolution of analytics from basic surveys to more advanced techniques leveraging social media, text, and sensors. It outlines the CRISP-DM process for analytics projects and career paths in analytics. Finally, it discusses various data mining techniques categorized as
PREDICTION OF HEART DISEASE USING LOGISTIC REGRESSIONIRJET Journal
油
This document discusses predicting heart disease using logistic regression. It begins with an introduction to heart disease and risk factors. It then discusses previous work using machine learning to predict diseases. The proposed system uses a logistic regression model to predict heart disease risk using 14 attributes from a dataset. It discusses the logistic regression technique and provides a workflow diagram. The results show the logistic regression model achieving 81% accuracy. It concludes that logistic regression is effective for predicting heart disease risk from simple clinical predictors.
Dr. Puneet Agarwal is one of the most endowed and respected neurologists. He is the best neurologist in Delhi for advanced diagnosis and treatment of complex neurological disorders. After having about 20 years of extensive clinical exposure in neurology, he specializes in conditions from stroke, epilepsy, Parkinson's Disease, migraine, multiple sclerosis, and neuropathies. He is a very patient-oriented doctor using advances in medicine.
More Related Content
Similar to Common Data Quality Issues During Data Mapping and How to Detect/Avoid Them (20)
The document discusses data preprocessing concepts from Chapter 3 of the book "Data Mining: Concepts and Techniques". It covers topics like data quality, major tasks in preprocessing including data cleaning, integration and reduction. Data cleaning involves handling incomplete, noisy and inconsistent data using techniques such as imputation of missing values, smoothing of noisy data, and resolving inconsistencies. Data integration combines data from multiple sources which requires tasks like schema integration and entity identification. Data reduction techniques include dimensionality reduction and data compression.
How to exchange and grant access to health data in a secure way approaches and recommendations
The health crisis due to COVID-19 is shaping a new reality in which the exchange and access to health data in a secure way will be more and more necessary. In this complex challenge converge both the respect for the individual rights as well as the interests of the patients and the need to promote the research in pursuit of the public interest. To face this challenge, we can find different approaches across Europe. In this webinar, we will present the experiences of three EU-funded projects (BigMedilytics, BodyPass, and DeepHealth), besides an overview of the legal framework and recommendations to enforce both national regulations and GDPR by an expert in data privacy and security.
The Evolution of Predictive Analytics in Maaged CareAltegra Health
油
This document discusses predictive analytics in managed care. It begins with an overview of predictive analytics terms and concepts. It then describes the company's approach, which uses a multi-disciplinary team and multiple data sources to develop predictive models. Examples of models discussed include those predicting dual eligibility, likelihood of recertification, and risk scores. Accuracy results are provided for some models showing high prediction rates.
This document discusses opportunities and challenges around data science in the financial services industry. It begins with defining data science and providing an overview of trends in data science. It then discusses four key challenges facing the financial industry: compliance, profitability and solvency growth, competitive advantage, and gaining insights from customer behavior, risk management, product optimization, and reporting. The document outlines various levels of analytics maturity and provides examples of how descriptive, predictive, and integrated analytics can be used in areas like customer analytics, risk management, and portfolio optimization. It concludes by discussing future trends in data science and artificial intelligence.
The document provides an overview of machine learning and discusses several key concepts:
1) It outlines the seven steps of machine learning including gathering data, preparing data, choosing a model, training the model, evaluating the model, tuning hyperparameters, and making predictions.
2) It discusses different machine learning tasks such as classification, regression, clustering, and dimensionality reduction. Common classification algorithms like decision trees, random forests, neural networks, and support vector machines are also covered.
3) The concepts of overfitting and underfitting are explained as well as techniques for preparing data, such as handling missing data, removing outliers, and feature engineering.
Fortune 1000 organizations spend approximately $5 billion each year to improve
the trustworthiness of data. Yet only 42 percent of the executives trust their data.
1. The document discusses machine learning and provides an overview of the seven steps of machine learning including gathering data, preparing data, choosing a model, training the model, evaluating the model, tuning hyperparameters, and making predictions.
2. It describes tips for data preparation such as exploring data for trends and issues, formatting data consistently, and handling missing values, outliers, and imbalanced data.
3. Techniques for outlier removal are discussed including clustering-based, nearest-neighbor based, density-based, graphical, and statistical approaches. Limitations and challenges of outlier removal are noted.
This document discusses data preprocessing for machine learning. It covers the importance of data preprocessing to clean and prepare raw data before building machine learning models. Specifically, it discusses tasks like data cleaning to handle missing values, noisy data and outliers. It also covers data integration, reduction and transformation techniques such as normalization, discretization and concept hierarchy generation. The goal of these techniques is to improve data quality and make it suitable for machine learning algorithms.
This document discusses data quality and provides facts about the high costs of poor data quality to businesses and the US economy. It defines data quality as ensuring data is "fit for purpose" by measuring it against its intended uses and dimensions of quality. The document outlines best practices for measuring data quality including profiling data to understand metadata and trends, using statistical process control, master data management to create standardized "gold records", and implementing a data governance program to centrally manage data quality.
This document provides an overview of data preprocessing techniques. It discusses why preprocessing is important, including that real-world data is often dirty, incomplete, noisy, and inconsistent. The major tasks of preprocessing are described as data cleaning, integration, transformation, reduction, and discretization. Specific techniques for handling missing data, noisy data, and reducing redundancy are also summarized.
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...EMC
油
Like most of healthcare and life science, pharmaceutical companies are undergoing a data-driven transformation. The industry-wide need to reduce the cost of developing, manufacturing and distributing drugs while bringing to market new products is not a novel concept or challenge. However, the ability to process and analyze large amounts of data using cutting-edge massively parallel processing (MPP) technologies means innovation can be found not only in the traditional hypothesis-driven approaches we have come to expect. New technologies and approaches make it possible to incorporate all available data, structured and unstructured. At Pivotal, it is the goal of our data science practice to demonstrate the capabilities of the technologies we offer. We focus on building predictive models by combining the vast and variable data that is available to elicit action or generate insights. In our talk we will focus on a use case in pharmaceutical manufacturing, wherein we created a predictive model to produce more consistent, high-quality products and drive decisions to abandon lots with expected poor outcomes. In addition, we demonstrate how we used machine learning to cleanse data and to improve efficiencies in data collection by identifying low information-content measurements and incorporate under-utilized data sources in manufacturing. Beyond this use case, we will discuss our vision of using machine learning in all areas of the industry, from research through distribution, to drive change.
Data quality testing a quick checklist to measure and improve data qualityJaveriaGauhar
油
Don't wait for a data migration event to test your data quality. Perform data quality tests now before it gets too late. Here's everything you need to know!
https://dataladder.com/data-quality-test-checklist/
This document discusses analytics and data mining techniques. It begins by outlining common measures used in analytics like time, proportions, size, and financials. It then discusses the universal language of cause and effect in analytics and how analytics finds relationships between causes and effects. The document outlines various sources of data, fundamental concepts in analytics like increased computing power and data volume, and common tools. It provides examples of universal applications of analytics across marketing, customer service, and other functions. The document charts the evolution of analytics from basic surveys to more advanced techniques leveraging social media, text, and sensors. It outlines the CRISP-DM process for analytics projects and career paths in analytics. Finally, it discusses various data mining techniques categorized as
PREDICTION OF HEART DISEASE USING LOGISTIC REGRESSIONIRJET Journal
油
This document discusses predicting heart disease using logistic regression. It begins with an introduction to heart disease and risk factors. It then discusses previous work using machine learning to predict diseases. The proposed system uses a logistic regression model to predict heart disease risk using 14 attributes from a dataset. It discusses the logistic regression technique and provides a workflow diagram. The results show the logistic regression model achieving 81% accuracy. It concludes that logistic regression is effective for predicting heart disease risk from simple clinical predictors.
Dr. Puneet Agarwal is one of the most endowed and respected neurologists. He is the best neurologist in Delhi for advanced diagnosis and treatment of complex neurological disorders. After having about 20 years of extensive clinical exposure in neurology, he specializes in conditions from stroke, epilepsy, Parkinson's Disease, migraine, multiple sclerosis, and neuropathies. He is a very patient-oriented doctor using advances in medicine.
A brain stroke, also known as a cerebrovascular accident (CVA), occurs due to an interruption in blood supply to the brain, leading to neuronal injury and dysfunction. It is classified into ischemic stroke (caused by vascular obstruction) and hemorrhagic stroke (caused by vessel rupture).
At Medkul Pharmaceuticals, we prioritize your health by providing top-quality pharmaceutical solutions. We offer best quality assurance, monopoly rights, free promotional tools, and third-party manufacturing to support your business growth. As a leading name in the pharma industry, we ensure high standards and innovation in our products. Partner with us to expand your pharmaceutical business and deliver superior healthcare solutions.
Meningitis is an acute or chronic inflammation of the meninges, the protective membranes surrounding the brain and spinal cord. It is most commonly caused by infectious agents such as bacteria, viruses, fungi, and parasites, but can also result from autoimmune or neoplastic conditions. The condition is classified into acute pyogenic (bacterial) meningitis, viral (aseptic) meningitis, chronic meningitis, and granulomatous meningitis based on etiology and pathological features.
This document contains unit-II of Fifth Semester B.Sc Nursing Syallbus of Nursing Education.
This 際際滷 is regarding Assessment and planing, which includes the topic such as..
1.Assessment of Teacher Essential qualities of a teacher
Teaching styles Formal
authority, demonstrator, facilitator,
delegator
2. Assessment of learner
Types of learners
Determinants of learning
learning needs, readiness to learn,
learning styles
Todays generation of learners and
their skills and attributes
Emotional intelligence of the
learner
Motivational factors personal
factors, environmental factors and
support system
Curriculum Planning
Curriculum definition, types
Curriculum design components,
approaches
Curriculum development factors
influencing curriculum development,
facilitators and barriers
Writing learning outcomes/
behavioral objectives
Basic principles of writing course
plan, unit plan and lesson plan.
I hope the content of this PPT is helpful.
Thanks...!
Meet the Modular Pillowyour ultimate sleep support system for chronic pain, hypermobility, and pregnancy relief. With adjustable sections and ergonomic design, this pillow keeps your spine aligned, reduces pressure points, and enhances overall comfort.
How to Improve EMS Billing and Get Paid FasterTraumasoft LLC
油
Billing challenges are a common thorn in the side for many emergency medical service (EMS) providers. Issues like inaccurate documentation, coding errors, and delayed claim submissions can wreak havoc on an EMS organizations financial stability. In practice, these problems can result in hundreds of millions of denied claims, and only a small fraction are likely to be appealed. Plus, billing problems dont just strain resources and cut profitability; they also hinder healthcare providers from providing their patients with timely, effective, and life-saving care. Its thus unquestionably in EMS providers best interest to invest in improved billing and revenue management.
This PPT aims to highlight some key strategies for improving EMS billing and securing quicker payments. Your EMS organization can leverage the following practices to enhance operational efficiency and financial performance, thereby ensuring you can continue to provide high-quality emergency services while staying profitable.
Encephalitis is an acute inflammation of the brain, primarily caused by viral infections but also by bacterial, fungal, parasitic, autoimmune, or toxic insults. It can result in significant neurological dysfunction and, if untreated, may lead to severe complications or death.
How Quality Checks Can Enhance Dermatology Revenue Cycle Management.pptxEminence RCM
油
Revenue cycle management (RCM) is highly important for dermatology practices, enabling timely reimbursements and minimizing claim denials. In this quality checks during various key stages such as scheduling, documenting, medical coding, and submission of claims not only increase accuracy but also enhance efficiency.
How Quality Checks Can Enhance Dermatology Revenue Cycle Management.pptxEminence RCM
油
Common Data Quality Issues During Data Mapping and How to Detect/Avoid Them
1. Common Data Quality Issues During Data Mapping
and How to Detect/Avoid Them
Remzi Celebi
remzi.celebi@maastrichtuniversity.nl
Maastricht University
Technical Coordinator, AIDAVA project
20 March 2025
2. Data Quality Issues during Mapping
Data quality issues can occur during mapping to a target
data model. These errors can be classified into:
Syntactic errors, such as incorrect date formatting or
mismatched data types.
Semantic errors, such as mapping a data element to the
wrong category.
3. Errors During Data Mapping
Variable Code Value Unit Date
Glucose 2339-0 130 mmol/l 12.01.2001
Creatinine 2160-0 100.0 mg/dL 07.02.2003
... ... ...
Observation
code
quantity
date
unit
value
text
{
"resourceType": "Observation",
"id": "measurement-1",
"status": "final",
"code": {
"coding": [
{
"system": "http://loinc.org",
"code": "2339-0",
"display": "Glucose [Mass/volume] in Blood"
}
],
"text": "Glucose"
},
"effectiveDateTime": "2001-01-12T00:00:00Z",
"valueQuantity": {
"value": 130.0,
"unit": "mmol/L",
"system": "http://unitsofmeasure.org",
"code": "mmol/L"
}
}
4. Errors During Data Mapping
The date 12.01.2001 could be in the US format.
This needs to be converted to the standard format (eg. ISO
8601 format YYYY-MM-DDThh:mm:ss+zz:zz) to ensure the
interoperability.
Variable Code Value Unit Date
Glucose 2339-0 130 mmol/l 12.01.2001
Creatinine 2160-0 100.0 mg/dL 07.02.2003
... ... ...
Observation
code
quantity
date
unit
value
text
5. Errors During Data Mapping
Either the unit for Creatinine is incorrect, or the value itself is incorrect.
The value should be checked for plausibility.
Variable Code Value Unit Date
Glucose 2339-0 130 mmol/l 12.01.2001
Creatinine 2160-0 100.0 mg/dL 07.02.2003
... ... ...
Observation
code
quantity
date
unit
value
text
6. Errors During Data Mapping
The identifiers in the Code column can mapped to SNOMED-CT using
Variable column, but these are the identifiers for substances being
measured, not the actual tests performed.
Therefore, mapping these to observation codes would be incorrect.
Variable Code Value Unit Date
Glucose 67079006 130 mmol/l 12.01.2001
Creatinine 15373003 2.0 mg/dL 07.02.2003
... ... ...
Observation
code
quantity
date
unit
value
text
7. RDFCraft -- tool to guide the mapping
RDFCraft can be used to
semi-automate the mapping
process.
Nodes are created corresponding
to schema classes.
Links are created corresponding to
relations.
Reduce possible errors during the
mapping.
https://github.com/MaastrichtU-IDS/RDFCraft
9. Standard Mapping Definitions
Click Save & Map
Share standard mappings
Multiple formats supported:
YARRRML
RML
RDF/TTL
Use mapping engine to
execute the mapping
10. Data quality check services in AIDAVA
10
Missing Essential variable
- Birth date is missing.
Multiple values than expected
- A measurement must have only one quantity.
Wrong data type for property
- A hasValue for Quantity should take only string or
double.
Wrong object type for property
- A hasUnit for Quantity must take only Unit type.
Not valid code for type
- Diagnosis does not use valid code.
Data model Based Checks Medical & Common sense Checks
Conditional Completeness
- Flag If the patient is prescribed an allergy drug, and an
allergy is not in their record.
Incompatible information (date and time error);
- Flag if date of birth > date of admission
Incompatible information (gender and diagnosis)
- Flag if Gender is equal to Male, Diagnosis Benign neoplasm
ovary is present.
Incompatible information (age and procedure)
- Flag if Age is over 12, Procedure TONSILLECTOMY AND
ADENOIDECTOMY; UNDER AGE 12 is present..
Incompatible information (lab measurement and unit)
- Flag if Lab LDL Cholesterol is present and Unit is not equal
to mg/dL
11. Key takeaways
Different Errors can occurs
Syntactic errors: Mostly caused by incorrect mapping or
uncleaned data that does not conform to the target
format.
Semantic errors: Occur due to misinterpretation of the
source data and target model.
To minimize errors, use appropriate tools during the
mapping process and implement data quality checks
after conversion.