狠狠撸

狠狠撸Share a Scribd company logo
How to
Standardize Your Data:
A ML Recipe
DAMIAN MINGLE
CHIEF DATA SCIENTIST, WPC Healthcare
@DamianMingle
GET THE FULL STORY
bit.ly/UseSciKitNow
What’s Standardization Anyway?
? Often referred to as “functions and transformers that change raw
feature vectors into a representation that is more suitable for the
downstream estimator”
? Shifting the distribution of each attribute to have a mean of “0”
and a standard deviation of “1”.
Why Standardization Matters
? It’s a common requirement of models
? Models may behave badly without it
? It’s useful for models that rely on the distribution of attributes
such as Gaussian processes.
Power in SciKit Learn
? Preprocessing
? Clustering
? Regression
? Classification
? Dimensionality Reduction
? Model Selection
Power of SciKit Learn
Let’s Look at ML Recipe
Standardization
The Imports
from sklearn.datasets import load_iris
from sklearn import preprocessing
Separate Features from Target
iris = load_iris()
print(iris.data.shape)
X = iris.data
y = iris.target
Standardize the Features
normalized_X = preprocessing.scale(X)
Standardization Recipe
# Normalize the data attributes for the Iris
dataset.
from sklearn.datasets import load_iris
from sklearn import preprocessing
# load the iris dataset iris = load_iris()
print(iris.data.shape)
# separate the data from the target attributes
X = iris.data
y = iris.target
# normalize the data attributes
normalized_X = preprocessing.scale(X)
How to
Standardize Your Data:
An ML Recipe
DAMIAN MINGLE
CHIEF DATA SCIENTIST, WPC Healthcare
@DamianMingle
GET THE FULL STORY
bit.ly/UseSciKitNow
Resources
? Society of Data Scientists
? SciKit Learn
? Also:
? Scaling features to a range (MinMaxScaler or MaxAbsScaler)
? Scaling sparse data (StandardScaler)
? Scaling data with outliers (RobustScaler)
Ad

Recommended

Redux data flow with angular
Redux data flow with angular
Gil Fink
?
Data Quality Everywhere
Data Quality Everywhere
Jean-Michel Franco
?
Oracle Ucm General Presentation Linked In
Oracle Ucm General Presentation Linked In
Jan Echarlod
?
Crm strategy of call centre
Crm strategy of call centre
souravpati
?
Scikit Learn: Data Normalization Techniques That Work
Scikit Learn: Data Normalization Techniques That Work
Damian R. Mingle, MBA
?
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
Nishant83346
?
Data Preprocessing
Data Preprocessing
zekeLabs Technologies
?
Data Transformation – Standardization & Normalization PPM.pptx
Data Transformation – Standardization & Normalization PPM.pptx
ssuser5cdaa93
?
Data Preprocessing:Feature scaling methods
Data Preprocessing:Feature scaling methods
sonali sonavane
?
Preparing your data for Machine Learning with Feature Scaling
Preparing your data for Machine Learning with Feature Scaling
Rahul K Chauhan
?
Machine learning session 5
Machine learning session 5
NirsandhG
?
Pandas Data Cleaning and Preprocessing PPT.pptx
Pandas Data Cleaning and Preprocessing PPT.pptx
bajajrishabh96tech
?
Introduction to ML_Data Preprocessing.pptx
Introduction to ML_Data Preprocessing.pptx
mousmiin
?
The model interacts with the environment seeking ways to maximize the reward....
The model interacts with the environment seeking ways to maximize the reward....
petershicaramirez
?
Preparing Data
Preparing Data
Eng Teong Cheah
?
Introduction to Artificial Intelligence_ Lec 5
Introduction to Artificial Intelligence_ Lec 5
Dalal2Ali
?
Feature Scaling with R.pdf
Feature Scaling with R.pdf
ShakiruBankole2
?
Data preprocessing in Machine learning
Data preprocessing in Machine learning
pyingkodi maran
?
Normalization and standardization in machine learning
Normalization and standardization in machine learning
deepayushdev3
?
Data_Preparation.pptx
Data_Preparation.pptx
ImXaib
?
ML-Unit-4.pdf
ML-Unit-4.pdf
AnushaSharma81
?
13_Data Preprocessing in Python.pptx (1).pdf
13_Data Preprocessing in Python.pptx (1).pdf
andreyhapantenda
?
Machine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
Andrew Ferlitsch
?
Ijsws14 423 (1)-paper-17-normalization of data in (1)
Ijsws14 423 (1)-paper-17-normalization of data in (1)
Raghavendra Pokuri
?
Human breastcancer
Human breastcancer
SAIRATHAN VENTRAPRAGADA
?
Feature scaling
Feature scaling
Gautam Kumar
?
TDC2017 | S?o Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | S?o Paulo - Trilha Java EE How we figured out we had a SRE team at ...
tdc-globalcode
?
overview of_data_processing
overview of_data_processing
FEG
?
Classify Rice Disease Using Self-Optimizing Models and Edge Computing with A...
Classify Rice Disease Using Self-Optimizing Models and Edge Computing with A...
Damian R. Mingle, MBA
?
Predicting Diabetic Readmission Rates: Moving Beyond HbA1c
Predicting Diabetic Readmission Rates: Moving Beyond HbA1c
Damian R. Mingle, MBA
?

More Related Content

Similar to SciKit Learn: How to Standardize Your Data (20)

Data Preprocessing:Feature scaling methods
Data Preprocessing:Feature scaling methods
sonali sonavane
?
Preparing your data for Machine Learning with Feature Scaling
Preparing your data for Machine Learning with Feature Scaling
Rahul K Chauhan
?
Machine learning session 5
Machine learning session 5
NirsandhG
?
Pandas Data Cleaning and Preprocessing PPT.pptx
Pandas Data Cleaning and Preprocessing PPT.pptx
bajajrishabh96tech
?
Introduction to ML_Data Preprocessing.pptx
Introduction to ML_Data Preprocessing.pptx
mousmiin
?
The model interacts with the environment seeking ways to maximize the reward....
The model interacts with the environment seeking ways to maximize the reward....
petershicaramirez
?
Preparing Data
Preparing Data
Eng Teong Cheah
?
Introduction to Artificial Intelligence_ Lec 5
Introduction to Artificial Intelligence_ Lec 5
Dalal2Ali
?
Feature Scaling with R.pdf
Feature Scaling with R.pdf
ShakiruBankole2
?
Data preprocessing in Machine learning
Data preprocessing in Machine learning
pyingkodi maran
?
Normalization and standardization in machine learning
Normalization and standardization in machine learning
deepayushdev3
?
Data_Preparation.pptx
Data_Preparation.pptx
ImXaib
?
ML-Unit-4.pdf
ML-Unit-4.pdf
AnushaSharma81
?
13_Data Preprocessing in Python.pptx (1).pdf
13_Data Preprocessing in Python.pptx (1).pdf
andreyhapantenda
?
Machine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
Andrew Ferlitsch
?
Ijsws14 423 (1)-paper-17-normalization of data in (1)
Ijsws14 423 (1)-paper-17-normalization of data in (1)
Raghavendra Pokuri
?
Human breastcancer
Human breastcancer
SAIRATHAN VENTRAPRAGADA
?
Feature scaling
Feature scaling
Gautam Kumar
?
TDC2017 | S?o Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | S?o Paulo - Trilha Java EE How we figured out we had a SRE team at ...
tdc-globalcode
?
overview of_data_processing
overview of_data_processing
FEG
?
Data Preprocessing:Feature scaling methods
Data Preprocessing:Feature scaling methods
sonali sonavane
?
Preparing your data for Machine Learning with Feature Scaling
Preparing your data for Machine Learning with Feature Scaling
Rahul K Chauhan
?
Machine learning session 5
Machine learning session 5
NirsandhG
?
Pandas Data Cleaning and Preprocessing PPT.pptx
Pandas Data Cleaning and Preprocessing PPT.pptx
bajajrishabh96tech
?
Introduction to ML_Data Preprocessing.pptx
Introduction to ML_Data Preprocessing.pptx
mousmiin
?
The model interacts with the environment seeking ways to maximize the reward....
The model interacts with the environment seeking ways to maximize the reward....
petershicaramirez
?
Introduction to Artificial Intelligence_ Lec 5
Introduction to Artificial Intelligence_ Lec 5
Dalal2Ali
?
Data preprocessing in Machine learning
Data preprocessing in Machine learning
pyingkodi maran
?
Normalization and standardization in machine learning
Normalization and standardization in machine learning
deepayushdev3
?
Data_Preparation.pptx
Data_Preparation.pptx
ImXaib
?
13_Data Preprocessing in Python.pptx (1).pdf
13_Data Preprocessing in Python.pptx (1).pdf
andreyhapantenda
?
Machine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
Andrew Ferlitsch
?
Ijsws14 423 (1)-paper-17-normalization of data in (1)
Ijsws14 423 (1)-paper-17-normalization of data in (1)
Raghavendra Pokuri
?
TDC2017 | S?o Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | S?o Paulo - Trilha Java EE How we figured out we had a SRE team at ...
tdc-globalcode
?
overview of_data_processing
overview of_data_processing
FEG
?

More from Damian R. Mingle, MBA (12)

Classify Rice Disease Using Self-Optimizing Models and Edge Computing with A...
Classify Rice Disease Using Self-Optimizing Models and Edge Computing with A...
Damian R. Mingle, MBA
?
Predicting Diabetic Readmission Rates: Moving Beyond HbA1c
Predicting Diabetic Readmission Rates: Moving Beyond HbA1c
Damian R. Mingle, MBA
?
Greek Letters with LaTeX Cheat Sheet
Greek Letters with LaTeX Cheat Sheet
Damian R. Mingle, MBA
?
Clustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn Tutorial
Damian R. Mingle, MBA
?
Scikit Learn: How to Deal with Missing Values
Scikit Learn: How to Deal with Missing Values
Damian R. Mingle, MBA
?
What is sepsis?
What is sepsis?
Damian R. Mingle, MBA
?
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...
Damian R. Mingle, MBA
?
The evolving definition of sepsis
The evolving definition of sepsis
Damian R. Mingle, MBA
?
Data and the Changing Role of the Tech Savvy CFO
Data and the Changing Role of the Tech Savvy CFO
Damian R. Mingle, MBA
?
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
Damian R. Mingle, MBA
?
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
Damian R. Mingle, MBA
?
A Multi-Pronged Approach to Data Mining Post-Acute Care Episodes
A Multi-Pronged Approach to Data Mining Post-Acute Care Episodes
Damian R. Mingle, MBA
?
Classify Rice Disease Using Self-Optimizing Models and Edge Computing with A...
Classify Rice Disease Using Self-Optimizing Models and Edge Computing with A...
Damian R. Mingle, MBA
?
Predicting Diabetic Readmission Rates: Moving Beyond HbA1c
Predicting Diabetic Readmission Rates: Moving Beyond HbA1c
Damian R. Mingle, MBA
?
Scikit Learn: How to Deal with Missing Values
Scikit Learn: How to Deal with Missing Values
Damian R. Mingle, MBA
?
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...
Damian R. Mingle, MBA
?
Data and the Changing Role of the Tech Savvy CFO
Data and the Changing Role of the Tech Savvy CFO
Damian R. Mingle, MBA
?
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
Damian R. Mingle, MBA
?
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
Damian R. Mingle, MBA
?
A Multi-Pronged Approach to Data Mining Post-Acute Care Episodes
A Multi-Pronged Approach to Data Mining Post-Acute Care Episodes
Damian R. Mingle, MBA
?
Ad

Recently uploaded (20)

Allotted-MBBS-Student-list-batch-2021.pdf
Allotted-MBBS-Student-list-batch-2021.pdf
subhansaifi0603
?
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
animaroy81
?
Informatics Market Insights AI Workforce.pdf
Informatics Market Insights AI Workforce.pdf
karizaroxx
?
最新版意大利米兰大学毕业证(鲍狈滨惭滨毕业证书)原版定制
最新版意大利米兰大学毕业证(鲍狈滨惭滨毕业证书)原版定制
taqyea
?
定制翱颁础顿学生卡加拿大安大略艺术与设计大学成绩单范本,翱颁础顿成绩单复刻
定制翱颁础顿学生卡加拿大安大略艺术与设计大学成绩单范本,翱颁础顿成绩单复刻
taqyed
?
Artigo - Playing to Win.planejamento docx
Artigo - Playing to Win.planejamento docx
KellyXavier15
?
一比一原版(罢鲍颁毕业证书)开姆尼茨工业大学毕业证如何办理
一比一原版(罢鲍颁毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
?
Residential Zone 4 for industrial village
Residential Zone 4 for industrial village
MdYasinArafat13
?
UPS and Big Data intro to Business Analytics.pptx
UPS and Big Data intro to Business Analytics.pptx
sanjum5582
?
最新版美国约翰霍普金斯大学毕业证(闯贬鲍毕业证书)原版定制
最新版美国约翰霍普金斯大学毕业证(闯贬鲍毕业证书)原版定制
Taqyea
?
Attendance Presentation Project Excel.pptx
Attendance Presentation Project Excel.pptx
s2025266191
?
Communication_Skills_Class10_Visual.pptx
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
?
Indigo_Airlines_Strategy_Presentation.pptx
Indigo_Airlines_Strategy_Presentation.pptx
mukeshpurohit991
?
Shifting Focus on AI: How it Can Make a Positive Difference
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
?
Predicting Titanic Survival Presentation
Predicting Titanic Survival Presentation
praxyfarhana
?
Boost Business Efficiency with Professional Data Entry Services
Boost Business Efficiency with Professional Data Entry Services
eloiacs eloiacs
?
presentation4.pdf Intro to mcmc methodss
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
?
Presentation by Tariq & Mohammed (1).pptx
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
?
ppt somu_Jarvis_AI_Assistant_presen.pptx
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
?
Daily, Weekly, Monthly Report MTC March 2025.pptx
Daily, Weekly, Monthly Report MTC March 2025.pptx
PanjiDewaPamungkas1
?
Allotted-MBBS-Student-list-batch-2021.pdf
Allotted-MBBS-Student-list-batch-2021.pdf
subhansaifi0603
?
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
animaroy81
?
Informatics Market Insights AI Workforce.pdf
Informatics Market Insights AI Workforce.pdf
karizaroxx
?
最新版意大利米兰大学毕业证(鲍狈滨惭滨毕业证书)原版定制
最新版意大利米兰大学毕业证(鲍狈滨惭滨毕业证书)原版定制
taqyea
?
定制翱颁础顿学生卡加拿大安大略艺术与设计大学成绩单范本,翱颁础顿成绩单复刻
定制翱颁础顿学生卡加拿大安大略艺术与设计大学成绩单范本,翱颁础顿成绩单复刻
taqyed
?
Artigo - Playing to Win.planejamento docx
Artigo - Playing to Win.planejamento docx
KellyXavier15
?
一比一原版(罢鲍颁毕业证书)开姆尼茨工业大学毕业证如何办理
一比一原版(罢鲍颁毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
?
Residential Zone 4 for industrial village
Residential Zone 4 for industrial village
MdYasinArafat13
?
UPS and Big Data intro to Business Analytics.pptx
UPS and Big Data intro to Business Analytics.pptx
sanjum5582
?
最新版美国约翰霍普金斯大学毕业证(闯贬鲍毕业证书)原版定制
最新版美国约翰霍普金斯大学毕业证(闯贬鲍毕业证书)原版定制
Taqyea
?
Attendance Presentation Project Excel.pptx
Attendance Presentation Project Excel.pptx
s2025266191
?
Communication_Skills_Class10_Visual.pptx
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
?
Indigo_Airlines_Strategy_Presentation.pptx
Indigo_Airlines_Strategy_Presentation.pptx
mukeshpurohit991
?
Shifting Focus on AI: How it Can Make a Positive Difference
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
?
Predicting Titanic Survival Presentation
Predicting Titanic Survival Presentation
praxyfarhana
?
Boost Business Efficiency with Professional Data Entry Services
Boost Business Efficiency with Professional Data Entry Services
eloiacs eloiacs
?
presentation4.pdf Intro to mcmc methodss
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
?
Presentation by Tariq & Mohammed (1).pptx
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
?
ppt somu_Jarvis_AI_Assistant_presen.pptx
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
?
Daily, Weekly, Monthly Report MTC March 2025.pptx
Daily, Weekly, Monthly Report MTC March 2025.pptx
PanjiDewaPamungkas1
?
Ad

SciKit Learn: How to Standardize Your Data