際際滷

際際滷Share a Scribd company logo
CROSS-VALIDATION AND
MODEL SELECTION
Many 際際滷s are from: Dr. Thomas Jensen -Expedia.com and
Prof. Olga Veksler - CS9840 - Learning and Computer Vision
How to check if a model fit is good?
 The R2
statistic has become the almost universally standard measure
for model fit in linear models.
 What is R2
?
 It is the ratio of error in a model over the total variance in the
dependent variable.
 Hence the lower the error, the higher the R2
value.
How to check if a model fit is good?
How to check if a model fit is good?
OVERFITTING
 Modeling techniques tend to overfit the data.
 Multiple regression:
Every time you add a variable to the regression, the models R2
goes
up.
Na誰ve interpretation: every additional predictive variable helps to
explain yet more of the targets variance. But that cant be true!
Left to its own devices, Multiple Regression will fit too many patterns.
A reason why modeling requires subject-matter expertise.
OVERFITTING
 Error on the dataset used to fit
the model can be misleading
 Doesnt predict future
performance.
 Too much complexity can
diminish models accuracy on
future data.
 Sometimes called the Bias-
Variance Tradeoff.
OVERFITTING
 What are the consequences of overfitting?
財Overfitted models will have high R2
values, but will perform poorly in
predicting out-of-sample cases
WHY WE NEED CROSS-VALIDATION?
 R2
, also known as coefficient of determination, is a popular measure
of quality of fit in regression. However, it does not offer any
significant insights into how well our regression model can predict
future values.
 When an MLR equation is to be used for prediction purposes it is
useful to obtain empirical evidence as to its generalizability, or its
capacity to make accurate predictions for new samples of data. This
process is sometimes referred to as validating the regression
equation.
 One way to address this issue is to literally obtain a new sample of
observations. That is, after the MLR equation is developed from the
original sample, the investigator conducts a new study, replicating the
original one as closely as possible, and uses the new data to assess
the predictive validity of the MLR equation.
 This procedure is usually viewed as impractical because of the
requirement to conduct a new study to obtain validation data, as well
as the difficulty in truly replicating the original study.
 An alternative, more practical procedure is cross-validation.
CROSS-VALIDATION
 In cross-validation the original sample is split into two parts. One part
is called the training (or derivation) sample, and the other part is
called the validation (or validation + testing) sample.
1)What portion of the sample should be in each part?
If sample size is very large, it is often best to split the sample in half. For
smaller samples, it is more conventional to split the sample such that
2/3 of the observations are in the derivation sample and 1/3 are in
the validation sample.
CROSS-VALIDATION
2) How should the sample be split?
The most common approach is to divide the sample randomly, thus
theoretically eliminating any systematic differences. One alternative is to
define matched pairs of subjects in the original sample and to assign one
member of each pair to the derivation sample and the other to the
validation sample.
Modeling of the data uses one part only. The model selected for this part is
then used to predict the values in the other part of the data. A valid model
should show good predictive accuracy.
One thing that R-squared offers no protection against is overfitting. On the
other hand, cross validation, by allowing us to have cases in our testing set
that are different from the cases in our training set, inherently offers
protection against overfitting.
CROSS VALIDATION  THE IDEAL PROCEDURE
1.Divide data into three sets, training, validation and test sets
2.Find the optimal model on the training set, and use the test set to
check its predictive capability
3.See how well the model can predict the test set
4.The validation error gives an unbiased estimate of the predictive
power of a model
TRAINING/TEST DATA SPLIT
Talked about splitting data in training/test sets
 training data is used to fit parameters
 test data is used to assess how classifier generalizes to new data
What if classifier has non tunable parameters?

 a parameter is non tunable if tuning (or training) it on the training

data leads to overfitting
TRAINING/TEST DATA SPLIT
What about test error? Seems appropriate
 degree 2 is the best model according to the test error
Except what do we report as the test error now?
 Test error should be computed on data that was not used for training
at all
 Here used test data for training, i.e. choosing model
VALIDATION DATA
Same question when choosing among several classifiers
 our polynomial degree example can be looked at as choosing among
3 classifiers (degree 1, 2, or 3)
 Solution: split the labeled data into three parts
TRAINING/ VALIDATION
Training/Validation/Test Data
Training Data
Validation Data
d = 2 is chosen
Test Data
1.3 test error computed for d = 2
LOOCV (Leaveoneout Cross Validation)
 For k=1 to R
1. Let (xk
,yk
) be the k
example
LOOCV (Leaveoneout Cross Validation)
LOOCV (Leaveoneout Cross Validation)
LOOCV (Leaveoneout Cross Validation)
LOOCV (Leaveoneout Cross Validation)
LOOCV (Leaveoneout Cross Validation)
LOOCV for Quadratic Regression
LOOCV for Join The Dots
Which kind of Cross Validation?
K-FOLD CROSS VALIDATION
Since data are often scarce, there might not be enough to set aside for a
validation sample
To work around this issue k-fold CV works as follows:
1. Split the sample into k subsets of equal size
2. For each fold estimate a model on all the subsets except one
3. Use the left out subset to test the model, by calculating a CV metric of
choice
4. Average the CV metric across subsets to get the CV error
This has the advantage of using all data for estimating the model, however
finding a good value for k can be tricky
K-fold Cross Validation Example
1. Split the data into 5
samples
2. Fit a model to the training
samples and use the test
sample to calculate a CV
metric.
3. Repeat the process for the
next sample, until all
samples have been used to
either train or test the
model
Which kind of Cross Validation?
Improve cross-validation
 Even better: repeated cross-validation
Example:
10-fold cross-validation is repeated 10 times and results are
averaged (reduce the variance)
Cross Validation - Metrics
 How do we determine if one model is predicting better than another
model?
Cross Validation Metrics
Best Practice for Reporting Model Fit
1.Use Cross Validation to find the best model
2.Report the RMSE and MAPE statistics from the cross validation
procedure
3.Report the R Squared from the model as you normally would.
The added cross-validation information will allow one to evaluate not
how much variance can be explained by the model, but also the
predictive accuracy of the model. Good models should have a high
predictive AND explanatory power!
EXAMPLE
 The following table gives the size of the floor area (ha) and the price ($000), for
15 houses sold in the Canberra (Australia) suburb of Aranda in 1999.
 For simplicity, we will use 3-fold cross validation
> library(DAAG)
Loading required package: lattice
> data(houseprices)
> summary(houseprices)
area bedrooms sale.price
Min. : 694.0 Min. :4.000 Min. :112.7
1st Qu.: 743.5 1st Qu.:4.000 1st Qu.:213.5
Median : 821.0 Median :4.000 Median :221.5
Mean : 889.3 Mean :4.333 Mean :237.7
3rd Qu.: 984.5 3rd Qu.:4.500 3rd Qu.:267.0
Max. :1366.0 Max. :6.000 Max. :375.0
> houseprices$bedrooms=as.factor(houseprices[,2])
> summary(houseprices)
area bedrooms sale.price
Min. : 694.0 4:11 Min. :112.7
1st Qu.: 743.5 5: 3 1st Qu.:213.5
Median : 821.0 6: 1 Median :221.5
Mean : 889.3 Mean :237.7
3rd Qu.: 984.5 3rd Qu.:267.0
Max. :1366.0 Max. :375.0
plot(sale.price ~ area, data = houseprices, log = "y",pch = 16, xlab = "Floor Area",
ylab = "Sale Price", main = "log(sale.price) vs area")
hist(log(houseprices$sale.price), xlab="Sale Price (logarithmic
scale)", main="Histogram of log(sale.price)")
> #Split row numbers randomly into 3 groups
> rand<- sample(1:15)%%3 + 1
> # a%%3 is a remainder of a modulo 3
> #Subtract from a the largest multiple of 3 that is <= a; take
remainder
> (1:15)[rand == 1] # Observation numbers from the first group
[1] 2 3 5 7 12
> (1:15)[rand == 2] # Observation numbers from the second group
[1] 4 8 9 11 14
> (1:15)[rand == 3] # Observation numbers from the third group
[1] 1 6 10 13 15
> houseprice.lm<- lm(sale.price ~ area, data= houseprices)
> CVlm(houseprices, houseprice.lm, plotit=TRUE)
Analysis of Variance Table
Response: sale.price
Df Sum Sq Mean Sq F value Pr(>F)
area 1 18566 18566 8 0.014 *
Residuals 13 30179 2321
fold 1
Observations in test set: 5
11 20 21 22 23
area 802 696 771.0 1006.0 1191
cvpred 204 188 199.3 234.7 262
sale.price 215 255 260.0 293.0 375
CV residual 11 67 60.7 58.3 113
Sum of squares = 24351 Mean square = 4870 n = 5
fold 2
Observations in test set: 5
10 13 14 17 18
area 905 716 963.0 1018.00 887.00
cvpred 255 224 264.4 273.38 252.06
sale.price 215 113 185.0 276.00 260.00
CV residual -40 -112 -79.4 2.62 7.94
Sum of squares = 20416 Mean square = 4083 n = 5
fold 3
Observations in test set: 5
9 12 15 16 19
area 694.0 1366 821.00 714.0 790.00
cvpred 183.2 388 221.94 189.3 212.49
sale.price 192.0 274 212.00 220.0 221.50
CV residual 8.8 -114 -9.94 30.7 9.01
Sum of squares = 14241 Mean square = 2848 n = 5
Overall (Sum over all 5 folds)
ms
3934
CROSS-VALIDATION AND MODEL SELECTION (1).ppt
houseprice.lm2<- lm(sale.price ~ area + bedrooms, data= houseprices)
CVlm(houseprices, houseprice.lm2, plotit=TRUE)
Analysis of Variance Table
Response: sale.price
Df Sum Sq Mean Sq F value Pr(>F)
area 1 18566 18566 17.0 0.0014 **
bedrooms 1 17065 17065 15.6 0.0019 **
Residuals 12 13114 1093
fold 1
Observations in test set: 5
11 20 21 22 23
Predicted 206 249 259.8 293.3 378
cvpred 204 188 199.3 234.7 262
sale.price 215 255 260.0 293.0 375
CV residual 11 67 60.7 58.3 113
Sum of squares = 24351 Mean square = 4870 n = 5
fold 2
Observations in test set: 5
10 13 14 17 18
Predicted 220.5 193.6 228.8 236.6 218.0
cvpred 226.1 204.9 232.6 238.8 224.1
sale.price 215.0 112.7 185.0 276.0 260.0
CV residual -11.1 -92.2 -47.6 37.2 35.9
Sum of squares = 13563 Mean square = 2713 n = 5
fold 3
Observations in test set: 5
9 12 15 16 19
Predicted 190.5 286.3 208.6 193.3 204
cvpred 174.8 312.5 200.8 178.9 194
sale.price 192.0 274.0 212.0 220.0 222
CV residual 17.2 -38.5 11.2 41.1 27
Sum of squares = 4323 Mean square = 865 n = 5
Overall (Sum over all 5 folds)
ms
2816
CROSS-VALIDATION AND MODEL SELECTION (1).ppt
MEASURING THE MODEL ACCURACY
44
MEASURING THE MODEL ACCURACY
45
MEASURING THE MODEL ACCURACY
46

More Related Content

Similar to CROSS-VALIDATION AND MODEL SELECTION (1).ppt (20)

ML MODULE 5.pdf
ML MODULE 5.pdfML MODULE 5.pdf
ML MODULE 5.pdf
Shiwani Gupta
Model Selection Techniques
Model Selection TechniquesModel Selection Techniques
Model Selection Techniques
Swati .
ML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptx
belay41
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI ModelsUNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
JVSTHARUNSAI
evaluation and credibility-Part 1
evaluation and credibility-Part 1evaluation and credibility-Part 1
evaluation and credibility-Part 1
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
Spc training
Spc training Spc training
Spc training
VIBHASH SINGH
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
rajalakshmi5921
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
ANURAG SINGH
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
ANURAG SINGH
Cross validation.pptx
Cross validation.pptxCross validation.pptx
Cross validation.pptx
YouKnowwho28
Cross validation
Cross validationCross validation
Cross validation
RidhaAfrawe
Build_Machine_Learning_System for Machine Learning Course
Build_Machine_Learning_System for Machine Learning CourseBuild_Machine_Learning_System for Machine Learning Course
Build_Machine_Learning_System for Machine Learning Course
ssuserfece35
VCE Physics: Dealing with numerical measurments
VCE Physics: Dealing with numerical measurmentsVCE Physics: Dealing with numerical measurments
VCE Physics: Dealing with numerical measurments
Andrew Grichting
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
Aman Vasisht
BIIntro.ppt
BIIntro.pptBIIntro.ppt
BIIntro.ppt
PerumalPitchandi
BIIntroduction. on business intelligenceppt
BIIntroduction. on business intelligencepptBIIntroduction. on business intelligenceppt
BIIntroduction. on business intelligenceppt
ShivaniSharma335055
measure of dispersion
measure of dispersion measure of dispersion
measure of dispersion
som allul
Penggambaran Data Secara Numerik
Penggambaran Data Secara NumerikPenggambaran Data Secara Numerik
Penggambaran Data Secara Numerik
anom1392
20231 MCHA022 (Analytical Chemistry 2).pdf
20231 MCHA022 (Analytical Chemistry 2).pdf20231 MCHA022 (Analytical Chemistry 2).pdf
20231 MCHA022 (Analytical Chemistry 2).pdf
TshiamoMotaung
Cross-validation aggregation for forecasting
Cross-validation aggregation for forecastingCross-validation aggregation for forecasting
Cross-validation aggregation for forecasting
Devon Barrow
Model Selection Techniques
Model Selection TechniquesModel Selection Techniques
Model Selection Techniques
Swati .
ML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptx
belay41
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI ModelsUNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
JVSTHARUNSAI
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
rajalakshmi5921
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
ANURAG SINGH
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
ANURAG SINGH
Cross validation.pptx
Cross validation.pptxCross validation.pptx
Cross validation.pptx
YouKnowwho28
Cross validation
Cross validationCross validation
Cross validation
RidhaAfrawe
Build_Machine_Learning_System for Machine Learning Course
Build_Machine_Learning_System for Machine Learning CourseBuild_Machine_Learning_System for Machine Learning Course
Build_Machine_Learning_System for Machine Learning Course
ssuserfece35
VCE Physics: Dealing with numerical measurments
VCE Physics: Dealing with numerical measurmentsVCE Physics: Dealing with numerical measurments
VCE Physics: Dealing with numerical measurments
Andrew Grichting
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
Aman Vasisht
BIIntroduction. on business intelligenceppt
BIIntroduction. on business intelligencepptBIIntroduction. on business intelligenceppt
BIIntroduction. on business intelligenceppt
ShivaniSharma335055
measure of dispersion
measure of dispersion measure of dispersion
measure of dispersion
som allul
Penggambaran Data Secara Numerik
Penggambaran Data Secara NumerikPenggambaran Data Secara Numerik
Penggambaran Data Secara Numerik
anom1392
20231 MCHA022 (Analytical Chemistry 2).pdf
20231 MCHA022 (Analytical Chemistry 2).pdf20231 MCHA022 (Analytical Chemistry 2).pdf
20231 MCHA022 (Analytical Chemistry 2).pdf
TshiamoMotaung
Cross-validation aggregation for forecasting
Cross-validation aggregation for forecastingCross-validation aggregation for forecasting
Cross-validation aggregation for forecasting
Devon Barrow

Recently uploaded (20)

GENCHEM_orientation.pptx GENCHEM_orientation.pptx
GENCHEM_orientation.pptx GENCHEM_orientation.pptxGENCHEM_orientation.pptx GENCHEM_orientation.pptx
GENCHEM_orientation.pptx GENCHEM_orientation.pptx
nacinopa016
MEASURE OF DISPERSION MATHEMATICS 1ST.pptx
MEASURE OF DISPERSION MATHEMATICS 1ST.pptxMEASURE OF DISPERSION MATHEMATICS 1ST.pptx
MEASURE OF DISPERSION MATHEMATICS 1ST.pptx
RicoCielo1
Gen AI for Beginners: How to Start immediately
Gen AI for Beginners: How to Start immediatelyGen AI for Beginners: How to Start immediately
Gen AI for Beginners: How to Start immediately
SANJIV VERMA - (Big Data & Data Scientist)
Week_2_Neural_Networks_Basichhhhhhhs.pdf
Week_2_Neural_Networks_Basichhhhhhhs.pdfWeek_2_Neural_Networks_Basichhhhhhhs.pdf
Week_2_Neural_Networks_Basichhhhhhhs.pdf
Aliker5
JRC_AI Watch. European landscape on the use of Artificial Intelligence by the...
JRC_AI Watch. European landscape on the use of Artificial Intelligence by the...JRC_AI Watch. European landscape on the use of Artificial Intelligence by the...
JRC_AI Watch. European landscape on the use of Artificial Intelligence by the...
fcoccetti
JRC_AI Watch. European landscape on the use of Artificial Intelligence by the...
JRC_AI Watch. European landscape on the use of Artificial Intelligence by the...JRC_AI Watch. European landscape on the use of Artificial Intelligence by the...
JRC_AI Watch. European landscape on the use of Artificial Intelligence by the...
fcoccetti
Data analytics I: classification of data and measurement scale
Data analytics  I: classification of data and measurement scaleData analytics  I: classification of data and measurement scale
Data analytics I: classification of data and measurement scale
shambhurout
bigdata notes engineering unit-02 notes.pdf
bigdata notes engineering   unit-02 notes.pdfbigdata notes engineering   unit-02 notes.pdf
bigdata notes engineering unit-02 notes.pdf
shaquibrizwan66
Mining the xDB; A pipeline for high-powered insights
Mining the xDB; A pipeline for high-powered insightsMining the xDB; A pipeline for high-powered insights
Mining the xDB; A pipeline for high-powered insights
Mark Stiles
Disaster Management and improve structure
Disaster Management and improve structureDisaster Management and improve structure
Disaster Management and improve structure
VinayThakur791640
Unit 6 principlr 嘆 mkt of mr thac you know.pdf
Unit 6 principlr 嘆 mkt of mr thac you know.pdfUnit 6 principlr 嘆 mkt of mr thac you know.pdf
Unit 6 principlr 嘆 mkt of mr thac you know.pdf
khuenguyen3124102833
Cognitive Chasms - A Grounded Theory of GenAI Adoption
Cognitive Chasms - A Grounded Theory of GenAI AdoptionCognitive Chasms - A Grounded Theory of GenAI Adoption
Cognitive Chasms - A Grounded Theory of GenAI Adoption
Dr. Tathagat Varma
Latest one pager of Gupte Education '25
Latest one pager of Gupte Education  '25Latest one pager of Gupte Education  '25
Latest one pager of Gupte Education '25
Dr. Manish Gupte
ADBMS_CSII_2025Feb25.pdfowkwkekekekekekee
ADBMS_CSII_2025Feb25.pdfowkwkekekekekekeeADBMS_CSII_2025Feb25.pdfowkwkekekekekekee
ADBMS_CSII_2025Feb25.pdfowkwkekekekekekee
amarsiraj2020
Discourse analysis gggfxxddcccccion.pptx
Discourse analysis gggfxxddcccccion.pptxDiscourse analysis gggfxxddcccccion.pptx
Discourse analysis gggfxxddcccccion.pptx
VaniMewmew
Advanced SystemCare Pro 18 Crack 2025 Download
Advanced SystemCare Pro 18 Crack 2025 DownloadAdvanced SystemCare Pro 18 Crack 2025 Download
Advanced SystemCare Pro 18 Crack 2025 Download
eastyy67
computerprogrammingggggggggggggggggg.pdf
computerprogrammingggggggggggggggggg.pdfcomputerprogrammingggggggggggggggggg.pdf
computerprogrammingggggggggggggggggg.pdf
MahnoorMushtaque
OpenMetadata Spotlight - OpenMetadata @ Gorgias
OpenMetadata Spotlight - OpenMetadata @ GorgiasOpenMetadata Spotlight - OpenMetadata @ Gorgias
OpenMetadata Spotlight - OpenMetadata @ Gorgias
OpenMetadata
Just The Facts - Data Modeling Zone 2025
Just The Facts - Data Modeling Zone 2025Just The Facts - Data Modeling Zone 2025
Just The Facts - Data Modeling Zone 2025
Marco Wobben
GENCHEM_orientation.pptx GENCHEM_orientation.pptx
GENCHEM_orientation.pptx GENCHEM_orientation.pptxGENCHEM_orientation.pptx GENCHEM_orientation.pptx
GENCHEM_orientation.pptx GENCHEM_orientation.pptx
nacinopa016
MEASURE OF DISPERSION MATHEMATICS 1ST.pptx
MEASURE OF DISPERSION MATHEMATICS 1ST.pptxMEASURE OF DISPERSION MATHEMATICS 1ST.pptx
MEASURE OF DISPERSION MATHEMATICS 1ST.pptx
RicoCielo1
Week_2_Neural_Networks_Basichhhhhhhs.pdf
Week_2_Neural_Networks_Basichhhhhhhs.pdfWeek_2_Neural_Networks_Basichhhhhhhs.pdf
Week_2_Neural_Networks_Basichhhhhhhs.pdf
Aliker5
JRC_AI Watch. European landscape on the use of Artificial Intelligence by the...
JRC_AI Watch. European landscape on the use of Artificial Intelligence by the...JRC_AI Watch. European landscape on the use of Artificial Intelligence by the...
JRC_AI Watch. European landscape on the use of Artificial Intelligence by the...
fcoccetti
JRC_AI Watch. European landscape on the use of Artificial Intelligence by the...
JRC_AI Watch. European landscape on the use of Artificial Intelligence by the...JRC_AI Watch. European landscape on the use of Artificial Intelligence by the...
JRC_AI Watch. European landscape on the use of Artificial Intelligence by the...
fcoccetti
Data analytics I: classification of data and measurement scale
Data analytics  I: classification of data and measurement scaleData analytics  I: classification of data and measurement scale
Data analytics I: classification of data and measurement scale
shambhurout
bigdata notes engineering unit-02 notes.pdf
bigdata notes engineering   unit-02 notes.pdfbigdata notes engineering   unit-02 notes.pdf
bigdata notes engineering unit-02 notes.pdf
shaquibrizwan66
Mining the xDB; A pipeline for high-powered insights
Mining the xDB; A pipeline for high-powered insightsMining the xDB; A pipeline for high-powered insights
Mining the xDB; A pipeline for high-powered insights
Mark Stiles
Disaster Management and improve structure
Disaster Management and improve structureDisaster Management and improve structure
Disaster Management and improve structure
VinayThakur791640
Unit 6 principlr 嘆 mkt of mr thac you know.pdf
Unit 6 principlr 嘆 mkt of mr thac you know.pdfUnit 6 principlr 嘆 mkt of mr thac you know.pdf
Unit 6 principlr 嘆 mkt of mr thac you know.pdf
khuenguyen3124102833
Cognitive Chasms - A Grounded Theory of GenAI Adoption
Cognitive Chasms - A Grounded Theory of GenAI AdoptionCognitive Chasms - A Grounded Theory of GenAI Adoption
Cognitive Chasms - A Grounded Theory of GenAI Adoption
Dr. Tathagat Varma
Latest one pager of Gupte Education '25
Latest one pager of Gupte Education  '25Latest one pager of Gupte Education  '25
Latest one pager of Gupte Education '25
Dr. Manish Gupte
ADBMS_CSII_2025Feb25.pdfowkwkekekekekekee
ADBMS_CSII_2025Feb25.pdfowkwkekekekekekeeADBMS_CSII_2025Feb25.pdfowkwkekekekekekee
ADBMS_CSII_2025Feb25.pdfowkwkekekekekekee
amarsiraj2020
Discourse analysis gggfxxddcccccion.pptx
Discourse analysis gggfxxddcccccion.pptxDiscourse analysis gggfxxddcccccion.pptx
Discourse analysis gggfxxddcccccion.pptx
VaniMewmew
Advanced SystemCare Pro 18 Crack 2025 Download
Advanced SystemCare Pro 18 Crack 2025 DownloadAdvanced SystemCare Pro 18 Crack 2025 Download
Advanced SystemCare Pro 18 Crack 2025 Download
eastyy67
computerprogrammingggggggggggggggggg.pdf
computerprogrammingggggggggggggggggg.pdfcomputerprogrammingggggggggggggggggg.pdf
computerprogrammingggggggggggggggggg.pdf
MahnoorMushtaque
OpenMetadata Spotlight - OpenMetadata @ Gorgias
OpenMetadata Spotlight - OpenMetadata @ GorgiasOpenMetadata Spotlight - OpenMetadata @ Gorgias
OpenMetadata Spotlight - OpenMetadata @ Gorgias
OpenMetadata
Just The Facts - Data Modeling Zone 2025
Just The Facts - Data Modeling Zone 2025Just The Facts - Data Modeling Zone 2025
Just The Facts - Data Modeling Zone 2025
Marco Wobben

CROSS-VALIDATION AND MODEL SELECTION (1).ppt

  • 1. CROSS-VALIDATION AND MODEL SELECTION Many 際際滷s are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS9840 - Learning and Computer Vision
  • 2. How to check if a model fit is good? The R2 statistic has become the almost universally standard measure for model fit in linear models. What is R2 ? It is the ratio of error in a model over the total variance in the dependent variable. Hence the lower the error, the higher the R2 value.
  • 3. How to check if a model fit is good?
  • 4. How to check if a model fit is good?
  • 5. OVERFITTING Modeling techniques tend to overfit the data. Multiple regression: Every time you add a variable to the regression, the models R2 goes up. Na誰ve interpretation: every additional predictive variable helps to explain yet more of the targets variance. But that cant be true! Left to its own devices, Multiple Regression will fit too many patterns. A reason why modeling requires subject-matter expertise.
  • 6. OVERFITTING Error on the dataset used to fit the model can be misleading Doesnt predict future performance. Too much complexity can diminish models accuracy on future data. Sometimes called the Bias- Variance Tradeoff.
  • 7. OVERFITTING What are the consequences of overfitting? 財Overfitted models will have high R2 values, but will perform poorly in predicting out-of-sample cases
  • 8. WHY WE NEED CROSS-VALIDATION? R2 , also known as coefficient of determination, is a popular measure of quality of fit in regression. However, it does not offer any significant insights into how well our regression model can predict future values. When an MLR equation is to be used for prediction purposes it is useful to obtain empirical evidence as to its generalizability, or its capacity to make accurate predictions for new samples of data. This process is sometimes referred to as validating the regression equation.
  • 9. One way to address this issue is to literally obtain a new sample of observations. That is, after the MLR equation is developed from the original sample, the investigator conducts a new study, replicating the original one as closely as possible, and uses the new data to assess the predictive validity of the MLR equation. This procedure is usually viewed as impractical because of the requirement to conduct a new study to obtain validation data, as well as the difficulty in truly replicating the original study. An alternative, more practical procedure is cross-validation.
  • 10. CROSS-VALIDATION In cross-validation the original sample is split into two parts. One part is called the training (or derivation) sample, and the other part is called the validation (or validation + testing) sample. 1)What portion of the sample should be in each part? If sample size is very large, it is often best to split the sample in half. For smaller samples, it is more conventional to split the sample such that 2/3 of the observations are in the derivation sample and 1/3 are in the validation sample.
  • 11. CROSS-VALIDATION 2) How should the sample be split? The most common approach is to divide the sample randomly, thus theoretically eliminating any systematic differences. One alternative is to define matched pairs of subjects in the original sample and to assign one member of each pair to the derivation sample and the other to the validation sample. Modeling of the data uses one part only. The model selected for this part is then used to predict the values in the other part of the data. A valid model should show good predictive accuracy. One thing that R-squared offers no protection against is overfitting. On the other hand, cross validation, by allowing us to have cases in our testing set that are different from the cases in our training set, inherently offers protection against overfitting.
  • 12. CROSS VALIDATION THE IDEAL PROCEDURE 1.Divide data into three sets, training, validation and test sets 2.Find the optimal model on the training set, and use the test set to check its predictive capability 3.See how well the model can predict the test set 4.The validation error gives an unbiased estimate of the predictive power of a model
  • 13. TRAINING/TEST DATA SPLIT Talked about splitting data in training/test sets training data is used to fit parameters test data is used to assess how classifier generalizes to new data What if classifier has non tunable parameters? a parameter is non tunable if tuning (or training) it on the training data leads to overfitting
  • 14. TRAINING/TEST DATA SPLIT What about test error? Seems appropriate degree 2 is the best model according to the test error Except what do we report as the test error now? Test error should be computed on data that was not used for training at all Here used test data for training, i.e. choosing model
  • 15. VALIDATION DATA Same question when choosing among several classifiers our polynomial degree example can be looked at as choosing among 3 classifiers (degree 1, 2, or 3) Solution: split the labeled data into three parts
  • 17. Training/Validation/Test Data Training Data Validation Data d = 2 is chosen Test Data 1.3 test error computed for d = 2
  • 18. LOOCV (Leaveoneout Cross Validation) For k=1 to R 1. Let (xk ,yk ) be the k example
  • 24. LOOCV for Quadratic Regression
  • 25. LOOCV for Join The Dots
  • 26. Which kind of Cross Validation?
  • 27. K-FOLD CROSS VALIDATION Since data are often scarce, there might not be enough to set aside for a validation sample To work around this issue k-fold CV works as follows: 1. Split the sample into k subsets of equal size 2. For each fold estimate a model on all the subsets except one 3. Use the left out subset to test the model, by calculating a CV metric of choice 4. Average the CV metric across subsets to get the CV error This has the advantage of using all data for estimating the model, however finding a good value for k can be tricky
  • 28. K-fold Cross Validation Example 1. Split the data into 5 samples 2. Fit a model to the training samples and use the test sample to calculate a CV metric. 3. Repeat the process for the next sample, until all samples have been used to either train or test the model
  • 29. Which kind of Cross Validation?
  • 30. Improve cross-validation Even better: repeated cross-validation Example: 10-fold cross-validation is repeated 10 times and results are averaged (reduce the variance)
  • 31. Cross Validation - Metrics How do we determine if one model is predicting better than another model?
  • 33. Best Practice for Reporting Model Fit 1.Use Cross Validation to find the best model 2.Report the RMSE and MAPE statistics from the cross validation procedure 3.Report the R Squared from the model as you normally would. The added cross-validation information will allow one to evaluate not how much variance can be explained by the model, but also the predictive accuracy of the model. Good models should have a high predictive AND explanatory power!
  • 34. EXAMPLE The following table gives the size of the floor area (ha) and the price ($000), for 15 houses sold in the Canberra (Australia) suburb of Aranda in 1999. For simplicity, we will use 3-fold cross validation > library(DAAG) Loading required package: lattice > data(houseprices) > summary(houseprices) area bedrooms sale.price Min. : 694.0 Min. :4.000 Min. :112.7 1st Qu.: 743.5 1st Qu.:4.000 1st Qu.:213.5 Median : 821.0 Median :4.000 Median :221.5 Mean : 889.3 Mean :4.333 Mean :237.7 3rd Qu.: 984.5 3rd Qu.:4.500 3rd Qu.:267.0 Max. :1366.0 Max. :6.000 Max. :375.0
  • 35. > houseprices$bedrooms=as.factor(houseprices[,2]) > summary(houseprices) area bedrooms sale.price Min. : 694.0 4:11 Min. :112.7 1st Qu.: 743.5 5: 3 1st Qu.:213.5 Median : 821.0 6: 1 Median :221.5 Mean : 889.3 Mean :237.7 3rd Qu.: 984.5 3rd Qu.:267.0 Max. :1366.0 Max. :375.0 plot(sale.price ~ area, data = houseprices, log = "y",pch = 16, xlab = "Floor Area", ylab = "Sale Price", main = "log(sale.price) vs area")
  • 36. hist(log(houseprices$sale.price), xlab="Sale Price (logarithmic scale)", main="Histogram of log(sale.price)")
  • 37. > #Split row numbers randomly into 3 groups > rand<- sample(1:15)%%3 + 1 > # a%%3 is a remainder of a modulo 3 > #Subtract from a the largest multiple of 3 that is <= a; take remainder > (1:15)[rand == 1] # Observation numbers from the first group [1] 2 3 5 7 12 > (1:15)[rand == 2] # Observation numbers from the second group [1] 4 8 9 11 14 > (1:15)[rand == 3] # Observation numbers from the third group [1] 1 6 10 13 15
  • 38. > houseprice.lm<- lm(sale.price ~ area, data= houseprices) > CVlm(houseprices, houseprice.lm, plotit=TRUE) Analysis of Variance Table Response: sale.price Df Sum Sq Mean Sq F value Pr(>F) area 1 18566 18566 8 0.014 * Residuals 13 30179 2321 fold 1 Observations in test set: 5 11 20 21 22 23 area 802 696 771.0 1006.0 1191 cvpred 204 188 199.3 234.7 262 sale.price 215 255 260.0 293.0 375 CV residual 11 67 60.7 58.3 113 Sum of squares = 24351 Mean square = 4870 n = 5
  • 39. fold 2 Observations in test set: 5 10 13 14 17 18 area 905 716 963.0 1018.00 887.00 cvpred 255 224 264.4 273.38 252.06 sale.price 215 113 185.0 276.00 260.00 CV residual -40 -112 -79.4 2.62 7.94 Sum of squares = 20416 Mean square = 4083 n = 5 fold 3 Observations in test set: 5 9 12 15 16 19 area 694.0 1366 821.00 714.0 790.00 cvpred 183.2 388 221.94 189.3 212.49 sale.price 192.0 274 212.00 220.0 221.50 CV residual 8.8 -114 -9.94 30.7 9.01 Sum of squares = 14241 Mean square = 2848 n = 5 Overall (Sum over all 5 folds) ms 3934
  • 41. houseprice.lm2<- lm(sale.price ~ area + bedrooms, data= houseprices) CVlm(houseprices, houseprice.lm2, plotit=TRUE) Analysis of Variance Table Response: sale.price Df Sum Sq Mean Sq F value Pr(>F) area 1 18566 18566 17.0 0.0014 ** bedrooms 1 17065 17065 15.6 0.0019 ** Residuals 12 13114 1093 fold 1 Observations in test set: 5 11 20 21 22 23 Predicted 206 249 259.8 293.3 378 cvpred 204 188 199.3 234.7 262 sale.price 215 255 260.0 293.0 375 CV residual 11 67 60.7 58.3 113 Sum of squares = 24351 Mean square = 4870 n = 5
  • 42. fold 2 Observations in test set: 5 10 13 14 17 18 Predicted 220.5 193.6 228.8 236.6 218.0 cvpred 226.1 204.9 232.6 238.8 224.1 sale.price 215.0 112.7 185.0 276.0 260.0 CV residual -11.1 -92.2 -47.6 37.2 35.9 Sum of squares = 13563 Mean square = 2713 n = 5 fold 3 Observations in test set: 5 9 12 15 16 19 Predicted 190.5 286.3 208.6 193.3 204 cvpred 174.8 312.5 200.8 178.9 194 sale.price 192.0 274.0 212.0 220.0 222 CV residual 17.2 -38.5 11.2 41.1 27 Sum of squares = 4323 Mean square = 865 n = 5 Overall (Sum over all 5 folds) ms 2816
  • 44. MEASURING THE MODEL ACCURACY 44
  • 45. MEASURING THE MODEL ACCURACY 45
  • 46. MEASURING THE MODEL ACCURACY 46