This document summarizes an analysis of wine quality data. It describes the steps taken which include data exploration, cleaning, examining relationships between variables, modeling and prediction. The data was explored, cleaned by imputing missing values, removing outliers, and scaling quantitative variables. Correlations between variables and the output quality variable were examined. The data was divided into train and test sets, and regression modeling was performed on the train set to determine important predictors of quality. The model was then used to predict quality on the test set.
1 of 15
Download to read offline
More Related Content
IDS 570 project presentation
1. Analysis of wine quality
Aadhish Chopra
Abhilekh Das
Gopal Bhutada
Parichay Jain
Presented By:
8. Data Cleaning
The cleaning of the data is done in three steps here
Imputation of missing values
Removal of outliers
Scaling of all the Quantitative variables
11. Examining Relationship
Correlation between the variables
We try to find out the relation
between various attributes and with
respect to our output variable quality
Correlation factor lies between -1 to
+1
Chart along-with indicates the
measure of correlation between
various attributes.
12. Regression
Divide data into train and test data
Train data using regression model
Based on the output of regression analysis
we find out the parameters which has
statistical importance over the quality of
wine and are not by random chance
Model analysis various combinations and
finally concludes the one with minimum
RSE, better adjusted R-squared value and
F-statistics