際際滷

際際滷Share a Scribd company logo
JanataHack: Mobility Analytics
Quick Surge Price Prediction for Cab
aggregator
Introduction
With the upcoming cab aggregators and demand for mobility solutions, the
past decade has seen immense growth in data collected from commercial
vehicles with major contributors such as Uber, Lyft and Ola to name a few.
There are loads of innovative data science and machine learning solutions
being implemented using such data and that has led to tremendous business
value for such organizations.
This is a Hackathon relating to Mobility Business conducted by Analytics
Vidhya. This presentation is about my attempt in tackling the challenge
Problem Statement
Welcome to Sigma Cab Private Limited - a cab aggregator service. Their customers
can download their app on smartphones and book a cab from any where in the cities
they operate in. They, in turn search for cabs from various service providers and
provide the best option to their client across available options. They have been in
operation for little less than a year now. During this period, they have captured
surge_pricing_type from the service providers.
You have been hired by Sigma Cabs as a Data Scientist and have been asked to build
a predictive model, which could help them in predicting the surge_pricing_type pro-
actively. This would in turn help them in matching the right cabs with the right
customers quickly and efficiently.
Preliminary Understanding
 Train and Test data shows similar pattern in their mean and quartile distribution. This is
great. We can assume that the test data is similar to that of train and predictions on Train
might work on Test
 Train and Test have no empty Train_ID, Train_Distance
 We have few NaN in Type_of_Cab for both Train and Test. Lets create a new category
F with all the NaN values
 Customer_Since_Months has few NaN values and replace them with 0. They are the newbi
es to this cab services.
 Life_Style_Index, Confidence_Life_Style_Index. This is a propritery value by the cab compan
y and we have no idea how it is derived. Can think of omitting the NaN rows. Since, replaci
ng them with 0 might mean something different. Or, can perform EDA and decide later.
 Destination_Type, Customer_Rating, Cancellation_Last_1Month have no missing values.
 Var1 is masked by the company and is very sparse. We definitely cant remove all records w
ith NaN values and neither assume them to be 0. we could take a call on this after EDA.
Correlation check
Can see Var2 and Var3 correlated in both Test and Train and have removed Var2
Preliminary Understanding
 Train and Test data shows similar pattern in their mean and quartile distribution. This is great.
We can assume that the test data is similar to that of train and predictions on Train might work
on Test
 Train and Test have no empty Train_ID, Train_Distance
 We have few NaN in Type_of_Cab for both Train and Test. Lets create a new category
F with all the NaN values
 Customer_Since_Months has few NaN values and replace them with 0. They are the newbies to t
his cab services.
 Life_Style_Index, Confidence_Life_Style_Index. This is a propritery value by the cab company and
we have no idea how it is derived. Can think of omitting the NaN rows. Since, replacing them wit
h 0 might mean something different. Or, can perform EDA and decide later.
 Destination_Type, Customer_Rating, Cancellation_Last_1Month have no missing values.
 Var1 is masked by the company and is very sparse. We definitely cant remove all records with N
aN values and neither assume them to be 0. we could take a call on this after EDA.
EDA to understand Life_Style_Index
From the 3 scatter plots, we can notice that most of the values of Life_style_index is
distributed between 2 to 3.5
For simplicity, we fill assume NaN values with mode values for both Train and Test(2.7)
EDA to understand Confidence_Life_Style_Index
Look like the Confidence_Life_Style_Index is randomly assigned with equal distribution. For
simplicity, lets equally assign A,B,C to the NaNs in the field.
EDA on Surge_Pricing_Type
Not a large difference between the target values, and Sampling isn't required.
Modeling: RandomForestClassifier
1. Simple RandomForest gives Accuracy 0.685
Modeling: XGBoost
1. Simple XGBoost gives Accuracy 0.6835
Modeling: XGBoost with GridSearchCV
1. XGBoost with GridSearchCV tuning gives Accuracy 0.696616 on Train data and 0.7015

More Related Content

Similar to Mobility Hackathon (20)

Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
patilaniket2418
The Third Eye
The Third EyeThe Third Eye
The Third Eye
pratima upadhyay
House price prediction
House price predictionHouse price prediction
House price prediction
AdityaKumar1505
Deriving insights from data using "R"ight way
Deriving insights from data using "R"ight wayDeriving insights from data using "R"ight way
Deriving insights from data using "R"ight way
Gaurav Shrivastav
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
Vasudev pendyala
EDA_Assignment_Sourabh S Hubballi.pdf
EDA_Assignment_Sourabh S Hubballi.pdfEDA_Assignment_Sourabh S Hubballi.pdf
EDA_Assignment_Sourabh S Hubballi.pdf
SourabhH1
Brm project report [meru cab]
Brm project report [meru cab]Brm project report [meru cab]
Brm project report [meru cab]
Rohan Kohli
Digital Marketing Campaign Conversion Prediction.
Digital Marketing Campaign Conversion Prediction.Digital Marketing Campaign Conversion Prediction.
Digital Marketing Campaign Conversion Prediction.
Boston Institute of Analytics
Digital Marketing Campaign Conversion Prediction
Digital Marketing Campaign Conversion PredictionDigital Marketing Campaign Conversion Prediction
Digital Marketing Campaign Conversion Prediction
Boston Institute of Analytics
Loan default prediction with machine language
Loan  default  prediction with  machine  language Loan  default  prediction with  machine  language
Loan default prediction with machine language
Aayush Kumar
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Boston Institute of Analytics
MAJOR Project Presentation for data science.pptx
MAJOR Project Presentation for data science.pptxMAJOR Project Presentation for data science.pptx
MAJOR Project Presentation for data science.pptx
ayushmanpatiown
Profile based recommendation for Airbnb users-Project report
Profile based recommendation for Airbnb users-Project reportProfile based recommendation for Airbnb users-Project report
Profile based recommendation for Airbnb users-Project report
Vasanti Mahajan
Final Report
Final ReportFinal Report
Final Report
Aman Soni
House_Price_Prediction using python and ML
House_Price_Prediction using python and MLHouse_Price_Prediction using python and ML
House_Price_Prediction using python and ML
PurviGupta42
CAR PRICE PREDICTION.pptx
CAR PRICE PREDICTION.pptxCAR PRICE PREDICTION.pptx
CAR PRICE PREDICTION.pptx
NAVINCHACKO1
Telecommunication Analysis(3 use-cases) with IBM cognos insight
Telecommunication Analysis(3 use-cases) with IBM cognos insightTelecommunication Analysis(3 use-cases) with IBM cognos insight
Telecommunication Analysis(3 use-cases) with IBM cognos insight
sheetal sharma
Supervised learning
Supervised learningSupervised learning
Supervised learning
Johnson Ubah
Competitor Analysis_Snehil
Competitor Analysis_SnehilCompetitor Analysis_Snehil
Competitor Analysis_Snehil
Snehil Singh
Feature engineering mean encodings
Feature engineering   mean encodingsFeature engineering   mean encodings
Feature engineering mean encodings
Chode Amarnath
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
patilaniket2418
House price prediction
House price predictionHouse price prediction
House price prediction
AdityaKumar1505
Deriving insights from data using "R"ight way
Deriving insights from data using "R"ight wayDeriving insights from data using "R"ight way
Deriving insights from data using "R"ight way
Gaurav Shrivastav
EDA_Assignment_Sourabh S Hubballi.pdf
EDA_Assignment_Sourabh S Hubballi.pdfEDA_Assignment_Sourabh S Hubballi.pdf
EDA_Assignment_Sourabh S Hubballi.pdf
SourabhH1
Brm project report [meru cab]
Brm project report [meru cab]Brm project report [meru cab]
Brm project report [meru cab]
Rohan Kohli
Loan default prediction with machine language
Loan  default  prediction with  machine  language Loan  default  prediction with  machine  language
Loan default prediction with machine language
Aayush Kumar
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Boston Institute of Analytics
MAJOR Project Presentation for data science.pptx
MAJOR Project Presentation for data science.pptxMAJOR Project Presentation for data science.pptx
MAJOR Project Presentation for data science.pptx
ayushmanpatiown
Profile based recommendation for Airbnb users-Project report
Profile based recommendation for Airbnb users-Project reportProfile based recommendation for Airbnb users-Project report
Profile based recommendation for Airbnb users-Project report
Vasanti Mahajan
Final Report
Final ReportFinal Report
Final Report
Aman Soni
House_Price_Prediction using python and ML
House_Price_Prediction using python and MLHouse_Price_Prediction using python and ML
House_Price_Prediction using python and ML
PurviGupta42
CAR PRICE PREDICTION.pptx
CAR PRICE PREDICTION.pptxCAR PRICE PREDICTION.pptx
CAR PRICE PREDICTION.pptx
NAVINCHACKO1
Telecommunication Analysis(3 use-cases) with IBM cognos insight
Telecommunication Analysis(3 use-cases) with IBM cognos insightTelecommunication Analysis(3 use-cases) with IBM cognos insight
Telecommunication Analysis(3 use-cases) with IBM cognos insight
sheetal sharma
Supervised learning
Supervised learningSupervised learning
Supervised learning
Johnson Ubah
Competitor Analysis_Snehil
Competitor Analysis_SnehilCompetitor Analysis_Snehil
Competitor Analysis_Snehil
Snehil Singh
Feature engineering mean encodings
Feature engineering   mean encodingsFeature engineering   mean encodings
Feature engineering mean encodings
Chode Amarnath

Recently uploaded (20)

Kaun TALHA quiz Prelims - El Dorado 2025
Kaun TALHA quiz Prelims - El Dorado 2025Kaun TALHA quiz Prelims - El Dorado 2025
Kaun TALHA quiz Prelims - El Dorado 2025
Conquiztadors- the Quiz Society of Sri Venkateswara College
Database population in Odoo 18 - Odoo slides
Database population in Odoo 18 - Odoo slidesDatabase population in Odoo 18 - Odoo slides
Database population in Odoo 18 - Odoo slides
Celine George
English 4 Quarter 4 Week 4 Classroom Obs
English 4 Quarter 4 Week 4 Classroom ObsEnglish 4 Quarter 4 Week 4 Classroom Obs
English 4 Quarter 4 Week 4 Classroom Obs
NerissaMendez1
TPR Data strategy 2025 (1).pdf Data strategy
TPR Data strategy 2025 (1).pdf Data strategyTPR Data strategy 2025 (1).pdf Data strategy
TPR Data strategy 2025 (1).pdf Data strategy
Henry Tapper
QuickBooks Desktop to QuickBooks Online How to Make the Move
QuickBooks Desktop to QuickBooks Online  How to Make the MoveQuickBooks Desktop to QuickBooks Online  How to Make the Move
QuickBooks Desktop to QuickBooks Online How to Make the Move
TechSoup
Blind spots in AI and Formulation Science, IFPAC 2025.pdf
Blind spots in AI and Formulation Science, IFPAC 2025.pdfBlind spots in AI and Formulation Science, IFPAC 2025.pdf
Blind spots in AI and Formulation Science, IFPAC 2025.pdf
Ajaz Hussain
POWERPOINT-PRESENTATION_DM-NO.017-S.2025.pptx
POWERPOINT-PRESENTATION_DM-NO.017-S.2025.pptxPOWERPOINT-PRESENTATION_DM-NO.017-S.2025.pptx
POWERPOINT-PRESENTATION_DM-NO.017-S.2025.pptx
MarilenQuintoSimbula
Blind Spots in AI and Formulation Science Knowledge Pyramid (Updated Perspect...
Blind Spots in AI and Formulation Science Knowledge Pyramid (Updated Perspect...Blind Spots in AI and Formulation Science Knowledge Pyramid (Updated Perspect...
Blind Spots in AI and Formulation Science Knowledge Pyramid (Updated Perspect...
Ajaz Hussain
DUBLIN PROGRAM DUBLIN PROGRAM DUBLIN PROGRAM
DUBLIN PROGRAM DUBLIN PROGRAM DUBLIN PROGRAMDUBLIN PROGRAM DUBLIN PROGRAM DUBLIN PROGRAM
DUBLIN PROGRAM DUBLIN PROGRAM DUBLIN PROGRAM
vlckovar
Lesson Plan M1 2024 Lesson Plan M1 2024 Lesson Plan M1 2024 Lesson Plan M1...
Lesson Plan M1 2024  Lesson Plan M1 2024  Lesson Plan M1 2024  Lesson Plan M1...Lesson Plan M1 2024  Lesson Plan M1 2024  Lesson Plan M1 2024  Lesson Plan M1...
Lesson Plan M1 2024 Lesson Plan M1 2024 Lesson Plan M1 2024 Lesson Plan M1...
pinkdvil200
Information Technology for class X CBSE skill Subject
Information Technology for class X CBSE skill SubjectInformation Technology for class X CBSE skill Subject
Information Technology for class X CBSE skill Subject
VEENAKSHI PATHAK
Eng7-Q4-Lesson 1 Part 1 Understanding Discipline-Specific Words, Voice, and T...
Eng7-Q4-Lesson 1 Part 1 Understanding Discipline-Specific Words, Voice, and T...Eng7-Q4-Lesson 1 Part 1 Understanding Discipline-Specific Words, Voice, and T...
Eng7-Q4-Lesson 1 Part 1 Understanding Discipline-Specific Words, Voice, and T...
sandynavergas1
Principle and Practices of Animal Breeding || Boby Basnet
Principle and Practices of Animal Breeding || Boby BasnetPrinciple and Practices of Animal Breeding || Boby Basnet
Principle and Practices of Animal Breeding || Boby Basnet
Boby Basnet
The Dravidian Languages: Tamil, Telugu, Kannada, Malayalam, Brahui, Kuvi, Tulu
The Dravidian Languages: Tamil, Telugu, Kannada, Malayalam, Brahui, Kuvi, TuluThe Dravidian Languages: Tamil, Telugu, Kannada, Malayalam, Brahui, Kuvi, Tulu
The Dravidian Languages: Tamil, Telugu, Kannada, Malayalam, Brahui, Kuvi, Tulu
DrIArulAram
cervical spine mobilization manual therapy .pdf
cervical spine mobilization manual therapy .pdfcervical spine mobilization manual therapy .pdf
cervical spine mobilization manual therapy .pdf
SamarHosni3
APM People Interest Network Conference - Oliver Randall & David Bovis - Own Y...
APM People Interest Network Conference - Oliver Randall & David Bovis - Own Y...APM People Interest Network Conference - Oliver Randall & David Bovis - Own Y...
APM People Interest Network Conference - Oliver Randall & David Bovis - Own Y...
Association for Project Management
N.C. DPI's 2023 Language Diversity Briefing
N.C. DPI's 2023 Language Diversity BriefingN.C. DPI's 2023 Language Diversity Briefing
N.C. DPI's 2023 Language Diversity Briefing
Mebane Rash
Digital Tools with AI for e-Content Development.pptx
Digital Tools with AI for e-Content Development.pptxDigital Tools with AI for e-Content Development.pptx
Digital Tools with AI for e-Content Development.pptx
Dr. Sarita Anand
CRITICAL THINKING AND NURSING JUDGEMENT.pptx
CRITICAL THINKING AND NURSING JUDGEMENT.pptxCRITICAL THINKING AND NURSING JUDGEMENT.pptx
CRITICAL THINKING AND NURSING JUDGEMENT.pptx
PoojaSen20
TLE 7 - 2nd Topic - Codes and Standards in Industrial Arts Services.pptx
TLE 7 - 2nd Topic - Codes and Standards in Industrial Arts Services.pptxTLE 7 - 2nd Topic - Codes and Standards in Industrial Arts Services.pptx
TLE 7 - 2nd Topic - Codes and Standards in Industrial Arts Services.pptx
RizaBedayo
Database population in Odoo 18 - Odoo slides
Database population in Odoo 18 - Odoo slidesDatabase population in Odoo 18 - Odoo slides
Database population in Odoo 18 - Odoo slides
Celine George
English 4 Quarter 4 Week 4 Classroom Obs
English 4 Quarter 4 Week 4 Classroom ObsEnglish 4 Quarter 4 Week 4 Classroom Obs
English 4 Quarter 4 Week 4 Classroom Obs
NerissaMendez1
TPR Data strategy 2025 (1).pdf Data strategy
TPR Data strategy 2025 (1).pdf Data strategyTPR Data strategy 2025 (1).pdf Data strategy
TPR Data strategy 2025 (1).pdf Data strategy
Henry Tapper
QuickBooks Desktop to QuickBooks Online How to Make the Move
QuickBooks Desktop to QuickBooks Online  How to Make the MoveQuickBooks Desktop to QuickBooks Online  How to Make the Move
QuickBooks Desktop to QuickBooks Online How to Make the Move
TechSoup
Blind spots in AI and Formulation Science, IFPAC 2025.pdf
Blind spots in AI and Formulation Science, IFPAC 2025.pdfBlind spots in AI and Formulation Science, IFPAC 2025.pdf
Blind spots in AI and Formulation Science, IFPAC 2025.pdf
Ajaz Hussain
POWERPOINT-PRESENTATION_DM-NO.017-S.2025.pptx
POWERPOINT-PRESENTATION_DM-NO.017-S.2025.pptxPOWERPOINT-PRESENTATION_DM-NO.017-S.2025.pptx
POWERPOINT-PRESENTATION_DM-NO.017-S.2025.pptx
MarilenQuintoSimbula
Blind Spots in AI and Formulation Science Knowledge Pyramid (Updated Perspect...
Blind Spots in AI and Formulation Science Knowledge Pyramid (Updated Perspect...Blind Spots in AI and Formulation Science Knowledge Pyramid (Updated Perspect...
Blind Spots in AI and Formulation Science Knowledge Pyramid (Updated Perspect...
Ajaz Hussain
DUBLIN PROGRAM DUBLIN PROGRAM DUBLIN PROGRAM
DUBLIN PROGRAM DUBLIN PROGRAM DUBLIN PROGRAMDUBLIN PROGRAM DUBLIN PROGRAM DUBLIN PROGRAM
DUBLIN PROGRAM DUBLIN PROGRAM DUBLIN PROGRAM
vlckovar
Lesson Plan M1 2024 Lesson Plan M1 2024 Lesson Plan M1 2024 Lesson Plan M1...
Lesson Plan M1 2024  Lesson Plan M1 2024  Lesson Plan M1 2024  Lesson Plan M1...Lesson Plan M1 2024  Lesson Plan M1 2024  Lesson Plan M1 2024  Lesson Plan M1...
Lesson Plan M1 2024 Lesson Plan M1 2024 Lesson Plan M1 2024 Lesson Plan M1...
pinkdvil200
Information Technology for class X CBSE skill Subject
Information Technology for class X CBSE skill SubjectInformation Technology for class X CBSE skill Subject
Information Technology for class X CBSE skill Subject
VEENAKSHI PATHAK
Eng7-Q4-Lesson 1 Part 1 Understanding Discipline-Specific Words, Voice, and T...
Eng7-Q4-Lesson 1 Part 1 Understanding Discipline-Specific Words, Voice, and T...Eng7-Q4-Lesson 1 Part 1 Understanding Discipline-Specific Words, Voice, and T...
Eng7-Q4-Lesson 1 Part 1 Understanding Discipline-Specific Words, Voice, and T...
sandynavergas1
Principle and Practices of Animal Breeding || Boby Basnet
Principle and Practices of Animal Breeding || Boby BasnetPrinciple and Practices of Animal Breeding || Boby Basnet
Principle and Practices of Animal Breeding || Boby Basnet
Boby Basnet
The Dravidian Languages: Tamil, Telugu, Kannada, Malayalam, Brahui, Kuvi, Tulu
The Dravidian Languages: Tamil, Telugu, Kannada, Malayalam, Brahui, Kuvi, TuluThe Dravidian Languages: Tamil, Telugu, Kannada, Malayalam, Brahui, Kuvi, Tulu
The Dravidian Languages: Tamil, Telugu, Kannada, Malayalam, Brahui, Kuvi, Tulu
DrIArulAram
cervical spine mobilization manual therapy .pdf
cervical spine mobilization manual therapy .pdfcervical spine mobilization manual therapy .pdf
cervical spine mobilization manual therapy .pdf
SamarHosni3
APM People Interest Network Conference - Oliver Randall & David Bovis - Own Y...
APM People Interest Network Conference - Oliver Randall & David Bovis - Own Y...APM People Interest Network Conference - Oliver Randall & David Bovis - Own Y...
APM People Interest Network Conference - Oliver Randall & David Bovis - Own Y...
Association for Project Management
N.C. DPI's 2023 Language Diversity Briefing
N.C. DPI's 2023 Language Diversity BriefingN.C. DPI's 2023 Language Diversity Briefing
N.C. DPI's 2023 Language Diversity Briefing
Mebane Rash
Digital Tools with AI for e-Content Development.pptx
Digital Tools with AI for e-Content Development.pptxDigital Tools with AI for e-Content Development.pptx
Digital Tools with AI for e-Content Development.pptx
Dr. Sarita Anand
CRITICAL THINKING AND NURSING JUDGEMENT.pptx
CRITICAL THINKING AND NURSING JUDGEMENT.pptxCRITICAL THINKING AND NURSING JUDGEMENT.pptx
CRITICAL THINKING AND NURSING JUDGEMENT.pptx
PoojaSen20
TLE 7 - 2nd Topic - Codes and Standards in Industrial Arts Services.pptx
TLE 7 - 2nd Topic - Codes and Standards in Industrial Arts Services.pptxTLE 7 - 2nd Topic - Codes and Standards in Industrial Arts Services.pptx
TLE 7 - 2nd Topic - Codes and Standards in Industrial Arts Services.pptx
RizaBedayo

Mobility Hackathon

  • 1. JanataHack: Mobility Analytics Quick Surge Price Prediction for Cab aggregator
  • 2. Introduction With the upcoming cab aggregators and demand for mobility solutions, the past decade has seen immense growth in data collected from commercial vehicles with major contributors such as Uber, Lyft and Ola to name a few. There are loads of innovative data science and machine learning solutions being implemented using such data and that has led to tremendous business value for such organizations. This is a Hackathon relating to Mobility Business conducted by Analytics Vidhya. This presentation is about my attempt in tackling the challenge
  • 3. Problem Statement Welcome to Sigma Cab Private Limited - a cab aggregator service. Their customers can download their app on smartphones and book a cab from any where in the cities they operate in. They, in turn search for cabs from various service providers and provide the best option to their client across available options. They have been in operation for little less than a year now. During this period, they have captured surge_pricing_type from the service providers. You have been hired by Sigma Cabs as a Data Scientist and have been asked to build a predictive model, which could help them in predicting the surge_pricing_type pro- actively. This would in turn help them in matching the right cabs with the right customers quickly and efficiently.
  • 4. Preliminary Understanding Train and Test data shows similar pattern in their mean and quartile distribution. This is great. We can assume that the test data is similar to that of train and predictions on Train might work on Test Train and Test have no empty Train_ID, Train_Distance We have few NaN in Type_of_Cab for both Train and Test. Lets create a new category F with all the NaN values Customer_Since_Months has few NaN values and replace them with 0. They are the newbi es to this cab services. Life_Style_Index, Confidence_Life_Style_Index. This is a propritery value by the cab compan y and we have no idea how it is derived. Can think of omitting the NaN rows. Since, replaci ng them with 0 might mean something different. Or, can perform EDA and decide later. Destination_Type, Customer_Rating, Cancellation_Last_1Month have no missing values. Var1 is masked by the company and is very sparse. We definitely cant remove all records w ith NaN values and neither assume them to be 0. we could take a call on this after EDA.
  • 5. Correlation check Can see Var2 and Var3 correlated in both Test and Train and have removed Var2
  • 6. Preliminary Understanding Train and Test data shows similar pattern in their mean and quartile distribution. This is great. We can assume that the test data is similar to that of train and predictions on Train might work on Test Train and Test have no empty Train_ID, Train_Distance We have few NaN in Type_of_Cab for both Train and Test. Lets create a new category F with all the NaN values Customer_Since_Months has few NaN values and replace them with 0. They are the newbies to t his cab services. Life_Style_Index, Confidence_Life_Style_Index. This is a propritery value by the cab company and we have no idea how it is derived. Can think of omitting the NaN rows. Since, replacing them wit h 0 might mean something different. Or, can perform EDA and decide later. Destination_Type, Customer_Rating, Cancellation_Last_1Month have no missing values. Var1 is masked by the company and is very sparse. We definitely cant remove all records with N aN values and neither assume them to be 0. we could take a call on this after EDA.
  • 7. EDA to understand Life_Style_Index From the 3 scatter plots, we can notice that most of the values of Life_style_index is distributed between 2 to 3.5 For simplicity, we fill assume NaN values with mode values for both Train and Test(2.7)
  • 8. EDA to understand Confidence_Life_Style_Index Look like the Confidence_Life_Style_Index is randomly assigned with equal distribution. For simplicity, lets equally assign A,B,C to the NaNs in the field.
  • 9. EDA on Surge_Pricing_Type Not a large difference between the target values, and Sampling isn't required.
  • 10. Modeling: RandomForestClassifier 1. Simple RandomForest gives Accuracy 0.685
  • 11. Modeling: XGBoost 1. Simple XGBoost gives Accuracy 0.6835
  • 12. Modeling: XGBoost with GridSearchCV 1. XGBoost with GridSearchCV tuning gives Accuracy 0.696616 on Train data and 0.7015