狠狠撸

狠狠撸Share a Scribd company logo
Final Case Study
Predictive Modelling for Equestrian Sports
N RAMACHANDRAN
Average by Stake Indicator
0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000
All
AP
CRC
FG
Handle by Stake Indicator
Y N
Average Handle by Day of Week
0
50000
100000
150000
200000
250000
300000
350000
Sun Mon Tue Wed Thu Fri Sat
Handle vs Day of week
All AP CRC FG
Average Handle by Hour of day
0
50000
100000
150000
200000
250000
300000
350000
400000
1 2 3 4 5 6 7 8 9
Handle vs Hour of day
hour_of_day All AP CRC FG
Average Handle by No of runners
0
100000
200000
300000
400000
500000
600000
700000
800000
3 4 5 6 7 8 9 10 11 12 13 14
Handle vs No of runners
All AP CRC FG
Average Handle vs Race Number
0
200000
400000
600000
800000
1000000
1200000
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Handle by Race Number
All AP CRC FG
Average Handle by Month
0
50000
100000
150000
200000
250000
300000
350000
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Average Handle by Month
All AP CRC FG
Variables and their influence on the handle
Variables influencing Handle
All AP CRC FG
Purse_USA +ve +ve +ve +ve
Number of runners +ve +ve +ve +ve
Holiday +ve +ve +ve -ve
Weekend +ve NA +ve +ve
Race Type -ve +ve -ve +ve
Age Restriction -ve -ve NA +ve
Sex Restriction -ve -ve -ve -ve
Race Number +ve -ve -ve +ve
Hour of day +ve -ve +ve +ve
Track_Condition -ve -ve NA NA
Wager Type +ve +ve +ve +ve
Linear Regression
? The analytic modelling used to predict the handle values is Linear
Regression .Since the handle is a continuous variable , this is the best
method to understand the predict the values.
? Following are the charts that show the results of the predicted values
and the error with respect to the original handle values .
? (The details of the variables used in the regression are in the Excel
files.)
Predicted Handle vs Handle with All Track Ids
Original Handle vs Errors for all Track Ids
Predicted Handle vs Original Handle for track AP
0
200000
400000
600000
800000
1000000
1200000
1400000
0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000 2000000 2200000 2400000 2600000 2800000 3000000 3200000 3400000
predicted_handle
Original Handle value vs Error for Track AP
-600000
-400000
-200000
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
2000000
2200000
2400000
2600000
2800000
0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000 2000000 2200000 2400000 2600000 2800000 3000000 3200000 3400000
difference
Predicted Handle vs Original Handle for track CRC
0
100000
200000
300000
400000
500000
600000
700000
800000
0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 1100000 1200000 1300000
predicted_handle
Original Handle value vs Error for Track CRC
-400000
-200000
0
200000
400000
600000
800000
1000000
1200000
0 200000 400000 600000 800000 1000000 1200000 1400000
difference
Predicted Handle vs Original Handle for track FG
0
100000
200000
300000
400000
500000
600000
0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 1100000 1200000 1300000 1400000 1500000 1600000 1700000 1800000
predicted_handle
Original Handle value vs Error for Track FG
-400000
-200000
0
200000
400000
600000
800000
1000000
1200000
1400000
0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000
difference
Important Points
? The predicted values for the range upto handle = 700,000 is predicted
with a good accuracy.
? The model does not do a good job of predicting higher values of
handle.
? The Handle values vs error graph shows most of the values
symmetrically placed along the x axis , the error are random and
therefore there is not any collinearity issue.
? Adj R sq is in the range 0.60 – 0.75 for all the different analysis.
Ideal Variable Values to Maximize Handle
Ideal Values for the maximization of Handle
All AP CRC FG
Number of runners 14 14 13 13
Holiday 1 1 1 0
Weekend 1 0 1 1
Race Type STK STK STK STK
Age Restriction 4U 34 35 3
Sex Restriction No Restriction No Restriction No Restriction No Restriction
Race Number 3 9 6 2
Hour of day 7 1 2 2
Track_Condition FT GD FT FT
Wager Type E E E E
Month Jan Aug Jan Jan
Day of Week Wed Wed Mon Thu

More Related Content

Final case study powerpoint

  • 1. Final Case Study Predictive Modelling for Equestrian Sports N RAMACHANDRAN
  • 2. Average by Stake Indicator 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 All AP CRC FG Handle by Stake Indicator Y N
  • 3. Average Handle by Day of Week 0 50000 100000 150000 200000 250000 300000 350000 Sun Mon Tue Wed Thu Fri Sat Handle vs Day of week All AP CRC FG
  • 4. Average Handle by Hour of day 0 50000 100000 150000 200000 250000 300000 350000 400000 1 2 3 4 5 6 7 8 9 Handle vs Hour of day hour_of_day All AP CRC FG
  • 5. Average Handle by No of runners 0 100000 200000 300000 400000 500000 600000 700000 800000 3 4 5 6 7 8 9 10 11 12 13 14 Handle vs No of runners All AP CRC FG
  • 6. Average Handle vs Race Number 0 200000 400000 600000 800000 1000000 1200000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Handle by Race Number All AP CRC FG
  • 7. Average Handle by Month 0 50000 100000 150000 200000 250000 300000 350000 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Average Handle by Month All AP CRC FG
  • 8. Variables and their influence on the handle Variables influencing Handle All AP CRC FG Purse_USA +ve +ve +ve +ve Number of runners +ve +ve +ve +ve Holiday +ve +ve +ve -ve Weekend +ve NA +ve +ve Race Type -ve +ve -ve +ve Age Restriction -ve -ve NA +ve Sex Restriction -ve -ve -ve -ve Race Number +ve -ve -ve +ve Hour of day +ve -ve +ve +ve Track_Condition -ve -ve NA NA Wager Type +ve +ve +ve +ve
  • 9. Linear Regression ? The analytic modelling used to predict the handle values is Linear Regression .Since the handle is a continuous variable , this is the best method to understand the predict the values. ? Following are the charts that show the results of the predicted values and the error with respect to the original handle values . ? (The details of the variables used in the regression are in the Excel files.)
  • 10. Predicted Handle vs Handle with All Track Ids
  • 11. Original Handle vs Errors for all Track Ids
  • 12. Predicted Handle vs Original Handle for track AP 0 200000 400000 600000 800000 1000000 1200000 1400000 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000 2000000 2200000 2400000 2600000 2800000 3000000 3200000 3400000 predicted_handle
  • 13. Original Handle value vs Error for Track AP -600000 -400000 -200000 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000 2000000 2200000 2400000 2600000 2800000 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000 2000000 2200000 2400000 2600000 2800000 3000000 3200000 3400000 difference
  • 14. Predicted Handle vs Original Handle for track CRC 0 100000 200000 300000 400000 500000 600000 700000 800000 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 1100000 1200000 1300000 predicted_handle
  • 15. Original Handle value vs Error for Track CRC -400000 -200000 0 200000 400000 600000 800000 1000000 1200000 0 200000 400000 600000 800000 1000000 1200000 1400000 difference
  • 16. Predicted Handle vs Original Handle for track FG 0 100000 200000 300000 400000 500000 600000 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 1100000 1200000 1300000 1400000 1500000 1600000 1700000 1800000 predicted_handle
  • 17. Original Handle value vs Error for Track FG -400000 -200000 0 200000 400000 600000 800000 1000000 1200000 1400000 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000 difference
  • 18. Important Points ? The predicted values for the range upto handle = 700,000 is predicted with a good accuracy. ? The model does not do a good job of predicting higher values of handle. ? The Handle values vs error graph shows most of the values symmetrically placed along the x axis , the error are random and therefore there is not any collinearity issue. ? Adj R sq is in the range 0.60 – 0.75 for all the different analysis.
  • 19. Ideal Variable Values to Maximize Handle Ideal Values for the maximization of Handle All AP CRC FG Number of runners 14 14 13 13 Holiday 1 1 1 0 Weekend 1 0 1 1 Race Type STK STK STK STK Age Restriction 4U 34 35 3 Sex Restriction No Restriction No Restriction No Restriction No Restriction Race Number 3 9 6 2 Hour of day 7 1 2 2 Track_Condition FT GD FT FT Wager Type E E E E Month Jan Aug Jan Jan Day of Week Wed Wed Mon Thu