Final presentation
In short, the falling crime rate we've
enjoyed may come at a cost: police
indifference when you report your
stereo was stolen.
From NPR.org March 30th 2015
Hypothesis Potential Attributes
Type of crime Crime Type (NIBRS raw class,
NIBRS category/against)
Location of crime Lat / Long, Distance to high risk
locations (homeless shelter, etc.)
Victim Profile Age, race, ethnicity, gender
Crime waves Normalized rolling count of
crimes in the last 7 or 30 days.
Information Provided
Witness Present Flag, Witness
Demographics (age, gender)
Time of Crime Hour of the day
Day/Week of Crime Day of the week, Week of the
Extreme Weather Days with Snow (e.g. Feb 2014
Snowstorm), Days with Severe
Amount of Damage
(Property Crimes only)
Property Damage Amount,
Property Type
Hypothesis Potential Attributes
Not included in the dataset.
Police Response Not included in the dataset.
Police Bias Not included in the dataset
Officer / Department
Not included in the dataset
Demographics of Officer Not included in the dataset
Association of Crimes
(Hidden Network)
Not included in the dataset
Institutional Factors (DA
Office, etc.)
Not included in the dataset
Other External Factors (e.g.
media coverage of a crime)
Difficult to measure and out
of scope. Would need to
append data (e.g. # of
media articles per crime)
Testable Hypotheses Non-Testable Hypotheses
Step in Preparing Model Dataset Change Records
Starting Population: Original Dataset 261,254
Remove Non-Crimes -25,992 235,262
Remove Unfound and Misc. Clear Status -30,593 204,669
Remove Non-CLT Crimes (e.g. Matthews) -1,367 203,302
Final Model Dataset 203,302
Variable Category # Fields
Crime Type 3
Location 9
Date / Time 4
Crime Wave 2
Neighborhood Demographics (QofL) 10
Police Response 1
Property 1
Severe Weather Flag 2
Victim 6
Business Victim 6
Victim/Reporting Flag 3
Victim-Suspect Relationship 3
Grand Total 50
Variables by Category
Rank Variable Chi Square
1 Crime Type I (NIBRS Hi Class) 0.6247
2 Crime Type II (Category) 0.5550
3 Crime Wave: Rolling 7 Day Avg 0.4914
4 Crime against Public 0.4682
5 Crime Type III (Against) 0.4637
6 Crime against NC State 0.4443
7 Victim Age (Binned) 0.3577
8 Property Value (Decile) 0.3041
9 Place2 (e.g. 30+ location types) 0.2687
10 Witness Flag: Provided Address Info 0.2679
11 Latitude of Crime 0.1955
12 Longitude of Crime 0.1904
13 Place1(e.g. 6 location types) 0.1889
14 Victim is White 0.1687
15 Crime against Wal-Mart 0.1622
16 Victim Knew Suspect Outside of Family 0.1544
17 Crime Wave: Rolling 30 Day Avg 0.1408
18 Hour of Day of Crime 0.1370
19 Victim Knew Suspect Inside of Family 0.1345
20 Crime Reported by Officer Flag 0.1247
Clearance Rates after exclusions (non-Crime, etc) applied
*Used H2O (via R Studio interface) for the model
H20s website: http://h2o.ai/
Metrics used for Model Evaluation:
1) Accuracy
2) Area-under-the-Curve (AUC)
"Simple" CART Train Valid Test
Accuracy 0.8033 0.8021 0.7988
AUC 0.8283 0.8290 0.8257
Accuracy Train Valid Test AUC Train Valid Test
"Simple" CART 0.8033 0.8021 0.7988 "Simple" CART 0.8283 0.8290 0.8257
CART 0.8327 0.8300 0.8276 CART 0.8524 0.8516 0.8480
Na誰ve Bayes 0.7495 0.7507 0.7455 Na誰ve Bayes 0.7951 0.7949 0.7915
GLM (Regularized) 0.8257 0.8149 0.7832 GLM (Regularized) 0.9157 0.9069 0.8781
GBM 0.8808 0.8463 0.8479 GBM 0.9528 0.9243 0.9241
Deep Learning 0.8573 0.8404 0.8390 Deep Learning 0.9346 0.9202 0.9171
Random Forests 0.8541 0.8402 0.8389 Random Forests 0.9263 0.9154 0.9128
Accuracy Train Valid Test AU
"Simple" CART 0.8033 0.8021 0.7988 "S
CART 0.8327 0.8300 0.8276 CA
Na誰ve Bayes 0.7495 0.7507 0.7455 Na
GLM (Regularized) 0.8257 0.8149 0.7832 GL
GBM 0.8808 0.8463 0.8479 GB
Deep Learning 0.8573 0.8404 0.8390 De
Random Forests 0.8541 0.8402 0.8389 Ra
Appendix includes ModelTuning Parameters
R Code is available on GitHub:
1. Crime
2. Crime
3. Police
4. Police
5. Solve
or not
Weatherburn, Donald James., and Bronwyn Lind. Delinquent-prone
Communities. Cambridge, UK: Cambridge UP, 2001. Print.
Each increase in the prevalence of involvement in crime expands
the scope for further contact between delinquents and susceptibles,
thereby fueling further increases in the level of participation in crime
Clearance Status 2012 2013 2014
Exceptionally Cleared - By Death of Offender 16 23 19
Exceptionally Cleared - Cleared by Other Means 962 1,383 1,311
Exceptionally Cleared - Extradition Declined 2 2 1
Exceptionally Cleared - Located (Missing Persons and Runaways only) 14 13 15
Exceptionally Cleared - Prosecution Declined by DA 173 209 174
Exceptionally Cleared - Victim Chose not to Prosecute 6,322 5,781 5,594
Normal Clearance - Cleared by Arrest 21,334 19,089 20,506
Normal Clearance - Cleared by Arrest by Another Agency 228 386 330
Open 46,798 45,937 47,349
Open - Cleared, Pending Arrest Validation 65 557 389
Unfounded 3,816 3,316 3,148
Total 79,730 76,696 78,836
Total Excluding Rare Clearances (Blue) 69,094 66,409 69,166
Clearance Rate (Normal Clearance / Total Excluding Rare) 32.3% 30.8% 31.5%
Reported Year
Blue = Excluded from model
Yellow = Event in the Dependent
Variable Flag (i.e. equal to 1)
Green = Non-event in the
DependentVariable Flag (i.e.
equal to 0)
Model Tuning Parameters
CART (Simple and Normal) Complexity =0.001, Minimum Split = 1000, Minimum Bucket Size
=1000, Maximum Depth = 5
Na誰ve Bayes Laplace Smoother = 3
GLM with Regularization Alpha = 1 (Lasso)
GBM Number ofTrees = 200, Maximum Depth = 5, Interaction Depth = 2,
Learning Rate = 0.2
Deep Learning 3 Hidden Layers, each with 200 nodes
Random Forests Number ofTrees = 50, Maximum Depth = 10, Minimum Rows = 5,
Number of Bins = 20

  • 3. In short, the falling crime rate we've enjoyed may come at a cost: police indifference when you report your stereo was stolen. From NPR.org March 30th 2015
  • 4. Hypothesis Potential Attributes Type of crime Crime Type (NIBRS raw class, NIBRS category/against) Location of crime Lat / Long, Distance to high risk locations (homeless shelter, etc.) Victim Profile Age, race, ethnicity, gender Crime waves Normalized rolling count of crimes in the last 7 or 30 days. Information Provided (Clues) Witness Present Flag, Witness Demographics (age, gender) Time of Crime Hour of the day Day/Week of Crime Day of the week, Week of the year Extreme Weather Days with Snow (e.g. Feb 2014 Snowstorm), Days with Severe Weather Amount of Damage (Property Crimes only) Property Damage Amount, Property Type Hypothesis Potential Attributes Police/Department strategy Not included in the dataset. Police Response Not included in the dataset. Police Bias Not included in the dataset Officer / Department Training Not included in the dataset Demographics of Officer Not included in the dataset Association of Crimes (Hidden Network) Not included in the dataset Institutional Factors (DA Office, etc.) Not included in the dataset Other External Factors (e.g. media coverage of a crime) Difficult to measure and out of scope. Would need to append data (e.g. # of media articles per crime) Testable Hypotheses Non-Testable Hypotheses
  • 6. Step in Preparing Model Dataset Change Records Starting Population: Original Dataset 261,254 Remove Non-Crimes -25,992 235,262 Remove Unfound and Misc. Clear Status -30,593 204,669 Remove Non-CLT Crimes (e.g. Matthews) -1,367 203,302 Final Model Dataset 203,302 Variable Category # Fields Crime Type 3 Location 9 Date / Time 4 Crime Wave 2 Neighborhood Demographics (QofL) 10 Police Response 1 Property 1 Severe Weather Flag 2 Victim 6 Business Victim 6 Victim/Reporting Flag 3 Victim-Suspect Relationship 3 Grand Total 50 Variables by Category Exclusions
  • 7. Rank Variable Chi Square 1 Crime Type I (NIBRS Hi Class) 0.6247 2 Crime Type II (Category) 0.5550 3 Crime Wave: Rolling 7 Day Avg 0.4914 4 Crime against Public 0.4682 5 Crime Type III (Against) 0.4637 6 Crime against NC State 0.4443 7 Victim Age (Binned) 0.3577 8 Property Value (Decile) 0.3041 9 Place2 (e.g. 30+ location types) 0.2687 10 Witness Flag: Provided Address Info 0.2679 11 Latitude of Crime 0.1955 12 Longitude of Crime 0.1904 13 Place1(e.g. 6 location types) 0.1889 14 Victim is White 0.1687 15 Crime against Wal-Mart 0.1622 16 Victim Knew Suspect Outside of Family 0.1544 17 Crime Wave: Rolling 30 Day Avg 0.1408 18 Hour of Day of Crime 0.1370 19 Victim Knew Suspect Inside of Family 0.1345 20 Crime Reported by Officer Flag 0.1247 Clearance Rates after exclusions (non-Crime, etc) applied
  • 8. *Used H2O (via R Studio interface) for the model H20s website: http://h2o.ai/ Metrics used for Model Evaluation: 1) Accuracy 2) Area-under-the-Curve (AUC)
  • 9. "Simple" CART Train Valid Test Accuracy 0.8033 0.8021 0.7988 AUC 0.8283 0.8290 0.8257
  • 10. Accuracy Train Valid Test AUC Train Valid Test "Simple" CART 0.8033 0.8021 0.7988 "Simple" CART 0.8283 0.8290 0.8257 CART 0.8327 0.8300 0.8276 CART 0.8524 0.8516 0.8480 Na誰ve Bayes 0.7495 0.7507 0.7455 Na誰ve Bayes 0.7951 0.7949 0.7915 GLM (Regularized) 0.8257 0.8149 0.7832 GLM (Regularized) 0.9157 0.9069 0.8781 GBM 0.8808 0.8463 0.8479 GBM 0.9528 0.9243 0.9241 Deep Learning 0.8573 0.8404 0.8390 Deep Learning 0.9346 0.9202 0.9171 Random Forests 0.8541 0.8402 0.8389 Random Forests 0.9263 0.9154 0.9128 Accuracy Train Valid Test AU "Simple" CART 0.8033 0.8021 0.7988 "S CART 0.8327 0.8300 0.8276 CA Na誰ve Bayes 0.7495 0.7507 0.7455 Na GLM (Regularized) 0.8257 0.8149 0.7832 GL GBM 0.8808 0.8463 0.8479 GB Deep Learning 0.8573 0.8404 0.8390 De Random Forests 0.8541 0.8402 0.8389 Ra Appendix includes ModelTuning Parameters
  • 12. R Code is available on GitHub: https://github.com/wesslen/MachineLearningProject
  • 13. 1. Crime Occurs 2. Crime Reported 3. Police Collect Info 4. Police Prioritize Crime 5. Solve or not solve. Weatherburn, Donald James., and Bronwyn Lind. Delinquent-prone Communities. Cambridge, UK: Cambridge UP, 2001. Print. Each increase in the prevalence of involvement in crime expands the scope for further contact between delinquents and susceptibles, thereby fueling further increases in the level of participation in crime
  • 15. Clearance Status 2012 2013 2014 Exceptionally Cleared - By Death of Offender 16 23 19 Exceptionally Cleared - Cleared by Other Means 962 1,383 1,311 Exceptionally Cleared - Extradition Declined 2 2 1 Exceptionally Cleared - Located (Missing Persons and Runaways only) 14 13 15 Exceptionally Cleared - Prosecution Declined by DA 173 209 174 Exceptionally Cleared - Victim Chose not to Prosecute 6,322 5,781 5,594 Normal Clearance - Cleared by Arrest 21,334 19,089 20,506 Normal Clearance - Cleared by Arrest by Another Agency 228 386 330 Open 46,798 45,937 47,349 Open - Cleared, Pending Arrest Validation 65 557 389 Unfounded 3,816 3,316 3,148 Total 79,730 76,696 78,836 Total Excluding Rare Clearances (Blue) 69,094 66,409 69,166 Clearance Rate (Normal Clearance / Total Excluding Rare) 32.3% 30.8% 31.5% Reported Year Blue = Excluded from model Yellow = Event in the Dependent Variable Flag (i.e. equal to 1) Green = Non-event in the DependentVariable Flag (i.e. equal to 0)
  • 16. Model Tuning Parameters CART (Simple and Normal) Complexity =0.001, Minimum Split = 1000, Minimum Bucket Size =1000, Maximum Depth = 5 Na誰ve Bayes Laplace Smoother = 3 GLM with Regularization Alpha = 1 (Lasso) GBM Number ofTrees = 200, Maximum Depth = 5, Interaction Depth = 2, Learning Rate = 0.2 Deep Learning 3 Hidden Layers, each with 200 nodes Random Forests Number ofTrees = 50, Maximum Depth = 10, Minimum Rows = 5, Number of Bins = 20

