�ݺ�ߣ

“In short, the falling crime rate we've
enjoyed may come at a cost: police
indifference when you report your
stereo was stolen.”
From NPR.org March 30th 2015

Hypothesis Potential Attributes
Type of crime Crime Type (NIBRS raw class,
NIBRS category/against)
Location of crime Lat / Long, Distance to high risk
locations (homeless shelter, etc.)
Victim Profile Age, race, ethnicity, gender
“Crime waves” Normalized rolling count of
crimes in the last 7 or 30 days.
Information Provided
(“Clues”)
Witness Present Flag, Witness
Demographics (age, gender)
Time of Crime Hour of the day
Day/Week of Crime Day of the week, Week of the
year
Extreme Weather Days with Snow (e.g. Feb 2014
Snowstorm), Days with Severe
Weather
Amount of Damage
(Property Crimes only)
Property Damage Amount,
Property Type
Hypothesis Potential Attributes
Police/Department
strategy
Not included in the dataset.
Police Response Not included in the dataset.
Police Bias Not included in the dataset
Officer / Department
Training
Not included in the dataset
Demographics of Officer Not included in the dataset
Association of Crimes
(“Hidden Network”)
Institutional Factors (DA
Office, etc.)
Other External Factors (e.g.
media coverage of a crime)
Difficult to measure and out
of scope. Would need to
append data (e.g. # of
media articles per crime)
Testable Hypotheses Non-Testable Hypotheses

Step in Preparing Model Dataset Change Records
Starting Population: Original Dataset 261,254
Remove Non-Crimes -25,992 235,262
Remove Unfound and Misc. Clear Status -30,593 204,669
Remove Non-CLT Crimes (e.g. Matthews) -1,367 203,302
Final Model Dataset 203,302
Variable Category # Fields
Crime Type 3
Location 9
Date / Time 4
Crime Wave 2
Neighborhood Demographics (QofL) 10
Police Response 1
Property 1
Severe Weather Flag 2
Victim 6
Business Victim 6
Victim/Reporting Flag 3
Victim-Suspect Relationship 3
Grand Total 50
Variables by Category
Exclusions

Rank Variable Chi Square
1 Crime Type I (NIBRS Hi Class) 0.6247
2 Crime Type II (Category) 0.5550
3 Crime Wave: Rolling 7 Day Avg 0.4914
4 Crime against Public 0.4682
5 Crime Type III (Against) 0.4637
6 Crime against NC State 0.4443
7 Victim Age (Binned) 0.3577
8 Property Value (Decile) 0.3041
9 Place2 (e.g. 30+ location types) 0.2687
10 Witness Flag: Provided Address Info 0.2679
11 Latitude of Crime 0.1955
12 Longitude of Crime 0.1904
13 Place1(e.g. 6 location types) 0.1889
14 Victim is White 0.1687
15 Crime against Wal-Mart 0.1622
16 Victim Knew Suspect Outside of Family 0.1544
17 Crime Wave: Rolling 30 Day Avg 0.1408
18 Hour of Day of Crime 0.1370
19 Victim Knew Suspect Inside of Family 0.1345
20 Crime Reported by Officer Flag 0.1247
Clearance Rates after exclusions (non-Crime, etc) applied

*Used H2O (via R Studio interface) for the model
H20’s website: http://h2o.ai/
Metrics used for Model Evaluation:
1) Accuracy
2) Area-under-the-Curve (AUC)

"Simple" CART Train Valid Test
Accuracy 0.8033 0.8021 0.7988
AUC 0.8283 0.8290 0.8257

Accuracy Train Valid Test AUC Train Valid Test
"Simple" CART 0.8033 0.8021 0.7988 "Simple" CART 0.8283 0.8290 0.8257
CART 0.8327 0.8300 0.8276 CART 0.8524 0.8516 0.8480
Naïve Bayes 0.7495 0.7507 0.7455 Naïve Bayes 0.7951 0.7949 0.7915
GLM (Regularized) 0.8257 0.8149 0.7832 GLM (Regularized) 0.9157 0.9069 0.8781
GBM 0.8808 0.8463 0.8479 GBM 0.9528 0.9243 0.9241
Deep Learning 0.8573 0.8404 0.8390 Deep Learning 0.9346 0.9202 0.9171
Random Forests 0.8541 0.8402 0.8389 Random Forests 0.9263 0.9154 0.9128
Accuracy Train Valid Test AU
"Simple" CART 0.8033 0.8021 0.7988 "S
CART 0.8327 0.8300 0.8276 CA
Naïve Bayes 0.7495 0.7507 0.7455 Na
GLM (Regularized) 0.8257 0.8149 0.7832 GL
GBM 0.8808 0.8463 0.8479 GB
Deep Learning 0.8573 0.8404 0.8390 De
Random Forests 0.8541 0.8402 0.8389 Ra
Appendix includes ModelTuning Parameters

R Code is available on GitHub:
https://github.com/wesslen/MachineLearningProject

1. Crime
Occurs
2. Crime
Reported
3. Police
Collect
Info
4. Police
Prioritize
Crime
5. Solve
or not
solve.
Weatherburn, Donald James., and Bronwyn Lind. Delinquent-prone
Communities. Cambridge, UK: Cambridge UP, 2001. Print.
“…Each increase in the prevalence of involvement in crime expands
the scope for further contact between delinquents and susceptibles,
thereby fueling further increases in the level of participation in crime”

Clearance Status 2012 2013 2014
Exceptionally Cleared - By Death of Offender 16 23 19
Exceptionally Cleared - Cleared by Other Means 962 1,383 1,311
Exceptionally Cleared - Extradition Declined 2 2 1
Exceptionally Cleared - Located (Missing Persons and Runaways only) 14 13 15
Exceptionally Cleared - Prosecution Declined by DA 173 209 174
Exceptionally Cleared - Victim Chose not to Prosecute 6,322 5,781 5,594
Normal Clearance - Cleared by Arrest 21,334 19,089 20,506
Normal Clearance - Cleared by Arrest by Another Agency 228 386 330
Open 46,798 45,937 47,349
Open - Cleared, Pending Arrest Validation 65 557 389
Unfounded 3,816 3,316 3,148
Total 79,730 76,696 78,836
Total Excluding Rare Clearances (Blue) 69,094 66,409 69,166
Clearance Rate (Normal Clearance / Total Excluding Rare) 32.3% 30.8% 31.5%
Reported Year
Blue = Excluded from model
Yellow = Event in the Dependent
Variable Flag (i.e. equal to 1)
Green = Non-event in the
DependentVariable Flag (i.e.
equal to 0)

Model Tuning Parameters
CART (Simple and Normal) Complexity =0.001, Minimum Split = 1000, Minimum Bucket Size
=1000, Maximum Depth = 5
Naïve Bayes Laplace Smoother = 3
GLM with Regularization Alpha = 1 (Lasso)
GBM Number ofTrees = 200, Maximum Depth = 5, Interaction Depth = 2,
Learning Rate = 0.2
Deep Learning 3 Hidden Layers, each with 200 nodes
Random Forests Number ofTrees = 50, Maximum Depth = 10, Minimum Rows = 5,
Number of Bins = 20

�ݺ�ߣ

Final presentation

Recommended

More Related Content

Similar to Final presentation (20)

Final presentation

Editor's Notes