The document summarizes that while falling crime rates are positive, they may come at the cost of police becoming indifferent to reported crimes like petty theft. Specifically, it notes that a stolen stereo was reported but the police did not seem to care to investigate or resolve the crime. This could be a negative consequence of focus on major crimes and reduced resources for minor crimes due to falling crime rates.
1 of 16
Download to read offline
More Related Content
Final presentation
3. In short, the falling crime rate we've
enjoyed may come at a cost: police
indifference when you report your
stereo was stolen.
From NPR.org March 30th 2015
4. Hypothesis Potential Attributes
Type of crime Crime Type (NIBRS raw class,
NIBRS category/against)
Location of crime Lat / Long, Distance to high risk
locations (homeless shelter, etc.)
Victim Profile Age, race, ethnicity, gender
Crime waves Normalized rolling count of
crimes in the last 7 or 30 days.
Information Provided
(Clues)
Witness Present Flag, Witness
Demographics (age, gender)
Time of Crime Hour of the day
Day/Week of Crime Day of the week, Week of the
year
Extreme Weather Days with Snow (e.g. Feb 2014
Snowstorm), Days with Severe
Weather
Amount of Damage
(Property Crimes only)
Property Damage Amount,
Property Type
Hypothesis Potential Attributes
Police/Department
strategy
Not included in the dataset.
Police Response Not included in the dataset.
Police Bias Not included in the dataset
Officer / Department
Training
Not included in the dataset
Demographics of Officer Not included in the dataset
Association of Crimes
(Hidden Network)
Not included in the dataset
Institutional Factors (DA
Office, etc.)
Not included in the dataset
Other External Factors (e.g.
media coverage of a crime)
Difficult to measure and out
of scope. Would need to
append data (e.g. # of
media articles per crime)
Testable Hypotheses Non-Testable Hypotheses
6. Step in Preparing Model Dataset Change Records
Starting Population: Original Dataset 261,254
Remove Non-Crimes -25,992 235,262
Remove Unfound and Misc. Clear Status -30,593 204,669
Remove Non-CLT Crimes (e.g. Matthews) -1,367 203,302
Final Model Dataset 203,302
Variable Category # Fields
Crime Type 3
Location 9
Date / Time 4
Crime Wave 2
Neighborhood Demographics (QofL) 10
Police Response 1
Property 1
Severe Weather Flag 2
Victim 6
Business Victim 6
Victim/Reporting Flag 3
Victim-Suspect Relationship 3
Grand Total 50
Variables by Category
Exclusions
7. Rank Variable Chi Square
1 Crime Type I (NIBRS Hi Class) 0.6247
2 Crime Type II (Category) 0.5550
3 Crime Wave: Rolling 7 Day Avg 0.4914
4 Crime against Public 0.4682
5 Crime Type III (Against) 0.4637
6 Crime against NC State 0.4443
7 Victim Age (Binned) 0.3577
8 Property Value (Decile) 0.3041
9 Place2 (e.g. 30+ location types) 0.2687
10 Witness Flag: Provided Address Info 0.2679
11 Latitude of Crime 0.1955
12 Longitude of Crime 0.1904
13 Place1(e.g. 6 location types) 0.1889
14 Victim is White 0.1687
15 Crime against Wal-Mart 0.1622
16 Victim Knew Suspect Outside of Family 0.1544
17 Crime Wave: Rolling 30 Day Avg 0.1408
18 Hour of Day of Crime 0.1370
19 Victim Knew Suspect Inside of Family 0.1345
20 Crime Reported by Officer Flag 0.1247
Clearance Rates after exclusions (non-Crime, etc) applied
8. *Used H2O (via R Studio interface) for the model
H20s website: http://h2o.ai/
Metrics used for Model Evaluation:
1) Accuracy
2) Area-under-the-Curve (AUC)
10. Accuracy Train Valid Test AUC Train Valid Test
"Simple" CART 0.8033 0.8021 0.7988 "Simple" CART 0.8283 0.8290 0.8257
CART 0.8327 0.8300 0.8276 CART 0.8524 0.8516 0.8480
Na誰ve Bayes 0.7495 0.7507 0.7455 Na誰ve Bayes 0.7951 0.7949 0.7915
GLM (Regularized) 0.8257 0.8149 0.7832 GLM (Regularized) 0.9157 0.9069 0.8781
GBM 0.8808 0.8463 0.8479 GBM 0.9528 0.9243 0.9241
Deep Learning 0.8573 0.8404 0.8390 Deep Learning 0.9346 0.9202 0.9171
Random Forests 0.8541 0.8402 0.8389 Random Forests 0.9263 0.9154 0.9128
Accuracy Train Valid Test AU
"Simple" CART 0.8033 0.8021 0.7988 "S
CART 0.8327 0.8300 0.8276 CA
Na誰ve Bayes 0.7495 0.7507 0.7455 Na
GLM (Regularized) 0.8257 0.8149 0.7832 GL
GBM 0.8808 0.8463 0.8479 GB
Deep Learning 0.8573 0.8404 0.8390 De
Random Forests 0.8541 0.8402 0.8389 Ra
Appendix includes ModelTuning Parameters
12. R Code is available on GitHub:
https://github.com/wesslen/MachineLearningProject
13. 1. Crime
Occurs
2. Crime
Reported
3. Police
Collect
Info
4. Police
Prioritize
Crime
5. Solve
or not
solve.
Weatherburn, Donald James., and Bronwyn Lind. Delinquent-prone
Communities. Cambridge, UK: Cambridge UP, 2001. Print.
Each increase in the prevalence of involvement in crime expands
the scope for further contact between delinquents and susceptibles,
thereby fueling further increases in the level of participation in crime
15. Clearance Status 2012 2013 2014
Exceptionally Cleared - By Death of Offender 16 23 19
Exceptionally Cleared - Cleared by Other Means 962 1,383 1,311
Exceptionally Cleared - Extradition Declined 2 2 1
Exceptionally Cleared - Located (Missing Persons and Runaways only) 14 13 15
Exceptionally Cleared - Prosecution Declined by DA 173 209 174
Exceptionally Cleared - Victim Chose not to Prosecute 6,322 5,781 5,594
Normal Clearance - Cleared by Arrest 21,334 19,089 20,506
Normal Clearance - Cleared by Arrest by Another Agency 228 386 330
Open 46,798 45,937 47,349
Open - Cleared, Pending Arrest Validation 65 557 389
Unfounded 3,816 3,316 3,148
Total 79,730 76,696 78,836
Total Excluding Rare Clearances (Blue) 69,094 66,409 69,166
Clearance Rate (Normal Clearance / Total Excluding Rare) 32.3% 30.8% 31.5%
Reported Year
Blue = Excluded from model
Yellow = Event in the Dependent
Variable Flag (i.e. equal to 1)
Green = Non-event in the
DependentVariable Flag (i.e.
equal to 0)
16. Model Tuning Parameters
CART (Simple and Normal) Complexity =0.001, Minimum Split = 1000, Minimum Bucket Size
=1000, Maximum Depth = 5
Na誰ve Bayes Laplace Smoother = 3
GLM with Regularization Alpha = 1 (Lasso)
GBM Number ofTrees = 200, Maximum Depth = 5, Interaction Depth = 2,
Learning Rate = 0.2
Deep Learning 3 Hidden Layers, each with 200 nodes
Random Forests Number ofTrees = 50, Maximum Depth = 10, Minimum Rows = 5,
Number of Bins = 20
Editor's Notes
#3: The CART and Simple CART models performed well too. The CART model performed better than the simpler model, showing the trade-off that simplicity and interpretability can be exchanged for increased predictive power. Even better, both models showed little signs of overfitting as its performance was nearly identical on the training, validation and test dataset.
GLM showed signs of overfitting. Its training accuracy was 82.6% while its test accuracy was 78.3%, which was lower than the Simple CART model. Likely, more rigorous feature transformation for non-linearities and perhaps other feature selection techniques (e.g. forward or backward stepwise) may provide less overfitting results.
In conclusion, from a predictive accuracy point of view, GBM was the best model and predicted clear rates with nearly 85% (out-of-sample) accuracy. Nevertheless, this model remains largely a black box model in which its components are difficult to interpret. Therefore, for practical use, we recommend that CART models can perform quite well along with interpretable results that practitioners may find usable than black box algorithms like GBM and Deep Learning.
#4: Why are Clearance Rates important:
They are official metrics tracked by local police departments and the FBI
They measure how effective at solving and thus, with crime feedback theory, also at preventing crime
Lower crime rates dont tell the whole story use example that tradeoff
#6: Our approach was to use the software and tools that would work best for the various parts of our project. For the data prep phase we used SQL, OpenRefine, and OpenGIS for the data wrangling. We then used ArcGIS and Tableau for exploring the data and looking for any high level patterns. External data sets were found and were run through SQL for standardization. Our datasets were merged with the external datasets we found, using the SAS EG software. Once we had our aggregated dataset we loaded this into R Studio for object building. Lastly, H2O was used for in-memory predictive analytics and fast data mining.
#8: Before running our models, we evaluated on a filter basis the variable of importance of each predictor using a statistical approach (Chi-Square). We chose Chi-Square given than nearly all of the variables were categorical and given that all variables were originally screened to ensure that they aligned to one of our hypotheses. However, as we explain later, most of our methods (like GBM and GLM with regularization) have their own wrapper based feature selection algorithms that will further refine the list of variables.
Notice the crime types are consistent year after year.
#9: For classification, we surveyed a range of models going from simple and intuitive (CART) to more complex, black box models like Gradient Boosting Models and Deep Learning. For more advanced models, we used the H2O R Wrapper to run H2O. H2O is an open-source machine and deep learning suite of applications used to increase the scalability for a broad range of algorithms. It uses in-memory compression to run millions of rows of data with a small cluster.
We started with a small decision tree where we selected for features with the largest predictive power. We called this our simple CART as it small and was easily interpretable. We then gave all of our features to a second decision tree to see if more variables would provide better predictive power.
Using the H2O engine we then used a Na誰ve Bayes on a limited number of variables with a Laplace smoother (lambda = 3). Fourth, we ran Regularized (Lasso) Generalized Linear Regression. We ran regularization on the model in order to reduce unnecessary and redundant features that are included in the dataset. We selected regularized instead of stepwise given that only regularization was available in the H2O package. We selected Lasso (Alpha = 1) instead of Ridge (Alpha = 0) because we found the Lasso performed better on the validation dataset.
In addition to the traditional methods (GLM, CART, Na誰ve Bayes), we ran three more advanced, black box methods: GBM, Deep Learning and Random Forests. For all three of these models, there were several tuning parameters (e.g. the number of trees and the maximum tree depth for GBM or the number of hidden neurons for Deep Learning).
#10: And here is what our simple decision tree looks like. This simple CART that restricted our decision tree to only the top variables (from filter selection) in order to gain intuition on our dataset. In particular, we restricted the type of crime variable to the variable Against rather than the more detailed NIBRS_Hi_Class or Category because this variable had far fewer classes (only four versus 30+) which made the interpretation much easier.
#14: A number of theories and crime models have been proposed over the years to explain the existence of a positive feedback loop between the level of crime at one point in time in a neighborhood, and the level of crime at a later point of time in the same neighborhood Dr. Weatherburn, a crime professor at Cambridge writes, In the book Delinquent-prone Communities the following, In the epidemic model of crime the positive feedback loop is created by the fact that each increase in the prevalence of involvement in crime expands the scope for further contact between delinquents and susceptibles, thereby fueling further increases in the level of participation in crime.