A Cautionary Tale About Time of Day Analysis Using Los Angeles Crime Data

The Story

Imagine that you’ve been tasked with strategically allocating scarce police resources across the City of Los Angeles. You, understanding the value of data, decide to do some historical analysis of the data that the City of L.A. has kindly made available via the LA Open Data Portal. With this data in hand, you feel much more confident in being able to be better informed to predict when and where crimes might occur and how to more efficiently utilize crime-fighting police forces. Let’s see the data!

Introducing the Data

We have an entire collection of crime data from Los Angeles from the years 2012 to the first quarter of 2016, inclusively. The data was downloaded as five separate csv files from data.lacity.org in the summer of 2016. Some minimal processing was done to combine the raw data and generate a new dataset with a few additional features for convenience. The data is available as a zipped file.

Altogether, we have 1,018,653 different crimes in the dataset, and include the following features (some derived):

variable name type description
year_id character original dataset id
date_rptd date date crime occurred
dr_no character ?
date_occ date date crime occurred
area_name character geographical location
rd character nearby road identifier
crm_cd_desc character crime type
status_desc character status outcome of crime
location character nearby address location
cross_st character nearby cross street
lat numeric latitude
long numeric longitute
year numeric year of crime occurred
month numeric month of crime occurred
day_of_month numeric day of month of crime occurred
hour_of_day numeric hour of day of crime occurred
day_of_week character day of week of crime occurred
weekday character weeday/weekend classification
simple_crime_bin character subjective binning of crimes

Total Crime Aggregations

A natural first question to ask of the data is “when do the majority of crimes occur during the hours of the day?”. Let’s keep things fairly simple to start with. By aggregating all the data into an average crime rate for each hour of the day, we see an interesting peak at 12:00 PM for each year in the dataset.

overal_time_of_day-1

Well here’s some unexpected insight! If the hour between noon and 1 PM really is the busiest time for criminals than we should surely act on this information! Inform the police of this interesting finding; transfer crime-fighting resources from evening hours to the lunch time hours. But wait? Is this a real insight gleaned from the data, or is it simply an artifact of how the data was collected? Let’s dig deeper and check which types of crimes contributed to this peak.

Which Crimes Cause These Lunch Time Crime Peaks?

Let’s take a closer look and split by different types of crimes. To start, we’ll classify all 104 unique crime types into 13 simple_crime_bins (this was a highly subjective process!).

which_crimes-1

Obviously, this noon peak isn’t shared amongst all crimes. In fact, it appears that only fraud, other, sexual, and theft crime types have this lunch time peak. Could these types of crimes’ peaks be attributed to a lesser chance of the victims know when the crimes occurred?

Investigating these Crimes Further

Filtering to the FRAUD, OTHER, SEXUAL, and THEFT crimes to analyze and holding the y scales constant shows that THEFT as being the main culprit for noon peak in number of crimes.

unnamed-chunk-1-1

It’s interesting and suspicious that the sudden, narrow peak is very outside of the gradual rise in the THEFT line. This suggests that the crime peak might not be real. This means that 12:00 PM would likely be the default time of day when a time of crime occurrence is unknown, especially given that thefts usually occur when the victim isn’t present. But to make sure, let’s check geographically.

Is Theft Centered at only one Area?

theft_in_areas-1

Since this peak is fairly prevalent geographically, it seems more likely the peak is in fact due to how the data is collected and recorded and not from a true peak in criminal activity during lunch time.

Why is This Important?

In my experience, some of the biggest challenges in working with real datasets tend to deal with data that isn’t there, and data that can’t necessarily be trusted as is. In this case, the LA crimes dataset presents the second challenge, and this shouldn’t be taken lightly. Only using crime data to better understand when and how we should staff thin police resources probably wouldn’t be that smart. This also highlights the importance of intuition and domain expertise in any data analysis.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s