In this post, I will describe creating a case file using code in R. The goal is to create a sum of homicides by month, year (just 2013 for this example), and police beat/post. We won't worry about any other specifics (i.e. degree) or related types of crimes, i.e. shootings.
To ready yourself for data preparation, read Richard Block's tutorial or the more extensive SatScan manual.
I use crime data from Chicago's Open Data Portal. The same code can be applied to other types of data, health data, etc. A few key points: 1) the data contains victim-based data--which we want to convert into incidents. 2) not every post has a homicide, and 3) the reference post list contains 275 post. So, we will end up with a data set with 3300 rows (275 x 12 months) or simply a row for each post-month.
If you want to skip ahead and just look at the code, go to: http://goo.gl/pmOi1u.
At the top: What you start with. Bottom: After processing in R |
Overview of Steps: See the code for further details
Step #1: Two files are imported: 1) a victim-based file of all crimes, which is narrowed down to just homicides (you could also add in shootings) and 2) a 'reference' file or simply a list of the police beats/posts in Chicago.
Step #2: The data are summed up so that each row contains the total number of victims, then grouped again into incidents by using two different count variables.
Step #3: The list of police beats get column variables for each month in the year and expanded by reshaping data from wide to long. This serves as a 'reference list' for matching purposes.
Step #4: The two data sets are matched the 'unmatched' records are also kept. These are post-months that don't have a homicide, so each count value is replaced with a zero.
Step #5: To ensure the code has worked, I check the total number of rows (3300) and spot check various posts to make sure the data has been grouped in to incidents and posts correctly.
Whether in R or using for-fee software (i.e. SAS, STATA), preparing data for SaTScan is relatively straightforward but there are a number of steps.
Update #1 (2/18/15)
Scan statistics can also be implemented in R's Spatial Epi Package and rsatscan .
You will be pleased to know that you can now run SaTScan directly in R:
ReplyDeletehttp://cran.r-project.org/web/packages/rsatscan/vignettes/rsatscan.html
Thanks for that catch! I updated the post with a link to the Spatial Epi package in R.
ReplyDelete-Jon