Sunday, March 8, 2015

Spatial Analysis with QGIS - Part I: Point Data

QGIS 2.8 Wien was released, so it is a good time to review QGIS's basic spatial analysis capabilities for vector data--starting with point data. We will also take a look at a few plugins and the SAGA and R processing toolboxes. Most of the functionality in QGIS is from Ftools, formerly a plugin, now part of base QGIS. There is also the MMQGIS plugin to examine vector data.

In addition, I will make a few recommendations for added features, or point you to another free or open source program that can be used in conjunction with QGIS or simply by importing and exporting data.

Nearest Neighbor Index
QGIS can calculate the nearest neighbor index to assess point clustering.  No p-value is given but the simple trick is to remember that large negative z-scores mean the points are clustered while large positive z-scores mean the data is more dispersed.
No p-values are given but remembering critical values/decision points,i.e. +/-1.65, 1.96,
is the easiest way to know if clustering is statistically significant.
Mean Center and Standard Distance
The mean center, an average of x- and y- coordinates, is an easy way to find the central feature and to examine spatial-temporal trends.  In the case below, the mean of all starting points, by year, for US tornadoes, 2000-2013. The data are grouped by UID, in this case a year variable.  It would be great to also be able to calculate a median center.  Data source: NOAA Storm Prediction Center.
  • In some years, the average was pulled slightly west or east.  Interestingly, the mean is pulled east in 2011, when there was a large 'outbreak' of tornadoes across the southeastern US.

The mean of all 'starting' points for US tornadoes, by year, 2000-2013.
Moreover, there are several point pattern analysis tools, including the standard distance--a measure of dispersion--in the SAGA Processing Toolbox.  More specifically the "Geostatistics" tool, contains a lot of useful functions that can be used.  The output can be  saved and displayed in QGIS.  The NOAA dataset already contains the length from start to end, but you could also calculate this by creating a distance matrix in QGIS.

The SAGA Geostatistics Toolbox in QGIS
Ripley's K
Ripley's K helps to determine clustering at different distances.  It can be implemented through the R processing toolbox in QGIS, using R's SpatStat package, or CrimeStat.

You can download the Heatmap plugin or use a built-in live/dynamic heat map when you go to style a layer.  For the latter, make sure to move the rendering slider to 'best' for a nice looking heatmap. Here is an example using the dynamic heat map to look at homicides in Philadelphia. Data source: OpenDataPhilly.   In future posts, we will also look at alternatives to heatmaps, like gridding/quadrat analysis.

QGIS has lot of neat options for styling vector data, including a dynamic heatmap
that changes as you zoom in and out.
 (Note: In ArcGIS kernel density tool (not to be confused with point density) remains separate from the base software and has to be purchased through the Spatial Analyst Extension).

Grouping Analysis
Lastly, grouping analysis can be examined using PostGIS, which allows for a wide variety of spatial queries using SQL, or CrimeStat.

Near future...
We will look at spatial analysis of line and polygon data as well joining points for analysis.

GME and ArcGIS
When using ArcGIS, be sure to check out the free windows-based program Geospatial Modelling Environment, or GME formerly 'Hawth's Tools,"  GME has a long list of helpeful commands:

No comments:

Post a Comment