Book Review: The ESRI Guide to GIS Analysis Vol. 2: Spatial Measurements & Statistics

Title: The ESRI Guide to GIS Analysis Vol. 2: Spatial Measurements & Statistics
Author: Andy Mitchell
Publisher: ESRI Press
Year: 2005
Aimed at: GIS/Analysts/Map Designers – intermediate
Purchased from: www.wordery.com

ESRI GA V2

This textbook acts as companion text for GIS Tutorial 2: Spatial Analysis Workbook (for ArcGIS 10.3.x) where you can match up the chapters in each book. Although not a necessity, I would recommend using both texts in tandem to apply the theory and methods discussed with practical tutorials and walkthroughs using ArcGIS. This is the second book of the series and follows on from The ESRI Guide to GIS Analysis Volume 1: Geographic Patterns & Relationships.

The first chapter is, inevitably, an introduction to spatial measurements and statistics. You perform analysis to answer questions and to answer these questions you not only need data but you also need to understand the data. Are you using nominal, ordinal, interval or ratio values, or a combination of these? The type of value(s) will shape the analysis techniques and methods used to calculate the statistics. You will need to interpret the statistics, test their significance and question the results. These elements are briefly visited with the premise of getting more in-depth as the book progresses. The chapter ends with a section on ‘Understanding data distributions’ which is essentially a brief introduction to data exploratory techniques such as describing frequency distributions, spatial distributions, and the presence of outliers and how they can affect analysis.

Chapter 2 discusses measuring geographic distributions with the bulk of the chapter focused on finding the center (mean, meridian, central feature), and measuring compactness (standard distance), orientation and direction of distributions (spatial trends). These are discussed for points, lines, and areal features and also using weighted factors based on attributes. These are useful for adding statistical confidence to patterns derived from a map. Formulas and equations begin to surface and although not necessary to learn them off by heart, because the GIS does all the heavy lifting for you, it gives insight into what goes on under the hood, and knowing the underlying theory and formulas can often aid in troubleshooting and producing accurate analysis. The last section of this chapter is fundamental to the rest of the text, testing statistical significance. This allows you to measure a confidence level for your analysis using the null hypothesis, p-value, and z-score. This can be a difficult topic to comprehend and may require further reading.

The third chapter, a lengthy one, is based around using statistical analysis to identify patterns, to enhance and backup the visual analysis of the map with confidence or to find patterns not may not have been immediately obvious. The human eye will often see patterns that do not really exist, so alternatively, statistical analysis might indicate what you thought was a strong pattern was actually quite weak. The statistical analysis methods are beginning to heat up and here we are introduced to; the Kolmorogov-Smirnov test and Chi Square test for quadrat analysis in identifying patterns in areas of equal size; the nearest neighbour index for calculating the average distance between features and identifying clustering or dispersion; and the K-function as an alternative to the nearest neighbour index, each used to measure the pattern of feature locations. These are followed by measuring the spatial pattern of feature values using; the join count statistic for areas with categories; Geary’s c and Moran’s I for measuring the similarity of nearby features, and the General-G statistic for measuring the concentration of high and low values for features having continuous values. The formulas for each are presented along with testing the significance of and interpreting the results. The final section of this chapter discusses defining spatial neighbourhoods and weights when analysing patterns. There are a few things to consider such as local or regional influences, thresholds of influence, interaction between adjacent features, and the rate of regional decline of influence.

Chapter 4 is titled ‘Identifying Clusters’ with a main focus on hotspot analysis. First, we are introduced to nearest neighbour hierarchical clustering which is heavily used in crime analysis. While Chapter 3 discussed global methods for identifying patterns and returns a single statistic, this chapter focuses on local statistics to show where these patterns exist within the global setting. Geary’s c and Moran’s I both have local versions and their definition, implementation, and factors influencing the results are discussed and critiqued along with Art Getis’ and Keith Ord’s Gi* method for identifying hot and cold spots.While the methods in Chapter 3 enforced that there are patterns in the data (or not), the methods in Chapter 4 highlight where these clustered patterns are. The last section of Chapter 4 discusses using statistics with geographic data; how the very nature of geographic data affects your analysis, how geographic data is represented in a GIS affects your data analysis, the influence of the study area boundary, and GIS data and errors.

“To the extent you’re confident in the quality of your GIS data, you can be confident in the quality of your analysis results.”

The last chapter ventures away from identifying patterns and clusters and focuses on analysing geographic relationships and using statistics to analyse such. Geographic relationships and processes are used to predict where something is likely to occur and examining why things occur where they do. Chapter 5 looks at statistical methods for identifying geographical relationships with a Pearson’s correlation coefficient and Spearman’s correlation coefficient discussed and assessed. Linear regression (ordinary least squares), and geographically weighted regression are presented as methods for analysing geographic processes. These methods warrant a full text in their own right and there is a list of further reading available at the end of the chapter.

Overall Verdict: I feel that I will be referring back to this text a lot. Having recently completed a MSc in Geocomputation I wish that this had crossed my path during the course of my studies and I would highly recommend this book to anyone venturing into spatial analysis where statistics can aid and back up the analysis. Although they are littered throughout the chapters, you really do not need to get bogged down with the formulas behind the statistical analysis techniques, the most important points is that you understand what the methods are performing, their limitations, and how to assess the results and this book really is a fantastic reference for doing just that. Knowing the theory is a huge step to being able to apply the analysis techniques confidently and derive accurate reporting of your data.

Book Review: The ESRI Guide to GIS Analysis Vol. 1: Geographic Patterns & Relationships

Title: The ESRI Guide to GIS Analysis Vol. 1: Geographic Patterns & Relationships
Author: Andy Mitchell
Publisher: ESRI Press
Year: 1999
Aimed at: GIS/Analysts/Map Designers – beginner
Purchased from: www.wordery.com

GIS Analysis Vol 1

This textbook is a companion text for GIS Tutorial 2: Spatial Analysis Workbook (for ArcGIS 10.3.x) where you can match up the chapters in each book. Although not a necessity, I would recommend using both texts in tandem to apply the theory and methods discussed with practical tutorials and walkthroughs using ArcGIS.

The title of this book might lead you to believe that ArcGIS will feature heavily throughout the text but Michael F. Goodchild sets this straight in the Preface by stating that he applauds ESRI for backing this book even though it isn’t Arc eccentric. The author, Andy Mitchell, presents the material as generic GIS such that most GIS software packages should be able to utilise the techniques discussed.

Chapter 1 is a short introduction to what GIS analysis is, understanding the representation of geographic features in a GIS, and the common attributes associated with geographic features that allow for analysis. The wording is simplistic in nature and easy to follow, and acts as a good entrance to the rest of the book.

The second chapter begins to delve into the realm of visual analysis, using your brain to to discern patterns for a better understanding of the data and the area that you are mapping. Several real-life mapped examples are displayed to show how ‘mapping where things are’ aids in more focused decision making. The chapter steps through; deciding what to map, preparing your data, and making your map, with comparison figures to show you why you might perform such tasks.

Why map the most and least? Because mapping features based on quantities adds an additional level of information beyond simply mapping the locations of the features and this notion is made clear from providing some real-life examples in Chapter 3. The author then takes us down a path to understanding quantities and the importance of knowing the type of quantities that you are mapping, and this naturally leads onto the next topic of classification, why use classes? and choosing an appropriate classification method/scheme for the purpose of your data. It is important to understand how classification methods such as Natural Breaks (Jenk’s), Quantile, Equal Interval, and Standard Deviation classify your data and having a general guideline on choosing the appropriate method.

A great recurring aspect in this book is that every chapter begins with a question and Chapter 4’s is ‘Why Map Density?’ and then proceeds to answer the question and the methods available for mapping in a GIS. This chapter discusses density for defined areas, dot density mapping, and density surfaces, what the GIS does to create them and the results of the output.

The fifth chapter takes a look at mapping what’s inside an area, discusses why you would want to map inside an area?, and some analysis and results that can be derived from such. Do you need to map a single area to find what’s happening inside or multiple areas to analyse what’s happening inside each for comparison purposes? Methods are explained along with how the GIS performs these for analysis. You might want to find out if a certain feature is within an area, a list of all features inside an area and a count of each, or the sum of a designated land type area within a boundary for examples. Summaries and statistics can also be generated from what is found inside an area boundary.

Having assessed some simple techniques for mapping what’s inside an area, the next chapter casts it’s attention towards finding what’s nearby. People often think of nearness in straight lines or along transport networks, but GIS is also useful for travel cost analysis giving weight to different land use or soil types for example when considering the path for a pipeline. Nearness by straight-line distance, distance/cost over a network, and cost over a geographic surface are discussed in detail. At this point we are venturing into understanding some of the concepts behind Network Analysis.

The last chapter looks at mapping change with regards to change over time for time pattern analysis. Three ways of mapping change are presented; creating a time series, creating a tracking map, and measuring change, along with the considerations required when creating each type for change in discrete features, events, summarized areas, and continuous categories and values.

Following the last chapter there are some recommendations for some further reading.

Overall Verdict: The perfect companion for a GIS student embarking on their geospatial educational quest. The theory behind GIS is essential for accurate analysis and troubleshooting. This book is an easy read with a plethora of figures and maps utilised in real-life situations found in each chapter to aid in the experience. Although getting closer to being two decades old this text stands the test of time and acts as a solid base for a foundation in simple analysis using a GIS to find patterns and relationships.

The only shortcoming of a text of this nature is that you cannot see how methods and techniques discussed are performed in a GIS. This is where the companion text GIS Tutorial 2: Spatial Analysis Workbook (for ArcGIS 10.3.x) comes in and aids in providing walkthroughs to further enhance your understanding of the underlying theory.

Next: see The ESRI Guide to GIS Analysis Volume 2: Spatial Measurements & Statistics

The Web Mercator Visual and Data Analysis Fallacy

How many of you have looked at a web map with a Google Maps or OpenStreetMap basemap, you know the one where Greenland looks like it’s the size of South America? Recently, I saw one of these maps with buffer zones spread across the United States. Each buffer was the same size indicating that each buffer zone represented a similar sized area of the Earth’s surface, as you’d expect, a 1000km radius buffer zone is a 1000km radius buffer zone! However, if Greenland is looking a similar size to South America, then more than likely the map is displayed using a Web Mercator projection (EPSG: 3857 or 900913) and the further you move away from the equator the more inaccurate and false those same sized 1000km buffer zones become.

Web Mercator

Click to enlarge. Web Mercator map with 1000km buffer zone around selected cities.

Ok, let’s take a slight step back here for a moment and look at what a projection is. A projection is the mathematical transformation of the Earth to a flat surface. The surface of the Earth is curved, maps are flat so a projected coordinate system begins with projecting an ellipsoidal model of the earth onto a flat plane. Now that we have a flat map we can define locations using Cartesian coordinates with x-axis and y-axis values.

Projection, however, causes distortions in the resulting planar map. These distortions fall into four categories; shape, area, direction, and distance.

Projections that minimize distortions in…
…shape are called conformal projections.
…area are called equal-area projections.
…direction are called true-direction projections.
…distance are called equidistant projections.

The choice of projected coordinate system you choose really boils down to two aspects. The projection should minimalise distortions for your area of interest, but more importantly, if your map requires that a particular spatial property (shape, area, direction, or distance) to be held true, then the projection you choose must preserve that property. It is possible to retain at least one of these properties but not all.

I recently read a book titled “Designing Better Maps” by Cynthia A. Brewer (you would’t know from the maps in this post though) and the following line stood out to me…

“If you see a map of the United States that looks like a rectangular slab, with a straight-line US-Canada border across the west, be suspicious of the mapmaker’s knowledge of map projection and of interpretations of the mapped data.”

This got me thinking about all those maps I see of the United States on a Web Mercator that thematically map data of census tracts or counties of states, or as previously mentioned show buffer zones/distances for visual analysis and/or data analysis purposes. A Mercator is a conformal projection and as such preserves angles (shape as seen by the circles in the figure below) but distorts size and area as you move away from the equator. If focussing on a geographic region as large as the U.S. surely Web Mercator should be avoided at all costs unless the map’s sole purpose is for navigation? A conformal projection should be used for large scale mapping (1:100 000 and larger) centred on the area of interest because at large scales (when using a conformal projection) there are insignificant errors in area and distance.

Tissot's Indicatrix WM

Tissot’s Indicatrix used to display distortions on a Web Mercator

The figure above uses something called the Tissot Indicatrix. Here we have a Web Mercator map, the circles at the equator cover a similar area on the globe as those further north and south of the equator. Hold on, what? Surely those bigger circles towards the poles cover a much larger area on the Earth than those smaller ones at the equator! This is false, but why is this? It is because a Web Mercator is a cylindrical projection system and we will get to this momentarily.

To fit the contiguous United Stated on to an A0 poster you need a scale of around 1:6500000, and 1:27500000 on an A4 page, far from large scale mapping, yet we persist to use the Web Mercator for visualising data for the U.S. on small screens.

UPDATE: the Web Mercator is NON-conformal, please read Roel Nicolai’s comment below and also visit GeoGarage for more information. This post is to make you aware that using the correct projection is paramount for data analysis.

More on Conformal Projections

Conformal projections preserve local shape (and angles) i.e. shape for small areas. Take note that no map projection can preserve shapes for large regions and as such, conformal projections are usually employed for large-scale mapping applications (1:100000 and larger) and rarely used for continental or world maps. Local angles on the sphere are mapped to the same angles in the projection, therefore graticule lines intersect at 90-degree angles. Point to remember: conformity is strictly a local property.

Use a conformal projection when the main purpose of the (large-scale) map involves:
• measuring angles
• measuring local directions accurately
• representing the shapes of features
• representing contour lines

Cylindrical Projection: The Cause for Distortion in a Web Mercator

Cylindrical Projection

A cylindrical projection (above) is like projecting the earth’s surface on the inside of the tubing and then rolling out the tube to be left with a flat rectangle. In a cylindrical projection world maps are always rectangular in shape. Scale is constant along each parallel (longitude) and meridians (latitude) are equally spaced. The rectangular nature results in all parallels having the same length and all meridians having the same length. But since the real Earth curves in toward the polls, in order to get those straight lines, you have to stretch and distort the surface more and more as you get closer to the north and south poles. In fact, is impossible to see the poles because as you approach them, the distance between latitude lines stretches out toward infinity.

Ruining Life for Web Mercator Buffers

Let’s take a look at an example comparing data on a Web Mercator to a better suited projection for the contiguous U.S.

The figure below shows a selection of locations along the east coast of the United States in a Web Mercator projection. A buffer with a radius of 200km has been generated in the Web Mercator projection and applied to each point. We know from the Tissot Indicatrix that circles become enlarged as we move away from the equator but yet the distance of the buffers remains constant as we move from south to north.

Web Mercator Buffers

If we convert the entire map to an equidistant projection such as the USA Contiguous Equidistant Conic projection (EPSG: 102005) we will see that the buffer zones will alter and will enlarge as we move from north to south.

Web Mercator Buffers Reprojected

So this tells us that the 200km buffer generated in the Web Mercator projection around Bar Harbor (the most northerly location on the map) covers far less an area than the same buffer zone generated for Miami Beach (the most southerly location). This makes sense because of the stretched distortion of the land as we move north from the equator caused by the Web Mercator projection. The buffer zone generated in the Web Mercator projection has not allowed for these distortions.

Now let’s generate the 200km buffer zones in the USA Contiguous Equidistant Conic projection, a projection that attempts to preserve distance.

Equidistant Buffers

Similar to the buffer zones created in the Web Mercator each circular zone is the same diameter of 400km. We know that this projection (EPSG: 102005) is designed to preserve distance, so what do you think will happen when we reproject these buffer zones to Web Mercator? Think back to the Tissot Indicatrix figure. That’s right! As we move away from the equator these buffer zones are going to become enlarged as shown in the figure below.

Equidistant Buffers Reprojected

The Equidistant Conic buffer zones in the Web Mercator map above more accurately define a 200km buffer zone around each location than those generated using the Web Mercator projection.

More on Equidistance Projections

Equidistant map projections make the distance from the centre of the projection to any other place on the map uniform in all directions. Take note that no map provides true-to-scale distances for any measurement you might make.

Use an equidistant projection when the main purpose of the map involves similar to; showing distances from the epicentre of an earthquake or other point of location, or mapping the flight routes from one city airport to all destination cities.

How Data Analysis Can Go Wrong

I won’t perform any in-depth analysis but will highlight how performing spatial data analysis using the Web Mercator projection can yield inaccurate results. It is good practice to convert all your data to a common projection when performing geoprocessing and spatial analysis tasks.

Census Tract Counts

The figure above is a count of the census tracts that intersect the 200km buffer zones of each of the two projections, Web Mercator and USA Contiguous Equidistant Conic. It is easy to see that if you are going to be analysing demographic data based on location around a certain point that the two projections will yield contrasting results. In fact, major contrasting results for most locations. Big decisions are often reliant on spatial analysis. Analysing your data in a non-suited projection system can steer these decisions completely off course, future plans may be scrapped based on the Mercator results, and this decision may have been made in error as the Equidistant Conic results could have shown that the project should have proceeded.

Similarly, if you need to preserve the area of features, such as land parcels for analysis and visual display you might consider an equal-area projection like the USA Contiguous Albers Equal Area Conic projection. Equal-area projections are also essential for dot density mapping, and other density mapping such as population density. Equal-area maps can be used to compare land-masses of the world and finally put to bed that Greenland is a lot smaller than South America.

According to Kenneth Field (a.k.a. the Cartonerd)…

“If you’re going to be comparing areas either for city comparison or for thematics you really do need an equal area projection unless all of your cities sit on the same degree of latitude. If not, you’re literally pulling the wool over the eyes of your map readers and they leave with a totally distorted impression of the themes mapped.”

Check out vis4.net for an example of the Albers Equal Area Conic projection. If Area is important to the underlying data being visualised for the United States, then this is one of the projections you should be using to display your data.

Conclusion

“Projections in a web browser are terrible and you should be ashamed of yourself.” – Calvin Metcalf

If you are using a web portal to perform data analysis through spatial analysis or visual analysis techniques, even if the final visualisation is in Web Mercator, at the very least, make sure that the underlying algorithms churning away in the background producing your output are using the appropriate projection to achieve better accuracy. If you are paying a vendor for their services make sure that their applications are providing you with accurate data analysis for better decision making. You will often here a saying that ‘GIS analysis is only as good as the data used for the analysis’, and while this strongly holds true, the best of data can produce misleading results because of a poor projection choice.

With the ability to produce your own map tiles and JavaScript libraries such as D3.js to overlay vector data in the correct map projection, OpenLayers can also handle projections and there is a Proj4 plugin for Leaflet, and also CartoDB, there are little excuses to allow the dictatorship of the Web Mercator to continue.

But Web Mercator isn’t all that bad. Projections are not important when people are only interested in the relative location of features on a map. So if you are simply dropping location markers on a map without the need for analysing the data, go ahead, use the Web Mercator. But if analysis of data is being performed it is a sin to use the Web Mercator.

P.S. I am still a Mercator sinner when it comes to display. I’m working on my penance.

Sources & Data

ESRI – Tissot Indicatrix Data
ESRI – Distances and Web Mercator
Tiger Geodatabases
Natural Earth Data
Cartonerd
Geo-Hunter
GISC – Slippy Maps
Geography 7
vis4.net – no more mercator
Map Time Boston – Mapping with D3
Calvin Metcalf – FOSS4G
CartoDB – Free Your Maps from Web Mercator

[An Introduction to] Hotspot Analysis Using ArcGIS

Make sure to read the What is Hotspot Analysis? post before proceeding with this tutorial. This tutorial will serve as an introduction to hotspot analysis with ArcGIS Desktop. You will find links at the bottom of the post that will provide information for further research.

Get the Data

It is often difficult to find real data for use with tutorials so first of all a hat tip to Eric Pimpler, the author of ArcGIS Blueprints, for pointing me towards accessing crime data for Seattle. To follow this tutorial you will need the neighborhoods of Seattle Shapefile which you can download from here and burglary data for 2015 which I have provided a link to here. Use the Project tool from Data Management Tools > Projections and Transformations to project the data into a Projected Coordinate System. For this tutorial I have used UTM Zone 10N. Open, view and if you want style the data in ArcMap.

HSA Vector Data

Spatial Autocorrelation: Is there clustering?

The presence of spatial clustering in the data is a requisite for hotspot analysis. Moran’s I is a measure of spatial autocorrelation that returns a value ranging from -1 to 1. Perfect dispersion at -1, complete random arrangement at 0, and a north/south divide at +1 indicating perfect correlation.

Moran's I Visual

For statistical hypothesis testing, Moran’s I value can be transformed to a z-zcore in which values greater than 1.96 or smaller than -1.96 indicate spatial autocorrelation that is significant at the 5% level.

We first need to prepare the data. At the moment each point represent one incident, we need to aggregate the data in some way so that each feature has an attribute with a value in a range. Open the Copy Features tool from Data Management Tools > Features. Create a copy of the burglary point layer. Run the tool and add the new layer to the map.

Copy Features Tool

Open the Integrate tool from Data Managemant Tools > Feature Class. Select the copy of the burglary layer as the Input Features and set an XY Tolerance of 90 or 100 meters. Run the tool. This will relocate points within 90m (or 100m) or whatever you set in XY Tolerance field, of each other and stack them on top of one another.

Integrate Tool

At this moment each point sits on top of another. We need to merge coincident points and make a count of how many were merged at each point. Open the Collect Events tool from Spatial Statistics Tools > Utilities. Set the copy of the burglary layer as the Input Incident Features and set a filepath and name for the Output Weighted Point Feature Class. Run the tool.

Collect Events Tool

The data will be added to the map with graduated symbols, however, we are interested in running further analysis using Moran’s I. If you open the attribute table for the layer you will see a field has been added called ICOUNT. This field holds the count of cooincident points from the Intergrate layer. Open the Spatial Autocorrelation (Moran’s I) from Spatial Statistics Tools > Analyzing Patterns. Set the aggregated burglary layer as the Input Feature Class and ICOUNT as the Input Field. I have left the default setting for the other parameters (see below).

Spatial Autocorrelation Tool

Run the tool by clicking on OK. A summary will display with statistical findings.

Moran's I Values

We return a value close to 0.2 and a high z-score. This indicates that clustering exists within the data for high positive values. We are now confident that clustering exists within the dataset and can continue with performing the hotspot analysis.

Optimized Hotspot Analysis

Remove all layers from map the except the two original layers with the burglary data and the neighborhoods. From the Toolbox navigate to Spatial Statistics Tools > Mapping Clusters and open the Optimized Hotspot Analysis tool. This tool allows for quick hotspot analysis using minimal input parameters and sets/calculates default parameters for those you have no control over. For more control over the statistical elements you can use the Hotspot Analysis (Getis-Ord GI*) tool. For now we will use the optimized approach.

Set the burglary points as the Input Features, name your Output Features (here I have named them ohsa_burg_plygns), select COUNT_INCIDENTS_WITHIN_AGGREGATION_POLYGONS for the Incident Data Aggregation Method and choose the neighborhoods features for the Polygons For Aggregating Incidents Into Counts.

Optimized HSA - Polygons

OHSA: Aggregating Point Data to Polygon Features

Click OK to run the tool. The ohsa_burg_plygns layer will automatically be added as a layer to the map, if not, add it and turn off all other layers. So what has happened here? The tool has aggregated the point data into the neighborhood polygons. If you open the attribute table for the newly created layer you will see a field names Count_Join which is a count of burglaries per neighborhood. A z-score and a p-score is calculated which enables the detection of hot and cold spots in the data. Remember, a high z-score and a low p-value for a feature indicates a significant hotspot. A low negative z-score and a small p-value indicates a significant cold spot. The higher (or lower) the z-score, the more intense the clustering. A z-score near 0 means no spatial clustering.

HSA Attribute Table

The Gi_Bin field classifies the data into a range from -3 (Cold Spot – 99% Confidence) to 3 (Hot Spot – 99% Confidence), with 0 being non-significant, just take a look at your Table of Contents.

Optimized - Confidence Levels

The map should look similar to below. There are several neighborhoods that are statistically significant hotspots. It is important to note that you may need to factor in other data or normalise your data to refine results. Some of the neighborhoods might be densely populated with suburban housing while in others housing may be sparse and bordering towards rural. This may affect findings and you may need to create ratios before analysing. We won’t delve into this here as this tutorial is introductory level (and because I don’t have the data to do so).

OHSA Polygon Map

OHSA: Aggregating Point Data to Fishnet Features

Close any attribute tables and turn off all layers in your map. Re-open the Optimized Hotspot Analysis tool and set the input as seen below. This time we will create a fishnet/grid to aggregate the point data to.

Optimized - Fishnet

Click OK to run the tool. The tool removes any locational outliers, calculates a cell size, and aggregates the point data to the cells in the grid. Similar to aggregating to polygons the fishnet attribute table will have a join count, z-score, p-score and bin value with the same confidence levels.

OHSA Fishnet MapShould attention be entirely focused on the red areas? Copy the fishnet layer and paste it into the data frame. Rename the copy as fishnet_count. Open the properties and navigate to the Symbology tab. Change the Value field to Join_Count, reduce the Classes to 5 and set the classification to Equal Count. Click OK.

Fishnet SymbologyThere will be one red cell and one light red cell in the northern half of the map. Use the zoom tool to zoom-in closer to both features. Turn on the labels for the feature for the Join_Count attribute. Notice that the light-red cell has a count of 19 but in the Hotspot Analysis this was a non-significant area. With the second highest burglary count for a 300m x 300m area surely this area requires some attention. Perhaps all areas outside of significant hotspots with values greater that 15 are a priority? I am not a expert in crime analysis so I’ll leave it up to those sleuth’s.

OHSA Fishnet Labels

This just serves to note to make sure that you use all the analysis techniques at your disposal from simple to more advanced, from visual and labels to statistical.

OHSA: Create Weighted Points by Snapping Nearby Incidents

Zoom out to the full extent of the neighborhoods layer and turn off all layers in the map. Re-open the Optimized Hotspot Analysis tool and set the input as seen below. Notice this time we will also create a Density Surface.

Optimized - Points

Click OK and run the tool. The tool calculates a distance value and converges points that fall within that distance in relation to each other. It then runs the hotspot analysis similar to the previous two examples producing an attribute table with an ICOUNT field, z-score, p-score and bin value for confidence level. The ICOUNT field denotes how many incidents the one point references.

OHSA Points Map

Let’s clip the density raster to the neighborhoods layer. Open the Clip tool from Data Management Tools > Raster > Raster Processing. Set the Input Raster as the density raster, use the neighborhoods layer as the Output Extent, make sure Use Input Features for Clipping Geometry is checked, set and name the Output Raster Dataset.

Density Raster Clip

Click OK and run the tool. Add the newly created raster to the map if it hasn’t automatically been added. Make it the only visible layer. Open the properties for the layer and go to the Symbology tab. Select Classified and generate a histogram if asked to. Change the Classes to 7 and the colour ramp to match previous colour schemes. You might need to flip the colour  ramp to achieve this.

Density Clip Symbology

Open the Display tab and select Bilinear Interpolation from Resample during display dropdown menu. This will smoothen the contour look of the raster. Click OK to view the density surface. Turn on the neighborhoods and make the fill transparent with a black outline.

Density Raster

Alternatives

The Optimized Hotspot Analysis tool is a great place to start but it limits the analysis to default parameters set by the tool or calculated by the tool. For more advanced user control you can use the Hotspot Analysis (Getis-Ord Gi*) tool. You will need to use other tools such as Spatial Join to aggregate your data to polygons and create a Join_Count field, or the Create Fishnet tool to define a grid and then use Spatial Join. Remember to delete any grid cells that have a value of zero prior to running the hotspot analysis.

Getis-Ord Tool

See the resources below for more information on using Getis-Ord Gi* and what the parameters do especially in relation to the Conceptualization of Spatial Relationships parameter.

Hotspot Analysis with ArcGIS Resources

ArcGIS Optimized Hotspot Analysis
ArcGIS Mapping Cluster Toolset: Hot Spot Analysis

ArcGIS How Hot Spot Analysis Works
ArcGIS – Selecting a Conceptualization of Spatial Relationships: Best Practices

Crime Data for Seattle

Crime data was accessed using the ArcGIS REST API and the Socrata Open Data API from the https://data.seattle.gov website. I highly recommend getting your hands on Eric Pimplers ArcGIS Blueprints eBook for a look at exciting workflows with ArcPy and the ArcGIS REST API.

What is Hotspot Analysis?

Hotspot Analysis uses vectors to identify locations of statistically significant hot spots and cold spots in your data by aggregating points of occurrence into polygons or converging points that are in proximity to one another based on a calculated distance. The analysis groups features when similar high (hot) or low (cold) values are found in a cluster. The polygons usually represent administration boundaries or a custom grid structure.

HSA Polygons

Before performing hotspot analysis you need to test for the presence of clustering in the data with some prior analysis technique involving spatial autocorrelation which will identify if any clustering occurs within the entire dataset. Two available methods are Moran’s I (Global) and Getis-Ord General G (Global). Hotspot analysis requires the presence of clustering within the data. The two methods mentioned will return values, including a z-score, and when analysed together will indicate if clustering is found in the data or not. Data will need to be aggregated to polygons or point of incident convergence before performing the spatial autocorrelation analysis, see [An Introduction to] Hotspot Analysis using ArcGIS for an example using Moran’s I.

Hotspot Analysis is also known as Getis-Ord Gi* (G-I-star) which works by looking at each feature in the dataset within the context of neighbouring features in the same dataset. There may be a feature with a high value but it may not be a statistically significant hotspot. In order to be a significant hotspot a feature with a high value will be surrounded by other features with high values. Here’s some statistical talk from GISC

“The local sum for a feature and its neighbors is compared proportionally to the sum of all features; when the local sum is very different from the expected local sum, and that difference is too large to be the result of random choice, a statistically significant z-score results.”

A z-score and a p-value are returned for each feature in the dataset. What is a z-score and p-value? click here.

HSA Attribute Table

A high z-score and a low p-value for a feature indicates a significant hotspot. A low negative z-score and a small p-value indicates a significant cold spot. The higher (or lower) the z-score, the more intense the clustering. A z-score near 0 means no spatial clustering.

HSA Z & P Scores

The output of the analysis tells you where features of either high or low values cluster spatially. Scale is important, you might notice regional differences between the admin boundaries and the grid (below) and the method you choose will depend on your data and scale of analysis. According to Esri the minimum amount of features being analysed should be 30 as results are unreliable sub-30. When using the grid approach you should remove grid values of zero before performing the hotspot analysis.

HSA Polygons

When performing hotspot analysis make sure that your data is projected. Actually, when performing any statistical analysis processes that require distances it is important that your data is in a projected coordinate system that uses a unit of measurement. A popular projection is the UTM Zone that your data falls into which uses metres (m) as the unit of measurement.

Hot spot analysis is being utilized to help police identify areas with high crime rates, the types of crime being committed, and the best way to respond to these crimes. I like this quote from the Mapping Crime: Understanding Hotspots report issued by The National Institute of Justice (U.S)

“a hot spot is an area that has a greater than average number of criminal or disorder events, or an area where people have a higher than average risk of victimization.”

An area can be considered a hotspot if a higher than average occurrence of the the event being analysed is found in a cluster. And cooler to cold spots with less than average occurrences. The higher above the average with similar surrounding areas the ‘hotter’ the hotspot.

Examples of other areas where hot spot analysis is being used is epidemiology; where is the disease outbreak concentrated?, retail analysis, demographics voting pattern analysis, and controlling invasive species.

Note: Hotspot Analysis differs from a Heat Map. A Heat Map uses a raster where point data are interpolated to a surface showing the density or intensity value of occurrence. A colour gradient is applied where cells coloured by the lower end of the gradient represent low density and the higher end representing higher density. The colour gradient usually flows from cool to warm colours such as blues to yellow, orange and red. See Creating a Density Heat Map with Leaflet.

Tutorials

Hotspot Analysis using ArcGIS

Sources

Children’s Environmental Health Initiative
U.S. National Institute of Justice
Leaflet Essentials
GIS Lounge
National Criminal Justice Reference System
ArcGIS – How Hot Spot Analysis Works
GISC