OSGP#3.1: Measuring Geographic Distributions – Median Center

(Open Source Geospatial Python)

The ‘What is it?’

Also known as the Center of Minimum Distance, the Median Center is a location that is the shortest total distance to all features in the study area (not to be confused with the Central Feature, which is the feature that is the shortest distance to all others). It can be used to find a suitable location for something that needs to be centrally located. The Median Center will gravitate towards an area with the most features.

The Median Center is good for finding the most accessible location.

The Formula!

The is no single formula or equation for calculating an exact Median Center, according to Andy Mitchell it is an iterative process involving calculating the Mean Center, summing the distances from it to each feature, offsetting the center slightly and summing the distances again until it eventually zones in on the optimum location that has the lowest sum.

The code below implements the Yehuda Vardi and Cun-Hui Zhang algorithm or the Weiszfeld algorithm.

The Code…

import math, sys
import numpy as np
from osgeo import ogr
from scipy.spatial.distance import cdist

## "W" for Weiszfield
## "YC" for Yehuda Vardi and Cun-Hui Zhang
algorithm = "YC"

## Weiszfield
## https://gist.github.com/endolith/2837160
def numersum(test_median,dataPoint):
    ## Provides the denominator of the weiszfeld algorithm depending on whether
    ## you are adjusting the candidate x or y
    return 1/math.sqrt((test_median[0]-dataPoint[0])**2 + (test_median[1]-dataPoint[1])**2)

def denomsum(test_median, xy_arr):
    ## Provides the denominator of the weiszfeld algorithm
    temp = 0.0
    for i in range(0,len(xy_arr)):
        temp += 1/math.sqrt((test_median[0] - xy_arr[i][0])**2 + (test_median[1] - xy_arr[i][1])**2)
    return temp

## Yehuda Vardi and Cun-Hui Zhang
## http://stackoverflow.com/questions/30299267/geometric-median-of-multidimensional-points
## user: orlp
def geometric_median(X, eps=1e-5):
    y = np.mean(X, 0)

    while True:
        D = cdist(X, [y])
        nonzeros = (D != 0)[:, 0]
        Dinv = 1 / D[nonzeros]
        Dinvs = np.sum(Dinv)
        W = Dinv / Dinvs
        T = np.sum(W * X[nonzeros], 0)
        num_zeros = len(X) - np.sum(nonzeros)
        if num_zeros == 0:
            y1 = T
        elif num_zeros == len(X):
            return y
        else:
            R = (T - y) * Dinvs
            r = np.linalg.norm(R)
            rinv = 0 if r == 0 else num_zeros/r
            y1 = max(0, 1-rinv)*T + min(1, rinv)*y
        if np.linalg.norm(y - y1) < eps:
            return y1
        y = y1

## set the driver for the data
driver = ogr.GetDriverByName("FileGDB")

## path to the FileGDB
gdb = r"C:\Users\Glen B\Documents\my_geodata.gdb"

## ope the GDB in write mode (1)
ds = driver.Open(gdb, 1)

## input feature class
input_lyr_name = "Birmingham_Secondary_Schools"

## name of output feature class
output_fc = "{0}_median_center".format(input_lyr_name)

## reference the layer using the layers name
if input_lyr_name in [ds.GetLayerByIndex(lyr_name).GetName() for lyr_name in range(ds.GetLayerCount())]:
    lyr = ds.GetLayerByName(input_lyr_name)
    print "{0} found in {1}".format(input_lyr_name, gdb)

## if the output layer already exists then delete it
if output_fc in [ds.GetLayerByIndex(lyr_name).GetName() for lyr_name in range(ds.GetLayerCount())]:
    ds.DeleteLayer(output_fc)
    print "Deleting: {0}".format(output_fc)

## create an array with coordinates of each feature
try:
    first_feat = lyr.GetFeature(1)
    ## centroid for points and polygons
    if first_feat.geometry().GetGeometryName() in ["POINT", "MULTIPOINT", "POLYGON", "MULTIPOLYGON"]:
        xy_arr = np.ndarray((len(lyr), 2), dtype=np.float)
        for i, pt in enumerate(lyr):
            ft_geom = pt.geometry()
            xy_arr[i] = (ft_geom.Centroid().GetX(), ft_geom.Centroid().GetY())

    ## for lines we get the midpoint of a line
    elif first_feat.geometry().GetGeometryName() in ["LINESTRING", "MULTILINESTRING"]:
        xy_arr = np.ndarray((len(lyr), 2), dtype=np.float)
        for i, ln in enumerate(lyr):
            line_geom = ln.geometry().ExportToWkt()
            shapely_line = MultiLineString(wkt.loads(line_geom))
            midpoint = shapely_line.interpolate(shapely_line.length/2)
            xy_arr[i] = (midpoint.x, midpoint.y)

except Exception:
    print "Unknown geometry for {}".format(input_lyr_name)
    sys.exit()

## if using Weiszfield
if algorithm == "W":
    ## https://gist.github.com/endolith/2837160
    avg_x, avg_y = np.mean(xy_arr, axis=0)
    test_median = [avg_x, avg_y]
    numIter = 50

## minimise the objective function
for x in range(0,numIter):
    denom = denomsum(test_median,xy_arr)
    nextx = 0.0
    nexty = 0.0

    for y in range(0,len(xy_arr)):
        nextx += (xy_arr[y][0] * numersum(test_median,xy_arr[y]))/denom
        nexty += (xy_arr[y][1] * numersum(test_median,xy_arr[y]))/denom

    test_median = [nextx,nexty]

## if using Yehuda Vardi and Cun-Hui Zhang
elif algorithm == "YC":
    test_median = geometric_median(xy_arr)

print "Median Center: {0}, {1}".format(test_median[0], test_median[1])

## create a new point layer with the same spatial ref as lyr
out_lyr = ds.CreateLayer(output_fc, lyr.GetSpatialRef(), ogr.wkbPoint)

## define and create new fields
x_fld = ogr.FieldDefn("X", ogr.OFTReal)
y_fld = ogr.FieldDefn("Y", ogr.OFTReal)
out_lyr.CreateField(x_fld)
out_lyr.CreateField(y_fld)

## create a new point for the mean center
pnt = ogr.Geometry(ogr.wkbPoint)
pnt.AddPoint(test_median[0], test_median[1])

## add the mean center to the new layer
feat_dfn = out_lyr.GetLayerDefn()
feat = ogr.Feature(feat_dfn)
feat.SetGeometry(pnt)
feat.SetField("X", test_median[0])
feat.SetField("Y", test_median[1])
out_lyr.CreateFeature(feat)

print "Created {0}".format(output_fc)

## free up resources
del ds, lyr, first_feat, feat, out_lyr

I’d like to give credit to…
Logan Byers from GIS StackExchange who aided in speeding up the computational time using NumPy and for forcing me to begin learning the wonders of NumPy.
orlp from Stack Overflow for their implementation of Yehuda Vardi and Cun-Hui Zhang’s algorithm for the geometric median.
Daniel J Lewis (I think) for the implementation of the Weiszfeld algorithm.

The Example:

I downloaded vector data that contains polygons for schools (and other features) from OS Open Map – Local that covered the West Midlands. I also downloaded OS Boundary Line data. Using Python and GDAL/OGR I extracted secondary schools from the data for Birmingham only. Everything was now in place to find the Median Center of all Secondary Schools for Birmingham. (see The Other Scripts section at the bottom of this post for processing the data)

birmingham_secondary_schools

Running the script from The Code section above calculates the coordinates of the Median Center for Secondary Schools in Birmingham and creates a point feature class in the File GDB.

birmingham_secondary_schools_median_center

OSGP Median Center (W):        407658.278755, 286696.905759
OSGP Median Center (YC):      407658.278752, 286696.905769
ArcGIS Median Center:             407658.009375, 286697.53996

What’s Next?

Weighted Mean Center (link will be updated once post is ready)

Also See…

Mean Center
Central Feature

The Resources:

ESRI Guide to GIS Volume 2: Chapter 2
see book review here.

Geoprocessing with Python

Python GDAL/OGR Cookbook

Setting up GDAL/OGR with FileGDB Driver for Python on Windows

< The Other Scripts >

Birmingham Secondary Schools

from osgeo import ogr
import os

## necessary drivers
shp_driver = ogr.GetDriverByName("ESRI Shapefile")
gdb_driver = ogr.GetDriverByName("FileGDB")

## input boundary shapefile and file reference file gdb
shapefile = r"C:\Users\Glen B\Documents\Schools\Data\GB\district_borough_unitary_region.shp"
gdb = r"C:\Users\Glen B\Documents\my_geodata.gdb"

shp_ds = shp_driver.Open(shapefile, 0)
gdb_ds = gdb_driver.Open(gdb, 1)

## filter boundary to just Birmingham
shp_layer = shp_ds.GetLayer(0)
shp_layer.SetAttributeFilter("NAME = 'Birmingham District (B)'")

## name the output
output_fc = "Birmingham_Secondary_Schools"

## if the output feature class already exists then delete it
if output_fc in [gdb_ds.GetLayerByIndex(lyr_name).GetName() for lyr_name in range(gdb_ds.GetLayerCount())]:
    gdb_ds.DeleteLayer(output_fc)
    print "Deleting: {0}".format(output_fc)

## create the output feature class
out_lyr = gdb_ds.CreateLayer(output_fc, shp_layer.GetSpatialRef(), ogr.wkbPolygon)

## the folder that contains the data to extract Secondary Schools from
root_folder = r"C:\Users\Glen B\Documents\Schools\Vector\data"

## traverse through the folders and find ImportantBuildings files
## copy only those that intersect the Birmingham region
## transfer across attributes
count = 1
for root,dirs,files in os.walk(root_folder):
    for filename in files:
        if filename.endswith("ImportantBuilding.shp") and filename[0:2] in ["SP", "SO", "SJ", "SK"]:
            shp_path = "{0}\\{1}".format(root, filename)
            schools_ds = shp_driver.Open(shp_path, 0)
            schools_lyr = schools_ds.GetLayer(0)
            schools_lyr.SetAttributeFilter("CLASSIFICA = 'Secondary Education'")
            lyr_def = schools_lyr.GetLayerDefn()
            if count == 1:
                for i in range(lyr_def.GetFieldCount()):
                    out_lyr.CreateField(lyr_def.GetFieldDefn(i))
                count += 1
            shp_layer.ResetReading()
            for shp_feat in shp_layer:
                birm_geom = shp_feat.GetGeometryRef()

                for school_feat in schools_lyr:
                    school_geom = school_feat.GetGeometryRef()

                    if school_geom.Intersects(birm_geom):
                        feat_dfn = out_lyr.GetLayerDefn()
                        feat = ogr.Feature(feat_dfn)
                        feat.SetGeometry(school_geom)
                        for i in range(lyr_def.GetFieldCount()):
                            feat.SetField(lyr_def.GetFieldDefn(i).GetNameRef(), school_feat.GetField(i))

                        out_lyr.CreateFeature(feat)
                        feat.Destroy()

del shp_ds, shp_layer, gdb_ds

The Usual 🙂

As always please feel free to comment to help make the code more efficient, highlight errors, or let me know if this was of any use to you.

OSGP#2.1: Measuring Geographic Distributions – Central Feature

(Open Source Geospatial Python)

The ‘What is it?’

The Central Feature is the point that is the shortest distance to all other points in the dataset and thus identifies the most centrally located feature. The Central Feature can be used to find the most accessible feature, for example, the most accessible school to hold a training day for teachers from schools in a given area.

The Formula!

For each feature calculate the total distance to all other features. The feature that has the shortest total distance is the Central Feature.

For Point features the X and Y coordinates of each feature is used, for Polygons the centroid of each feature represents the X and Y coordinate to use, and for Linear features the mid-point of each line is used for the X and Y coordinate

The Code…

from osgeo import ogr
from shapely.geometry import MultiLineString
from shapely import wkt
import numpy as np

## set the driver for the data
driver = ogr.GetDriverByName("FileGDB")
## path to the FileGDB
gdb = r"C:\Users\Glen B\Documents\my_geodata.gdb"
## open the GDB in write mode (1)
ds = driver.Open(gdb, 1)

## input layer
input_lyr_name = "Birmingham_Secondary_Schools"

## output layer
output_fc = "{0}_central_feature".format(input_lyr_name)

## reference the layer using the layers name
if input_lyr_name in [ds.GetLayerByIndex(lyr_name).GetName() for lyr_name in range(ds.GetLayerCount())]:
    lyr = ds.GetLayerByName(input_lyr_name)
    print "{0} found in {1}".format(input_lyr_name, gdb)

## delete the output layer if it already exists
if output_fc in [ds.GetLayerByIndex(lyr_name).GetName() for lyr_name in range(ds.GetLayerCount())]:
    ds.DeleteLayer(output_fc)
    print "Deleting: {0}".format(output_fc)

## for each point or polygon in the layer
## get the x and y value of the centroid
## and add them into a numpy array
try:
    first_feat = lyr.GetFeature(1)
    if first_feat.geometry().GetGeometryName() in ["POINT", "MULTIPOINT", "POLYGON", "MULTIPOLYGON"]:
        xy_arr = np.ndarray((len(lyr), 2), dtype=np.float)
        for i, pt in enumerate(lyr):
            ft_geom = pt.geometry()
            xy_arr[i] = (ft_geom.Centroid().GetX(), ft_geom.Centroid().GetY())

    ## for linear features we get the midpoint of a line
    elif first_feat.geometry().GetGeometryName() in ["LINESTRING", "MULTILINESTRING"]:
        xy_arr = np.ndarray((len(lyr), 2), dtype=np.float)
        for i, ln in enumerate(lyr):
            line_geom = ln.geometry().ExportToWkt()
            shapely_line = MultiLineString(wkt.loads(line_geom))
            midpoint = shapely_line.interpolate(shapely_line.length/2)
            xy_arr[i] = (midpoint.x, midpoint.y)

## exit gracefully if unknown geometry or input contains no geometry
except Exception:
    print "Unknown Geometry for {0}".format(input_lyr_name)

## construct NxN array, this will be the distance matrix
pt_dist_arr = np.ndarray((len(xy_arr), len(xy_arr)), dtype=np.float)

## fill the distance array
for i, a in enumerate(xy_arr):
    for j, b in enumerate(xy_arr):
        pt_dist_arr[i,j] = np.linalg.norm(a-b)

## sum distances for each point
summed_distances = np.sum(pt_dist_arr, axis=0)

## index of point with minimum summed distances
index_central_feat = np.argmin(summed_distances)

## position of the point with min distance
central_x, central_y = xy_arr[index_central_feat]

print "Central Feature Coords: {0}, {1}".format(central_x, central_y)

## create a new point layer with the same spatial ref as lyr
out_lyr = ds.CreateLayer(output_fc, lyr.GetSpatialRef(), ogr.wkbPoint)

## define and create new fields
x_fld = ogr.FieldDefn("X", ogr.OFTReal)
y_fld = ogr.FieldDefn("Y", ogr.OFTReal)
out_lyr.CreateField(x_fld)
out_lyr.CreateField(y_fld)

## create a new point for the mean center
pnt = ogr.Geometry(ogr.wkbPoint)
pnt.AddPoint(central_x, central_y)

## add the mean center to the new layer
feat_dfn = out_lyr.GetLayerDefn()
feat = ogr.Feature(feat_dfn)
feat.SetGeometry(pnt)
feat.SetField("X", central_x)
feat.SetField("Y", central_y)
out_lyr.CreateFeature(feat)

print "Created: {0}".format(output_fc)

## free up resources
del ds, lyr, first_feat, feat, out_lyr

I’d like to give credit to Logan Byers from GIS StackExchange who aided in speeding up the computational time using NumPy and for forcing me to begin learning the wonders of NumPy.

At the moment this is significantly slower that performing the same process with ArcGIS for 20,000+ features, but more rapid for a lower amount. 1,000 features processed in 3 seconds.

The Example:

I downloaded vector data that contains polygons for schools from OS Open Map – Local that covered the West Midlands. I also downloaded OS Boundary Line data. Using Python and GDAL/OGR I extracted secondary schools from the data for Birmingham. Everything was now in place to find the Central Feature of all Secondary Schools for Birmingham. (see The Other Scripts section at the bottom of this post for processing the data)

birmingham_secondary_schools

Running the script from The Code section above calculates the coordinates of the Central Feature for all Secondary Schools and creates a point feature class in the File GDB.

birmingham_secondary_schools_central_feature

OSGP Central Feature:      407726.185, 287215.1
ArcGIS Central Feature:    407726.185, 287215.1

What’s Next?

Median Center (link will be updated once post is complete)

Also see…

Mean Center

The Resources:

ESRI Guide to GIS Volume 2: Chapter 2 (I highly recommend this book)
see book review here.

Geoprocessing with Python

Python GDAL/OGR Cookbook

Setting up GDAL/OGR with FileGDB Driver for Python on Windows

< The Other Scripts >

Birmingham Secondary Schools

from osgeo import ogr
import os

## necessary drivers
shp_driver = ogr.GetDriverByName("ESRI Shapefile")
gdb_driver = ogr.GetDriverByName("FileGDB")

## input boundary shapefile and file reference file gdb
shapefile = r"C:\Users\Glen B\Documents\Schools\Data\GB\district_borough_unitary_region.shp"
gdb = r"C:\Users\Glen B\Documents\my_geodata.gdb"

shp_ds = shp_driver.Open(shapefile, 0)
gdb_ds = gdb_driver.Open(gdb, 1)

## filter boundary to just Birmingham
shp_layer = shp_ds.GetLayer(0)
shp_layer.SetAttributeFilter("NAME = 'Birmingham District (B)'")

## name the output
output_fc = "Birmingham_Secondary_Schools"

## if the output feature class already exists then delete it
if output_fc in [gdb_ds.GetLayerByIndex(lyr_name).GetName() for lyr_name in range(gdb_ds.GetLayerCount())]:
    gdb_ds.DeleteLayer(output_fc)
    print "Deleting: {0}".format(output_fc)

## create the output feature class
out_lyr = gdb_ds.CreateLayer(output_fc, shp_layer.GetSpatialRef(), ogr.wkbPolygon)

## the folder that contains the data to extract Secondary Schools from
root_folder = r"C:\Users\Glen B\Documents\Schools\Vector\data"

## traverse through the folders and find ImportantBuildings files
## copy only those that intersect the Birmingham region
## transfer across attributes
count = 1
for root,dirs,files in os.walk(root_folder):
    for filename in files:
        if filename.endswith("ImportantBuilding.shp") and filename[0:2] in ["SP", "SO", "SJ", "SK"]:
            shp_path = "{0}\\{1}".format(root, filename)
            schools_ds = shp_driver.Open(shp_path, 0)
            schools_lyr = schools_ds.GetLayer(0)
            schools_lyr.SetAttributeFilter("CLASSIFICA = 'Secondary Education'")
            lyr_def = schools_lyr.GetLayerDefn()
            if count == 1:
                for i in range(lyr_def.GetFieldCount()):
                    out_lyr.CreateField(lyr_def.GetFieldDefn(i))
                count += 1
            shp_layer.ResetReading()
            for shp_feat in shp_layer:
                birm_geom = shp_feat.GetGeometryRef()

                for school_feat in schools_lyr:
                    school_geom = school_feat.GetGeometryRef()

                    if school_geom.Intersects(birm_geom):
                        feat_dfn = out_lyr.GetLayerDefn()
                        feat = ogr.Feature(feat_dfn)
                        feat.SetGeometry(school_geom)
                        for i in range(lyr_def.GetFieldCount()):
                            feat.SetField(lyr_def.GetFieldDefn(i).GetNameRef(), school_feat.GetField(i))

                        out_lyr.CreateFeature(feat)
                        feat.Destroy()

del shp_ds, shp_layer, gdb_ds

The Usual 🙂

As always please feel free to comment to help make the code more efficient, highlight errors, or let me know if this was of any use to you.

OSGP#1.1: Measuring Geographic Distributions – Mean Center

(Open Source Geospatial Python)

The ‘What is it?’

The Mean Center is the average X coordinate and Y coordinate for all features in a study area and is the simplest descriptor of a geographic distribution. The Mean Center is generally used to track the changes in a features distribution over time and can also be used to compare the distribution of multiple features.

The Mean Center is also known as the Geographic Center or Center of Concentration for a set of features.

You would calculate the Mean Center for features where there is no travel interaction between the Center and the features of the study. Basically, use it for a study where each event that happens is a recorded location, for example a burglary for crime analysis, or the sighting of wombat for wildlife studies.

The Formula!

Mean Center Formula

For Point features the X and Y coordinates of each feature is used, for Polygons the centroid of each feature represents the X and Y coordinate to use, and for Linear features the mid-point of each line is used for the X and Y coordinate.

The Code…

from osgeo import ogr
from shapely.geometry import MultiLineString
from shapely import wkt
import numpy as np
import sys

## set the driver for the data
driver = ogr.GetDriverByName("FileGDB")
## path to the FileGDB
gdb = r"C:\Users\Glen B\Documents\my_geodata.gdb"
## ope the GDB in write mode (1)
ds = driver.Open(gdb, 1)

## input layer
input_lyr_name = "Birmingham_Burglaries_2016"

## the output layer
output_fc = "{0}_mean_center".format(input_lyr_name)

## reference the layer using the layers name
if input_lyr_name in [ds.GetLayerByIndex(lyr_name).GetName() for lyr_name in range(ds.GetLayerCount())]:
    lyr = ds.GetLayerByName(input_lyr_name)
    print "{0} found in {1}".format(input_lyr_name, gdb)

## delete the output feature class if it already exists
if output_fc in [ds.GetLayerByIndex(lyr_name).GetName() for lyr_name in range(ds.GetLayerCount())]:
    ds.DeleteLayer(output_fc)
    print "Deleting: {0}".format(output_fc)

try:
    ## assess the geometry of the input feature class
    first_feat = lyr.GetFeature(1)
    ## for each point or polygon in the layer 
    ## get the x and y value of the centroid 
    ## store in a numpy array
    if first_feat.geometry().GetGeometryName() in ["POINT", "MULTIPOINT", "POLYGON", "MULTIPOLYGON"]:
        xy_arr = np.ndarray((len(lyr), 2), dtype=np.float)
        for i, pt in enumerate(lyr):
            ft_geom = pt.geometry()
            xy_arr[i] = (ft_geom.Centroid().GetX(), ft_geom.Centroid().GetY())

    ## for lineear we get the midpoint of a line
    elif first_feat.geometry().GetGeometryName() in ["LINESTRING", "MULTILINESTRING"]:
        xy_arr = np.ndarray((len(lyr), 2), dtype=np.float)
        for i, ln in enumerate(lyr):
            line_geom = ln.geometry().ExportToWkt()
            shapely_line = MultiLineString(wkt.loads(line_geom))
            midpoint = shapely_line.interpolate(shapely_line.length/2)
            xy_arr[i] = (midpoint.x, midpoint.y)

## exit gracefully if unknown geometry or input contains no geometry
except Exception:
    print "Unknown geometry for {0}".format(input_lyr_name)
    sys.exit()

avg_x, avg_y = np.mean(xy_arr, axis=0)

print "Mean Center: {0}, {1}".format(avg_x, avg_y)

## create a new point layer with the same spatial ref as input layer
out_lyr = ds.CreateLayer(output_fc, lyr.GetSpatialRef(), ogr.wkbPoint)

## define and create new fields to hold the mean center coordinates
x_fld = ogr.FieldDefn("X", ogr.OFTReal)
y_fld = ogr.FieldDefn("Y", ogr.OFTReal)
out_lyr.CreateField(x_fld)
out_lyr.CreateField(y_fld)

## create a new point geom for the mean center
pnt = ogr.Geometry(ogr.wkbPoint)
pnt.AddPoint(avg_x, avg_y)

## add the mean center point to the new layer with attributes
feat_dfn = out_lyr.GetLayerDefn()
feat = ogr.Feature(feat_dfn)
feat.SetGeometry(pnt)
feat.SetField("X", avg_x)
feat.SetField("Y", avg_y)
out_lyr.CreateFeature(feat)

print "Created: {0}".format(output_fc)

## free up resources
del ds, lyr, first_feat, feat, out_lyr

I’d like to give credit to Logan Byers from GIS StackExchange who aided in speeding up the computational time using NumPy and for forcing me to begin learning the wonders of NumPy.

The Example:

I downloaded crime data from DATA.POLICE.UK for the West Midlands Police from January 2016 to December 2016. I used some Python to extract just the Burglary data and made this into a feature class in the File GDB. Next, I downloaded OS Boundary Line data and clipped the Burglary data to just Birmingham. Everything was now in place to find the Mean Center of all burglaries for Birmingham in 2016. (see The Other Scripts section at the bottom of this post for processing the data)

birmingham_burgalries_2016

Running the script from The Code section above calculates the Mean Center of all burglaries for 2016 and created a point feature class in the File GDB.

birmingham_buglaries_2016_mean_center

OSGP Mean Center:     407926.695396, 286615.428507
ArcGIS Mean Center:    407926.695396, 286615.428507

What’s Next?…

Central Feature

The Resources:

ESRI Guide to GIS Volume 2: Chapter 2 (I highly recommend this book)
see book review here.

Geoprocessing with Python

Python GDAL/OGR Cookbook

Setting up GDAL/OGR with FileGDB Driver for Python on Windows

< The Other Scripts >

1. Extract Burglary Data for West Midlands

import csv, os
from osgeo import ogr, osr

## set the driver for the data
driver = ogr.GetDriverByName("FileGDB")

## path to the FileGDB
gdb = r"C:\Users\Glen B\Documents\my_geodata.gdb"

## ope the GDB in write mode (1)
ds = driver.Open(gdb, 1)

## the coordinates in the csv files are lat/long
source = osr.SpatialReference()
source.ImportFromEPSG(4326)

## we need the data in British National Grid
target = osr.SpatialReference()
target.ImportFromEPSG(27700)

transform = osr.CoordinateTransformation(source, target)

## set the output fc name
output_fc = "WM_Burglaries_2016"

## if the output fc already exists delete it
if output_fc in [ds.GetLayerByIndex(lyr_name).GetName() for lyr_name in range(ds.GetLayerCount())]:
    ds.DeleteLayer(output_fc)
    print "Deleting: {0}".format(output_fc)

out_lyr = ds.CreateLayer(output_fc, target, ogr.wkbPoint)

## define and create new fields
mnth_fld = ogr.FieldDefn("Month", ogr.OFTString)
rep_by_fld = ogr.FieldDefn("Reported_by", ogr.OFTString)
fls_wthn_fld = ogr.FieldDefn("Falls_within", ogr.OFTString)
loc_fld = ogr.FieldDefn("Location", ogr.OFTString)
lsoa_c_fld = ogr.FieldDefn("LSOA_code", ogr.OFTString)
lsoa_n_fld = ogr.FieldDefn("LSOA_name", ogr.OFTString)
crime_fld = ogr.FieldDefn("Crime_type", ogr.OFTString)
outcome_fld = ogr.FieldDefn("Last_outcome", ogr.OFTString)

out_lyr.CreateField(mnth_fld)
out_lyr.CreateField(rep_by_fld)
out_lyr.CreateField(fls_wthn_fld)
out_lyr.CreateField(loc_fld)
out_lyr.CreateField(lsoa_c_fld)
out_lyr.CreateField(lsoa_n_fld)
out_lyr.CreateField(crime_fld)
out_lyr.CreateField(outcome_fld)

## where the downloaded csv files reside
root_folder = r"C:\Users\Glen B\Documents\Crime"

## for each csv
for root,dirs,files in os.walk(root_folder):
    for filename in files:
        if filename.endswith(".csv"):
            csv_path = "{0}\\{1}".format(root, filename)
            with open(csv_path, "rb") as csvfile:
                reader = csv.reader(csvfile, delimiter=",")
                next(reader,None)
                ## create a point with attributes for each burglary
                for row in reader:
                    if row[9] == "Burglary":
                        pnt = ogr.Geometry(ogr.wkbPoint)
                        pnt.AddPoint(float(row[4]), float(row[5]))
                        pnt.Transform(transform)
                        feat_dfn = out_lyr.GetLayerDefn()
                        feat = ogr.Feature(feat_dfn)
                        feat.SetGeometry(pnt)
                        feat.SetField("Month", row[1])
                        feat.SetField("Reported_by", row[2])
                        feat.SetField("Falls_within", row[3])
                        feat.SetField("Location", row[6])
                        feat.SetField("LSOA_code", row[7])
                        feat.SetField("LSOA_name", row[8])
                        feat.SetField("Crime_type", row[9])
                        feat.SetField("Last_outcome", row[10])
                        out_lyr.CreateFeature(feat)

del ds, feat, out_lyr

2. Birmingham Burglaries Only

from osgeo import ogr

## required drivers
shp_driver = ogr.GetDriverByName("ESRI Shapefile")
gdb_driver = ogr.GetDriverByName("FileGDB")

## input boundary shapefile and file gdb
shapefile = r"C:\Users\Glen B\Documents\Crime\Data\GB\district_borough_unitary_region.shp"
gdb = r"C:\Users\Glen B\Documents\my_geodata.gdb"

## open the shapefile in read mode and gdb in write mode
shp_ds = shp_driver.Open(shapefile, 0)
gdb_ds = gdb_driver.Open(gdb, 1)

## reference the necessary layers
shp_layer = shp_ds.GetLayer(0)
gdb_layer = gdb_ds.GetLayerByName("WM_Burglaries_2016")

## filter the shapefile
shp_layer.SetAttributeFilter("NAME = 'Birmingham District (B)'")

## set the name for the output feature class
output_fc = "Birmingham_Burglaries_2016"

## if the output already exists then delete it
if output_fc in [gdb_ds.GetLayerByIndex(lyr_name).GetName() for lyr_name in range(gdb_ds.GetLayerCount())]:
    gdb_ds.DeleteLayer(output_fc)
    print "Deleting: {0}".format(output_fc)

## create an output layer
out_lyr = gdb_ds.CreateLayer(output_fc, shp_layer.GetSpatialRef(), ogr.wkbPoint)

## copy the schema from the West Midlands burglaries
## and use it for the Birmingham burglaries
lyr_def = gdb_layer.GetLayerDefn()
for i in range(lyr_def.GetFieldCount()):
    out_lyr.CreateField (lyr_def.GetFieldDefn(i))

## only get burglaries that intersect the Birmingham region
for shp_feat in shp_layer:
    print shp_feat.GetField("NAME")
    birm_geom = shp_feat.GetGeometryRef()
    for gdb_feat in gdb_layer:
        burglary_geom = gdb_feat.GetGeometryRef()
        if burglary_geom.Intersects(birm_geom):
            feat_dfn = out_lyr.GetLayerDefn()
            feat = ogr.Feature(feat_dfn)
            feat.SetGeometry(burglary_geom)

            ## populate the attribute table
            for i in range(lyr_def.GetFieldCount()):
                feat.SetField(lyr_def.GetFieldDefn(i).GetNameRef(), gdb_feat.GetField(i))
            ## create the feature
            out_lyr.CreateFeature(feat)
            feat.Destroy()

del shp_ds, shp_layer, gdb_ds, gdb_layer

The Usual 🙂

As always please feel free to comment to help make the code more efficient, highlight errors, or let me know if this was of any use to you.

Setting up GDAL/OGR with FileGDB Driver for Python on Windows

I have decided to venture into the world of GDAL/OGR with Python with my main motivation to mimic some tools from ArcGIS for Desktop. I am hoping that this will help me to improve on a few fronts; my Python coding, increased knowledge regarding open source geospatial libraries, and to better understand the algorithms that churn away behind the scenes when you click a button in a GUI based GIS and perform some sort of geoprocessing or data analysis.

I mainly work with ESRI File Geodatabases and while I know this is not open source ESRI have an API in place to read and write to a gdb via GDAL/OGR. The first step is to setup what I need to start my journey for learning GDAL/OGR with Python for Windows. I will also install a few libraries that will help speed up some computations for more efficient geoprocessing.

I am using…
Python 2.7.13 32bit on Windows 7 Professional

1. Download and Install Microsoft Visual C++ 2008 Service Pack

Click here to download and the install.

microsoft_visual c++

2. Go to Christoph Gohlke’s website and download the GDAL wheel.

Grab the GDAL whl file. I downloaded GDAL‑2.1.3‑cp27‑cp27m‑win32.whl
Open the command prompt, change directory to where the whl was downloaded and use pip to install.

pip install "GDAL‑2.1.3‑cp27‑cp27m‑win32.whl"

gdal_whl installation

3. Get the File Geodatabase API from ESRI (you will need an ESRI account)

Go to ESRI Dowloads and download File Geodatabase API 1.3 version for Windows (Visual Studio 2008). This will be a zip folder. Open the contents of the API zipped folder and extract FileGDBAPI.dll from the bin folder to

C:\Python27\Lib\site-packages\osgeo

or wherever your site-package folder resides. Just make sure to extract it to osgeo.

4. Create a New Variable in Environmental Variables

In Advanced System Settings create a new Environmental Variable called GDAL_DRIVER_PATH and set the path to the osgeo folder in Step 5.

5. Open __init__.py from osgeo…

… and uncomment line 10.

gdal_uncomment_line

Save the file.

6. Test the setup

Open a Python interpreter and test using…

test gdal setup

If you do not get an errors like the screenshot above then setup has been successful.

*************************************************************************************************
OPTIONAL: these will be used in some capacity for scripting geoprocessing,

7. Download numpy + mkl wheel from the brilliant website of Christoph Gohlke

Click here and download the necessary whl file. For my setup I have downloaded numpy‑1.11.3+mkl‑cp27‑cp27m‑win32.whl 
Open up the command prompt and change directory to where the downloaded file resides. Use pip to install.

pip install "numpy‑1.11.3+mkl‑cp27‑cp27m‑win32.whl"

numpy_mkl_whl

8. Install SciPy

Back we go to Gohlke repository and to the SciPy Wheels. Here, I have downloaded scipy‑0.19.0‑cp27‑cp27m‑win32.whl
Open up the command prompt if you have closed it after Step 1 and change directory to where the downloaded file can be found.
Use pip to install.

pip install "scipy‑0.19.0‑cp27‑cp27m‑win32.whl"

scipy_whl

9. Install Shapely

You got it, go back to Gohlke and download the Shapely whl file. I grabbed Shapely‑1.5.17‑cp27‑cp27m‑win32.whl. Use pip to install similar to Steps 7 and 8.

 

Now to immerse myself in learning mode and put GDAL/OGR to some use. Check out OSGP#1.1: Measuring Geographic Distributions – Mean Center for the first attempt.

PDF to JPG Conversion with Python (for Windows)

I recently had a torrid time trying to research and implement a Python script that could batch convert from PDF to JPG. There are numerous entries online that aim to help (and did so in parts) but I struggled to find one with a concise workflow from start to finish that satisfied my criteria and involved setting up what’s required to implement such. The below could be slated for not being the most ‘Pythonic’ way to get it done but it certainly made my life easier. I was struggling with Wand and ImageMagick as per most posts until I luckily stumbled across an entry on StackOverflow where floqqi, my new hero, answered my prayers. I felt that if I struggled with this that there must be others out there with the same predicament and I hope that the title of this post will help it come to the forefront of searches and aid fellow Python snippet researchers in finding some salvation.
Note: I am using Python 2.7 32-bit on Windows 7 Professional

1. Install ImageMagick
floqqi recommends downloading the latest version, which at the time of writing this is 7.0.4-3. I had already installed an earlier version while trying to get the Wand module to work. My version is 6.9.7-3. If you hover over the links you should be able to see the full link name http://www.imagemagick.org/download/binaries/ImageMagick-6.9.7-3-Q8-x86-dll.exe, or just click that link to download the same version I did.
Run the installer, accept the license agreement, and click Next on the Information window. In the Select Additional Tasks make sure that Install development headers and libraries for C and C++ is selected.

imagemagick

Click Next and then Install.

2. Install GhostScript
I
nstall the 32-bit Ghostscript AGPL Release

3. Set Environment Variables
Create a new System Variable (Advanced System Settings > Environment Variables) called MAGICK_HOME and insert the Image Magick installation path as the value. This will be similar to C:\Program Files (x86)\ImageMagick-6.9.7-Q8

MAGICK_HOME

Click OK and and make sure that the same value (C:\Program Files (x86)\ImageMagick-6.9.7-Q8) is at the start of the Path variable. After this entry in the Path variable insert the entry for GhostScript which will be similar to C:\Program Files (x86)\gs\gs9.20\bin
Note: make sure that the entries are separated by a semi-colon (;)

4. Check if steps 1-3 have been correctly configured
Open the Command Prompt and enter…

convert file1.pdf file2.jpg

where file.pdf and file2.jpg are fully qualified paths for an input PDF and and output JPG (or the current directory contains the file).

convert cmd

If no errors are presented and the JPG has been created you can move on to the next step. Otherwise step into some troubleshooting.

5. Install PythonMagick
I downloaded the Python 2.7 32-bit whl file PythonMagick‑0.9.10‑cp27‑none‑win32.whl and then used pip to install from the command prompt.

pip install C:\Users\glen.bambrick\Downloads\pip install PythonMagick‑0.9.10‑cp27‑none‑win32.whl

Open up a Python IDE and test to see if you can import PythonMagick

import PythonMagick

We now have everything set up and can begin to write a script that will convert multiple (single page) PDFs to JPGs. 

Import the necessary modules.

import os, PythonMagick
from PythonMagick import Image
from datetime import datetime

Ok so datetime isn’t necessary but I like to time my scripts and see if it can be improved upon. Set the start time for the script

start_time = datetime.now()

A couple of global variables, one for the directory that holds the PDFs, and another to hold a hexidecimal value for the background colour ‘white’. After trial and error I noticed that some JPGs were being exported with a black background instead of white and this will be used to force a white background. I found a useful link on StackOverflow to help overcome this.

pdf_dir = r"C:\MyPDFs"
bg_colour = "#ffffff"

We loop through each PDF in the folder

for pdf in [pdf_file for pdf_file in os.listdir(pdf_dir) if pdf_file.endswith(".pdf")]:

Set and read in each PDF. density is the resolution.

    input_pdf = pdf_dir + "\\" + pdf
    img = Image()
    img.density('300')
    img.read(input_pdf)

Get the dimensions of the image.

    size = "%sx%s" % (img.columns(), img.rows())

Build the JPG for output. This part must be the Magick in PythonMagic because for a small portion of it I am mystified. See that last link to StackOverflow for the origin of the code here. The PythonMagick documentation is tough to digest and in various threads read the laments about how poor it is.

    output_img = Image(size, bg_colour)
    output_img.type = img.type
    output_img.composite(img, 0, 0, PythonMagick.CompositeOperator.SrcOverCompositeOp)
    output_img.resize(str(img.rows()))
    output_img.magick('JPG')
    output_img.quality(75)

And lastly we write out our JPG

    output_jpg = input_pdf.replace(".pdf", ".jpg")
    output_img.write(output_jpg)

And see how long it took the script to run.

print datetime.now() - start_time

This places the output JPGs in the same folder as the PDFs. Based on the resolution (density) and quality settings the process can be a bit lengthy. Using the settings above it took 9 minutes to do 20 PDF to JPG Conversions. You will need to figure out the optimum resolution and quality for your purpose. Low res took 46 seconds for all 20.

As always I feel a sense of achievement when I get a Python script to work and hope that this post will spur on some comments to make the above process more efficient. Feel free to post links to any resources, maybe comment to help myself and other readers, or if this helped you in anyway let me know and I’ll pass the thanks on to floqqi and the rest of the crew. This script is the limit of my knowledge with PythonMagick and this is thanks to those that have endeavoured before me and referenced in the links throughout this post. Thanks guys.

Complete script…

import os, PythonMagick
from PythonMagick import Image
from datetime import datetime

start_time = datetime.now()

pdf_dir = r"C:\MyPDFs"
bg_colour = "#ffffff"

for pdf in [pdf_file for pdf_file in os.listdir(pdf_dir) if pdf_file.endswith(".pdf")]:

input_pdf = pdf_dir + "\\" + pdf
 img = Image()
 img.density('300')
 img.read(input_pdf)

size = "%sx%s" % (img.columns(), img.rows())

output_img = Image(size, bg_colour)
 output_img.type = img.type
 output_img.composite(img, 0, 0, PythonMagick.CompositeOperator.SrcOverCompositeOp)
 output_img.resize(str(img.rows()))
 output_img.magick('JPG')
 output_img.quality(75)


 output_jpg = input_pdf.replace(".pdf", ".jpg")
 output_img.write(output_jpg)

print datetime.now() - start_time

Table or Feature Class Attributes to CSV with ArcPy (Python)

Here’s a little function for exporting an attribute table from ArcGIS to a CSV file. The function takes two arguments, these are a file-path to the input feature class or table and a file-path for the output CSV file (see example down further).

First import the necessary modules.

import arcpy, csv

Inside the function we use ArcPy to get a list of the field names.

def tableToCSV(input_tbl, csv_filepath):
    fld_list = arcpy.ListFields(input_tbl)
    fld_names = [fld.name for fld in fld_list]

We then open a CSV file to write the data to.

    with open(csv_filepath, 'wb') as csv_file:
        writer = csv.writer(csv_file)

The first row of the output CSV file contains the header which is the list of field names.

        writer.writerow(fld_names)

We then use the ArcPy SearchCursor to access the attributes in the table for each row and write each row to the output CSV file.

        with arcpy.da.SearchCursor(input_tbl, fld_names) as cursor:
            for row in cursor:
                writer.writerow(row)

And close the CSV file.

    csv_file.close()

Full script example…

import arcpy, csv

def tableToCSV(input_tbl, csv_filepath):
    fld_list = arcpy.ListFields(input_tbl)
    fld_names = [fld.name for fld in fld_list]
    with open(csv_filepath, 'wb') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow(fld_names)
        with arcpy.da.SearchCursor(input_tbl, fld_names) as cursor:
            for row in cursor:
                writer.writerow(row)
        print csv_filepath + " CREATED"
    csv_file.close()

fc = r"C:\Users\******\Documents\ArcGIS\Default.gdb\my_fc"
out_csv = r"C:\Users\******\Documents\output_file.csv"

tableToCSV(fc, out_csv)

Feel free to ask questions, comment, or help build upon this example.

My First Encounter with arcpy.da.UpdateCursor

I have been using arcpy intermittently over the past year and a half mainly for automating and chaining batch processing to save myself countless hours of repetition. This week, however, I had to implement a facet of arcpy that I had not yet had the opportunity to utilise – the data access module.

Data Cursor

The Scenario
A file geodatabase with 75 feature classes each containing hundreds to thousands of features. These feature classes were the product of a CAD (Bentley Microstation) to GIS conversions via FME with data coming from 50+ CAD files. As a result of the conversion each feature class could contain features with various attributes from one or multiple CAD files but each feature class consisted of the same schema which was helpful.

cad2gis

The main issue was that the version number for a chunk of the CAD files had not been corrected. Two things needed to be fixed: i) the ‘REV_NUM’ attribute for all feature classes needed to be ‘Ver2’, there would be a mix of ‘Ver1’ and ‘Ver2’,  and ii) in the ‘MODEL_SUMMARY’ if ‘Ver1’ was found anywhere in the text it needed to be replaced with ‘Ver2’. There was one other issue and this stemmed from creating new features and not attributing them, this would have left a ‘NULL’ value in the ‘MODEL’ field (and the other fields). All features had to have standardised attributes. The script would not fix these but merely highlight the feature classes.

OK so a quick recap…
1. Set the ‘REV_NUM’ for every feature to ‘Ver2’
2. Find and replace ‘Ver1’ with ‘Ver2’ in the text string of ‘MODEL_SUMMARY’ for all features.
3. Find all feature classes that have ‘NULL’ in the ‘MODEL’ field.

The Script
Let’s take a look at the thirteen lines of code required to complete the mission.

import arcpy

arcpy.env.workspace = r"C:\Users\*****\Documents\CleanedUp\Feature_Classes.gdb"
fc_list = arcpy.ListFeatureClasses()
fields = ["MODEL", "MODEL_SUMMARY", "REV_NUM"]

for fc in fc_list:
 with arcpy.da.UpdateCursor(fc, fields) as cursor:
  for row in cursor:
   if row[0] == None or row[0] == "":
    print fc + ": Null value found for MODEL"
    break
   if row[1] != None:
    row[1] = row[1].replace("Ver1", "Ver2")
   row[2] = "Ver2"
   cursor.updateRow(row)

The Breakdown
Import the arcpy library (you need ArcGIS installed and a valid license to use)

import arcpy

Set the workspace path to the relevant file geodatabase

arcpy.env.workspace = r"C:\Users\*****\Documents\CleanedUp\Feature_Classes.gdb"

Create a list of all the feature classes within the file geodatabase.

fc_list = arcpy.ListFeatureClasses()

We know the names of the fields we wish to access so we add these to a list.

fields = ["MODEL", "MODEL_SUMMARY", "REV_NUM"]

For each feature class in the geodatabase we want to access the attributes of each feature for the relevant fields.

for fc in fc_list:
 with arcpy.da.UpdateCursor(fc, fields) as cursor:
  for row in cursor:

If the ‘MODEL’ attribute has a None (NULL) or empty string value then print the feature class name to the screen. Once one is found we can break out and move onto the next feature class.

   if row[0] == None or row[0] == "":
    print fc + ": Null value found for MODEL"
    break

We know have a list of feature classes that we can fix the attributes manually.

Next we find any instance of ‘Ver1’ in ‘MODEL_SUMMARY’ text strings and replace it with ‘Ver2’….

   if row[1] != None:
    row[1] = row[1].replace("Ver1", "Ver2")

…and update all ‘REV_NUM’ attributes to ‘Ver2’ regardless of what is already attributed. This is like using the Field Calculator to update.

   row[2] = "Ver2"

Perform and commit the above updates for each feature.

   cursor.updateRow(row)

Very handy to update the data you need and this script can certainly be extended to handle more complex operations using the arcpy.da.UpdateCursor module.

Check out the documentation for arcpy.da.UpdateCursor