OSGP: Measuring Geographic Distributions – Weighted Mean Center

(Open Source Geospatial Python)

The ‘What is it?’

See Mean Center.

The unweighted center is mainly used for events that occur at a place and time such as burglaries. The weighted center, however, is predominantly used for stationary features such as retail outlets or delineated areas for example (such as Census tracts). The Weighted Mean Center does not take into account distance between features in the dataset.

The weight needs to be a numerical attribute. The greater the value, the higher the weight for that feature.

The Formula!

The Weighted Mean Center is calculated by multiplying the x and y coordinate by the weight for that feature and summing all for both x and y individually, and then dividing this by the sum of all the weights.

Weighted Mean Center FormulaFor Point features the X and Y coordinates of each feature is used, for Polygons the centroid of each feature represents the X and Y coordinate to use, and for Linear features the mid-point of each line is used for the X and Y coordinate.

The Code…

from osgeo import ogr
from shapely.geometry import MultiLineString
from shapely import wkt
import numpy as np
import sys

## set the driver for the data
driver = ogr.GetDriverByName("ESRI Shapefile")
## folder where the shapefile resides
folder = r"C:\Users\glen.bambrick\Documents\GDAL\shp\\"
## name of the shapefile concatenated with folder
shp = "{0}Census2011_Small_Areas_generalised20m.shp".format(folder)
## open the shapefile
ds = driver.Open(shp, 0)
## reference the only layer in the shapefile
lyr = ds.GetLayer(0)

## create an output data source
out_ds = driver.CreateDataSource("{0}{1}_wgt_mean_center.shp".format(folder,lyr.GetName()))

## output mean center weighted filename
output_fc = "{0}{1}_wgt_mean_center".format(folder,lyr.GetName())

## field that has numerical weight
weight_fld = "TOTAL2011"

try:
    first_feat = lyr.GetFeature(1)
    xy_arr = np.ndarray((len(lyr), 2), dtype=np.float)
    wgt_arr = np.ndarray((len(lyr), 1), dtype=np.float)
    ## use the centroid for points and polygons
    if first_feat.geometry().GetGeometryName() in ["POINT", "MULTIPOINT", "POLYGON", "MULTIPOLYGON"]:
        for i, pt in enumerate(lyr):
            ft_geom = pt.geometry()
            weight = pt.GetField(weight_fld)
            xy_arr[i] = (ft_geom.Centroid().GetX() * weight, ft_geom.Centroid().GetY() * weight)
            wgt_arr[i] = weight
    ## midpoint of lines
    elif first_feat.geometry().GetGeometryName() in ["LINESTRING", "MULTILINESTRING"]:
        for i, ln in enumerate(lyr):
            line_geom = ln.geometry().ExportToWkt()
            weight = ln.GetField(weight_fld)
            shapely_line = MultiLineString(wkt.loads(line_geom))
            midpoint = shapely_line.interpolate(shapely_line.length/2)
            xy_arr[i] = (midpoint.x * weight, midpoint.y * weight)
            wgt_arr[i] = weight

except Exception:
    print "Unknown geometry or Incorrect field name for {}".format(input_lyr_name)
    sys.exit()

## do the maths
sum_x, sum_y = np.sum(xy_arr, axis=0)
sum_wgt = np.sum(wgt_arr)
weighted_x, weighted_y = sum_x/sum_wgt, sum_y/sum_wgt

print "Weighted Mean Center: {0}, {1}".format(weighted_x, weighted_y)

## create a new point layer with the same spatial ref as lyr
out_lyr = out_ds.CreateLayer(output_fc, lyr.GetSpatialRef(), ogr.wkbPoint)

## define and create new fields
x_fld = ogr.FieldDefn("X", ogr.OFTReal)
y_fld = ogr.FieldDefn("Y", ogr.OFTReal)
out_lyr.CreateField(x_fld)
out_lyr.CreateField(y_fld)

## create a new point for the mean center weighted
pnt = ogr.Geometry(ogr.wkbPoint)
pnt.AddPoint(weighted_x, weighted_y)

## add the mean center weighted to the new layer
feat_dfn = out_lyr.GetLayerDefn()
feat = ogr.Feature(feat_dfn)
feat.SetGeometry(pnt)
feat.SetField("X", weighted_x)
feat.SetField("Y", weighted_y)
out_lyr.CreateFeature(feat)

print "Created: {0}.shp".format(output_fc)

## free up resources
del ds, out_ds, lyr, first_feat, feat, out_lyr

I’d like to give credit to Logan Byers from GIS StackExchange who aided in speeding up the computational time using NumPy and for forcing me to begin learning the wonders of NumPy (which is still a work in progress)

The Example:

I downloaded the Small Areas of Ireland from the CSO. You will have to acknowledge a disclaimer. The data contains population information for the 2011 Census. Once downloaded unzip Census2011_Small_Areas_generalised20m.zip to working folder.

Small Areas

Running the script from The Code section above calculates the Weighted Mean Center of all Small Areas based on the population count for each for 2011 and creates a point Shapefile as the output.

Small Areas Weighted Mean Center

OSGP Weighted Mean Center:      238557.427484, 208347.116116
ArcGIS Weighted Mean Center:    238557.427484, 208347.116116

Also See…

Mean Center
Central Feature
Median Center
Initial Data Assessment

The Resources:

ESRI Guide to GIS Volume 2: Chapter 2 (I highly recommend this book)
see book review here.

Geoprocessing with Python

Python GDAL/OGR Cookbook

The Usual 🙂

As always please feel free to comment to help make the code more efficient, highlight errors, or let me know if this was of any use to you.