In this post I will look at extracting point data from a CSV file and creating a Shapefile with the pyshp library. The data consists of the location of trees with various attributes generated by the Fingal County Council in Ireland. The data can be downloaded as a CSV file from dublinked.ie.
pyshp is a pure Python library designed to provide read and write support for the ESRI Shapefile (.shp) format and only utilizes Python’s standard library to achieve this. The library can be downloaded from https://code.google.com/p/pyshp/ and placed in the site-packages folder of your Python installation. Alternatively you can use easy-install…
easy_install pyshp
…or pip.
pip install pyshp
NOTE: You should make yourself familiar with the pyshp library by visiting Joel Lawhead’s examples and documents here.
The full code is at the bottom of the post, the following is a walkthrough. When ready to go open your favourite editor and import the modules required for the task at hand.
import shapefile, csv
We will use the getWKT_PRJ function discussed in a previous post.
def getWKT_PRJ (epsg_code): import urllib wkt = urllib.urlopen("http://spatialreference.org/ref/epsg/{0}/prettywkt/".format(epsg_code)) remove_spaces = wkt.read().replace(" ","") output = remove_spaces.replace("\n", "") return output
Create an instance of the Shapefile Writer( ) class and declare the POINT geometry type.
trees_shp = shapefile.Writer(shapefile.POINT)
Set the autoBalance to 1. This enforces that for every record there must be a corresponding geometry.
trees_shp.autoBalance = 1
Create the field names and data types for each.
trees_shp.field("TREE_ID", "C") trees_shp.field("ADDRESS", "C") trees_shp.field("TOWN", "C") trees_shp.field("TREE_SPEC", "C") trees_shp.field("SPEC_DESC", "C") trees_shp.field("COMMONNAME", "C") trees_shp.field("AGE_DESC", "C") trees_shp.field("HEIGHT", "C") trees_shp.field("SPREAD", "C") trees_shp.field("TRUNK", "C") trees_shp.field("TRUNK_ACTL", "C") trees_shp.field("CONDITION", "C")
Create a counter variable to keep track of the number of feature written to the Shapefile.
counter = 1
Open the CSV file in read mode.
with open('C:/csv_to_shp/Trees.csv', 'rb') as csvfile: reader = csv.reader(csvfile, delimiter=',')
Skip the header.
next(reader, None)
Loop through each row and assign each attribute in the row to a variable.
for row in reader: tree_id = row[0] address = row[1] town = row[2] tree_species = row[3] species_desc = row[4] common_name = row[5] age_desc = row[6] height = row[7] spread = row[8] trunk = row[9] trunk_actual = row[10] condition = row[11] latitude = row[12] longitude = row[13]
Set the geometry for each record based on the longitude and latitude vales.
trees_shp.point(float(longitude),float(latitude))
Create a matching record for the geometry using the attributes.
trees_shp.record(tree_id, address, town, tree_species, species_desc, common_name, age_desc,height, spread, trunk, trunk_actual, condition)
Print to screen the current feature number and increase the counter.
print "Feature " + str(counter) + " added to Shapefile." counter = counter + 1
Save the Shapefile to a location and name the file.
trees_shp.save("C:/csv_to_shp/Fingal_Trees")
Create a projection file (.prj)
prj = open("C:/csv_to_shp/Fingal_Trees.prj", "w") epsg = getWKT_PRJ("4326") prj.write(epsg) prj.close()
Save and run the script. The number of features should be printed to the console.
If you open the original CSV file you can see that there are also 33670 records. Navigate to the file location where you saved the Shapefile output. You should see four files shown below.
And just to make sure that the data is correct, here I have opened it up in QGIS.
And the attribute table…
And here’s the full code…
# import libraries import shapefile, csv # funtion to generate a .prj file def getWKT_PRJ (epsg_code): import urllib wkt = urllib.urlopen("http://spatialreference.org/ref/epsg/{0}/prettywkt/".format(epsg_code)) remove_spaces = wkt.read().replace(" ","") output = remove_spaces.replace("\n", "") return output # create a point shapefile trees_shp = shapefile.Writer(shapefile.POINT) # for every record there must be a corresponding geometry. trees_shp.autoBalance = 1 # create the field names and data type for each. trees_shp.field("TREE_ID", "C") trees_shp.field("ADDRESS", "C") trees_shp.field("TOWN", "C") trees_shp.field("TREE_SPEC", "C") trees_shp.field("SPEC_DESC", "C") trees_shp.field("COMMONNAME", "C") trees_shp.field("AGE_DESC", "C") trees_shp.field("HEIGHT", "C") trees_shp.field("SPREAD", "C") trees_shp.field("TRUNK", "C") trees_shp.field("TRUNK_ACTL", "C") trees_shp.field("CONDITION", "C") # count the features counter = 1 # access the CSV file with open('C:/csv_to_shp/Trees.csv', 'rb') as csvfile: reader = csv.reader(csvfile, delimiter=',') # skip the header next(reader, None) #loop through each of the rows and assign the attributes to variables for row in reader: tree_id = row[0] address = row[1] town = row[2] tree_species = row[3] species_desc = row[4] common_name = row[5] age_desc = row[6] height = row[7] spread = row[8] trunk = row[9] trunk_actual = row[10] condition = row[11] latitude = row[12] longitude = row[13] # create the point geometry trees_shp.point(float(longitude),float(latitude)) # add attribute data trees_shp.record(tree_id, address, town, tree_species, species_desc, common_name, age_desc,height, spread, trunk, trunk_actual, condition) print "Feature " + str(counter) + " added to Shapefile." counter = counter + 1 # save the Shapefile trees_shp.save("C:/csv_to_shp/Fingal_Trees") # create a projection file prj = open("C:/csv_to_shp/Fingal_Trees.prj", "w") epsg = getWKT_PRJ("4326") prj.write(epsg) prj.close()
Any problems let me know.
Pingback: Generate a Projection File (.prj) using Python | Geospatiality
Pingback: Reproject a Polygon Shapefile using PyShp and PyProj | Geospatiality
Hi, thank you for your great tutorial. I’ve tried to follow your script, but why it only added 1 feature into the shapefile? look forward to hear your explanations. Thanks
LikeLike
Just solved it, my indentation caused it only read 1 feature..thank you
LikeLike
Excellent, it’s usually something small. Thanks for visiting the page, I hope it was helpful.
LikeLike
Where exactly was the indentation incorrect?
LikeLike
Where exactly was the indentation incorrect?
LikeLike
Under for row in reader: down to increasing the counter should have been indented in the full listing of the script and is fixed in the post now.
LikeLike
Thank you! And like Reymonza, I have found this tutorial and example very insightful. Thank you again!
LikeLiked by 1 person
I’m converting into opposite direction from shp to csv. My problem is that when I read .shp file with shapefile Reader I get coordinates in meters. Is there way to get points in degrees?
With pprint I get e.g:
{‘m’: (-1.7976931348623157e+308,),
‘points’: [[339532.813818718, 6797730.64206192]],
‘shapeType’: 11,
‘z’: (0.0,)}
LikeLike
Have you got the capability to reproject the Shapefile into WGS84? You can easily do this with GIS software or if you want to try through Python modules similar to https://glenbambrick.com/2016/01/24/reproject-shapefile/. Your X, Y points in the Shapefile should then be in Long/Lat as decimal degrees. Let me know how it goes.
LikeLike
For the remove_spaces = wkt.read().replace(” “,””) line I get the error “TypeError: a bytes-like object is required, not ‘str'”
do you know why I am getting this error?
LikeLiked by 1 person
Can you please send me the epsg code you are using and I’ll recreate and see if I get a similar error. Cheers
LikeLike
I got the same error,
Got to working by modifying line 3 of function getWKT_PRJ to
remove_spaces = wkt.read().decode(‘utf-8’).replace(” “,””)
the wkt.read returns a byte array. Whereas need to read the response as string of utf-8 charecters
I am new to python so cant say how it worked in past.
also for python 3.5 I modified the function as below
def getWKT_PRJ (epsg_code):
from urllib.request import urlopen
# access projection information
with urlopen(“http://spatialreference.org/ref/epsg/{0}/prettywkt/”.format(epsg_code)) as wkt:
# remove spaces between charachters
remove_spaces = wkt.read().decode(‘utf-8’).replace(” “,””)
# place all the text on one line
output = remove_spaces.replace(“\n”, “”)
return output
LikeLike
Excellent. I haven’t used this function for quite some time. Thanks for taking the time to post this solution.
LikeLike
Thanks for the helpful and insightful example. I’m trying to use this and when I successfully create a shapefile and try to add it into a map document in ArcMap, I get an error saying “The number of shapes does not match the number of table records”. Do you know how to rectify this issue? Many of the shapefile repair tools I’ve found are outdated….
Thanks for any and all help
LikeLike
Have you made sure that you have set autoBalance = 1 for the shapefile? See Geometry and Record Balancing in the documentation https://pypi.python.org/pypi/pyshp
In the code example above it is trees_shp.autoBalance = 1
LikeLike
Hi there,
When I run a script written similar to yours and get to this section:
trees_shp.point(float(easting),float(northing))
I get the error: exceptions.ValueError: could not convert string to float: Line 1.00
I’ve tried adding: easting = float(row[1]) and northing = float(row[2]) to get the string to be a float, but that does not seem to work.
The CSV Easting value has 6 digits and 3 decimals, while the Northing value has 7 digits and 3 decimals. I’ve added those attributes in the respective .field(“Easting”, “F”, 6, 3) Thanks in advance!
LikeLike
It could possibly be reading an empty line. float(“”) will return the error ValueError: could not convert string to float:
Make sure all rows have an Easting and Northing
Or you could have non-digit values in your string such as “123O0” or “123,456” but this usually returns ValueError: invalid literal for float():
You will need to catch the error in any case.
See Stack Exchange for various solutions
https://stackoverflow.com/search?q=ValueError%3A+could+not+convert+string+to+float%3A
LikeLike
Hi, I have used your code before and it has worked perfectly. This time when I run the code I don’t get any errors pop up but when I open the attribute table of the shapefile it is blank. It will open in QGIS but not in ArcGIS. Do you know what may be causing this?
LikeLike
Unfortunately I have never encountered such so cannot really provide a solution I’m afraid. It’s been a long time since I used this but the main problems I used to have were making sure that every geometry (point) had a matching row in the attribute table. If you find a solution please post here. Cheers
LikeLike
Hey, i also have the error with the conversion of the lat long values.
output_shp.point(float(lon),float(lat)
output_shp.point(float(lon),float(lat))
TypeError: float() argument must be a string or a number, not ‘list’
LikeLike
You are trying to put a list for lat and/or lon, you need to iterate through a list and use each value.
LikeLike
Traceback (most recent call last):
File “D:\anzhuang\PyCharm Community Edition 2017.1.3\helpers\pydev\pydevd.py”, line 1585, in
globals = debugger.run(setup[‘file’], None, None, is_module)
File “D:\anzhuang\PyCharm Community Edition 2017.1.3\helpers\pydev\pydevd.py”, line 1015, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File “D:\anzhuang\PyCharm Community Edition 2017.1.3\helpers\pydev\_pydev_imps\_pydev_execfile.py”, line 18, in execfile
exec(compile(contents+”\n”, file, ‘exec’), glob, loc)
File “F:/论文数据处理/上海/点成线.py”, line 34, in
output_shp.record(id,speed,status,time)
File “D:\anzhuang\anconda\lib\site-packages\shapefile.py”, line 1071, in record
record = [recordList[i] for i in range(fieldCount)]
File “D:\anzhuang\anconda\lib\site-packages\shapefile.py”, line 1071, in
record = [recordList[i] for i in range(fieldCount)]
IndexError: tuple index out of range
thank you for your script very much.
when I follow your script I got this error,I don’t know what happen? look forward your answer. thank you so much!!
LikeLike
Hi and thank you, :)))
I have the error with your code and it is below,
Traceback (most recent call last):
File “C:\…\csv2shp.py”, line 13, in
trees_shp = shapefile.Writer(shapefile.POINT)
File “C:\…\Python\Python37\lib\site-packages\shapefile.py”, line 1057, in __init__
self.shp = self.__getFileObj(os.path.splitext(target)[0] + ‘.shp’)
File “C:\…\Python\Python37\lib\ntpath.py”, line 202, in splitext
p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not int
I couldn’t handle it.
LikeLike
Traceback (most recent call last):
File “C:\…\csv2shp.py”, line 67, in
trees_shp.save(“C:/…/Fingal_Trees1”)
AttributeError: ‘Writer’ object has no attribute ‘save’
LikeLike
Above I sent an error but I solve it in below
I change ‘rb’ with ‘rt’
with open(‘C:/…/fingaltrees.csv’, ‘rt’) as csvfile:
and also
trees_shp = shapefile.Writer(str(shapefile.POINT))
I turn to ‘str’ inside the Writer.
Now, this is my error in my code. I couldn’t over it.
LikeLike
Pyshp has changed, and now shapefile.Writer() takes as input a string that specifies the file path to save to. There’s no .save attribute either, it saves automatically. See here: https://pypi.org/project/pyshp/#writing-shapefiles
LikeLike
I get the error, AttributeError: ‘int’ object has no attribute ‘rfind’
LikeLike