Local governments have made a wide variety of data available at the ZIP code level. A great way of visualizing this information is to use a choropleth map, in which each ZIP code is colored by the intensity of its value for a particular statistic.
In this article I’ll demonstrate how to produce a ZIP-level choropleth map using freely available tools and data. As an example, we’ll plot the total number of births in 2007 for each ZIP code in California. Here’s an example of the final output:
All the data and script files used in this example are available here. If you would like further information, this page has links to a large selection of ZIP-level data, including population, political and environmental statistics. Also, the website FlowingData has an excellent article on creating maps at the county-level.
OK, let’s get started.
We’ll use Python and matplotlib to create our the map. Make sure you have both installed before proceeding.
The US Census site has lots of great geographic information, including ZIP code boundary files in various formats. We’ll use the ASCII format since it’s easy to read in and manipulate in Python. The 5-digit ZIP code boundary files for each state are available here. This example will use the California file zt06_d00_ascii.zip.
You may already have some data that you wish to plot, but for the purposes of this example we will use birth rate data from the California Department of Public Health. The data in this example was extracted from this Excel file on their site and converted into CSV format (CA_2007_births_by_ZIP.txt is included in here) for processing with Python.
Each ASCII file from the Census website contains boundaries for all the ZIP codes in a state. There are actually two files for each state: a metadata file that ends in ‘a.dat’ and the actual boundary data which ends in just ‘.dat’.
The boundary for each ZIP code has a single main polygon and may also contain several ‘exclusions’ such as islands, lakes, etc. This Python function will read in the data for a given state.
def read_ascii_boundary(filestem):
'''
Reads polygon data from an ASCII boundary file.
Returns a dictionary with polygon IDs for keys. The value for each
key is another dictionary with three keys:
'name' - the name of the polygon
'polygon' - list of (longitude, latitude) pairs defining the main
polygon boundary
'exclusions' - list of lists of (lon, lat) pairs for any exclusions in
the main polygon
'''
= filestem + 'a.dat'
metadata_file = filestem + '.dat'
data_file # Read metadata
= [line.strip().strip('"') for line in open(metadata_file)]
lines = lines[::6]
polygon_ids = lines[2::6]
polygon_names = {}
polygon_data for polygon_id, polygon_name in zip(polygon_ids, polygon_names):
# Initialize entry with name of polygon.
# In this case the polygon_name will be the 5-digit ZIP code.
= {'name': polygon_name}
polygon_data[polygon_id] del polygon_data['0']
# Read lon and lat.
= open(data_file)
f for line in f:
= line.split()
fields if len(fields) == 3:
# Initialize new polygon
= fields[0]
polygon_id 'polygon'] = []
polygon_data[polygon_id]['exclusions'] = []
polygon_data[polygon_id][elif len(fields) == 1:
# -99999 denotes the start of a new sub-polygon
if fields[0] == '-99999':
'exclusions'].append([])
polygon_data[polygon_id][else:
# Add lon/lat pair to main polygon or exclusion
= float(fields[0])
lon = float(fields[1])
lat if polygon_data[polygon_id]['exclusions']:
'exclusions'][-1].append((lon, lat))
polygon_data[polygon_id][else:
'polygon'].append((lon, lat))
polygon_data[polygon_id][return polygon_data
And finally here’s the full script that creates the final map:
import csv
from pylab import *
# Read in ZIP code boundaries for California
= read_ascii_boundary('../data/zip5/zt06_d00')
d
# Read in data for number of births by ZIP code in California
= csv.reader(open('../data/CA_2007_births_by_ZIP.txt', 'rb'))
f = {}
births # Skip header line
next()
f.# Add data for each ZIP code
for row in f:
= row
zipcode, totalbirths = float(totalbirths)
births[zipcode] = max(births.values())
max_births
# Create figure and two axes: one to hold the map and one to hold
# the colorbar
=(5, 5), dpi=30)
figure(figsize= axes([0.0, 0.0, 0.8, 0.9])
map_axis = axes([0.83, 0.1, 0.03, 0.6])
cb_axis
# Define colormap to color the ZIP codes.
# You can try changing this to cm.Blues or any other colormap
# to get a different effect
= cm.PuRd
cmap
# Create the map axis
axes(map_axis)-125, -114, 32, 42.5])
axis([
gca().set_axis_off()
# Loop over the ZIP codes in the boundary file
for polygon_id in d:
= array(d[polygon_id]['polygon'])
polygon_data = d[polygon_id]['name']
zipcode = births[zipcode] if zipcode in births else 0.
num_births # Define the color for the ZIP code
= cmap(num_births / max_births)
fc # Draw the ZIP code
= Polygon(array(polygon_data), facecolor=fc,
patch =(.3, .3, .3, 1), linewidth=.2)
edgecolor
gca().add_patch(patch)'Births per ZIP Code in California (2007)')
title(
# Draw colorbar
= mpl.colorbar.ColorbarBase(cb_axis, cmap=cmap,
cb = mpl.colors.Normalize(vmin=0, vmax=max_births))
norm 'Number of births')
cb.set_label(
# Change all fonts to Arial
for o in gcf().findobj(matplotlib.text.Text):
'Arial')
o.set_fontname(
# Export figure to bitmap
'../images/ca_births.png') savefig(