This tutorial demonstrates how you can take a raw series of coordinates and end up with a binned hexagonal map rendered in the browser using d3js and topojson.
Most Americans prefer to huddle together around urban areas, which raises all sorts of problems for map-based visualizations. Coloring regions according to a data value, known as a choropleth map, leaves the map maker beholden to arbitrary political boundaries and, at the county level, pixel-wide polygons in parts of the Northeast. Many publications prefer to place dots proportional in area to the data values over the center of each county, which inevitably produces overlapping circles in these same congested regions. Here's a particularly atrocious example of that strategy I once made at Slate:
Two weeks ago, Kevin Schaul released an exciting new command-line tool called binify that offers a brilliant alternative. Schaul's tool takes a series of points and clusters them (or "bins" them) into hexagonal tiles. Check out the introductory blog post on his site.
Binify operates on .shp files, which can be a bit difficult to work with for those of us who aren't GIS pros. I put together this tutorial to demonstrate how you can take a raw series of coordinates and end up with a binned hexagonal map rendered in the browser using d3js and topojson, both courtesy of the beautiful mind of Mike Bostock. All the source files we'll need are on Github.
I downloaded about 2,000 addresses from a Craigslist-like website and converted them to coordinates with geopy.
We're going to use one small Python script to create our .shp file. It's recommended you first create and activate a virtualenv with:
Whether or not you use virtualenv:
pip install -r requirements.txt
You also need to install ogr2ogr and topojson
for working with the shapefiles.
CSV -> SHP
Binify takes as input a .shp file, a format developed by ESRI for geospatial data. Specifically, it needs a "point shapefile" that contains a layer of individual coordinates. (Most .shp files you're likely to encounter consist of a lot of polygons marking territorial boundaries and so forth.) We can make a .shp file from a list of raw coordinates with the pyshp library. The
shpify.py script in the Github repo for this demo will take care of this:
If you look at the source, you'll see this is a very simple process of loading the coordinates from
coordinates.csv and writing them to a shapefile, same as you might to when creating a new .csv file in Python.
This script should place a file called
output.shp in the shapefiles directory. Pyshp also creates
the companion files
output.shx. We also need a projection file,
output.prj, so this script manually creates one.
Load these files into an ArcGIS program such as Quantum GIS and you'll see a nice collection of points:
SHP -> Binned SHP
Here is where Binify comes in. Per the documentation, we simply feed it our point shapefile with a few arguments.
First, we want to give it enough hexagons to achieve the granularity we want. 120 hexagons across sounds like a good starting target.
Because these sample coordinates span the United States, we will expect many of the hexagons to encompass 0 points. We can greatly reduce the filesize by including the -e argument, which prevents binify from writing empty polygons.
binify -n=120 -e shapefiles/original.shp shapefiles/binned.shp
This may take a few minutes to run. When finished, you'll have a new set of files named
binary.shp and so forth.
Load those files into QGIS and, like magic, we've got hexagons:
Binned SHP -> GeoJSON -> topoJSON
The mechanics of how to build GeoJSON and topoJSON files are well-documented--see this Stack Overflow Question of mine and and the generous answer from Bostock, for example--so we'll skip to the CLI commands:
ogr2ogr -f GeoJSON binned.json shapefiles/binned.shp
Make sure to use the
-p flag with the next line to preserve the
topojson -s 7e-9 -p -o coordinates.json -- binned.json
This reduces the 1.9MB .shp file to an 88KB .json file.
We can reuse 90 percent of the code in the d3 choropleth map example, which serves as a nice introduction to topoJSON mapping.
As Schaul notes in his introductory blog post, how you divide your data into color bins is critically important to how viewers interpret the information. In this case, I was lazy and simply colored all the hexagons red and then dimmed them according to the COUNT value (specifically, the square root of the ratio of the value to the maximum value on the map).
And there you have it. If the hexagons look a little too big, just rerun the
binify command with a
larger value for
n. The following map has been rendered live in your browser:
You can see the map with the code on bl.ocks.org.