How to Map Political Data Free-of-Charge

Feb. 15, 2013, 5:17 p.m.

Development practitioners have long had reasons to put numbers on aspects of their work. Not least is the increasing emphasis on quantitative program evaluation, and these data often vary in spatially meaningful ways. Consider a program in which towns are randomly assigned to receive some treatment. Mapping the levels of some output before and after the intervention would be a good way to quickly represent the effect of the project.

Contrary to conventional wisdom, one does not need special skills or expensive software to make maps like these. Here I lay out one method for doing so free-of-charge. Our end result will be a shaded map of voter turnout by district in Sri Lanka's 2010 presidential election, stored to a PDF.  For the cartographically inclined, this type of map is a choropleth.

An attached ZIP archive includes the example election data and an annotated R script file containing all the code used here. To make this map, you will need a statistical package called 'R.'  Download the appropriate version here. A spreadsheet program is helpful here too. If you do not have Microsoft Excel or an equivalent, consider OpenOffice.

You also need to install some packages. Once you've loaded R onto your computer, open it up, and find the command prompt. It looks like this: >.

Type the following and hit enter or return: chooseCRANmirror().

R will then ask you to choose a server from which to download these packages. After you do, enter the following line of code. It will download the packages we need to make our map, plus any packages each requires to function:

install.packages(c("foreign","sp","RColorBrewer"), dependencies=TRUE)

When you're done installing the packages, tell R to load them into memory:

#Load packages library(sp) library(RColorBrewer) library(foreign)

Now, you need data to plot and a map to plot them to. Think in terms of a spreadsheet. In one column are the values to plot. In an adjacent column are the names of the geographic units to which each data point corresponds. I went to the Sri Lankan Department of Elections and drilled down to the district-level summaries. Then I made a spreadsheet and saved it as comma-separated values (CSV). Put that file in a folder where you plan to do your work.

Tell R to work from that folder by setting the "working directory" to it. In Windows, this command is "Change dir" under the File menu. On a Mac, it's "Change Working Directory" under the Misc menu. If you use Linux, I probably don't need to tell you about this step.

Next, get your map (i.e., shapefile). Head over to the Global Administrative Areas country download page. Pick your country from the drop-down, and choose the file format corresponding to R. In the case of Sri Lanka, you'll notice three levels available. Level 1 contains the district information (spatial polygons in the language of GIS). Grab the corresponding file URL. Load it into R as follows. You will end up with an R object called "gadm."

connection <- url("http://www.filefactory.com/file/fa7ipjv04gv/n/LKA_adm1_RData") print(load(connection)) close(connection)

It's important that the names of the geographic units in your CSV file exactly match the names of the units in the map. In this case, the map names are stored in the variable "NAME_1." Use the command gadm$NAME_1 to display and examine these names. Head over to your CSV file and make sure everything matches.

Load your election data. Here we're drawing on the functionality in the "foreign" library. A possible command is skl <- read.csv("srilanka2012.csv"). The name of my spreadsheet is in the quotation marks, and the preceding command stores all this in an R object (called a "data frame") named "skl."

Merge the election data with the map data. You're matching the observations by district name, which, in my spreadsheet, is a column called "district." Note that our new workhorse data frame is "skl.map."

skl.map <- merge(gadm, skl, by.x="NAME_1", by.y="district", sort=TRUE)

The next two steps basically sort our data points into (left-closed) intervals, name those intervals, and thereby lay groundwork for colors and a map legend. All this information will live in the variable called "col_no.10ryg."

col_no.10ryg <- cut(as.numeric(skl.map$to.prop.total), c(0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1), include.lowest=TRUE, right=FALSE, ordered_result=TRUE) levels(col_no.10ryg) <- c("0-10%","10-20%","20-30%","30-40%","40-50%","50-60%","60-70%","70-80%","80-90%","90-100%")

At this point, the mathematically inclined might quibble with our left-closed intervals and the corresponding level names. I'm not concerned here because none of our voter turnout rates lands exactly on any of the cutpoints.

Now let's move the intervals back into the "gadm" data frame.

gadm$col_no.10ryg <- col_no.10ryg

Then generate a palette of ten colors, which will live in the variable called "myPalette."

myPalette <- brewer.pal(10, "RdYlGn")

At last, here comes the fun part. The first line tells R to make a PDF file in your working directory. The second line does a few things: loads our map, calls up the variable of interest, outlines the regions in dark gray (choosing a larger number would lighten the shade), fills the regions accordingly, and puts a nice title on everything. Line three turns off the PDF creator.

pdf(file="my-pretty-map.pdf") spplot(gadm, "col_no.10ryg", col=grey(.3), col.regions=myPalette, main="Sri Lanka: 2010 Presidential Turnout by District") dev.off()

If you've downloaded the CSV file that I included with this post, you'll notice it includes a variable called "rej.prop.total." This column includes the proportion of ballots that were invalid in each district. Below I modify the code above to make a map of these invalid rates. I specify nine intervals, and I tell RColorBrewer to make a palette of reds.

col_no.error <- cut(as.numeric(skl.map$rej.prop.total), c(0,0.005,0.01,0.015,0.02,0.025,0.03,0.035,0.04,0.045), include.lowest=TRUE, right=FALSE, ordered_result=TRUE) levels(col_no.error) <- c("0.0-0.5%","0.5-1.0%","1.0-1.5%","1.5-2.0%","2.0-2.5%","2.5-3.0%","3.0-3.5%","3.5-4.0%","4.0-4.5%") gadm$col_no.error <- col_no.error myPalette <- brewer.pal(9, "Reds") pdf(file="my-other-pretty-map.pdf") spplot(gadm, "col_no.error", col=grey(.8), col.regions=myPalette, main="Sri Lanka: 2010 Presidential Invalid Rates by District") dev.off()

Perhaps you do not want PDF files. Simply replace the pdf() call with png() or jpg().

You now have a free-of-charge method for creating choropleth maps (subject, of course, to the availability of shapefiles and whatever data you want to represent). The code above can be customized to your needs, especially as you learn more about R.

Jack Santucci is a Ph.D. student working on parties and elections in Georgetown University's Government Department.

comments powered by Disqus