Continuing the theme of mapping choices, this post examines choices in a different kind of map data visualization. In this Washington Post Reservoirs are drying up as consequences of the Western drought worsen article graphic, the geographic map is for context and each “data mark” is a miniature graph itself.
The blue filled triangles are striking and immediately evocative of partially filled reservoirs. The fill sizes are based on area, which seems correct, but it’s still very easy to misread. As Christopher Ingraham put it, “the proportions are basically like an optical illusion“.
I was able to get the California reservoir data easily from the Daily Reservoir Storage Summary at the California Department of Water Resources, cited by the Post story. That summary table doesn’t have the location data, but each reservoir station ID in the table links to a station metadata page with longitude and latitude. Since the metadata page URLs follow a consistent pattern using the station ID, I was able to write a little JMP script to loop through all the station IDs and extract the location fields.
dt = Open( "california reservoirs.jmp" ); dt << New Column( "lon", Format( "Longitude DDD", "PUNDIR", 14, 4 ) ); dt << New Column( "lat", Format( "Latitude DDD", "PUNDIR", 14, 4 ) ); For Each Row( dt, dt meta = Open( "https://cdec.water.ca.gov/dynamicapp/staMeta?station_id=" || :StaID, HTML Table( 1, Column Names( 1 ), Data Starts( 2 ) ) ); dt:lon = Num( dt meta[3, 4] ); dt:lat = Num( dt meta[3, 2] ); Close( dt meta, "NoSave" ); Wait( 0.1 ); );
I also added Lake Mead manually but didn’t add the Oregon reservoirs. A simplistic view of the data with dots on a map, sized by capacity and colored by percent full, underscores the two main mapping choices.
- How to visually encode the data. Circle size is OK for the capacity, but color is not working well for percent full. Maybe a better color scheme would help, but it still seems problematic.
- How to handle the overlapping marks. Some of the reservoir locations are close enough that any proportionally sized marks will overlap each other.
I’m mainly interesting in the data encoding choice, but I’ll briefly comment on the overlapping choice. The WP version appears to use custom relocation of the overlapping reservoir marks (aka dodging). Seems reasonable but it’s a little disappointing that there’s no indication of it. That is, I wouldn’t have realized the locations were off except by remaking the chart myself. Another option is to use transparency as in my bubble chart above. A third option is to pull out the mini-charts with pointers to the actual locations, as done in this reservoir supply map on the Department of Water Resources site.
Back to the data encoding choice, I also (before seeing the above chart) thought of using rectangles instead of triangles for better proportion encoding. I tried a few rectangle shapes and I think I like the 2:3 ratio best. Here is my version with partial overlap dodging and using background tiles from Stamen Designs.
I do think the proportions are better represented. I haven’t included the annotations like in the original only due to my laziness. They definitely add value. I added a horizontal line for the historical average volume of each reservoir. Not sure how relevant it is, but it’s prominent in the Dept. of Water Resources tables and charts, so I’m following their lead and including it.
Here are two other rectangle aspect ratios I tried. In the left image, all the rectangles have equal heights. In the right image, all are squares. The background imagery in these is from Natural Earth Data.
As an implementation note, part of my interest here is in being able to exercise JMP’s new support for custom marker drawing. I supplied a script to draw each data mark as a pair of filled rectangles, and it worked pretty well. Another technique would be to make an image for each of the mini-graphs and then use the images as markers, sized by the capacity.
Finally, a note on data quality. One insight from remaking charts is an appreciation of data issues. The California data was easy to read; Lake Mead took a little more searching, and I didn’t see the Oregon reservoirs neatly available. I did notice one discrepancy in the reservoir locations. The WP version has an unlabeled reservoir in the northeast, and it’s missing the Indian Valley reservoir that I have south of Oroville. I wonder if it got misplaced during the dodging work or if I’ve missing something.
This was a fun chart to remake. I’m still intrigued by the triangle area encoding. It seems so right since it resembles a lake profile and area itself is not a terrible data encoding dimension. Yet, it’s hard for my brain to look past the vertical proportion.