The Reuters Graphics article, India is running out of water, mixes text and detailed yet beautiful graphics to explore India’s water usage. The article shows how some regions are using more groundwater than is being replenished. Most of the charts are straightforward, but this summary chart comparing India with other countries puzzled me.
It use lengths to show water withdrawal amounts by source: groundwater (from wells) and surface water (from lakes and rivers) and bar breadths to show population. I’m avoiding the tradition width and height terms since the chart is rotated. What about area? It’s a dominate feature of the graph but it doesn’t encode anything meaningful: total water × population, and that was the source of my confusion. By area, for instance, USA is about 10-15% of India, but the water usage is more like 60%.
My rule is that every encoding channel in use should have some meaningful interpretation. Otherwise, our vision perception system and conscious brain have to spend effort negotiating what to ignore.
This chart form is sometimes called “bar-mekko” as a hybrid between a (stacked) bar chart and a marimekko or mosaic chart. The bars have variable breadth according to some other variable. It’s hard to pull off in general, and it works best when one of the variable is a proportion so that the product is the count.
To better understand the chart form, I set out to make a bar-mekko chart with this data where breadth, length and area all had meaningful interpretations.
Getting the data
The article cites several data sources, and I found most of the country-level data from the UN’s FAO AQUASTAT database. For some reason India and USA were not there, so I read those values off the original chart. In the process of understanding the data, I found two errors.
It’s always good to make a diagnostic chart of two when looking at new data, and I checked the top groundwater users to see if there were any other major groundwater users. I was surprised that the Republic of Moldova was the top withdrawer of groundwater.
For a moment, I thought Moldova might have massive wells that supply all of Eastern Europe, but looking closer at the data, I saw that Moldova’s previous groundwater usage figure was about 1000x less, 0.129 versus 126. So likely it was a misplaced decimal point. I reported the issue to the AQUASTAT contact, and they responded quickly, confirmed the issue and quickly corrected the online database. Yay!
The other data error was discovered reading the India values from the original chart. Its X axis is in millions of liters. 600 million liters per year doesn’t seem like much for a country with over a billion people. Even as daily usage that wouldn’t be much. I now think it should have been trillions of liters instead. The AQUASTAT data is in billions of cubic meters, and I suspect the author divided by 1000 instead of multiplying by 1000 to convert cubic meters to liters. After I pointed that out, the original image has been corrected. Double yay!
Remake as bar-mekko
To follow the meaningful area rule, I sought to keep population as the rectangle breadth and use per-capita water use as the length, which would make area correspond to total usage.
It works, but the main message of the graph, showing how much groundwater India uses, is harder to make out. It’s still there, both in the area sizes and in the rank order of the Y axis, but the length of the bar along the X axis is most noticeable and easiest to compare.
Other features of note: I’m not sure why this subset of countries was chosen for the original article, but I added Pakistan since it also has a lot of water usage and is near India. Oddly, Pakistan’s per-capita usage is more like the USA. Also, the unlabeled bar is Spain. Labeling is a challenge with variable-breadth bars; I could have used a tiny font like the original did, or squeezed in a special label, but I was too lazy.
Remake as mosaic
A traditional mosaic chart is more about proportions and makes it easier to see that India is more reliant on groundwater than most other large countries.
The breadths correspond to water usage for the entire country, the lengths correspond to the relative breakdown within country, and the areas correspond to the water usage for that country and source. Now it’s the proportion of each water source that’s easiest to compare.
Remake as bars
Regarding the general value of bar-mekko charts, Dan Zvinca noted on Twitter:
I believe that it is always easier (and uniform) to compare components and the results of any math calculation (that includes basic arithmetic operations as well) than encoding the components and the results in one view.
That’s the thinking behind my bar chart version.
Now it’s easy to compare both the total water usage and the groundwater usage among countries without the area distraction or the tight labeling challenges.