A recent Reuters Graphic article, Bats and the Origin of Outbreaks, won instant praise for its beautiful illustrations and charts. The whole piece has a nice scientist’s notebook vibe with the illustrations, charts, fonts and colors all working together to support the theme. However, one of the charts looks suspicious. It looks like it’s showing something important, such as a sharp increase in bat species endangerment, but now I’m thinking the time component is not adding any real value.

Stacked area chart from the original article showing counts on the Y axis and year of assessment on the X axis.

The data comes from the IUCN Red List of Threatened Species Data and, at best, is showing the growth of the database over time rather than showing something about bats themselves. The downloadable database that was used for the chart only has the most recent assessment for a given species. So any earlier assessments are not shown, making the first part of the chart even less useful. However, multiple assessments don’t appear to be that common. While I couldn’t find a single dataset with historical assessments, you can see a few by looking at the page for an individual species. For instance, Rousettus madagascariensis shows two previous assessments. Looks like that info is available in JSON blocks internal to the webpage and so one could assemble a full historical dataset with some effort by visiting each species page.

screen capture of a bit of data in JSON format

However, I don’t think there’s too much to be gained since the re-assessments seem uncommon from my spot checks. Without a reliable assessment history, I think a more useful chart would just be the current breakdown of assessments. For that, we have the usual choices for part to whole. A straight 100% bar chart would leave the small most-endangered blocks too small to see and label, so here I’ve broken it up into three columns.

Stacked bar chart of bat species endangerment levels.

For all my graphs, I’ve re-ordered the assessment levels so that Data Deficient is between Least Concern and the threatened groups. Seems fairer since those could go either way.

Another technique for dealing with mixing large and small quantities is to use area instead of length for the data encoding. For that, here’s a treemap view.

Treemap chart of bat species endangerment levels.

Even a pie chart might have been better than the original.

Pie chart of bat species endangerment levels.

So maybe that time series area chart is fluff, but how about the scatterplot showing longevity versus size?

Scatterplot from the original article showing lifespan versus mass for many animal species.

Bat species appear to clearly stand apart from the others, and the article says this combinations makes them more susceptible for carrying chronic virus infections. It’s doesn’t really explain why that is, but the graph is beautiful enough to make we want to try and reproduce it. I got the data from their source, the AnAge Database of Animal Ageing and Longevity, and my first attempt didn’t quite match.

Scatterplot showing lifespan versus adult weight for many animal species.

I had even made the shift to log scales and filtered out extreme values from sponges and other organisms living thousands of years, but bats still don’t stand out like in the original. It turns out the original view is only showing mammals. I don’t see that mentioned in the article, so yet another reason to try a reproduce published charts: to understand what’s really being displayed.

Maybe it’s reasonable to only show mammals, but birds seem relevant as a virus source for humans given past bird flus, so here are just mammals and birds.

Scatterplot showing lifespan versus adult weight for mammal and bird species.

While the graph is more crowded, we can see the difference between bats and birds is only apparent at the low end of the size scale.

Leave a Reply