Elisabeth Bik tweeted this image regarding a paper studying soil treatments to reduce a watermelon fungus. I wasn’t sure what to make of it since she often tweets about questions of scientific integrity. However, I think this tweet is just sharing the paper without highlighting anything questionable.

Nonetheless, the bottom plots stood out to me as what I sometimes call “Anscombe’s quartet in the wild”. Anscombe’s Quartet is the famous example of four small datasets having the same linear fit yet very different visual characteristics. Here’s a version I made as part of my All Graphs Are Wrong but Some are Useful talk in 2015.

The quartet was supposed to show the importance of visualizing data to make sure your statistical method makes sense. However, this and other examples show that it’s not enough to visualize your data, you also have to look at the visualization!

Just to be clear, the main issue I’m taking with original plots (c) and (d) is the use of a linear fit on data that is clearly not well described by a line plus noise model. There’s also a secondary issue of the fit going into the nonsensical region of negative fungus amounts. Nonetheless, I do acknowledge that, just as in the Quartet examples, the fitted result has *some* value: it tells us that the sloped line is a better fit than a flat line and serves as a first-order approximation of the effect.

Trying to investigate further, I used WebPlotDigitizer to capture an approximation of the data in panel (c) and applied my favorite graphical analysis tool, a smooth trend line.

That looks like a truer representation of the relationship. I imagine such non-parametric models aren’t popular in scientific journals, but I’d consider this a good first step that might suggest a different more acceptable model, such as a logistic curve or even a step curve. However, the smoother still has both flaws of the linear fit, just to a lesser extent: the imbalanced residuals and an unawareness of the non-negative response domain.

Here’s what a 3-parameter logistic fit looks like for the data. My software, JMP, doesn’t give me confidence intervals, probably for a good reason, but it does provide variable standard error values, and the CI shown is my application of those standard error values * 1.96 to the fitted curve.

That’s not bad, but by now I’m starting to read parts of the paper and wondering how the treatment groups are represented in the data. The study consists of two treatment groups (FBOF and FOF) and a control group (OF). The idea is that the treatments will increase helpful bacteria, such as *bacillus* which will then reduce the harmful fungus, *fusarium*. The top plots in the original panel show the bacteria levels versus the treatment group. Since the *bacillus* values are fairly spread out, we can use that common variable to join the two paired data sets to create a three-variable data set.

Here’s the original plot with a separate linear fit for each treatment group, using the original group coloring.

The high green point is part of a cluster of red and green values with similar *bacillus* values, so possibly it should be red. I used green since another graph in the paper shows the green group having a higher average *fusarium* response than the red group.

Given the wide confidence intervals, and the corresponding large p-values, you might conclude that *fusarium* levels are not related to the *bacillus* levels. However, since the blue region has little overlap with the others, it would be more accurate to say there’s no *local* correlation. I didn’t try to integrate the *trichoderma* data, which might be another confounder, in addition to the treatment itself.

Of course, it would be interesting to investigate the three outlying points: the high green dot, the high red dot, and the low-x green dot. As far as I can tell the raw data for the paper isn’t available, and I gone about as far as I can with the extracted data.