A few months ago I helped demographer Ilya Kashnitsky investigate a suspicious chart about suicides in Maryland during the Covid-19 lockdown there. The original research letter in JAMA Psychiatry, Racial Differences in Statewide Suicide Mortality Trends in Maryland During the Coronavirus Disease 2019 (COVID-19) Pandemic, compared cumulative suicide counts by race for the first half of 2020.
While the data behind the chart was not supplied, I wanted to explore a few issues with it:
- Cumulative sum is essentially a smoothing technique and is often useful for rare events, but it does make it harder to comparer intervals since the start position of an interval is dependent on previous intervals. Plus, the data is by definition auto-correlated, making it harder to interpret the regression fits.
- Using raw counts seems problematic since Maryland has almost twice as many White residents as Black residents and suicide rates are general much lower among Black people. So while the directions are valid, the magnitude of the slopes seems exaggerated by about a factor of four.
- Fixing the line breaks at the lockdown intervals might be reasonable, but it would be nice to see an unconstrained model.
So I manually digitized the data in the original chart, de-accumulated the values, adjusted for population and plotted it with a smoother to get the figure below. The units are excess deaths per million residents over the 2017-2019 average. It’s still not adjusting for the different base suicide rates by race.
More recently I saw this related Lancet paper on a much broader scope, Suicide trends in the early months of the COVID-19 pandemic: an interrupted time-series analysis of preliminary data from 21 countries. Though the subtitle mentions time-series analysis, the visuals only show the final outcomes of the models. Here is a truncated version of the results, finding overall no increased risk of suicide during the first few months of the pandemic.
All those little qualifying asterisks and daggers next to the country names are indicating which kind of analysis was performed for each country: linear, linear + seasonal, or non-linear + seasonal. Seems strange to use different models, but discussion of the analysis sounds thorough, even adding sensitivity analyses, and likely the authors and reviews know better than me. Still, it would be nice to understand more about the modeling choices. Fortunately, most of the raw data and Stata code is provided in an appendix.
I say “most” of the data because low values are censored as can be seen for Carinthia, Austria in this excerpt. Seems a bit overly cautious but I’m guessing it’s standard practice. I wouldn’t gain much insight from such rare events anyway. Thanks to JMP’s PDF Import, it wasn’t too much trouble to use this data for my own analysis.
Here’s a raw view of the normalized monthly counts along with a smoother for the 20 areas with a decent amount of uncensored data. By “normalized counts” I mean I’ve adjusted them for differing month lengths. The red dots are the pandemic values.
Seeing all the raw data in context, we can start to understand both the data and the modeling choices. Some of the areas, New South Wales and Mexico City, only have one year of historical data. That must be why they didn’t get seasonal models applied. Too bad since NSW appears to have a prominent seasonal pattern.
We can sense the seasonality better. The conventional wisdom is that suicides peak during the end-of-year holidays when loneliness is amplified, but the seasonal trends here are quite varied. Chile’s trend has a strong seasonal component peaking before year end; Poland’s seasonal trend peaks in the Spring; and Netherlands has little seasonal variation.
We can see some areas have uneven histories. Those, apparently, are the ones which got treated with the non-linear fractional polynomial regression. I can see Peru has an odd history, but I’m surprised New Jersey qualifies for that treatment. Sure it’s slightly quadratic, but that’s starting to feel like overfitting.
Though my smoothers are applied to all the data, the last data points might be incomplete, and they were ignored in the paper’s analysis. For instance, notice the low final month counts for New Jersey and California.
Even though the seasonal component looks weaker than expected, it may be easier to see the pandemic values in context if they were overlaid on previous years. Here is a view of that with something I call a moving box plot for the pre-pandemic data.
A moving box plot is my attempt at what John Tukey called a “wandering schematic plot” as a way of applying his “schematic plot” (commonly known now as a box plot) to continuous predictor variables. Like a moving average, a moving box plot computes each median and quartile based on a moving window of the data and connects them to form the line and shaded regions. I think it provides reasonable context, but I have one reservation about it: the moving window doesn’t wrap-around to capture the potential cyclical nature of the data.
Of the panels here, Japan looks the most worrying. Even though the study results finds Japan with a reduced suicide count because it only considers the early months of the pandemic, the later months show a steady rise. Fortunately, the numbers later improved after appointment of a “minister of loneliness,” as reported in a Graph Detail treatment by The Economist. The article also has a nice restyling of the paper’s results chart, for those with a subscription.
Sparse in, fuzzy out
Overall, both analysis are difficult since they’re looking at relatively rare events over short time periods. (As always), it would be nice to have more data, but I can understand the value in getting papers like this out just to confirm that there hasn’t been a drastic change in suicides.