Statistical education software tool usage

A recent paper “The Teaching of Introductory Statistics: Results of a National Survey” by Chelsey Legacy et al. in Journal of Statistics and Data Science Education summarizes 228 responses to a 2019 survey of statistics teachers in colleges and universities. The following figure got my attention since I work on a statistical software product, JMP. The chart supposedly shows the percentage of respondents using software in each of five categories.

It’s shocking if Excel is really so dominant at the college level, and I wonder how the individual products in each category fare. Fortunately, the data is available to download. The tool usage data comes from the question, “Do students use the following desktop- or web-based applications/software to analyze data in your class?” which was asked once each for 18 different applications. Here’s an excerpt of the raw survey data, showing the presence of three possible responses: NA (no answer), No, and Yes.

t_Excel	t_Fathom	t_JMP	t_Minitab	t_Python	t_R_GUI	t_R_Studio
Yes	No	No	No	No	No	No
Yes	No	No	No	No	No	Yes
Yes	No	No	No	No	No	No
Yes	No	No	No	No	Yes	Yes
Yes	No	Yes	Yes	No	No	No
No	No	No	No	No	No	No
NA	NA	NA	NA	NA	NA	Yes
Yes	No	No	No	No	No	No

Here’s a graphical summary of all the responses.

Excel indeed has the most Yes responses, but some others aren’t that far behind, and I would expect them to be on par with Excel when grouped into categories. Since each respondent can answer Yes to multiple software choices, it’s not so straightforward to aggregate the responses. Still, we can already see that R Studio alone has 50+ Yes responses, or about 25% of the total, which is already above the 10% claimed in the paper’s figure.

(Note: Tableau was grouped into “Other” instead of “GUI-Based” because of what appears to be a coding error in the paper, where it was not assigned a specific category.)

Missing responses

There are a fair number of NA responses, and it’s unclear what those mean in this context. Does it really represent a missing response or is it the same as No? I can imagine the latter if the respondent answered Yes for some software and skipped the others, making those recorded as NA.

Here’s one way of looking at all the responses, grouped by the Yes-No-NA response pattern.

The top gray bar represents the 25 respondents who didn’t answer any of the software questions. I checked and most of those respondents answered all the other questions in the survey, implying maybe they left the software questions unanswered intentionally as a blanket No response. Nonetheless, either way won’t affect the relative results.

Very few respondents made use of both NA and No, so it seems reasonable to say that at least those upper bars with only Yes and NA respondents can be treated as Yes and No responses. After that, there is a group of No+NA responses; oddly they come from the first two respondents in the data set and had many other unanswered questions, making me wonder if there was some issue that got worked out in the survey administration. Fortunately, it’s only a few responses which is not enough to have a noticeable impact on the summary statistics.

Summarizing by software type

When multiple responses are allowed per respondent, there are a few ways to aggregate them. Since the caption of the original chart says “Percentage (and 95% CI) of STI respondents”, I aggregated them such that if one respondent is using three different GUI-Based tools, that counts once, not three times. The paper also introduces the chart with “For instructors who have students use software …”, which I interpret as ignoring the 27 respondents who had zero Yes responses.

Recreating the original chart (except as a bar chart and without confidence intervals) reveals quite different results.

My findings align with my original stacked bar chart of the response breakdowns, indicating a potential error in the paper. I contacted the author in case there was a different intention for the percentages, but she only directed me to the R code that was included in the paper’s materials.

Diagnosing the difference

I don’t know R very well, but the code was written well enough that I could mostly follow along. Here are the critical sections. This first block also shows why Tableau gets coded as Other since it was left out of the “Software %in%” tests and falls into Other.

software = sti_2019 |>
  select(t_CODAP,t_Excel, t_Fathom, t_JMP, t_Minitab, t_Python,
         t_R_GUI, t_R_Studio, t_R_Studio_Cloud, t_SAS, t_SAS_U,t_SPSS, t_Stata,
         t_StatCrunch, t_Statkey, t_Tableau, t_TinkerPlots, t_Other) |>
  gather(
    key = Software, 
    value = Response
  ) |>
  mutate(
    Software = stringr::str_remove(Software, "t_"),
    Type = case_when(
      Software == "Excel" ~ "Excel",
      Software %in% c("CODAP", "Fathom", "TinkerPlots", "Statkey") ~ "Pedagogical",
      Software %in% c("JMP", "Minitab", "SPSS", "Stata", "StatCrunch") ~ "GUI-Based",
      Software %in% c("Python", "R_GUI", "R_Studio", "R_Studio_Cloud", "SAS", "SAS_U") ~ "Syntax-Driven",
      TRUE ~ "Other"
    )
  )

The gather operation (which I was used to as “stack” in JMP) restructures the 18 columns into 18 rows per respondent with two columns, Software and Response. For instance, here are the 18 rows for one respondent.

Software	Type	Response
CODAP	Pedagogical	No
Excel	Excel	No
Fathom	Pedagogical	No
JMP	GUI-Based	No
Minitab	GUI-Based	No
Python	Syntax-Driven	NA
R_GUI	Syntax-Driven	No
R_Studio	Syntax-Driven	Yes
R_Studio_Cloud	Syntax-Driven	Yes
SAS	Syntax-Driven	No
SAS_U	Syntax-Driven	No
SPSS	GUI-Based	No
Stata	GUI-Based	No
StatCrunch	GUI-Based	No
Statkey	Pedagogical	No
Tableau	Other	No
TinkerPlots	Pedagogical	No
Other	Other	No

The next block of code summarizes that data, without regard to the respondents. That is, instead of about 200 respondents, it’s looking at 200 × 18 responses. So this respondent who answered Yes for two of the six questions in the Syntax-Driven category, will contribute 2/6 in that group and 0/5 in the GUI-Based and 0/1 in the Excel group, when they should contribute 1/1, 0/1 and 0/1 respectively.

software_yes_tbl = software |>
  group_by(Type, Response) |>
  summarize(
    n = n()
  ) |>
  tidyr::drop_na() |>
  mutate(
    p = n / sum(n),
    N = sum(n)
  ) |>
  ungroup() |>
  filter(Response == "Yes") |>
  select(version = Type, n, p, N) |>
  mutate(question = "Software Type")

Effectively, categories containing many products get penalized in the final percentages. Here is my reproduction of the paper’s summarization. Though I don’t completely understand the R code, I think I got it right since the figures in the last column completely agree with the chart from the paper.

Type	Yes	No	NA	Yes / (Yes + No)
Pedagogicall	54	681	177	7.3%
GUI-Based	128	801	211	13.8%
Syntax-Drivenen	101	994	273	9.2%
Excel	93	99	36	48.4%
Other	36	326	94	9.9%

What now?

After being more sure of my calculations, I contacted the paper’s contact author again with the findings. Update: I just heard back, and they’re redoing the calculations and contacting the editor about making a correction.

Even if this is an error in the paper, what happens next? The growing trend where papers also publish their data and code is a giant step in the right direction, but as far as I know, there’s no mechanism for correcting errors in the code or the papers.

Epilogue

Still perplexed by the common use of Excel in college intro stats courses, I dug a little deeper into the data. There is a field for the type of institution, and a quick rework (without as much care for NA values) of the calculations does show a difference there, with two-year colleges being far less likely to use products in the syntax-driven category and more likely to use all the others, including Excel.

And going back to my original breakdown by product, Excel and especially StatCrunch, a web-based application, are more often used in two-year colleges while R is less often used there.

Another reason for the large number of Yes responses for Excel is the Excel is so ubiquitous that it always gets included but may not be the main software product. While we can’t tell how much each tool is used within respondent, we can at least tell if they’re using Excel alone or in conjunction with other tools.

This graph shows, for those 93 respondents who answered Yes to using Excel, the number of Yes responses they had across all software tools.

That first bar represents those educators who used Excel and only Excel. That’s not too many (18) out of the survey total (200+), which feels less alarming.

t_Excel	t_Fathom	t_JMP	t_Minitab	t_Python	t_R_GUI	t_R_Studio
Yes	No	No	No	No	No	No
Yes	No	No	No	No	No	Yes
Yes	No	No	No	No	No	No
Yes	No	No	No	No	Yes	Yes
Yes	No	Yes	Yes	No	No	No
No	No	No	No	No	No	No
NA	NA	NA	NA	NA	NA	Yes
Yes	No	No	No	No	No	No

t_Excel	t_Fathom	t_JMP	t_Minitab	t_Python	t_R_GUI	t_R_Studio
Yes	No	No	No	No	No	No
Yes	No	No	No	No	No	Yes
Yes	No	No	No	No	No	No
Yes	No	No	No	No	Yes	Yes
Yes	No	Yes	Yes	No	No	No
No	No	No	No	No	No	No
NA	NA	NA	NA	NA	NA	Yes
Yes	No	No	No	No	No	No