Raw Data Studies

Crossword solving times

I’ve been doing crossword puzzles on my iPad over breakfast (usually oatmeal) for the past few years. I got a subscription to the weekly American Values Club crosswords and would also do some free puzzles. Last year, I went in for a subscription to the New York Times crossword puzzles when they were having a discount, and I’ve been doing those almost daily.

The NYT crossword is bit stricter than my general iPad Crosswords app. For instance, the NYT app doesn’t make it easy to look up clues via Google, which is odd since their solving guide touts “It’s not cheating, it’s learning” and quotes a former editor saying, “It’s your puzzle. Solve it any way you like.” And you can’t check your answers until the next day.

After each puzzle solved, the NYT app will tell you your time and solving streak (only counting same-day solves), which does provide a little extra motivation. I’ve often wanted a download button to get access to my past data to measure the day-of-week difficulty increase and to track progress over time. Only recently I realized that each day’s puzzle has a unique URL and I could visit past puzzles, solved or still in progress. And that made me look for a way to automate those visits to collect my timing data and solved status.

My usual technique of running a JMP script to get the raw HTML for the page didn’t work here, because it didn’t have my authentication for the NYT site to identify me. Perhaps there is a way to pass along that info in the URL or the payload, but I wasn’t hopeful . Instead I looked at automating Chrome to load each page for me. It turns out AppleScript is still around for automating MacOS apps, and it even supports a JavaScript syntax now. Not that I know JavaScript that well, but there’s plenty of help for it online.

I uploaded my resulting AppleScript/JavaScript as crossword_times.js on GitHub in case anyone is curious. Getting Chrome to load a URL was fairly straightforward, but getting the data was a bit trickier. I had to study the HTML page to find the timing text and done status, and then get Chrome to execute an XPath query to retrieve the info I wanted. That meant having my script send another script to Chrome for execution, which required turning on an option to allow that in Chrome settings. I put all that in a loop and dumped the data as JSON for import into JMP.

Looking at the entire 16 months of data doesn’t show much, except that there were more at the beginning that I didn’t even start (those red x marks along the bottom). Zooming in on recent times, you can get a sense of the daily difficulty pattern.

Breaking out the weekdays, you can see both how the difficulty increased throughout the week and maybe that I’ve been improving over time.

The trend lines are only based on the solved puzzles. There exist statistical techniques to incorporate the unsolved times into the trend by treating them as censored values, but I didn’t have much luck with it, apparently because I often gave up early and with irregular amounts of effort in general.

The week-end puzzles are harder and/or bigger and I put them on a separate scale. Looks like I’m not improving too much on these. ?

I shared these images and more on Twitter last month, and my data is available as crossword_times.csv on GitHub and in interactive form on JMP Public, where you can filter by day of the week, for instance.

January 19, 2021

xangregg

Crossword solving times

Leave a ReplyCancel reply

Discover more from Raw Data Studies