Like many others, I’ve had fun playing the daily Wordle puzzles, and like many programmers, I’ve had additional fun writing code to explore optimal guessing strategies. What might be different in my efforts has been an attempt to do a deep traversal of the forking paths of guesses that make up the full decision tree. Most approaches I’ve seen apply greedy or shallow optimization of just the first or second guess. My data files are shared at github/deepwordle.
For anyone still unacquainted with it, Wordle is an online word-guessing game with rules similar to the board game Mastermind. You have six guesses to identify the word of the day based on the color clues on your previous guesses. Here’s a recent game I completed compared to choices from the optimal decision tree.
After luckily getting two letters right with POINT, I tried a stab at the solution with BOAST and then a couple more educated guesses. However, you don’t have to make your second guess compatible with the first guess’s clues. You can use it for general information gathering instead. That’s what the decision tree did by choosing ACRES as the second word, optimizing the information gained given the first clue’s information. Here’s that part of the decision tree.
point ⬜🟩⬜⬜🟨 35 ratty 🟩⬜⬜🟩⬜ 1 route
point ⬜🟩⬜⬜🟨 35 ratty 🟩⬜🟩⬜⬜ 1 rotor
point ⬜🟩⬜⬜🟩 14 acres ⬜⬜⬜⬜⬜ 2 doubt
point ⬜🟩⬜⬜🟩 14 acres ⬜⬜⬜⬜⬜ 2 doubt ⬜🟩🟩⬜🟩 1 moult
point ⬜🟩⬜⬜🟩 14 acres ⬜⬜⬜⬜🟨 2 boost
point ⬜🟩⬜⬜🟩 14 acres ⬜⬜⬜⬜🟨 2 boost ⬜🟩⬜🟩🟩 1 joust
point ⬜🟩⬜⬜🟩 14 acres ⬜⬜🟨⬜⬜ 1 robot
point ⬜🟩⬜⬜🟩 14 acres ⬜⬜🟨⬜🟨 1 roost
point ⬜🟩⬜⬜🟩 14 acres ⬜⬜🟩⬜🟨 1 worst
point ⬜🟩⬜⬜🟩 14 acres ⬜🟨⬜🟩⬜ 2 comet
point ⬜🟩⬜⬜🟩 14 acres ⬜🟨⬜🟩⬜ 2 comet 🟩🟩⬜🟩🟩 1 covet
point ⬜🟩⬜⬜🟩 14 acres ⬜🟨🟨⬜⬜ 1 court
point ⬜🟩⬜⬜🟩 14 acres 🟨⬜⬜⬜🟨 2 boast
point ⬜🟩⬜⬜🟩 14 acres 🟨⬜⬜⬜🟨 2 boast ⬜🟩🟩🟩🟩 1 toast
point ⬜🟩⬜⬜🟩 14 acres 🟨⬜🟨⬜🟨 1 roast
point ⬜🟩⬜⬜🟩 14 acres 🟨🟨⬜⬜🟨 1 coast
point ⬜🟩⬜🟨⬜ 28 maned ⬜⬜🟨⬜⬜ 2 colon
point ⬜🟩⬜🟨⬜ 28 maned ⬜⬜🟨⬜⬜ 2 colon ⬜🟩🟨⬜🟨 1 nobly
After the first clue, ⬜🟩⬜⬜🟩, there are only 14 possible solution words, and the next guess ACRES was chosen to best differentiate those 14 words. It usually doesn’t work quite this well (three guesses) even with an optimal algorithm.
The questions I want to investigate are:
- What is the maximum number of guesses needed for a given word?
- What is the expected number of guesses needed for a given word?
- Is there a best or worst starting word from those measures.
Wordle uses two word lists. One list consists of all the solution words (2315 in total). Another has additional words that can be used as valid guesses (about 11,000). The main distinction seems to be that the solutions words are more common while the additional legal words are more obscure. However, the solution word list is also missing common words that you might call derivative words such as plurals and past tense words.
I wanted my investigations to reflect a person’s play which probably wouldn’t include obscure words but could include those derivative words. so I created my own list of common guess words. It contains 3622 words and is a result merging:
- the Wordle solution list
- the Word Master (a Wordle clone) solution list
- the five-letter words from Google 10,000 most common English words
- the Wordle obscure words that look like plurals and past tense of smaller common words
I then pruned out a few that were proper names and words that seemed obscure to me and ended up with common.txt.
Much of the attention of the “Wordle Analytics” crowd is around finding a “best” first guess. My theory is that the first guess is not so important as long as the second guess is optimized. Before going too far we have to admit that “best” has many definitions. For the shallow analyses, best usually means either the coverage of the most common letters or the elimination of the most words as possible solutions. FiveThirtyEight has a medium-depth analysis where best means the highest chance of finishing in three guesses. Even for a deep analysis, there best-ness choices such as the maximum number and the average number of guesses.
What is a deep analysis? Ideally, it would examine every possible tree of guesses and determine the tree that is quickest to arrive at all the possible solutions.
Wordle allows 6 guesses, so hopefully more are not needed if solving optimally. However, even constrained to 6 guesses and 3600 possible words, that’s 3600^6 or over 10^21 possible trees to examine, so concessions are needed to find the optimal trees . My initial code was thorough but too slow (many hours for a single word’s optimal tree), but a series of improvements made it fast enough to try all the words (1 – 10 minutes each for most words). The improvements mainly came from trimming the search to avoid hopeless paths. Some of the trimming actions:
- focus on minimizing the maximum number of guesses instead of the average number, which allows for early stopping when a goal number is achieved — five in this case
- stop exploring a sequence if it isn’t reducing the uncertainty much
- don’t explore guesses that contain too many “dead” letters (those already disqualified)
- start with a breadth-first shallow assessment to prioritize which next guesses are worth following for deep assessments
All of those adjustments have some fuzzy parameters to them that needs tuning. My meta-adjustment was to apply the deep search algorithm multiple times until a good decision tree was found, with each subsequent attempt using less aggressive pruning. While it’s impossible to say that the whole search space has been covered, the good news is that the algorithm is fast enough to examine all 3622 of my “common” words as starting words in an overnight run.
To my surprise, almost every possible starting word needs only a maximum of five total guesses to find any solution. So the first word really doesn’t matter for this measure of best-ness. In fact, only three words needed six guesses: MUMMY, SHUSH and YUMMY. The average number of guesses was usually between 3.5 and 4, so there is not much differentiation there, either. Those calculations are more approximate since my search stops early when it finds a decision tree with a maximum of five guesses. That is, it’s possible a different depth-five tree had a better average than the one I found for each starting word. Here’s a dot plot of the average number of guesses for all 3622 starting works.
If we zoom in on the low and high ends, we can label more words.
Judging by the estimated average number of guesses, CRATE is the best starting word. TRACE is a close second and was the also best word found by the FiveThirtyEight “SolveInThree” optimization.
I’m still wondering how to visualize the findings. It’s hard to do anything with the whole data set of 3622 decision trees. Here’s a heat map of ten selected starting words (along the x axis) and the optimal second word for each of the most common responses to the first words (160 out of 3^5 = 243 possible responses along the y axis).
Trying to visualize the entire decision tree for a given starting word makes for a huge illegible graphic.
The first guess is in the center, and as the rings go outward, the next word is printed over the clue for the previous word, and small words have some jitter in the radial direction to reduce overlap. Though most details are illegible, there is some discernible information:
- Most of the second words (innermost ring) are legible.
- You can verify that at most five guesses are required.
- From the density of the outer ring you get a sense of how often you need five guesses
We can keep those features and make a simpler version by giving up on showing the small words.
Now it’s getting into the data art realm as a Wordle emblem for a starting word. The unlabeled decoration is not completely meaningless — at least you can tell CRATE from JAZZY which are on opposite ends of the average guesses scale while both needing a maximum of five guesses.
As a check, I tried out the JAZZY decision tree against Absurdle, the adversarial version of Wordle.
It worked! After the KITED response there were 7 possible words, BEECH, BELCH, BENCH, LEECH, WELCH, WENCH, and WHELP. The BELIE guess generates a different response for each of those words, so Absurdle was helpless to extend the game beyond 5 moves.
With that somewhat dramatic finish, I wondered what the most dramatic finish possible is. That is, what’s the largest number of words in play before the fourth guess? I’ve found a handful of situations in the generated trees where there are 10 words in play but no more than that. One starts from JAZZY:
jazzy ⬜⬜⬜⬜⬜ 1111 rouse 🟨⬜⬜⬜🟨 72 fiend ⬜⬜🟨⬜⬜ 10 ?????
The 10 possible solutions are: BERET, BERTH, EGRET, EMBER, ETHER, LEPER, LEVER, METER, PERCH, and THREW. The word that amazingly differentiates among them is … PETER!
… peter ⬜⬜🟨🟩🟨 1 threw
… peter ⬜🟨⬜🟩🟩 1 ember
… peter ⬜🟨🟨🟩🟨 1 egret
… peter ⬜🟨🟨🟩🟩 1 ether
… peter ⬜🟩⬜🟩🟩 1 lever
… peter ⬜🟩🟨⬜🟨 1 berth
… peter ⬜🟩🟨🟩🟨 1 beret
… peter ⬜🟩🟩🟩🟩 1 meter
… peter 🟨🟩⬜🟩🟩 1 leper
… peter 🟩🟩⬜⬜🟨 1 perch
5 responses to “Deep Wordle”
Very interesting. What does the solutiontend to look like when the answer is something like SHARE, SHAPE, SHADE, SHAKE, SHALE, SHAVE or SHAME? Or RUMPS, PUMPS, SUMPS, DUMPS, HUMPS, JUMPS, LUMPS, BUMPS or MUMPS?
I wrote a solver that tried to approximate the best guess at each turn by estimating and minimising the number of possible answers at each turn. This ran much quicker than yours but would sometimes get stumped by the words above and end up cycling through the final letter to find the solution.
I had a quick look at some of the trees in your results folder and I wonder if part of the optimal strategy is to use up those interchangeable letters to avoid ending up in that situation. So the optimal solution is mainly driven by these collection of words with one interchangeable letter.
Yes, some of my initial too-quick searches would get stuck with those word groups at the end. Another group I remember was *OUND. It’s especially risky to guess one of those words in hard mode.
Plurals are generally not in the solution set, so those groups aren’t really a problem.
I call guessing sound, pound, mound, and so on “falling down the canyon”; based on how the picture looks after the fact.
In those scenarios, the hardest part is not jumping on the first answer that pops in your brain, but figuring out the the best entropy-minimization guess, which would have 4 consonants and one vowel like: swamp or frosh.
[…] a follow-up to my Deep Wordle exploration of optimal Wordle play, I investigated a different strategy for variations like […]
I thought you might be interested in a result from my Wordle Helper program (developed just for my own entertainment).
After using starter word STARE and getting S and R in yellow, a “dramatic finish” at level 11 results:
with DICKS as the differentiator.
My program searches for “differentiators” as a priority so that it takes at most 2 further guesses to achieve a solution.