As a follow-up to my Deep Wordle exploration of optimal Wordle play, I investigated a different strategy for variations like Quordle, Octordle and Sedecordle, where you use the same guesses to find multiple solutions in parallel. In those games, I start with a fixed triplet of three starting words and then hope to guess one solution in each subsequent guess. So that would be Quordle in 7, Octordle in 11 and Sedecordle in 19.
I’ve had pretty good success with the triplet “chomp grind salty”, achieving the 7 goal over half the time. It includes 15 distinct common letters and strategically omits “e” and “u”, hoping they can be guessed when needed. How can I quantify what a great triplet I found?
While it’s not practical to fully examine every possible triplet of guesses against every possible set of 4 or 8 or 16 solutions, I tried a few different approximate scoring methods:
- Average bucket size for each triplet
- Maximum bucket size for each triplet
- Average game length, from simulated games
- Probability of solve-in-seven, from simulated games
I constrained the triplets to my set of about 3600 “common” five-letter words [commomn.txt on github]. I made it by merging the 2300 Wordle solution words with other lists of common words. Many of the additions are plurals of four-letter words.
Selecting three words using fifteen distinct letters resulted in 4 million triplets. At one point, I disallowed rare letters “jkqxz” which reduced the triplet count to 1 million, but with only a 4x difference, I stuck with the full letter set.
By “bucket size” I’m referring to the way each Wordle response partitions the potential solutions into groups I’m calling buckets. There are three responses per letter (gray, yellow and green) and so 3×3×3×3×3 = 243 possible responses per word, creating 243 buckets of potential solutions. For three words, the upper bound for the number of buckets is 243 × 243 × 243 = 14.3 million.
Since the words have distinct letters, many of those clue combinations can’t happen. For instance, at most 5 of the 15 letter responses can be non-gray. It would be an interesting combinatorial counting problem to determine a better upper bound, but fortunately allowing for all 14.3 million possibilities wasn’t a burden to the scoring. For what it’s worth, I saw 84,357 non-empty buckets using legal Wordle words.
Average bucket size
A good triplet will split the 2300 possible solution words into many buckets with very few words in each bucket, ideally one (or zero). So as an approximate triplet quality score, I computed the average size of the non-zero buckets. Here’s the distribution of that average over all 4 million triplets.
The triplets with the lowest averages were:
chimp robed slant 1.418 boned clasp right 1.423 chimp gored slant 1.424 bored chimp slant 1.428 birch moped slant 1.428
My “chomp grind salty” came in ranked 107,173. Not so great, after all, but out of 4 million triplets, it’s in the top 2.5%. 😀
Maximum bucket size
Maximum bucket size is probably not so interesting since it likely arises from a very obscure solution set and the result is an integer which means there are a lot of ties. Nonetheless, it was nice to see the distribution (clipped to 31 in the histogram), and there was one surprising result. Only a single triplet had a maximum bucket size of 5: charm gifts poled.
Simulated game scores
With such small average bucket sizes after three guesses, it should be possible to simulate a few more moves to get a better sense of the playability of each triplet. We can’t play all 2300 × 2300 × 2300 × 2300 = 28 trillion solution sets (even more for Octordle), but a random sampling should be informative. My Mac is able to run 50,000 game simulations for a single guess triplet in under a second. Still too slow to cover all 4 million triplets, so I ran the game simulation on the top 100,000 triplets.
The average game score (in turns count) and the probability of solving in seven turns were highly correlated, especially at the top. Here are the top five triplets by probability, also showing the average turns count.
chimp robed slant 53.5% 7.562 bored chimp slant 53.3% 7.570 chimp gored slant 53.0% 7.575 coped glint marsh 52.8% 7.582 birch moped slant 52.8% 7.579
Amazingly, “chimp robed slant” is still at the top, by both measures. Only “coped glint marsh” is new to the top five, having ranked 28th in the average bucket size score. 186 triplets achieved a solve-in-seven success rate of 50%, but I feel like those values are a bit low. My daily stats for “chomp grind salty” show a 50+% solve-in-seven success rate, but the simulations give it at 41% chance. I’m guessing it’s a result of the simplified solving behavior use in the simulations. Even with that downward bias, the score should be meaningful for relative quality.
I manually added “chomp grind salty” just for kicks, and it did move up 60,000 spots in the rankings, which saves my pride a little bit.
Simulation versus bucket size scoring
The two scoring methods, bucket size and game simulation, are reasonably correlated. Here are the two measures against each other for the top 100,000 that I simulated.
I had to crank the transparency way down to get a sense of the core density, but then it’s hard to see the outlying points. Here’s a 2-D HDR plot to show both the Highest Density Regions and the outliers.
That’s “chimp robed slant” in the bottom left. In spite of the randomness, I tend to trust the simulated games as a truer measure of the quality of each triplet since it’s more grounded in game play.
It’s all good
Though I’ve highlighted the top scoring triplets, the real take-away is that there were about 1 million triplets of very similar quality (first histogram). So pick one you enjoy. I think I’ll start using “build graph often” which has a respectable 7.88 average turns per game.
Bonus chart: letter sets
Each triplet uses 15 distinct letters, and many of them use the same letters. In other words, there are many anagram triplets. The most common letter set among the top 100,000 triplets was “acdehilmnoprstu” with 1753 anagrams! You might think that anagram triplets would have very similar scores, but there is more variation than I expected. Here are 1-D HDR plots for the best scoring letter sets, denoted by their difference from the most common letter set.