I also wrote a tool for playing: https://github.com/vple/wordle-solver/blob/main...

jonathankoren · on Jan 14, 2022

This is pretty much what I did, but I mixed in a regexp to hold the location restrictions, and a penalty for using the same letter multiple times. (eg guessing “added” is worse than “aspen” for “a..e.”)

I do wonder if looking at how a letter splits the space of letters and words would be interesting

vple · on Jan 14, 2022

Yeah, I wasn't sure how I wanted to deal with duplicates so I mostly ignored them. I track letter positions directly (just a bunch of tuples), but don't actually do anything with this other than restricting candidates words.

I think if I work on this some more I'd try to factor in letter positioning when deciding what to guess. My hunch is that it won't make too much of a difference though.

jonathankoren · on Jan 16, 2022

So I tried an experiment using 15,918 five letter English words. I used a basic scoring strategy of scoring a word by summing up the frequency of the candidate letters in the candidate words as determined by a regexp of included and excluded letters. (e.g. `.aves` would score `waves` 1, but `saves` as 0 since `s` is already included)

Variations included adding in the frequency of the letter at a particular position, and adding in the frequency of two letter combinations.

Interestingly enough, the winning strategy was using single letters and using figuring in the position. Second second best was using two letters and position.

ngram=1 posfreq=True mean attempts: 4.34 WinPct 91.280%

ngram=2 posfreq=True mean attempts: 4.35 WinPct 91.186%

ngram=2 posfreq=False mean attempts: 4.37 WinPct 90.074%

ngram=1 posfreq=False mean attempts: 4.38 WinPct 90.445%

Since my base dictionary is way bigger than the Wordle one, I also mixed in a smaller 1,382 word dictionary (google-10000-english.txt) and then combined them by either just sorting by the score, or normalizing the scores, and then sorting. Normalizing the scores was strictly worse.

normalize=False ngram=1 posfreq=True mean attempts: 4.34 WinPct 91.280%

normalize=True ngram=1 posfreq=True mean attempts: 4.43 WinPct 90.281%

FWIW, the absolute worse one was:

normalize=True ngram=1 posfreq=False mean attempts: 4.43 WinPct 89.835%

I should write this up.

hddqsb · on Jan 17, 2022

Which solution list did you use to calculate the mean attempts?

Another comment mentioned https://botfights.io/game/wordle, if you evaluate your solver on their word list you could compare scores.