July 11, 2015
This is a follow-up to an article I wrote a few years ago on Solving Doublets in Mathematica.
Doublets are a type of word puzzle invented by Lewis Carroll (author of “Alice in Wonderland”). The goal is to change one word into another by adding, removing, or changing one letter at a time. The tricky part is that each intermediate step must also be a valid word. For more information see this Wikipedia article on word ladders. Here’s an example from that page:
COLD → CORD → CARD → WARD → WARM
If we think of words as vertices in a graph then two words are connected by an edge if they differ by exactly one character. Once we have a graph of connected words we can use a shortest path algorithm to solve for doublets. We’ll solve this in Python 2.7 using the NetworkX graph library. I highly recommend using the Anaconda Python distribution which includes NetworkX and many other useful packages for data science.
First let’s import the packages we’ll need and define a function differ_by_one
that returns True if two words differ by exactly one character and False otherwise. Note that we could simply check that the Levenshtein distance between the two words equals 1 but that algorithm is overkill for what we need here and runs much more slowly than the custom function below.
import re
import itertools
import urllib2
import networkx
def differ_by_one(word1, word2):
# Make sure word1 is shorter or equal in length to word2
if len(word2) < len(word1):
= word2, word1
word1, word2 if len(word2) - len(word1) > 1:
# Words differ in length by 2 or more characters so return False
return False
elif len(word1) == len(word2):
# Words are same length so check how many characters are different
# and return True if exactly one
= sum(c1 != c2 for c1, c2 in zip(word1, word2))
n_chars_diff return n_chars_diff == 1
else:
# word2 is guaranteed to be one character longer than word1.
# Chop out one character at a time from word2 and compare to word1.
for i in range(len(word2)):
= word2[:i] + word2[i + 1:]
word2_shortened if word1 == word2_shortened:
return True
return False
Now let’s get a list of 5-letter words and compute which ones are connected to each other. If two words are connected we’ll add them to our graph.
# Compile list of all lowercase words with up to 5 letters.
= urllib2.urlopen(
response 'http://s3-us-west-1.amazonaws.com/cpeccei-public/words.txt')
= [word for word in response.read().splitlines()
words if re.search('^[a-z]{,5}$', word)]
= networkx.Graph()
g for word1, word2 in itertools.combinations(words, 2):
if differ_by_one(word1, word2):
g.add_edge(word1, word2)
Now if we want to solve a doublet between two words we just do:
'hello', 'world') networkx.shortest_path(g,
Which in this case returns:
'hello', 'hell', 'held', 'hold', 'cold', 'cord', 'word', 'world'] [
NetworkX makes this so nice and simple. For fun we can compute some doublets that turn words into their opposites (i.e. antonyms).
# From http://www.michigan-proficiency-exams.com/antonym-list.html
= urllib2.urlopen(
response 'http://s3-us-west-1.amazonaws.com/cpeccei-public/antonyms.txt')
= [line.split() for line in response.read().splitlines()]
antonyms
= set()
antonym_paths for word1, word2 in antonyms:
if word1 in g and word2 in g:
try:
= networkx.shortest_path(g, word1, word2)
path tuple(path))
antonym_paths.add(except networkx.exception.NetworkXNoPath:
pass
with open('antonym_paths.html', 'w') as f:
for p in sorted(antonym_paths, key=len, reverse=True):
'<p><b>{}</b> to <b>{}</b><br />{}</p>\n'.format(p[0], p[-1],
f.write(' → '.join(p)))
Which returns:
heavy to light
heavy → heady → beady → bead → beat → begat → begot → bigot → bight → light
noisy to quiet
noisy → nosy → nose → note → nite → site → suite → quite → quit → quiet
right to wrong
right → sight → sighs → signs → sins → sing → ring → wring → wrong
night to day
night → nigh → sigh → sign → sin → pin → pan → pay → day
tight to slack
tight → sight → sighs → signs → sins → sics → sacs → sack → slack
clear to vague
clear → clean → clan → can → cane → cage → age → ague → vague
dark to light
dark → dank → sank → sink → sins → signs → sighs → sight → light
light to dark
light → sight → sighs → signs → sins → sink → sank → dank → dark
fresh to stale
fresh → flesh → flash → flask → flak → flake → slake → stake → stale
below to above
below → blow → bow → boy → body → bode → abode → above
glad to sorry
glad → goad → good → wood → woody → wordy → worry → sorry
sober to drunk
sober → saber → saner → sane → sank → sunk → dunk → drunk
left to right
left → lest → best → besot → begot → bigot → bight → right
best to worst
best → lest → lost → host → hose → horse → worse → worst
blunt to sharp
blunt → bunt → hunt → hurt → hart → harp → sharp
bless to curse
bless → less → mess → muss → cuss → curs → curse
big to small
big → bid → mid → mil → mail → mall → small
alive to dead
alive → live → lie → lid → did → dad → dead
dull to clear
dull → dell → deal → dean → lean → clean → clear
tall to short
tall → tale → tare → hare → share → shore → short
rapid to slow
rapid → raid → said → slid → slit → slot → slow
low to high
low → sow → son → sin → sign → sigh → high
rich to poor
rich → rick → rock → rook → book → boor → poor
cruel to kind
cruel → creel → creed → reed → rend → rind → kind
dusk to dawn
dusk → dunk → dun → don → down → dawn
bold to timid
bold → told → toed → tied → timed → timid
stand to lie
stand → staid → said → slid → lid → lie
early to late
early → earl → ears → eats → lats → late
loss to find
loss → less → lens → lend → fend → find
long to short
long → lone → lore → sore → sort → short
up to down
up → uh → oh → on → own → down
fast to slow
fast → fat → flat → slat → slot → slow
new to old
new → now → nod → god → gold → old
found to lost
found → fount → font → foot → loot → lost
slim to thick
slim → shim → shin → thin → think → thick
open to shut
open → pen → pun → spun → shun → shut
happy to sad
happy → harpy → hardy → hard → had → sad
sour to sweet
sour → soar → sear → swear → sweat → sweet
odd to even
odd → ode → owe → ewe → eve → even
dry to wet
dry → cry → coy → cot → wot → wet
hard to soft
hard → hart → tart → tort → sort → soft
tame to wild
tame → time → tile → wile → wild
sow to reap
sow → sop → sap → rap → reap
out to in
out → but → bun → bin → in
few to many
few → fen → men → man → many
find to lose
find → fine → line → lone → lose
land to sea
land → lad → lead → lea → sea
peace to war
peace → pace → pare → par → war
fat to thin
fat → tat → tan → tin → thin
less to more
less → loss → lose → lore → more
guest to host
guest → gust → lust → lost → host
take to give
take → hake → hike → hive → give
come to go
come → code → cod → god → go
loud to soft
loud → lout → loft → soft
hate to love
hate → have → hove → love
mad to sane
mad → sad → sand → sane
last to first
last → list → fist → first
bad to good
bad → gad → god → good
cold to hot
cold → colt → cot → hot
cheap to dear
cheap → heap → hear → dear
first to last
first → fist → list → last
me to you
me → ye → yo → you
near to far
near → fear → far
thick to thin
thick → think → thin
wax to wane
wax → wan → wane
here to there
here → there