Decrypting cryptic crosswords: Semantically complex wordplay puzzles as a target for NLP
- 1. Stanford University
- 2. The University of Texas at Austin
Description
Cryptic crosswords, the dominant crossword variety in the UK, are a promisingtarget for advancing NLP systems that seek to process semantically complex, highlycompositional language. Cryptic clues read like fluent natural language but areadversarially composed of two parts: a definition and a wordplay cipher requiringcharacter-level manipulations. Expert humans use creative intelligence to solvecryptics, flexibly combining linguistic, world, and domain knowledge. In this paper,we make two main contributions. First, we present a dataset of cryptic clues as achallenging new benchmark for NLP systems that seek to process compositionallanguage in more creative, human-like ways. After showing that three non-neuralapproaches and T5, a state-of-the-art neural language model, do not achieve goodperformance, we make our second main contribution: a novel curriculum approach,in which the model is first fine-tuned on related tasks such as unscrambling words.We also introduce a challenging data split, examine the meta-linguistic capabilitiesof subword-tokenized models, and investigate model systematicity by perturbingthe wordplay part of clues, showing that T5 exhibits behavior partially consistentwith human solving strategies. Although our curricular approach considerablyimproves on the T5 baseline, our best-performing model still fails to generalize tothe extent that humans can. Thus, cryptic crosswords remain an unsolved challengefor NLP systems and a potential source of future innovation.
Notes
Files
decrypt.zip
Files
(53.4 MB)
Name | Size | Download all |
---|---|---|
md5:3fc20e51100846846ff4b1f3f237eb70
|
53.4 MB | Preview Download |
Additional details
Related works
- Cites
- https://arxiv.org/abs/2104.08620 (URL)
- https://arxiv.org/abs/2104.08620 (URL)
- Is derived from
- https://github.com/jsrozner/decrypt (URL)
- Is source of
- 10.5061/dryad.n02v6wwzp (DOI)