Presentation Open Access

How a File Format led to a Crossword Scandal

Saul Pwanson

In 2016 I designed a plain-text file format for crossword puzzle data, and then spent a couple of months building a micro-data-pipeline, scraping tens of thousands of crosswords from various sources. Then, having all those crosswords in a simple format, I wanted to see if there were any common grid patterns--and discovered egregious plagiarism by a major crossword editor that had gone on for years. This talk covers the file format, data pipeline, and the design choices that aided rapid exploration; the evidence for the scandal, from the initial anomalies to the final damning visualization; and what it's like for a data project to get 15 seconds of fame.

Files (8.7 MB)
Name Size
xdtalk.zip
md5:ab498ed6cd76eacb2b5ff0a6d13d409c
8.7 MB Download
138
22
views
downloads
All versions This version
Views 138138
Downloads 2222
Data volume 192.5 MB192.5 MB
Unique views 135135
Unique downloads 2222

Share

Cite as