2836892
doi
10.5281/zenodo.2836892
oai:zenodo.org:2836892
user-csvconfv4
How a File Format led to a Crossword Scandal
Saul Pwanson
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
crossword
plagiarism
csvconf
format
xd
<p>In 2016 I designed a plain-text file format for crossword puzzle data, and then spent a couple of months building a micro-data-pipeline, scraping tens of thousands of crosswords from various sources. Then, having all those crosswords in a simple format, I wanted to see if there were any common grid patterns--and discovered egregious plagiarism by a major crossword editor that had gone on for years. This talk covers the file format, data pipeline, and the design choices that aided rapid exploration; the evidence for the scandal, from the initial anomalies to the final damning visualization; and what it's like for a data project to get 15 seconds of fame.</p>
Zenodo
2019-05-09
info:eu-repo/semantics/lecture
2836891
user-csvconfv4
1579540625.841754
8748347
md5:ab498ed6cd76eacb2b5ff0a6d13d409c
https://zenodo.org/records/2836892/files/xdtalk.zip
public
10.5281/zenodo.2836891
isVersionOf
doi