Published July 16, 2019 | Version v1
Software Open

Data processing and analysis code for 'Detecting Textual Reuse in News Stories, At Scale' in the International Journal of Communication.

  • 1. University of Oxford


These are the data preparation and analysis Jupyter notebooks accompanying Nicholls (2019) Detecting Textual Reuse at Scale, in the International Journal of Communication. The first notebook shows the steps for building the database of news content data which this notebook relies upon, the second carries out the analyses from the paper.

Although this sets out all the steps required to implement the method, there are two important issues to be aware of:

  • The source data (newspaper articles) are not included as they are copyright encumbered
  • There are many things that could be done better a second time around

If you want to reimplement the method, please do be in touch:


This work was supported by a grant from Google UK as part of the Digital News Initiative (CTR00220).



Files (670.7 kB)

Name Size Download all
505.9 kB Preview Download
164.8 kB Preview Download