Published July 16, 2019 | Version v1
Software Open

Data processing and analysis code for 'Detecting Textual Reuse in News Stories, At Scale' in the International Journal of Communication.

  • 1. University of Oxford

Description

These are the data preparation and analysis Jupyter notebooks accompanying Nicholls (2019) Detecting Textual Reuse at Scale, in the International Journal of Communication. The first notebook shows the steps for building the database of news content data which this notebook relies upon, the second carries out the analyses from the paper.

Although this sets out all the steps required to implement the method, there are two important issues to be aware of:

  • The source data (newspaper articles) are not included as they are copyright encumbered
  • There are many things that could be done better a second time around

If you want to reimplement the method, please do be in touch: tom.nicholls@politics.ox.ac.uk

Notes

This work was supported by a grant from Google UK as part of the Digital News Initiative (CTR00220).

Files

DetectingTextualReuse-Analysis.ipynb

Files (670.7 kB)

Name Size Download all
md5:691770997087ac3be65331f1299f52bf
505.9 kB Preview Download
md5:8cdf8bb62aeacb176fe0578164146fee
164.8 kB Preview Download