Data processing and analysis code for 'Detecting Textual Reuse in News Stories, At Scale' in the International Journal of Communication.

Nicholls, Tom

doi:10.5281/zenodo.3338003

Published July 16, 2019 | Version v1

Software Open

Data processing and analysis code for 'Detecting Textual Reuse in News Stories, At Scale' in the International Journal of Communication.

Nicholls, Tom¹

1. University of Oxford

These are the data preparation and analysis Jupyter notebooks accompanying Nicholls (2019) Detecting Textual Reuse at Scale, in the International Journal of Communication. The first notebook shows the steps for building the database of news content data which this notebook relies upon, the second carries out the analyses from the paper.

Although this sets out all the steps required to implement the method, there are two important issues to be aware of:

The source data (newspaper articles) are not included as they are copyright encumbered
There are many things that could be done better a second time around

If you want to reimplement the method, please do be in touch: tom.nicholls@politics.ox.ac.uk

Notes

This work was supported by a grant from Google UK as part of the Digital News Initiative (CTR00220).

Files

DetectingTextualReuse-Analysis.ipynb

Files (670.7 kB)

Name	Size	Download all
DetectingTextualReuse-Analysis.ipynb md5:691770997087ac3be65331f1299f52bf	505.9 kB	Preview Download
DetectingTextualReuse-DataPrep.ipynb md5:8cdf8bb62aeacb176fe0578164146fee	164.8 kB	Preview Download

	All versions	This version
Views	240	240
Downloads	107	107
Data volume	42.1 MB	42.1 MB

Data processing and analysis code for 'Detecting Textual Reuse in News Stories, At Scale' in the International Journal of Communication.

Authors/Creators

Description

Notes

Files

DetectingTextualReuse-Analysis.ipynb

Files (670.7 kB)