Published October 20, 2023 | Version v1
Software Open

Analysing state-backed propaganda websites: a new dataset and linguistic study (software)

Description

This is the software accompanying the EMNLP 2023 paper "Analysing state-backed propaganda websites: a new dataset and linguistic study".

To use this software to investigate other sites: please visit the GitHub repository for this software for the most up-to-date version.

For copyright and liability reasons, we do not publicly distribute the complete dataset. Instead, we provide the software used to create the dataset (this repository) and a list containing the URLs of all the posts in the full dataset (DOI: 10.5281/zenodo.10007383).

To reconstruct our dataset: use the software to extract the sites, then filter the posts to the corresponding URL list. Please note that some posts may no longer be available or may have been modified.

If you are a researcher: please contact the authors, we may be able to provide you with the original dataset.

 

Files

wordpress-site-extractor-12542fc43fe465e3e9ea13c97bcf95cad3c1c2ef.zip

Files (85.8 kB)

Additional details

Related works

Compiles
Dataset: 10.5281/zenodo.10007383 (DOI)
Is described by
Conference paper: 10.18653/v1/2023.emnlp-main.349 (DOI)

Funding

UK Research and Innovation
VIGILANT : Vital IntelliGence to Investigate ILlegAl DisiNformaTion 10039039