Multilingual news article similarity dataset

Chen, Xi; Samory, Mattia; Hale, Scott; Jurgens, David; Grabowicz, Przemyslaw

doi:10.5281/zenodo.10611923

Published February 2, 2024 | Version v1

Dataset Open

Multilingual news article similarity dataset

1. University of Massachusetts Amherst
2. Sapienza University of Rome
3. University of Michigan–Ann Arbor

This dataset contains the extended version of the authors' earlier work: https://zenodo.org/records/6507872, where pairs of news articles drawn from the first half of 2020 are annotated for seven aspects of similarity in the original version as well as an additional FRAME aspect:

GEO: How similar is the geographic focus (places, cities, countries, etc.) of the two articles?
ENT: How similar are the named entities (e.g., people, companies, organizations, products, named living beings), excluding previously considered locations appearing in the two articles?
TIME Are the two articles relevant to similar time periods or describing similar time periods?
NAR How similar are the narrative schemas presented in the two articles?
OVERALL Overall, are the two articles covering the same substantive news story? (excluding style, framing, and tone)
STYLE Do the articles have similar writing styles?
TONE Do the articles have similar tones?
FRAME Do the articles have similar framing and express similar opinions?

Files

Codebook for text similarity annotations - Google Docs.pdf

Files (13.6 MB)

Name	Size	Download all
Codebook for text similarity annotations - Google Docs.pdf md5:9007e1014065d65c20690c6ee54270ce	402.6 kB	Preview Download
zenodo_release_data.csv md5:f013b39fcd3359c20daf4b9c7c9604c2	13.2 MB	Preview Download

Additional details

Is version of: Conference proceeding: https://zenodo.org/records/6507872 (URL)

Collected: 2024-01-15

Chen et al. (2024). Multilingual news article similarity dataset. doi: 10.5281/zenodo.10611923

	All versions	This version
Views	381	381
Downloads	356	356
Data volume	4.0 GB	4.0 GB

Multilingual news article similarity dataset

Files

Codebook for text similarity annotations - Google Docs.pdf

Files (13.6 MB)

Additional details

Related works

Dates

References

Multilingual news article similarity dataset

Creators

Description

Files

Codebook for text similarity annotations - Google Docs.pdf

Files (13.6 MB)

Additional details

Related works

Dates

References