Published February 2, 2024
| Version v1
Dataset
Open
Multilingual news article similarity dataset
Creators
Description
This dataset contains the extended version of the authors' earlier work: https://zenodo.org/records/6507872, where pairs of news articles drawn from the first half of 2020 are annotated for seven aspects of similarity in the original version as well as an additional FRAME aspect:
- GEO: How similar is the geographic focus (places, cities, countries, etc.) of the two articles?
- ENT: How similar are the named entities (e.g., people, companies, organizations, products, named living beings), excluding previously considered locations appearing in the two articles?
- TIME Are the two articles relevant to similar time periods or describing similar time periods?
- NAR How similar are the narrative schemas presented in the two articles?
- OVERALL Overall, are the two articles covering the same substantive news story? (excluding style, framing, and tone)
- STYLE Do the articles have similar writing styles?
- TONE Do the articles have similar tones?
- FRAME Do the articles have similar framing and express similar opinions?
Files
Codebook for text similarity annotations - Google Docs.pdf
Files
(13.6 MB)
Name | Size | Download all |
---|---|---|
md5:9007e1014065d65c20690c6ee54270ce
|
402.6 kB | Preview Download |
md5:f013b39fcd3359c20daf4b9c7c9604c2
|
13.2 MB | Preview Download |
Additional details
Related works
- Is version of
- Conference proceeding: https://zenodo.org/records/6507872 (URL)
Dates
- Collected
-
2024-01-15
References
- Chen et al. (2024). Multilingual news article similarity dataset. doi: 10.5281/zenodo.10611923