SemEval-2022 Task 8: Multilingual news article similarity
Creators
- 1. UMass Amherst
- 2. University of Exeter
- 3. GESIS
- 4. Meedan
- 5. Meedan and University of Oxford
- 6. University of Michigan
Description
This dataset contains pairs of news articles drawn from the first half of 2020 and annotated for seven aspects of similarity:
- GEO: How similar is the geographic focus (places, cities, countries, etc.) of the two articles?
- ENT: How similar are the named entities (e.g., people, companies, organizations, products, named living beings), excluding previously considered locations appearing in the two articles?
- TIME Are the two articles relevant to similar time periods or describing similar time periods?
- NAR How similar are the narrative schemas presented in the two articles?
- OVERALL Overall, are the two articles covering the same substantive news story? (excluding style, framing, and tone)
- STYLE Do the articles have similar writing styles?
- TONE Do the articles have similar tones?
Further details are provided in
Chen et al. (2022). SemEval-2022 Task 8: Multilingual news article similarity. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). https://aclanthology.org/2022.semeval-1.155/
The data in this repository includes pairs of URLs and annotations. The text of webpages is generally via the Internet Archive in this special collection: https://archive.org/details/2020-multilingual-news-article-similarity . A script to download and process the webpages is available at https://github.com/euagendas/semeval_8_2022_ia_downloader .
Notes
Files
Codebook for text similarity annotations.pdf
Files
(14.3 MB)
Name | Size | Download all |
---|---|---|
md5:2765dc6580b589319fd7125da726be78
|
274.0 kB | Preview Download |
md5:2faa787b553ffb2bcc2794cd87681ca6
|
357.6 kB | Download |
md5:0f73b405f5a69bf72926133c3e7baa42
|
18.0 kB | Preview Download |
md5:9f00f9192dec6a78915fdbb40bf46767
|
23.9 kB | Preview Download |
md5:b1acdfeafd230d186f7a59b0d9085e67
|
110.8 kB | Download |
md5:49c4dbb8c9db263bbfb7bd502df9349b
|
2.5 MB | Preview Download |
md5:f514b277ea50f1d082258a4631274ee5
|
3.0 MB | Preview Download |
md5:2ba095d53f51142a12375e7ccb22fdab
|
5.3 MB | Preview Download |
md5:c3f2cf2be0460bb338d6d16f232f5992
|
2.7 MB | Preview Download |
Additional details
References
- Chen et al. (2022). SemEval-2022 Task 8: Multilingual news article similarity. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)