Published October 17, 2025 | Version v1
Dataset Open

SKILL-IR-Discourse

Description

We present a large annotated corpus of scholarly discourse in the domain of International Relations, a subfield of political science. The corpus comprises 190 articles (over 1500K tokens) annotated at the argumentation, basic rhetorical, and domain level. Five of the included articles (ca. 62K tokens) constitute a Gold-standard, coded by domain experts. The remaining articles were coded by annotators trained on the Gold-standard and monitored for annotation quality. We describe our corpus creation methodology, the annotation process and quality assurance, the corpus itself, and present insights into the data: Most argumentative structures in the data are simple premise-conclusion structures, fewer than half of the claims have explicit supporting evidence. Counter-arguments to claims are rare. The claim-to-support ratio varies widely between articles; possibly to some extent due to the topics covered (with clear common ground) or to the differences between authors' styles. The distribution of theoretical vs. evaluative statements varies strongly between articles; this can be attributed to such factors as different methodological approaches between the articles and the methodological focus of the publishing journal.

Files

skill-ir-discourse_25-10-15.zip

Files (8.3 MB)

Name Size Download all
md5:e7c93d68109bdabc9322b172cca7a16a
8.3 MB Preview Download

Additional details

Dates

Available
2025-10-17