{
  "DOI": "10.5281/zenodo.3732944",
  "abstract": "This data collection contains the Latin test data for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection]:\u00a0\n\n\n\n\t\na Latin text corpus pair (`corpus1/lemma`, `corpus2/lemma`)\n\t\n40 lemmas which have been annotated for their lexical semantic change between the two corpora (`targets.txt`)\n\t\nthe annotated binary change scores of the targets for subtask 1, and their annotated graded change scores for subtask 2 (`truth/`)\n\n\n\nThe corpus data have been automatically lemmatized and part-of-speech tagged, and have been partially corrected by hand. For homonyms, the lemmas are followed by the '\\#' symbol and the number of the homonym according to the Lewis-Short dictionary of Latin when this number is greater than 1. For example, the lemma 'dico' corresponds to the first homonym in the Lewis-Short dictionary and 'dico\\#2' corresponds to the second homonym, cf. Lewis-Short dictionary.\n\n\n__Corpus 1__\n\n\n\n\t\nbased on: LatinISE\u00a0(McGillivray and Kilgarriff 2013), version on Sketch Engine\n\t\nlanguage: Latin\n\t\ntime covered: from the beginning of the second century before Christ (BC) to the end of the first century BC\n\t\nsize: ~1.7 million tokens\n\t\nformat: lemmatized, sentence length >= 2, no punctuation, sentences randomly shuffled\n\t\nencoding: UTF-8\n\n\n\n__Corpus 2__\n\n\n\n\t\nbased on: LatinISE\u00a0(McGillivray and Kilgarriff 2013) , version on Sketch Engine\n\t\nlanguage: Latin\n\t\ntime covered: from the beginning of the first century after Christ (AD) to the end of the twenty-first century AD\n\t\nsize: ~9.4 million tokens\n\t\nformat: lemmatized, sentence length >= 2, no punctuation, sentences randomly shuffled\n\t\nencoding: UTF-8\n\n\n\nFind more information on the data in the papers referenced below.\n\n\nReferences\n\n\nDominik Schlechtweg, Barbara McGillivray, Simon Hengchen, Haim Dubossarsky and Nina Tahmasebi SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection. To appear in SemEval@COLING2020.\n\n\nMcGillivray, B. and Kilgarriff, A. (2013). Tools for historical corpus research, and a corpus of Latin. In Paul Bennett, Martin Durrell, Silke Scheible, Richard J. Whitt (eds.), New Methods in Historical Corpus Linguistics, T\u00fcbingen: Narr.\n\u00a0",
  "author": [
    {
      "family": "McGillivray",
      "given": "Barbara"
    },
    {
      "family": "Schlechtweg",
      "given": "Dominik"
    },
    {
      "family": "Dubossarsky",
      "given": "Haim"
    },
    {
      "family": "Tahmasebi",
      "given": "Nina"
    },
    {
      "family": "Hengchen",
      "given": "Simon"
    }
  ],
  "id": "3732944",
  "issued": {
    "date-parts": [
      [
        "2020",
        "02",
        "18"
      ]
    ]
  },
  "language": "lat",
  "publisher": "Zenodo",
  "title": "LatinISE subcorpora for SemEval 2020 task 1",
  "type": "dataset",
  "version": "2"
}