{
  "access": {
    "embargo": {
      "active": false,
      "reason": null
    },
    "files": "public",
    "record": "public",
    "status": "open"
  },
  "created": "2020-03-27T09:00:49.259203+00:00",
  "custom_fields": {},
  "deletion_status": {
    "is_deleted": false,
    "status": "P"
  },
  "files": {
    "count": 1,
    "enabled": true,
    "entries": {
      "semeval2020_ulscd_swe.zip": {
        "checksum": "md5:47eb5678bbd6483969ffd904cb5e9ca8",
        "ext": "zip",
        "id": "81b4993d-3cb0-430b-afa2-985a8c0a63c4",
        "key": "semeval2020_ulscd_swe.zip",
        "metadata": null,
        "mimetype": "application/zip",
        "size": 1002486930
      }
    },
    "order": [],
    "total_bytes": 1002486930
  },
  "id": "3730550",
  "is_draft": false,
  "is_published": true,
  "links": {
    "access": "https://zenodo.org/api/records/3730550/access",
    "access_grants": "https://zenodo.org/api/records/3730550/access/grants",
    "access_links": "https://zenodo.org/api/records/3730550/access/links",
    "access_request": "https://zenodo.org/api/records/3730550/access/request",
    "access_users": "https://zenodo.org/api/records/3730550/access/users",
    "archive": "https://zenodo.org/api/records/3730550/files-archive",
    "archive_media": "https://zenodo.org/api/records/3730550/media-files-archive",
    "communities": "https://zenodo.org/api/records/3730550/communities",
    "communities-suggestions": "https://zenodo.org/api/records/3730550/communities-suggestions",
    "doi": "https://doi.org/10.5281/zenodo.3730550",
    "draft": "https://zenodo.org/api/records/3730550/draft",
    "files": "https://zenodo.org/api/records/3730550/files",
    "latest": "https://zenodo.org/api/records/3730550/versions/latest",
    "latest_html": "https://zenodo.org/records/3730550/latest",
    "media_files": "https://zenodo.org/api/records/3730550/media-files",
    "parent": "https://zenodo.org/api/records/3672949",
    "parent_doi": "https://zenodo.org/doi/10.5281/zenodo.3672949",
    "parent_html": "https://zenodo.org/records/3672949",
    "requests": "https://zenodo.org/api/records/3730550/requests",
    "reserve_doi": "https://zenodo.org/api/records/3730550/draft/pids/doi",
    "self": "https://zenodo.org/api/records/3730550",
    "self_doi": "https://zenodo.org/doi/10.5281/zenodo.3730550",
    "self_html": "https://zenodo.org/records/3730550",
    "self_iiif_manifest": "https://zenodo.org/api/iiif/record:3730550/manifest",
    "self_iiif_sequence": "https://zenodo.org/api/iiif/record:3730550/sequence/default",
    "versions": "https://zenodo.org/api/records/3730550/versions"
  },
  "media_files": {
    "count": 0,
    "enabled": false,
    "entries": {},
    "order": [],
    "total_bytes": 0
  },
  "metadata": {
    "additional_descriptions": [
      {
        "description": "The creation of the data was supported by the project Towards Computational Lexical Semantic Change Detection funded  by a project grant from the Swedish Research Council  (2019\u20132022;   dnr  2018-01184). \nIt has also been created as part of the effort to construct and develop a Swedish national research infrastructure in support of research based on language data. This infrastructure -- Nationella spr\u00e5kbanken (the Swedish National Language Bank) -- is jointly funded for the period 2018--2024 by the Swedish Research Council (grant number 2017-00626) and its 10 partner institutions.",
        "type": {
          "id": "notes",
          "title": {
            "de": "Anmerkungen",
            "en": "Notes"
          }
        }
      }
    ],
    "creators": [
      {
        "affiliations": [
          {
            "name": "Spr\u00e5kbanken, University of Gothenburg"
          }
        ],
        "person_or_org": {
          "family_name": "Tahmasebi",
          "given_name": "Nina",
          "name": "Tahmasebi, Nina",
          "type": "personal"
        }
      },
      {
        "affiliations": [
          {
            "name": "University of Helsinki"
          }
        ],
        "person_or_org": {
          "family_name": "Hengchen",
          "given_name": "Simon",
          "name": "Hengchen, Simon",
          "type": "personal"
        }
      },
      {
        "affiliations": [
          {
            "name": "IMS, University of Stuttgart"
          }
        ],
        "person_or_org": {
          "family_name": "Schlechtweg",
          "given_name": "Dominik",
          "name": "Schlechtweg, Dominik",
          "type": "personal"
        }
      },
      {
        "affiliations": [
          {
            "name": "The Alan Turing Institute"
          }
        ],
        "person_or_org": {
          "family_name": "McGillivray",
          "given_name": "Barbara",
          "name": "McGillivray, Barbara",
          "type": "personal"
        }
      },
      {
        "affiliations": [
          {
            "name": "University of Cambridge"
          }
        ],
        "person_or_org": {
          "family_name": "Dubossarsky",
          "given_name": "Haim",
          "name": "Dubossarsky, Haim",
          "type": "personal"
        }
      }
    ],
    "description": "<p>This data collection contains the Swedish test data for <a href=\"https://competitions.codalab.org/competitions/20948\">SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection:</a></p>\n\n<p>- a Swedish text corpus pair (`corpus1/`, `corpus2/`)<br>\n- 31 lemmas which have been annotated for their lexical semantic change between the two corpora (`targets.txt`)<br>\n- the annotated binary change scores of the targets for subtask 1, and their annotated graded change scores for subtask 2 (`truth/`)</p>\n\n<p>We sample from the KubHist2 corpus, digitized by the National Library of Sweden, and available through the Spr&aring;kbanken corpus infrastructure Korp (<a href=\"https://www.researchgate.net/profile/Markus_Forsberg/publication/266352576_Korp_-_the_corpus_infrastructure_of_Sprakbanken/links/55bf1ee008aed621de121ba3/Korp-the-corpus-infrastructure-of-Sprakbanken.pdf\">Borin et al., 2012</a>). The full corpus is available through a CC BY (attribution) license. Each word for which the lemmatizer in the Korp pipelien has found a lemma is replaced with the lemma. In cases where the lemmatizer cannot find a lemma, we leave the word as is (i.e., unlemmatized, no lower-casing). KubHist contains very frequent OCR errors, especially for the older data.More detail about the properties and quality of the Kubhist corpus can be found in (<a href=\"https://www.diva-portal.org/smash/get/diva2:1358014/FULLTEXT01.pdf#page=28\">Adesam et al., 2019</a>).</p>\n\n<p>Lars Borin, Markus Forsberg, and Johan Roxendal. &quot;Korp-the corpus infrastructure of Spr&aring;kbanken.&quot; <em>LREC</em>. 2012.</p>\n\n<p>Adesam, Yvonne, Dana Dann&eacute;lls, and Nina Tahmasebi. &quot;Exploring the Quality of the Digital Historical Newspaper Archive KubHist.&quot; <em>DHN</em>. 2019.</p>\n\n<p>__Corpus 1__</p>\n\n<p>- based on: <a href=\"https://spraakbanken.gu.se/korp/?mode=kubhist\">Kubhist2</a><br>\n- language: Swedish<br>\n- time covered: 1790-1830<br>\n- size: ~71 million tokens<br>\n- format: lemmatized, sentence length &gt; 9 (before removal of punctuation), no punctuation, sentences randomly shuffled<br>\n- encoding: UTF-8<br>\n- note: contains frequent OCR errors</p>\n\n<p>__Corpus 2__</p>\n\n<p>- based on:&nbsp;<a href=\"https://spraakbanken.gu.se/korp/?mode=kubhist\">Kubhist2</a><br>\n- language: Swedish<br>\n- time covered: 1895-1903<br>\n- size: ~111 million tokens<br>\n- format: lemmatized, sentence length &gt; 9 (before removal of punctuation), no punctuation, sentences randomly shuffled<br>\n- encoding: UTF-8<br>\n- note: contains OCR errors</p>\n\n<p>Besides the official lemma version of the corpora for SemEval-2020 Task 1 we also provide the raw token version (`corpus1/token/`, `corpus2/token/`). It contains the raw sentences in the same order as in the lemma version. Find more information on the data and SemEval-2020 Task 1 in the paper referenced below.</p>\n\n<p>&nbsp;</p>\n\n<p>Reference:</p>\n\n<p>Dominik Schlechtweg, Barbara McGillivray, Simon Hengchen, Haim Dubossarsky and Nina Tahmasebi.<a href=\"https://competitions.codalab.org/competitions/20948\">SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection</a>. To appear in SemEval@COLING2020.</p>",
    "languages": [
      {
        "id": "swe",
        "title": {
          "en": "Swedish"
        }
      }
    ],
    "publication_date": "2020-02-19",
    "publisher": "Zenodo",
    "references": [
      {
        "reference": "Dominik Schlechtweg, Barbara McGillivray, Simon Hengchen, Haim Dubossarsky and Nina Tahmasebi.SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection. To appear in SemEval@COLING2020."
      }
    ],
    "resource_type": {
      "id": "dataset",
      "title": {
        "de": "Datensatz",
        "en": "Dataset"
      }
    },
    "rights": [
      {
        "description": {
          "en": ""
        },
        "icon": "cc-by-icon",
        "id": "cc-by-2.0",
        "props": {
          "scheme": "spdx",
          "url": "https://creativecommons.org/licenses/by/2.0/legalcode"
        },
        "title": {
          "en": "Creative Commons Attribution 2.0 Generic"
        }
      }
    ],
    "subjects": [
      {
        "subject": "unsupervised lexical semantic change detection, semantic change, SemEval2020, Kubhist2"
      }
    ],
    "title": "Swedish Test Data for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection",
    "version": "v2"
  },
  "parent": {
    "access": {
      "owned_by": {
        "user": "68379"
      }
    },
    "communities": {
      "default": "3eec7ae6-7230-439d-b9eb-58b23297fa67",
      "entries": [
        {
          "access": {
            "member_policy": "open",
            "members_visibility": "public",
            "record_policy": "open",
            "review_policy": "open",
            "visibility": "public"
          },
          "children": {
            "allow": false
          },
          "created": "2019-04-12T13:51:58.685710+00:00",
          "custom_fields": {},
          "deletion_status": {
            "is_deleted": false,
            "status": "P"
          },
          "id": "3eec7ae6-7230-439d-b9eb-58b23297fa67",
          "links": {},
          "metadata": {
            "curation_policy": "<p>Manual curation.</p>\r\n",
            "page": "<p>The idea of natural language processing, short NLP, is to process texts and to understand the texts&#39; conveyed meaning.</p>\r\n\r\n<p>Natural Language Understanding can be seen as a subtask of Natural Language Processing. However, those terms are often used synonymously.</p>\r\n\r\n<p>https://en.wikipedia.org/wiki/Natural_language_processing</p>",
            "title": "Natural Language Processing"
          },
          "revision_id": 0,
          "slug": "natural-language-processing",
          "updated": "2020-06-08T21:07:14.336530+00:00"
        }
      ],
      "ids": [
        "3eec7ae6-7230-439d-b9eb-58b23297fa67"
      ]
    },
    "id": "3672949",
    "pids": {
      "doi": {
        "client": "datacite",
        "identifier": "10.5281/zenodo.3672949",
        "provider": "datacite"
      }
    }
  },
  "pids": {
    "doi": {
      "client": "datacite",
      "identifier": "10.5281/zenodo.3730550",
      "provider": "datacite"
    },
    "oai": {
      "identifier": "oai:zenodo.org:3730550",
      "provider": "oai"
    }
  },
  "revision_id": 3,
  "stats": {
    "all_versions": {
      "data_volume": 8436509589075.0,
      "downloads": 8960,
      "unique_downloads": 8405,
      "unique_views": 1908,
      "views": 2068
    },
    "this_version": {
      "data_volume": 7725164282580.0,
      "downloads": 7706,
      "unique_downloads": 7560,
      "unique_views": 793,
      "views": 876
    }
  },
  "status": "published",
  "updated": "2020-06-29T11:54:28.762143+00:00",
  "versions": {
    "index": 2,
    "is_latest": true
  }
}