Dataset Open Access

ProGene - A Large-scale, High-Quality Protein-Gene Annotated Benchmark Corpus

Faessler, Erik; Modersohn, Luise; Lohr, Christina; Hahn, Udo


JSON Export

{
  "files": [
    {
      "links": {
        "self": "https://zenodo.org/api/files/b9961b8d-ddd5-4e6e-a6a8-953542bb39cf/progene.zip"
      }, 
      "checksum": "md5:fa985ca0ef2c8da932db6f235422c9d9", 
      "bucket": "b9961b8d-ddd5-4e6e-a6a8-953542bb39cf", 
      "key": "progene.zip", 
      "type": "zip", 
      "size": 24926113383
    }
  ], 
  "owners": [
    93441
  ], 
  "doi": "10.5281/zenodo.3698568", 
  "stats": {
    "version_unique_downloads": 145.0, 
    "unique_views": 241.0, 
    "views": 290.0, 
    "version_views": 290.0, 
    "unique_downloads": 145.0, 
    "version_unique_views": 241.0, 
    "volume": 5334188263962.0, 
    "version_downloads": 214.0, 
    "downloads": 214.0, 
    "version_volume": 5334188263962.0
  }, 
  "links": {
    "doi": "https://doi.org/10.5281/zenodo.3698568", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.3698567", 
    "bucket": "https://zenodo.org/api/files/b9961b8d-ddd5-4e6e-a6a8-953542bb39cf", 
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.3698567.svg", 
    "html": "https://zenodo.org/record/3698568", 
    "latest_html": "https://zenodo.org/record/3698568", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.3698568.svg", 
    "latest": "https://zenodo.org/api/records/3698568"
  }, 
  "conceptdoi": "10.5281/zenodo.3698567", 
  "created": "2020-03-16T13:58:17.286116+00:00", 
  "updated": "2020-06-12T11:43:30.699651+00:00", 
  "conceptrecid": "3698567", 
  "revision": 4, 
  "id": 3698568, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.3698568", 
    "description": "<p>The Pro(tein)/Gene corpus was developed at the JULIE Lab Jena under supervision of Prof. Udo Hahn.</p>\n\n<p>The goals of the annotation project were</p>\n\n<ul>\n\t<li>to construct a consistent and (as far as possible) subdomain-independent/-comprehensive protein-annotated corpus</li>\n\t<li>to differentiate between protein families and groups, protein complexes, protein molecules, protein variants (e.g. alleles) and elliptic enumerations of proteins.</li>\n</ul>\n\n<p>The corpus has the following annotation levels / entity types:</p>\n\n<ul>\n\t<li>protein</li>\n\t<li>protein_familiy_or_group</li>\n\t<li>protein_complex</li>\n\t<li>protein_variant</li>\n\t<li>protein_enum</li>\n</ul>\n\n<p>For definitions of the annotation levels, please refer to the Proteins-guidelines-final.doc file that is found in the download package.</p>\n\n<p>To achieve a large coverage of biological subdomains, document from multiple other protein / gene corpora were reannotated. For further coverage, new document sets were created. All documents are abstracts from PubMed/MEDLINE. The corpus is made up of the union of all the documents in the different subcorpora.<br>\nAll document are delivered as MMAX2 (http://mmax2.net/) annotation projects.</p>", 
    "language": "eng", 
    "title": "ProGene - A Large-scale, High-Quality Protein-Gene Annotated Benchmark Corpus", 
    "license": {
      "id": "CC-BY-4.0"
    }, 
    "relations": {
      "version": [
        {
          "count": 1, 
          "index": 0, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "3698567"
          }, 
          "is_last": true, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "3698568"
          }
        }
      ]
    }, 
    "communities": [
      {
        "id": "julie-lab"
      }
    ], 
    "version": "1.1", 
    "references": [
      "Faessler et al. (2020). PROGENE\u2014A Large-scale, High-Quality Protein-Gene Annotated Benchmark Corpus, LREC 2020"
    ], 
    "keywords": [
      "Genes and proteins", 
      "text corpus", 
      "annotation", 
      "biomedical corpus"
    ], 
    "publication_date": "2020-03-12", 
    "creators": [
      {
        "orcid": "0000-0003-1193-5103", 
        "affiliation": "Jena University Language & Information Engineering (JULIE) Lab,Friedrich-Schiller-Universit\u00e4t Jena, Jena, Germany", 
        "name": "Faessler, Erik"
      }, 
      {
        "affiliation": "Jena University Language & Information Engineering (JULIE) Lab,Friedrich-Schiller-Universit\u00e4t Jena, Jena, Germany", 
        "name": "Modersohn, Luise"
      }, 
      {
        "affiliation": "Jena University Language & Information Engineering (JULIE) Lab,Friedrich-Schiller-Universit\u00e4t Jena, Jena, Germany", 
        "name": "Lohr, Christina"
      }, 
      {
        "affiliation": "Jena University Language & Information Engineering (JULIE) Lab,Friedrich-Schiller-Universit\u00e4t Jena, Jena, Germany", 
        "name": "Hahn, Udo"
      }
    ], 
    "meeting": {
      "acronym": "LREC", 
      "url": "https://lrec2020.lrec-conf.org/", 
      "dates": "11-16 May 2020", 
      "place": "Marseille, France", 
      "title": "12th Language Resources and Evaluation Conference"
    }, 
    "access_right": "open", 
    "resource_type": {
      "type": "dataset", 
      "title": "Dataset"
    }, 
    "related_identifiers": [
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.3698567", 
        "relation": "isVersionOf"
      }
    ]
  }
}
290
214
views
downloads
All versions This version
Views 290290
Downloads 214214
Data volume 5.3 TB5.3 TB
Unique views 241241
Unique downloads 145145

Share

Cite as