Dataset Open Access
Faessler, Erik;
Modersohn, Luise;
Lohr, Christina;
Hahn, Udo
{ "files": [ { "links": { "self": "https://zenodo.org/api/files/b9961b8d-ddd5-4e6e-a6a8-953542bb39cf/progene.zip" }, "checksum": "md5:fa985ca0ef2c8da932db6f235422c9d9", "bucket": "b9961b8d-ddd5-4e6e-a6a8-953542bb39cf", "key": "progene.zip", "type": "zip", "size": 24926113383 } ], "owners": [ 93441 ], "doi": "10.5281/zenodo.3698568", "stats": { "version_unique_downloads": 145.0, "unique_views": 241.0, "views": 290.0, "version_views": 290.0, "unique_downloads": 145.0, "version_unique_views": 241.0, "volume": 5334188263962.0, "version_downloads": 214.0, "downloads": 214.0, "version_volume": 5334188263962.0 }, "links": { "doi": "https://doi.org/10.5281/zenodo.3698568", "conceptdoi": "https://doi.org/10.5281/zenodo.3698567", "bucket": "https://zenodo.org/api/files/b9961b8d-ddd5-4e6e-a6a8-953542bb39cf", "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.3698567.svg", "html": "https://zenodo.org/record/3698568", "latest_html": "https://zenodo.org/record/3698568", "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.3698568.svg", "latest": "https://zenodo.org/api/records/3698568" }, "conceptdoi": "10.5281/zenodo.3698567", "created": "2020-03-16T13:58:17.286116+00:00", "updated": "2020-06-12T11:43:30.699651+00:00", "conceptrecid": "3698567", "revision": 4, "id": 3698568, "metadata": { "access_right_category": "success", "doi": "10.5281/zenodo.3698568", "description": "<p>The Pro(tein)/Gene corpus was developed at the JULIE Lab Jena under supervision of Prof. Udo Hahn.</p>\n\n<p>The goals of the annotation project were</p>\n\n<ul>\n\t<li>to construct a consistent and (as far as possible) subdomain-independent/-comprehensive protein-annotated corpus</li>\n\t<li>to differentiate between protein families and groups, protein complexes, protein molecules, protein variants (e.g. alleles) and elliptic enumerations of proteins.</li>\n</ul>\n\n<p>The corpus has the following annotation levels / entity types:</p>\n\n<ul>\n\t<li>protein</li>\n\t<li>protein_familiy_or_group</li>\n\t<li>protein_complex</li>\n\t<li>protein_variant</li>\n\t<li>protein_enum</li>\n</ul>\n\n<p>For definitions of the annotation levels, please refer to the Proteins-guidelines-final.doc file that is found in the download package.</p>\n\n<p>To achieve a large coverage of biological subdomains, document from multiple other protein / gene corpora were reannotated. For further coverage, new document sets were created. All documents are abstracts from PubMed/MEDLINE. The corpus is made up of the union of all the documents in the different subcorpora.<br>\nAll document are delivered as MMAX2 (http://mmax2.net/) annotation projects.</p>", "language": "eng", "title": "ProGene - A Large-scale, High-Quality Protein-Gene Annotated Benchmark Corpus", "license": { "id": "CC-BY-4.0" }, "relations": { "version": [ { "count": 1, "index": 0, "parent": { "pid_type": "recid", "pid_value": "3698567" }, "is_last": true, "last_child": { "pid_type": "recid", "pid_value": "3698568" } } ] }, "communities": [ { "id": "julie-lab" } ], "version": "1.1", "references": [ "Faessler et al. (2020). PROGENE\u2014A Large-scale, High-Quality Protein-Gene Annotated Benchmark Corpus, LREC 2020" ], "keywords": [ "Genes and proteins", "text corpus", "annotation", "biomedical corpus" ], "publication_date": "2020-03-12", "creators": [ { "orcid": "0000-0003-1193-5103", "affiliation": "Jena University Language & Information Engineering (JULIE) Lab,Friedrich-Schiller-Universit\u00e4t Jena, Jena, Germany", "name": "Faessler, Erik" }, { "affiliation": "Jena University Language & Information Engineering (JULIE) Lab,Friedrich-Schiller-Universit\u00e4t Jena, Jena, Germany", "name": "Modersohn, Luise" }, { "affiliation": "Jena University Language & Information Engineering (JULIE) Lab,Friedrich-Schiller-Universit\u00e4t Jena, Jena, Germany", "name": "Lohr, Christina" }, { "affiliation": "Jena University Language & Information Engineering (JULIE) Lab,Friedrich-Schiller-Universit\u00e4t Jena, Jena, Germany", "name": "Hahn, Udo" } ], "meeting": { "acronym": "LREC", "url": "https://lrec2020.lrec-conf.org/", "dates": "11-16 May 2020", "place": "Marseille, France", "title": "12th Language Resources and Evaluation Conference" }, "access_right": "open", "resource_type": { "type": "dataset", "title": "Dataset" }, "related_identifiers": [ { "scheme": "doi", "identifier": "10.5281/zenodo.3698567", "relation": "isVersionOf" } ] } }
All versions | This version | |
---|---|---|
Views | 290 | 290 |
Downloads | 214 | 214 |
Data volume | 5.3 TB | 5.3 TB |
Unique views | 241 | 241 |
Unique downloads | 145 | 145 |