BioASQ Sub-Corpus for the Pharmacology of Epilepsy (BioPepsy)
Description
The sub corpus contains Standoff Annotations for Drug Names and Terms from Epilepsy Ontologies with their Aggregations Recognized in the 2021 BioASQ corpus.
The terms for epilepsy ontologies are from NCBO BioPortal, namely from the ontologies EpSO, ESSO, EPILONT, EPISEM and FENICS:
- https://bioportal.bioontology.org/ontologies/EPSO
- https://bioportal.bioontology.org/ontologies/ESSO
- https://bioportal.bioontology.org/ontologies/EPILONT
- https://bioportal.bioontology.org/ontologies/EPISEM
- https://bioportal.bioontology.org/ontologies/FENICS
The dictionary for the identificatin of drug names is derived from the DrugBank vocabulary available online at https://go.drugbank.com/releases/latest#open-data.
The terms were identified using a custom implementation of a UIMA-based text mining wokflow that annotates free text with the UIMA ConceptMapper. Further descriptions of this workflow can be found in the following publications:
- Bernd Müller, Alexandra Hagelstein: Beyond Metadata: Enriching life science publications in Livivo with semantic entities from the linked data cloud. SEMANTiCS (Posters, Demos, SuCCESS) 2016
- Bernd Müller, Alexandra Hagelstein, Thomas Gübitz: Life Science Ontologies in Literature Retrieval: A Comparison of Linked Data Sets for Use in Semantic Search on a Heterogeneous Corpus. EKAW (Satellite Events) 2016: 158-161
- Bernd Müller, Christoph Poley, Jana Pössel, Alexandra Hagelstein, Thomas Gübitz: LIVIVO - the Vertical Search Engine for Life Sciences. Datenbank-Spektrum 17(1): 29-34 (2017)
- Bernd Müller, Dietrich Rebholz-Schuhmann: Selected Approaches Ranking Contextual Term for the BioASQ Multi-label Classification (Task6a and 7a). PKDD/ECML Workshops (2) 2019: 569-580
The file format is JSON. The file content is described as follows:
- bioasqepilepsy2021.json - All standoff annotations for each document in the 2021 BioASQ corpus
- aggepilepsy2021EPSOANDDrugNames.json - aggregation of frequencies for all standoff annotations in documents from the 2021 BioASQ corpus that contain terms from EpSO co-occurring with at least one drug name
- aggepilepsy2021ESSOANDDrugNames.json- aggregation of frequencies for all standoff annotations in documents from the 2021 BioASQ corpus that contain terms from ESSO co-occurring with at least one drug name
- aggepilepsy2021EPILONTANDDrugNames.json- aggregation of frequencies for all standoff annotations in documents from the 2021 BioASQ corpus that contain terms from EPILONT co-occurring with at least one drug name
- aggepilepsy2021EPISEMANDDrugNames.json- aggregation of frequencies for all standoff annotations in documents from the 2021 BioASQ corpus that contain terms from EPISEM co-occurring with at least one drug name
- aggepilepsy2021FENICSANDDrugNames.json- aggregation of frequencies for all standoff annotations in documents from the 2021 BioASQ corpus that contain terms from FENICS co-occurring with at least one drug name
All JSON files should be importable into a collection of a MongoDB. Documents are identified by their PMIDs.
Please cite this data as:
Müller, Bernd. BioASQ Sub-Corpus for the Pharmacology of Epilepsy (BioPEpsy) 2021. ZENODO, 10.5281/zenodo.4680086
Files
aggepilepsy2021EPILONTANDDrugNames.json
Files
(43.3 GB)
Name | Size | Download all |
---|---|---|
md5:a278a8940406f0d3f0e70017c7e4898d
|
1.7 MB | Preview Download |
md5:4e163f9ebcd9a2bfbb158fb29c4adbfb
|
2.0 MB | Preview Download |
md5:3b523ca9eda68614826a488b106e34fa
|
2.1 MB | Preview Download |
md5:cd0a09208e9695b7ab7bbeb822bda9c3
|
2.2 MB | Preview Download |
md5:d41b3b1605f24e9bf1cfdd5e46001fd2
|
47.3 kB | Preview Download |
md5:60f8ab50500189a79e6d3f81f884cbfe
|
43.3 GB | Preview Download |
Additional details
Related works
- Compiles
- Dataset: 10.5281/zenodo.4683353 (DOI)
- Is compiled by
- Software: 10.5281/zenodo.4680086 (DOI)
- Is part of
- Software: https://cran.r-project.org/package=epos (URL)
- Software: 10.5281/zenodo.4682869 (DOI)
References
- Bernd Müller, Dietrich Rebholz-Schuhmann: Selected Approaches Ranking Contextual Term for the BioASQ Multi-label Classification (Task6a and 7a). PKDD/ECML Workshops (2) 2019: 569-580
- Bernd Müller, Christoph Poley, Jana Pössel, Alexandra Hagelstein, Thomas Gübitz: LIVIVO - the Vertical Search Engine for Life Sciences. Datenbank-Spektrum 17(1): 29-34 (2017)
- Bernd Müller, Alexandra Hagelstein, Thomas Gübitz: Life Science Ontologies in Literature Retrieval: A Comparison of Linked Data Sets for Use in Semantic Search on a Heterogeneous Corpus. EKAW (Satellite Events) 2016: 158-161
- Bernd Müller, Alexandra Hagelstein: Beyond Metadata: Enriching life science publications in Livivo with semantic entities from the linked data cloud. SEMANTiCS (Posters, Demos, SuCCESS) 2016