Published April 12, 2021 | Version 1
Dataset Open

BioASQ Sub-Corpus for the Pharmacology of Epilepsy (BioPepsy)

  • 1. ZB MED - Information Centre for Life Sciences

Description

The sub corpus contains Standoff Annotations for Drug Names and Terms from Epilepsy Ontologies with their Aggregations Recognized in the 2021 BioASQ corpus. 

The terms for epilepsy ontologies are from NCBO BioPortal, namely from the ontologies EpSO, ESSO, EPILONT, EPISEM and FENICS:

The dictionary for the identificatin of drug names is derived from the DrugBank vocabulary available online at https://go.drugbank.com/releases/latest#open-data.

The terms were identified using a custom implementation of a UIMA-based text mining wokflow that annotates free text with the UIMA ConceptMapper. Further descriptions of this workflow can be found in the following publications:

  1. Bernd Müller, Alexandra Hagelstein: Beyond Metadata: Enriching life science publications in Livivo with semantic entities from the linked data cloud. SEMANTiCS (Posters, Demos, SuCCESS) 2016
  2. Bernd Müller, Alexandra Hagelstein, Thomas Gübitz: Life Science Ontologies in Literature Retrieval: A Comparison of Linked Data Sets for Use in Semantic Search on a Heterogeneous Corpus. EKAW (Satellite Events) 2016: 158-161
  3. Bernd Müller, Christoph Poley, Jana Pössel, Alexandra Hagelstein, Thomas Gübitz: LIVIVO - the Vertical Search Engine for Life Sciences. Datenbank-Spektrum 17(1): 29-34 (2017)
  4. Bernd Müller, Dietrich Rebholz-Schuhmann: Selected Approaches Ranking Contextual Term for the BioASQ Multi-label Classification (Task6a and 7a). PKDD/ECML Workshops (2) 2019: 569-580

The file format is JSON. The file content is described as follows:

  • bioasqepilepsy2021.json - All standoff annotations for each document in the 2021 BioASQ corpus
  • aggepilepsy2021EPSOANDDrugNames.json - aggregation of frequencies for all standoff annotations in documents from the 2021 BioASQ corpus that contain terms from EpSO co-occurring with at least one drug name
  • aggepilepsy2021ESSOANDDrugNames.json- aggregation of frequencies for all standoff annotations in documents from the 2021 BioASQ corpus that contain terms from ESSO co-occurring with at least one drug name
  • aggepilepsy2021EPILONTANDDrugNames.json- aggregation of frequencies for all standoff annotations in documents from the 2021 BioASQ corpus that contain terms from EPILONT co-occurring with at least one drug name
  • aggepilepsy2021EPISEMANDDrugNames.json- aggregation of frequencies for all standoff annotations in documents from the 2021 BioASQ corpus that contain terms from EPISEM co-occurring with at least one drug name
  • aggepilepsy2021FENICSANDDrugNames.json- aggregation of frequencies for all standoff annotations in documents from the 2021 BioASQ corpus that contain terms from FENICS co-occurring with at least one drug name

All JSON files should be importable into a collection of a MongoDB. Documents are identified by their PMIDs.

Please cite this data as:

Müller, Bernd. BioASQ Sub-Corpus for the Pharmacology of Epilepsy (BioPEpsy) 2021. ZENODO, 10.5281/zenodo.4680086

Files

aggepilepsy2021EPILONTANDDrugNames.json

Files (43.3 GB)

Name Size Download all
md5:a278a8940406f0d3f0e70017c7e4898d
1.7 MB Preview Download
md5:4e163f9ebcd9a2bfbb158fb29c4adbfb
2.0 MB Preview Download
md5:3b523ca9eda68614826a488b106e34fa
2.1 MB Preview Download
md5:cd0a09208e9695b7ab7bbeb822bda9c3
2.2 MB Preview Download
md5:d41b3b1605f24e9bf1cfdd5e46001fd2
47.3 kB Preview Download
md5:60f8ab50500189a79e6d3f81f884cbfe
43.3 GB Preview Download

Additional details

Related works

Compiles
Dataset: 10.5281/zenodo.4683353 (DOI)
Is compiled by
Software: 10.5281/zenodo.4680086 (DOI)
Is part of
Software: https://cran.r-project.org/package=epos (URL)
Software: 10.5281/zenodo.4682869 (DOI)

References

  • Bernd Müller, Dietrich Rebholz-Schuhmann: Selected Approaches Ranking Contextual Term for the BioASQ Multi-label Classification (Task6a and 7a). PKDD/ECML Workshops (2) 2019: 569-580
  • Bernd Müller, Christoph Poley, Jana Pössel, Alexandra Hagelstein, Thomas Gübitz: LIVIVO - the Vertical Search Engine for Life Sciences. Datenbank-Spektrum 17(1): 29-34 (2017)
  • Bernd Müller, Alexandra Hagelstein, Thomas Gübitz: Life Science Ontologies in Literature Retrieval: A Comparison of Linked Data Sets for Use in Semantic Search on a Heterogeneous Corpus. EKAW (Satellite Events) 2016: 158-161
  • Bernd Müller, Alexandra Hagelstein: Beyond Metadata: Enriching life science publications in Livivo with semantic entities from the linked data cloud. SEMANTiCS (Posters, Demos, SuCCESS) 2016