RegEl Database: text-mined regulatory elements from the literature and their associations to genes and disease

Garda, Samuele; Lenihan-Geels, Freyda; Proft, Sebastian; Hochmuth, Stefanie; Schülke, Markus; Seelow, Dominik; Leser, Ulf

doi:10.5281/zenodo.6418451

Published April 6, 2022 | Version v2

Dataset Open

RegEl Database: text-mined regulatory elements from the literature and their associations to genes and disease

1. Humboldt-Universitält zu Berlin
2. Charité-Universitätsmedizin Berlin
3. Berlin Institute of Health

@article{garda2022regel,
  title={RegEl corpus: identifying DNA regulatory elements in the scientific literature},
  author={Garda, Samuele and Lenihan-Geels, Freyda and Proft, Sebastian and Hochmuth, Stefanie and Sch{\"u}lke, Markus and Seelow, Dominik and Leser, Ulf},
  journal={Database},
  volume={2022},
  year={2022},
  publisher={Oxford Academic}
}

# RegEl PubMed Database

This database contains the annotations generated by running [HunFlair](https://github.com/flairNLP/flair/blob/master/resources/docs/HUNFLAIR.md) models trained on the [RegEl corpus](https://zenodo.org/record/5776679) over >20M PubMed abstracts.

By pairing these annotations with the one provided by PubTator this generates a large text mining database of regulatory elements associated with genes (normalized to NCBI Gene ids) and disease (normalized to either MeSH or OMIM).

The tables composing the database are:

* abstracts.db:
- pmid = PubMed ID of the given abstracts
- sid = sentence ID of the given abstracts (from 0 to # of sentences)
- text = text of the given sentence

* gene.db and disease.db:
- pmid = PubMed ID of the given abstracts
- sid = sentence ID of the given abstracts (from 0 to # of sentences)
- etype = entity type (enhancer, promoter, TFBS)
- ann_text = mention of the regulatory element as found in the abstract
- start = position (# character) in which the mention begins
- end = position (# characters) in which the mention ends
- score = model's confidence
- cui = gene or disease identifier
- cui_symbol = official symbol of cui (if available)

Files

regel_db.zip

Files (304.0 MB)

Name	Size	Download all
regel_db.zip md5:04d86b2c3d11fd9798fd4ef9b553af50	304.0 MB	Preview Download

	All versions	This version
Views	541	347
Downloads	26	25
Data volume	8.2 GB	7.9 GB

RegEl Database: text-mined regulatory elements from the literature and their associations to genes and disease

Creators

Description

Files

regel_db.zip

Files (304.0 MB)