RegEl Database: text-mined regulatory elements from the literature and their associations to genes and disease
Creators
- 1. Humboldt-Universitält zu Berlin
- 2. Charité-Universitätsmedizin Berlin
- 3. Berlin Institute of Health
Description
@article{garda2022regel, title={RegEl corpus: identifying DNA regulatory elements in the scientific literature}, author={Garda, Samuele and Lenihan-Geels, Freyda and Proft, Sebastian and Hochmuth, Stefanie and Sch{\"u}lke, Markus and Seelow, Dominik and Leser, Ulf}, journal={Database}, volume={2022}, year={2022}, publisher={Oxford Academic} }
# RegEl PubMed Database
This database contains the annotations generated by running [HunFlair](https://github.com/flairNLP/flair/blob/master/resources/docs/HUNFLAIR.md) models trained on the [RegEl corpus](https://zenodo.org/record/5776679) over >20M PubMed abstracts.
By pairing these annotations with the one provided by PubTator this generates a large text mining database of regulatory elements associated with genes (normalized to NCBI Gene ids) and disease (normalized to either MeSH or OMIM).
The tables composing the database are:
* abstracts.db:
- pmid = PubMed ID of the given abstracts
- sid = sentence ID of the given abstracts (from 0 to # of sentences)
- text = text of the given sentence
* gene.db and disease.db:
- pmid = PubMed ID of the given abstracts
- sid = sentence ID of the given abstracts (from 0 to # of sentences)
- etype = entity type (enhancer, promoter, TFBS)
- ann_text = mention of the regulatory element as found in the abstract
- start = position (# character) in which the mention begins
- end = position (# characters) in which the mention ends
- score = model's confidence
- cui = gene or disease identifier
- cui_symbol = official symbol of cui (if available)
Files
regel_db.zip
Files
(304.0 MB)
Name | Size | Download all |
---|---|---|
md5:04d86b2c3d11fd9798fd4ef9b553af50
|
304.0 MB | Preview Download |