There is a newer version of the record available.

Published December 29, 2018 | Version v3
Dataset Open

BioWordlists

Creators

  • 1. University of British Columbia

Description

This describes the output files for the BioWordlists project. These files are ancillary data for other text mining projects.

Each file is a tab-delimited file with one term per line. The first column is a unique ID. The second column is the main name of the term. The third column is a pipe-delimited set of the synonyms for this term (including the main term).

terms_genes.tsv: This is a list of all human genes with synonyms. The first column is the HUGO gene ID. It includes an additional fourth column is the Entrez gene ID. Genes are built using the NCBI Gene resource with synonyms from the UMLS Metathesaurus.

terms_drugs.tsv: This is a list of all drugs from the WikiData resource. It also includes some more general terms and inhibitors terms for all genes in the gene list.

terms_cancers.tsv: This is a list of specific cancer types from the Disease Ontology. General cancer terms have been removed and synonyms added from the UMLS Metathesaurus.

terms_variants.tsv: Common mutations, aberrations and other 'omic events that may occur to a gene, especially in the cancer setting.

terms_conflicting.tsv: Several common biomedical terms that are easily confused with other useful concepts. An examples is "Cox Regression". This list is used to identify these to reduce ambiguity.

terms_proteins.tsv: Human protein names from UniProt with synonyms.

Files

Files (18.9 MB)

Name Size Download all
md5:b17ce62506e6984090ad56f166b9c6de
945.5 kB Download
md5:d1e166d2c88260e6df904470db4a05d4
826 Bytes Download
md5:de998412db150be35f2f0e61d1fde5f8
13.5 MB Download
md5:1cc7c1b33d0cc595039a0b7da5c385d7
2.0 MB Download
md5:916326f93d60a7a5afb3e5924d6d56ab
2.4 MB Download
md5:e5cf740cdfba112e785b6276be295643
9.0 kB Download