Published November 18, 2025 | Version v1
Dataset Open

Can we define "dark genome" using the number of research resources?

Authors/Creators

Contributors

Data collector:

Researcher:

  • 1. SciCrunch Inc
  • 2. ROR icon University of California, San Diego

Description

The dark genome, as defined by those genes that are understudied (Opera, 2019) have been defined to some extent by the Illuminating the Druggable Genome, IDG project (RRID:SCR_016924). One goal of this project is to encourage the study of genes that are likely to be important for precision medicine but are currently understudied. The KOMP project (RRID:SCR_005571) aims to knock out 8500 genes from mice, to deliver these mice and their associated phenotype data to the research community also to illuminate the dark genome. Pharos pharos.nih.gov (RRID:SCR_016258) provides a list of gene symbols and a ‘Tdark’ tag when genes are not sufficiently understood. We apply this convenient definition of darkness as a baseline. The RRID project works with antibody manufacturers, plasmid providers and animal stock centers and contains the largest freely available list of these key resources. We use the search for individual gene names in the RRID.site API as a proxy for the availability of tools.  

We do not know how well the presence of a knock out animal, the presence of data as defined by the IDG project, or the availability of tools like antibodies and plasmids impact the attention of researchers on specific genes.

Thus to determine if the availability of scientific tools (including animals, antibodies and plasmids) could substitute for a more manual definition of the dark genome we explored correlations between the Pharos dark genome definition and the numbers of resources. The output is a survey of key data about the less studied genes that are likely to be interesting targets for the study of disease. 

Methods

First we wanted to enhance our understanding of the dark genome. We downloaded the list of mouse genes and a list of manuscripts associated with mouse genes from the NCBI gene (RRID:SCR_002473). After the compilation of the full list of mouse genes, we queried the scicrunch.org Application Programming Interface, API, to determine which antibodies, plasmids and organisms were available for scientists to use for which genes. Papers were assessed by querying the PubMed database (RRID:SCR_004846).  Drugs were assessed by using the Drug-Gene Interaction Database (DGIdb) database (RRID:SCR_006608). For each mouse gene, we queried the DGIdb API and extracted the number of drugs associated with the gene. 

We downloaded the Pharos dataset from https://pharos.nih.gov/targets (RRID:SCR_016258) on Tuesday, October 7, 2025. We found that out of 16,240 genes in both the pharos and NCBI mouse gene databases, 2,684 were considered dark according to pharos. 

With this data, we created five candidate lists of dark genes: for each of the five above resources (antibodies, plasmids, organisms, papers, and drugs), we defined darkness as follows: we established a threshold such that all genes associated with less resources of that type than the thresholds are considered dark. For example, all genes mentioned in less than 17 papers are dark. We compared our dark gene lists to the pharos list using Fisher's Exact Test, and picked the thresholds minimizing the resulting p-value. When multiple thresholds returned a p-value of 0.0, we picked the threshold such that the number of genes in our dark list was the closest to the number of genes in the pharos dark list. 

We will present the data in a histogram where gaps in research tools will be highlighted and genes that seem to have significant tooling will be shown as a set of genes that should be able to be explored. Areas in which few papers exist, but tools are available will be highlighted. 

Results

We explored whether the dark genome can be defined as that set of genes that are lacking any tools, such as plasmids, drugs, antibodies and transgenic animals. We posited that at least one tool that can specifically probe a gene or gene product creates a possibility of that gene can be studied, thus much more likely to be a light gene. 

We used the definition of dark genome as a list defined by the Pharos consortium. Our comparison yielded the number of resources in each type that most closely corresponded with the Pharos dark genes, and for two types, plasmids and drugs, the number of reagents that was most closely associated with the dark genome was one. Animals and antibodies required additional resources before they could be considered light, which may be because those tools are not quite as robust for the study of a gene.  

Interestingly, the analyses yielded a set of genes that are “most likely to become light” because they have a significant number of associated tools, but are still considered part of the dark genome by Pharos.  

REFERENCE

Oprea, T. I. (2019). Exploring the dark genome: implications for precision medicine. Mammalian Genome, 30(7), 192-200.

Files

Comparison of Dark Gene Counts Venn Diagrams.png

Files (24.5 MB)

Name Size Download all
md5:5379574053937da6ee5c0b8c40e41b5e
370.2 kB Preview Download
md5:9913bcbceba9501b6ac229ad359bb578
1.8 kB Download
md5:d70f85c6022a2128e4bf83b802b290e5
558 Bytes Download
md5:3008d0a36e653dad996a0794ed7e4cc5
4.0 kB Download
md5:37260c3a37f9c4d16fc8c97d3bb0672b
22.3 MB Preview Download
md5:3c1f08d3ebfba9998c371f9f963cdf51
3.5 kB Download
md5:0d44055ecc40727f508aa31644fdbcf5
13.2 kB Download
md5:0216b9c7b516b65154397357526f3ce7
305 Bytes Download
md5:8b60fd9f75b6429fca9d801d6c26331a
8.3 kB Download
md5:9222b48354e45a93a43ebcdb42ddeb8a
333 Bytes Download
md5:e3633e11d9495ac8d80e86beb1976768
1.4 MB Preview Download
md5:98349352b53504c948b01816b592464b
474.2 kB Preview Download
md5:8e714810eaa90f9f7d2bff94c6bd9fcd
4.1 kB Download
md5:c8a90d9a75ce8d482dfb31864fc1e39a
304 Bytes Download
md5:cbd18d1a531f2b90491a01aa615fb242
502 Bytes Download
md5:b8ebef8f5e60746889b711ef7f7d2348
509 Bytes Download
md5:edb982841ab6503e92ae7a9abdbab258
8.2 kB Download
md5:ab4ec5a21cc35a43514746887856fda9
514 Bytes Download
md5:48b61e03aacd8848a2763640dad10a4a
2.5 kB Download

Additional details

Dates

Available
2025-11-18
SfN meeting

Software

Programming language
Python , SQL