Can we define "dark genome" using the number of research resources?
Authors/Creators
Contributors
Data collector:
Researcher:
Supervisor:
Description
The dark genome, as defined by those genes that are understudied (Opera, 2019) have been defined to some extent by the Illuminating the Druggable Genome, IDG project (RRID:SCR_016924). One goal of this project is to encourage the study of genes that are likely to be important for precision medicine but are currently understudied. The KOMP project (RRID:SCR_005571) aims to knock out 8500 genes from mice, to deliver these mice and their associated phenotype data to the research community also to illuminate the dark genome. Pharos pharos.nih.gov (RRID:SCR_016258) provides a list of gene symbols and a ‘Tdark’ tag when genes are not sufficiently understood. We apply this convenient definition of darkness as a baseline. The RRID project works with antibody manufacturers, plasmid providers and animal stock centers and contains the largest freely available list of these key resources. We use the search for individual gene names in the RRID.site API as a proxy for the availability of tools.
We do not know how well the presence of a knock out animal, the presence of data as defined by the IDG project, or the availability of tools like antibodies and plasmids impact the attention of researchers on specific genes.
Thus to determine if the availability of scientific tools (including animals, antibodies and plasmids) could substitute for a more manual definition of the dark genome we explored correlations between the Pharos dark genome definition and the numbers of resources. The output is a survey of key data about the less studied genes that are likely to be interesting targets for the study of disease.
Methods
First we wanted to enhance our understanding of the dark genome. We downloaded the list of mouse genes and a list of manuscripts associated with mouse genes from the NCBI gene (RRID:SCR_002473). After the compilation of the full list of mouse genes, we queried the scicrunch.org Application Programming Interface, API, to determine which antibodies, plasmids and organisms were available for scientists to use for which genes. Papers were assessed by querying the PubMed database (RRID:SCR_004846). Drugs were assessed by using the Drug-Gene Interaction Database (DGIdb) database (RRID:SCR_006608). For each mouse gene, we queried the DGIdb API and extracted the number of drugs associated with the gene.
We downloaded the Pharos dataset from https://pharos.nih.gov/targets (RRID:SCR_016258) on Tuesday, October 7, 2025. We found that out of 16,240 genes in both the pharos and NCBI mouse gene databases, 2,684 were considered dark according to pharos.
With this data, we created five candidate lists of dark genes: for each of the five above resources (antibodies, plasmids, organisms, papers, and drugs), we defined darkness as follows: we established a threshold such that all genes associated with less resources of that type than the thresholds are considered dark. For example, all genes mentioned in less than 17 papers are dark. We compared our dark gene lists to the pharos list using Fisher's Exact Test, and picked the thresholds minimizing the resulting p-value. When multiple thresholds returned a p-value of 0.0, we picked the threshold such that the number of genes in our dark list was the closest to the number of genes in the pharos dark list.
We will present the data in a histogram where gaps in research tools will be highlighted and genes that seem to have significant tooling will be shown as a set of genes that should be able to be explored. Areas in which few papers exist, but tools are available will be highlighted.
Results
We explored whether the dark genome can be defined as that set of genes that are lacking any tools, such as plasmids, drugs, antibodies and transgenic animals. We posited that at least one tool that can specifically probe a gene or gene product creates a possibility of that gene can be studied, thus much more likely to be a light gene.
We used the definition of dark genome as a list defined by the Pharos consortium. Our comparison yielded the number of resources in each type that most closely corresponded with the Pharos dark genes, and for two types, plasmids and drugs, the number of reagents that was most closely associated with the dark genome was one. Animals and antibodies required additional resources before they could be considered light, which may be because those tools are not quite as robust for the study of a gene.
Interestingly, the analyses yielded a set of genes that are “most likely to become light” because they have a significant number of associated tools, but are still considered part of the dark genome by Pharos.
REFERENCE
Oprea, T. I. (2019). Exploring the dark genome: implications for precision medicine. Mammalian Genome, 30(7), 192-200.
Files
Comparison of Dark Gene Counts Venn Diagrams.png
Files
(24.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:5379574053937da6ee5c0b8c40e41b5e
|
370.2 kB | Preview Download |
|
md5:9913bcbceba9501b6ac229ad359bb578
|
1.8 kB | Download |
|
md5:d70f85c6022a2128e4bf83b802b290e5
|
558 Bytes | Download |
|
md5:3008d0a36e653dad996a0794ed7e4cc5
|
4.0 kB | Download |
|
md5:37260c3a37f9c4d16fc8c97d3bb0672b
|
22.3 MB | Preview Download |
|
md5:3c1f08d3ebfba9998c371f9f963cdf51
|
3.5 kB | Download |
|
md5:0d44055ecc40727f508aa31644fdbcf5
|
13.2 kB | Download |
|
md5:0216b9c7b516b65154397357526f3ce7
|
305 Bytes | Download |
|
md5:8b60fd9f75b6429fca9d801d6c26331a
|
8.3 kB | Download |
|
md5:9222b48354e45a93a43ebcdb42ddeb8a
|
333 Bytes | Download |
|
md5:e3633e11d9495ac8d80e86beb1976768
|
1.4 MB | Preview Download |
|
md5:98349352b53504c948b01816b592464b
|
474.2 kB | Preview Download |
|
md5:8e714810eaa90f9f7d2bff94c6bd9fcd
|
4.1 kB | Download |
|
md5:c8a90d9a75ce8d482dfb31864fc1e39a
|
304 Bytes | Download |
|
md5:cbd18d1a531f2b90491a01aa615fb242
|
502 Bytes | Download |
|
md5:b8ebef8f5e60746889b711ef7f7d2348
|
509 Bytes | Download |
|
md5:edb982841ab6503e92ae7a9abdbab258
|
8.2 kB | Download |
|
md5:ab4ec5a21cc35a43514746887856fda9
|
514 Bytes | Download |
|
md5:48b61e03aacd8848a2763640dad10a4a
|
2.5 kB | Download |
Additional details
Dates
- Available
-
2025-11-18SfN meeting