Published August 7, 2025 | Version v1
Dataset Open

Benchmarking gene embeddings from sequence, expression, network, and text models for functional prediction tasks

Description

These files are a companion to our gene embedding benchmarking paper. Here, all 38 embeddings methods benchmarked are saved as two files - one containing entrez gene row indexes and one containing the vector of values defining the embedding. Two archives contain different gene subsets for each embedding, "full" contains the original embeddings (all genes) and "intersect" reduces the genes to a common set present across all 38 methods.

Please cite our manuscript instead of the zenodo doi if using these files.

Associated code is available at github.com/ylaboratory/gene-embedding-benchmarks.

Files

Files (7.2 GB)

Name Size Download all
md5:4c54c0d1b9716d4c5688c27895d7abdb
4.8 GB Download
md5:49e52541e792cc939cb9979db993b73a
2.4 GB Download