Published June 18, 2025
| Version v1
Dataset
Open
Preprocessed CMAP dataset for UNAGI in-silico perturbation
Creators
Description
These are preprocessed drug/compound info files from CMAP LINCS 2020 database, https://clue.io/data/CMap2020#LINCS2020.
- 'cmap_drug_target.npy' uses the direct drug target genes provided in CMAP LINCS 2020.
- 'cmap_drug_treated_res_cutoff.npy' uses genes which is up/down-regulated significantly after individual drug treatments in CMAP LINCS 2020. The level 5 MODZ score was used to determine the extent of gene expression changes after treatments. Top 5% significantly changed drug-gene pairs are kept. NOTE: Running perturbation with this file could take a long time. To reduce the run time, you can use only a subset of this data. e.g. running 'use_only_drugs.py' to run perturbation with only established drugs.
-
'cmap_direction_df.npy' indicates the direction of gene regulated by drugs after treatments. The level 5 MODZ score was used to indicate the direction of changes.
-
'use_only_drugs.py' filters out compounds without a formal drug name, e.g. 'BRD-xxxxx' will be removed. This script can extract established drugs for the repurposing purpose. NOTE: if you use 'cmap_drug_target.npy', you don't need to run this script.
Files
Files
(1.6 GB)
Name | Size | Download all |
---|---|---|
md5:81db344d594857d8bf69ffe2c62ac3bf
|
849.8 MB | Download |
md5:7a3b1e2b5ca4b50f68c4fa6d1400d8ae
|
139.2 kB | Download |
md5:e482e807a83d37d2583499bc694372ca
|
711.3 MB | Download |
md5:47612e7e3bdc81e55d599fb465eec96f
|
261 Bytes | Download |