Published June 18, 2025 | Version v1
Dataset Open

Preprocessed CMAP dataset for UNAGI in-silico perturbation

Description

These are preprocessed drug/compound info files from CMAP LINCS 2020 database, https://clue.io/data/CMap2020#LINCS2020. 

  • 'cmap_drug_target.npy' uses the direct drug target genes provided in CMAP LINCS 2020.
  • 'cmap_drug_treated_res_cutoff.npy' uses genes which is up/down-regulated significantly after individual drug treatments in CMAP LINCS 2020. The level 5 MODZ score was used to determine the extent of gene expression changes after treatments. Top 5% significantly changed drug-gene pairs are kept. NOTE: Running perturbation with this file could take a long time. To reduce the run time, you can use only a subset of this data. e.g. running 'use_only_drugs.py' to run perturbation with only established drugs.
  • 'cmap_direction_df.npy' indicates the direction of gene regulated by drugs after treatments. The level 5 MODZ score was used to indicate the direction of changes.
  • 'use_only_drugs.py' filters out compounds without a formal drug name, e.g. 'BRD-xxxxx' will be removed. This script can extract established drugs for the repurposing purpose. NOTE: if you use 'cmap_drug_target.npy', you don't need to run this script. 

Files

Files (1.6 GB)

Name Size Download all
md5:81db344d594857d8bf69ffe2c62ac3bf
849.8 MB Download
md5:7a3b1e2b5ca4b50f68c4fa6d1400d8ae
139.2 kB Download
md5:e482e807a83d37d2583499bc694372ca
711.3 MB Download
md5:47612e7e3bdc81e55d599fb465eec96f
261 Bytes Download