Published July 14, 2023 | Version 1
From Pixels to Phenotypes: Integrating Image-Based Profiling with Cell Health Data Improves Interpretability

  • 1. University of Cambridge
  • 2. Uppsala University
  • 3. Broad Institute of MIT and Harvard
  • 4. Spjuth



Cell Painting assays generate morphological profiles that are versatile descriptors of biological systems and have been used to predict in vitro and in vivo drug effects. However, Cell Painting features are based on image statistics, and are, therefore, often not readily biologically interpretable. In this study, we introduce an approach that maps specific Cell Painting features into the BioMorph space using readouts from comprehensive Cell Health assays. We validated that the resulting BioMorph space effectively connected compounds not only with the morphological features associated with their bioactivity but with deeper insights into phenotypic characteristics and cellular processes associated with the given bioactivity. The BioMorph space revealed the mechanism of action for individual compounds, including dual-acting compounds such as emetine, an inhibitor of both protein synthesis and DNA replication. In summary, BioMorph space offers a more biologically relevant way to interpret cell morphological features from the Cell Painting assays and to generate hypotheses for experimental validation.


The following datasets are released:

Cell_Health_median_357_profiles_70_labels.csv :
The Cell Heath dataset for CRISPR perturbations. Contains median consensus signatures for the 357 consensus profiles (119 CRISPR perturbations × 3 cell lines) Ref: Way et al.

The Cell Painting dataset for CRISPR perturbations. Contains 827 morphology features (and metadata annotation) for 357 consensus profiles (119 CRISPR perturbations × 3 cell lines). Ref: Way et al.

The Cell Painting dataset for compound perturbations. Contains 658 structurally unique compounds with 827 Cell Painting features. Ref: Bray et al

The biological assay activity labels for compound perturbations. Contains 658 structurally unique compounds with 9 biological activity consensus hit calls. Ref: ToxCast/MoleculeNet

The dataset of standardised BioMorph term p-values. Contains 398 BioMorph terms for the 658 compounds in the biological activity dataset. 

Way et al. Predicting cell health phenotypes using image-based morphology profiling. Mol Biol Cell. 2021;32(9):995-1005.
Bray et al. A dataset of images and morphological profiles of 30 000 small-molecule treatments using the Cell Painting assay. Gigascience. 2017;6(12):1-5. 
MoleculeNet: Wu et al. MoleculeNet: A benchmark for molecular machine learning. Chem Sci. 2018;9(2):513-530. 
ToxCast: Exploring ToxCast Data | US EPA (accessed Jul 9, 2023).