Published August 31, 2017 | Version v1
Journal article Open

A transcriptomics data-driven gene space accurately predicts liver cytopathology and drug-induced liver injury

  • 1. Institute of Environmental Medicine, Karolinska Institutet, Nobelsväg 13, Box 210, SE-17177, Stockholm, Sweden
  • 2. Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Konemiehentie 2, PO Box 15400, 00076 Aalto, Finland
  • 3. Department of Bioinformatics - BiGCaT, Maastricht University, Universiteitssingel 50, P.O. Box 616, UNS 50 Box19, NL-6200 MD, Maastricht, The Netherlands
  • 4. Institute for Molecular Medicine Finland, FIMM, University of Helsinki, Tukholmankatu 8, P.O. Box 20, FI-00014 Helsinki, Finland
  • 5. Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Gustaf Hällströmin katu 2b, P.O. Box 68, FI-00014 Helsinki, Finland



  • 1. Prof., Aalto University, Department of Information and Computer Science, Finland


Predicting unanticipated harmful effects of chemicals and drug molecules is a difficult and costly task. Here we utilize a “big data compacting and data fusion”- concept to capture diverse adverse outcomes on cellular and organismal levels.  The approach generates from transcriptomics data set a “predictive toxicogenomics space” (PTGS) tool composed of 1331 genes distributed over 14 overlapping cytotoxicity-related gene space components. Involving approximately 2.5 x 108 data points and 1300 compounds to construct and validate the PTGS, the tool serves to: explain dose-dependent cytotoxicity effects, provide a virtual cytotoxicity probability estimate intrinsic to omics data, predict chemically-induced pathological states in liver resulting from repeated dosing of rats, and furthermore, predict human drug-induced liver injury (DILI) from hepatocyte experiments. Analyzing 68 DILI-annotated drugs, the PTGS tool outperforms and complements existing tests, leading to a hereto-unseen level of DILI prediction accuracy. Custom R code and methods to calculate the component-based PTGS scores using gene expression data.


The authors want to thank Ida Lindenschmidt and the High Throughput Biomedicine unit at FIMM for technical support to cellular high-throughput screening assays. J.A.P. and S.K. acknowledge support from The Academy of Finland (Finnish Centre of Excellence in Computational Inference Research COIN, 251170; Computational Modeling of the Biological Effects of Chemicals, 140057) and Helsinki Doctoral Programme in Computer Science. P.K. and R.C.G. acknowledge support from FP7-Theme HEALTH-2010-Alternative-Testing, through SEURAT/ToxBank and Cosmetics Europe under Grant Agreement nr: 267042, the Swedish Research Council, Swedish Vinnova/EUROSTARS E!9698 - ToxHQ CRO, Swedish Cancer and Allergy Fund, the Swedish Fund for Research without Animal Experiments, Finnish Foundation's Post-doc research grant award to P.K., and Karolinska Institutet. K.W. acknowledges support from the Jane and Aatos Erkko Foundation.


Files (3.1 MB)

Name Size Download all
3.1 MB Preview Download
4.4 kB Download
3.3 kB Download
1.7 kB Download

Additional details

Related works


caLIBRAte – Performance testing, calibration and implementation of a next generation system-of-systems Risk Governance Framework for nanomaterials 686239
European Commission
TOXBANK – ToxBank – Supporting Integrated Data Analysis and Servicing of Alternative Testing Methods in Toxicology 267042
European Commission