Published November 10, 2022 | Version 1.0
Software Open

Predicting endocrine disruption using conformal prediction – a prioritisation strategy to identify hazardous chemicals with confidence

  • 1. Chemistry Department, Umeå University, 901 87, Umeå, Sweden
  • 2. Department of Computer and Systems Sciences, Stockholm University, Box 7003, 164 07 Kista, Sweden; MTM Research Centre, School of Science and Technology, Örebro University, 701 82 Örebro, Sweden; Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75 124, Uppsala, Sweden

Contributors

Project leader:

Researcher:

  • 1. Department of Computer and Systems Sciences, Stockholm University, Box 7003, 164 07 Kista, Sweden; MTM Research Centre, School of Science and Technology, Örebro University, 701 82 Örebro, Sweden; Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75 124, Uppsala, Sweden
  • 2. Chemistry Department, Umeå University

Description

Receptor-mediated molecular initiating events (MIEs) and their relevance in endocrine activity (EA) have been highlighted in literature. More than 15 receptors have been associated with neurodevelopmental adversity and metabolic disruption. MIEs describe chemical interactions with defined biological outcomes, a relationship that could be described with quantitative structure–activity relationship (QSAR) models. QSAR uncertainty can be assessed using Conformal Prediction (CP) framework, which provides similarity (i.e. non-conformity) scores relative to the defined classes per prediction. CP calibration can indirectly mitigate data imbalance during model development, and the non-conformity scores serve as intrinsic measures of chemical applicability domain assessment during screening. The focus of this work was to propose an in silico predictive strategy for EA. First, 23 QSAR models for MIEs associated with EA were developed using high-throughput data for 14 receptors. To handle data imbalance, five protocols were compared, and CP provided the most balanced class definition. Second, the developed QSAR models were applied to a large dataset (~55,000 chemicals), comprising chemicals representative of potential risk for human exposure. Using CP, it was possible to assess uncertainty of the screening results, identify model strengths and out of domain chemicals. Last, two clustering methods, t-distributed stochastic neighbour embedding (t-SNE) and Tanimoto similarity, were used to identify compounds with potential EA using as reference known endocrine disruptors. The cluster overlap between methods brought forward 23 chemicals with suspected or demonstrated EA potential. The presented models could be utilized for first-tier screening and identification of compounds with potential biological activity across he studied MIEs.

Manuscript DOI: 10.1021/acs.chemrestox.2c00267

Supplementary Information 2: Python code for Conformal Prediction implementation for all 23 developed models. 

Contents: CP python code, readme, licence file and example input file

Notes

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreements No. 825759, ENDpoiNTs, and No. 825489, GOLIATH, both part of the EURION cluster. This work has been partially funded (UN) from the Swedish Foundation for Strategic Environmental Research, MISTRA (grant no. DIA 2018/11), Safe and Efficient Chemistry by Design (SafeChem).

Files

cp_code.zip

Files (26.5 kB)

Name Size Download all
md5:aeb18c91161efba9f2e2f2c4bcb644b7
26.5 kB Preview Download

Additional details

Funding

European Commission
GOLIATH - Beating Goliath: Generation Of NoveL, Integrated and Internationally Harmonised Approaches for Testing Metabolism Disrupting Compounds 825489
European Commission
ENDpoiNTs - Novel Testing Strategies for Endocrine Disruptors in the Context of Developmental NeuroToxicity 825759