Published December 12, 2022 | Version v1
Presentation Open

An in-silico coverage evaluation of RPLC-MS chemical space via two different retention indices scales

  • 1. University of Amsterdam
  • 2. University of Queensland
  • 3. Imperial College London


Resolving the human and environmental exposome is an extremely challenging task due to the complexity of their chemical space (e.g. the number of potential structures). Reversed phase liquid chromatography (RPLC), particularly C18, coupled to high resolution mass spectrometry (HRMS) is one of the most dominant techniques for non-targeted analysis (NTA) and thus tackling the exposome. The NTA approaches, even though the most comprehensive available, do not provide any information about the covered chemical spaces. This implies that what was outside of the covered chemical space remains unknown. Another limitation of NTA is the difficulty in confident chemical identification, since multiple compounds may be potential candidates, especially considering the ever expanding the chemical databases. In this study, a machine learning based tool was built to access the coverage of RPLC chemical space. For this purpose, two independent quantitative structure–retention relationship (QSRR)-based models were used to predict a conventional and a novel type of retention indices (ri) for around 100k compounds from the NORMAN database (i.e. SusDat). To whether a chemical is analyzable with an RPLC system, we employed a leverage-based and a probability density-based approach. Combining the data gathered from the models, it can be predicted whether a compound can be analyzed using a C18 LC column with a certain level of confidence. This assessment showed an accuracy of 63% correct classifications and set up a concrete workflow suitable for similar studies.



Files (6.8 MB)

Name Size Download all
6.8 MB Preview Download