Integrating Chemical Language and Physicochemical Features for Enhanced Molecular Property Prediction with Multimodal Language Models
Authors/Creators
- 1. IBM Research
- 2. IBM Reseach
Description
Here we present a novel multimodal language model (MultiModal-MoLFormer) approach for predicting molecular properties, which combines chemical language representation embeddings derived from the recently introduced MoLFormer chemical language model and physicochemical features. Our approach employs a causal multi-stage feature selection method that selects physicochemical features based on their direct causal-effect on a specific target property to predict. Specifically, we use Mordred descriptors as physicochemical features and Markov blanket causal graphs as the inference algorithm to identify the most relevant features. Our results demonstrate that our proposed approach outperforms existing state-of-the-art algorithms, including the chemical language-based MoLFormer and graph neural networks, in predicting complex tasks such as the biodegradability of general compounds and PFAS toxicity estimation. The MultiModal-MoLFormer model resulted in a significant improvement in the classification accuracy for EPA categories of PFAS Toxicity, from 0.75 to 0.84, when compared to the base MoLFormer approach. Additionally, our proposed approach achieves an accuracy of 0.94 for the biodegradability estimation task.
Files
BioKDD___Multimodal_MoLFormer.pdf
Files
(261.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:2120b77a402cb4488a65c8ca2a176c77
|
261.3 kB | Preview Download |
Additional details
Related works
- Cites
- https://arxiv.org/pdf/2306.14919.pdf (URL)