Published 2024 | Version v1
Conference proceeding Open

Comparative Study of Random Forest, Gradient Boosted Trees, Feedforward Neural Networks and Convolutional Neural Networks Using Fingerprints and Molecular Descriptors for Adverse Drug Reaction Prediction

  • 1. George Emil Palade University of Medicine, Pharmacy, Science and Technology of Targu Mures

Description

Adverse drug reactions have, in some cases, devastating effects on the population, so it is very important to develop decision support systems based on artificial intelligence to predict such situations having a preventive effect with an important positive impact on population health. This research aims to study the influence of features, which are different characteristics used to represent chemical drug compounds, on the prediction of adverse reactions. A chemical fingerprint is defined as being  a distinctive characteristic or pattern that indicates the presence of a certain molecule and a  molecular descriptor that is a mathematical representation of a molecule’s properties, which describes the chemical information of the molecule. Based on a comprehensive study of the state-of-the-art artificial intelligence algorithms such as Random Forest (RF), Gradient Boosted Trees (GBT), Feedforward Neural Network (FFN) and Convolutional Neural Network (CNN) are chosen,  with which prediction models were created with different approaches to features such as the use of fingerprints, the use of molecular descriptors or the co mbination of fingerprints and molecular descriptors in order to obtain improved results. The algorithms specified above were selected since they are powerful and can capture complex patterns along the given features. The experimental evaluations consisted of identifying four effective fingerprints, namely: Pattern, Torsion, Atom Pair and Morgan, creating a machine learning model for each fingerprint type and using them as input data features, observing the results, after which two types of molecular descriptors were chosen, namely, molecular weight and number of valence electrons, we combined them as a feature of the input data, created machine learning models which we trained and observed the results, then, the last experiment, consisted of combining the molecular descriptors and the fingerprints, creating the machine learning models, training and observing the results. According to the experimental evaluation results RF and GBT had better results, with significant differences of even 20% in accuracy in certain cases, such as using molecular descriptors for Hepatobiliary disorders, than FNN and CNN. Another conclusion that can be formulated is the fact that the fingerprints had the greatest impact on the performance increase of the models, so the use of fingerprints and molecular descriptors together as features gave the best results among all the performed tests.

Files

1-s2.0-S1877050924027625-main.pdf

Files (762.9 kB)

Name Size Download all
md5:7db6ee99377446576c5a9d027be3061e
762.9 kB Preview Download

Additional details