Stereoisomers are not Machine Learning's Best Friends: Stereo2vec Models
Authors/Creators
Description
This study addresses the challenge of accurately identifying stereoisomers in cheminformatics which originates from our objective to apply machine learning to predict association constant between a cyclodextrin and a guest. Identifying stereoisomers is indeed crucial for machine learning applications. Current tools offer various molecular descriptors, including their textual representation as Isomeric SMILES which can distinguish stereoisomers. But such representation is text-based and does not have a fixed size, so a conversion is needed to make it usable to machine learning approaches. Word embedding techniques can be used to solve this problem. Mol2vec, a word embedding approach for molecules, offers such a conversion. Unfortunately, it cannot distinguish between stereoisomers due to its inability to capture the spatial configuration of molecular structures. This study proposes several approaches that use word embedding techniques to handle molecular discrimination using stereochemical information of molecules or considering Isomeric SMILES notation as a text in Natural Language Processing. Our aim is to generate a distinct vector for each unique molecule, correctly identifying stereoisomer information in cheminformatics. The proposed approaches are then compared on our original machine learning task: predicting the association constant between a cyclodextrin and a guest molecule.
Files
LICENCE.md
Files
(13.5 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:f9f615ae90190f8347c5ae6a392351b7
|
660.8 MB | Download |
|
md5:b6de69f241cda5615c133bcfbc54d692
|
398.6 MB | Download |
|
md5:d9f0ce963c83ed5acb74c6eada4bba57
|
80.0 MB | Download |
|
md5:515931a6c01adad626d6f1777706bf8c
|
398.6 MB | Download |
|
md5:e8e60d5d37466c972ff425b79fbde07c
|
78.3 kB | Download |
|
md5:0c35559b2422359df4f76c3ae3ae1795
|
2.4 GB | Download |
|
md5:47dbadbf9711c6ac2b328f2db0554e90
|
78.0 kB | Download |
|
md5:05a6bb68ce2d2faa3fc87a56294fd2d1
|
26.6 kB | Download |
|
md5:e96518aedd9b7c50566b02d8a62eb99f
|
680.0 MB | Download |
|
md5:e35843fb855af0c25e926f144f6db512
|
1.5 kB | Preview Download |
|
md5:212f52cc42a2fa1b059ce102555ee37a
|
10.8 MB | Download |
|
md5:e70abb61c5f3ccaef3876c192492e24b
|
2.4 GB | Download |
|
md5:e159751ca7c730242fd8b23078f83fcf
|
10.8 MB | Download |
|
md5:8f15a128905a82d58da28bee4744c878
|
13.3 MB | Download |
|
md5:ad9d9a1fa614ee4031d0aab8c08c68df
|
2.4 GB | Download |
|
md5:bb6d28ef089b85228dd1ecd564558dad
|
12.2 MB | Download |
|
md5:0620ecf6e7013a743ff5a715508be9bf
|
7.8 MB | Download |
|
md5:ea5c0c3877560bb09f6ae03869412fd9
|
1.6 GB | Download |
|
md5:9da2787d034ebcbccfad40ecfb022a7f
|
11.7 MB | Download |
|
md5:46a9a2046d4ac076d53749d9cd586504
|
2.4 GB | Download |
|
md5:1b58401724409529005fcbfcf5c24e2e
|
10.9 MB | Download |
|
md5:2baa7254018775a1718c9a9778de16b4
|
1.3 kB | Preview Download |