Published October 9, 2025
| Version v1
Plot
Open
[Data augmentation in a TTL] - Figure 3 TMAPS // Comparison of USPTO and fictive reactions in terms of chemical space coverage.
Authors/Creators
Description
Here are the 2 interactive TMAPs shown in figure 3 a, b of our work. Feel free to explore the different reactions and molecules.
- Fig 3a: DRFP TMAP comparing the fictive dataset (~1M reactions) with USPTO140kt, labels are the dataset from which each reaction is originated. Each template is represented by 2 randomly picked reactions in each dataset, making a total of 55k reactions.
- Fig 3b: MHFP6 TMAP of starting materials (SM) considering 10,000 SM randomly picked from USPTO14kt and 40,000 SM randomly picked from the 1M fictive reactions.
Title of the manuscript:
"Data augmentation in a Triple Transformer Loop retrosynthesis model"
Abstract:
Reactions in the US Patent Office (USPTO) are biased towards a few over-represented reaction types, which potentially limits its usefulness for computer-assisted synthesis planning (CASP). To obtain an equilibrated dataset, we applied retrosynthesis templates to USPTO molecules as products (P) to generate starting materials (SM). We then used transformer T2 from our recently reported triple transformer loop (TTL) retrosynthesis model to predict reagents (R) for the SM®P reaction. Finally, we validated the prediction by requesting a high confidence prediction (>95%) for the prediction of P from SM+R by TTL transformer T3. We generated up to 5,000 reactions per template, resulting in 27.5 million validated fictive reactions covering the chemical space of the original UPSTO dataset. To exemplify the use of this dataset, we show that a single-step retrosynthesis transformer model trained with a template equilibrated subset of 1,097,374 fictive reactions outperforms the corresponding model trained on USPTO reactions only.
Files
Files
(27.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:1b2c0d72fc55b3592b3818a4ebd714b8
|
21.7 MB | Download |
|
md5:bc6438d1a1d1ff5648e3aa1d9dd47ee9
|
6.1 MB | Download |