RPIS_FragmentationLibraries: Molecular Fragments Useful for Design of Molecular Glues for Protein-RNA complexes
Authors/Creators
Description
RPIS_FragmentationLibraries: Molecular Fragments Useful for Design of Molecular Glues for Protein-RNA Complexes
This Database lives at:
- [Zenodo] (Large Data Files)
- [github] (Executables and README)
- [CSBJ] (Original Publication)
Overview
This repository consists of two primary components: chemical compound libraries in SMILES format, a universally recognized representation in computational molecular sciences.
The fragment libraries presented here are part of a peer-reviewed scientific study. These fragments were derived through a comprehensive in silico workflow designed to identify promising stabilizer candidates for protein–RNA interactions (RPIs).
The workflow integrates several computational methods, including:
- Binding pocket detection and evaluation
- Molecular docking
- Molecular dynamics simulations
- Binding free energy estimations
This pipeline yielded a set of stabilizer candidates, which were then used to generate fragment libraries through two distinct approaches:
1. Extended Connectivity Fingerprints (ECFP) – to extract the most representative chemical features.
2. Breaking of Retrosynthetically Interesting Chemical Substructures (BRICS) – to decompose molecules into synthetically meaningful fragments.
Both approaches were implemented using Python’s rdkit.Chem module and can be used for a large spectrum of different applications.
---
Highlights
- 1,000 most abundant ECFP-derived fragments for compound database filtering
- Executable scripts demonstrating the database filtering workflow
- T38DrugDB filtered database, organized into 9 sub-databases, containing only molecules with at least 2 to 10 fragment matches (provided as .tar.xz archives for portability)
- All 213 BRICS fragments decomposed from the 96 high-ranking stabilizer ligands identified in our study
- Executables for de novo compound generation from the 213 BRICS fragments, with adjustable maxDepth parameters
- Comprehensive 4,000,000-compound database, generated from all possible combinations of the 213 BRICS fragments with maxDepth = 3
Intended Usage
The fragment libraries in this repository are divided into two subsets: ECFP-derived fragments and BRICS-derived fragments. Although both provide databases of SMILES strings, their intended applications differ.
- ECFP fragments can be used to filter existing compound databases for molecular patterns enriched in RPI stabilizers identified through our in silico workflow. These fragments are particularly useful for identifying chemical motifs associated with stabilizing protein–RNA interactions.
- BRICS fragments, on the other hand, also encode combinatorial information. In this method, chemical bonds of RPI stabilizers were broken in a retrosynthetically meaningful way using the rdkit.Chem.BRICS module. The resulting fragments can be recombined following the same retrosynthetic rules, enabling the construction of new compounds enriched with RPI-stabilizing features. These fragments can also be used for database filtering, but since they lack fragment importance scores, filtering must be done naively using all fragments.
While these two applications, database filtering (ECFP) and de novo compound generation (BRICS), represent the primary intended uses, they are not the only possibilities. Generative machine learning techniques, for instance, could utilize these fragment libraries to design new potential binders, particularly leveraging the ECFP dataset.
In general, the scope of applications for these fragment libraries is broad and limited only by the creativity of the user. The included executable scripts demonstrate the core workflows for database filtering (ECFP) and de novo compound assembly (BRICS).
ECFP – Database Filtering
Within the github ECFP/ directory, you will find the executable script RefilterDatabaseWithECFP.py and the example dataset sampleDB.csv. The fragment data required for execution is provided in the file MMGBSA_ChemicalAnalysisFragments_cutoff10_ranked_datatable_3orMore.csv. This file must be present to run RefilterDatabaseWithECFP.py without errors. After installing all required dependencies (listed at the beginning of RefilterDatabaseWithECFP.py), the script can be executed to produce nine .csv files named in the format: ECFP_fragmentFilteredLibrary_#N#FragsOrMore.csv. These files serve as example outputs.
The results of the filtering process are stored in the Zenode database . That dataset contains the results of applying the filtering procedure to the T38DrugDB, which includes approximately 34 million drug-like compounds published elsewhere. The filtered sub-databases are provided as compressed `.tar.xz` archives for portability. These databases can be used for targeted virtual screening of RPI-stabilizing drug candidates. This example workflow can be executed as follows:
# clone this repositorygit clone git@github.com:Foly93/RPIS_FragmentationLibraries.git ./
# switch to ECFP directorycd RPIS_FragmentationLibraries/ECFP
# INSTALL REQUIRED PYTHON PACKAGES FROM YOUR FAVOURITE PACKAGE MANAGER e.g. micromambamicromamba activate your_env_name_goes_here
# execute the Python programpython RefilterDatabaseWithECFP.py
# Check if the expected output was generated (assuming ubuntu, mac or power shell)ls -rtal ECFP_fragmentFilteredLibrary_*FragsOrMore.csv
BRICS – De Novo Compound Assembly
The BRICS/ directory contains several executables that serve different purposes, as well as the BRICS fragment library RPIS_BRICS_Fragment_DB.smi. This .smifile contains 213 BRICS fragments, which can be combined to generate novel molecules using the rdkit.Chem.BRICS module. This functionality is demonstrated across three executables:
- BRICS_fragment_database_interactive.ipynb
- build_one_example_Mol_from_BRICS_fragments.py
- generate_N_Mols_from_BRICS_fragments.py
The Jupyter notebook BRICS_fragment_database_interactive.ipynb requires a Python environment with jupyter notebook installed. It provides a visual and interactive overview of the fragment set, demonstrating how to:
- Display the BRICS fragments
- Assemble random molecules from the fragment library
- Export the resulting molecules as SMILES strings
The assembly process uses the RDKit function BRICS.BRICSBuild. Its parameters are highly sensitive—particularly the maxDepth option, which controls the maximum number of fragments combined into a single molecule. With 213 fragments available, the total combinatorial space is on the order of 4 × 10¹¹ possible assemblies, making exhaustive enumeration computationally infeasible.
The script build_one_example_Mol_from_BRICS_fragments.py replicates the notebook’s core functionality in a standalone Python executable. It generates a single example molecule, saving the output to a file identified by its timestamp.
The final executable, generate_N_Mols_from_BRICS_fragments.py, creates a specified number of BRICS-assembled molecules and saves them into a timestamped .smi file. While it can be run without command-line arguments, optional parameters are available and can be displayed using the -h flag. The generated .smi files can be directly used for virtual screening or binding affinity prediction in drug discovery workflows. This showcase can be executed as follows:# clone this repository# not necessary if already donegit clone git@github.com:Foly93/RPIS_FragmentationLibraries.git ./
# switch to BRICS directorycd RPIS_FragmentationLibraries/BRICS
# INSTALL REQUIRED PACKAGES FROM YOUR FAVOURITE PACKAGE MANAGER e.g. micromambamicromamba activate your_env_name_goes_here
# open the jupyter notebook in your browser and have a good look at the functionalityjupyter notebook BRICS_fragment_database_interactive.ipynb
# execute the one-example-python-programpython build_one_example_Mol_from_BRICS_fragments.py
# execute the batch creation python programpython generate_N_Mols_from_BRICS_fragments.py
# display the options of the batch creation python programpython generate_N_Mols_from_BRICS_fragments.py -h
# run the python program with custom flagspython generate_N_Mols_from_BRICS_fragments.py \ --maxDepth 3 \ --numMold 10 \ --scrambleReagents True \ --outputDirectory ../trash
Finally, the Zenodo Database also contains BRICS_DB_BuiltMaxDepth_3.txt a data base that contains all possible combinations for maxDepth set to 3. This file contains 3,878,955 compound SMILES strings which is less that the theoretically possible 213 x 213 x 213 = 9,663,597 which results from incompatibilities between some BRICS fragments and duplicate entries.
File Description
RPIS_FragmentationLibraries/├── README.md # THIS file├── ECFP_fragmentFilteredLibrary_3FragsOrMore.csv.tar.xz # T38DrugDB compounds that contain 3 0r more ECFP fragments├── ECFP_fragmentFilteredLibrary_4FragsOrMore.csv.tar.xz # T38DrugDB compounds that contain 4 0r more ECFP fragments├── ECFP_fragmentFilteredLibrary_5FragsOrMore.csv.tar.xz # T38DrugDB compounds that contain 5 0r more ECFP fragments├── ECFP_fragmentFilteredLibrary_6FragsOrMore.csv.tar.xz # T38DrugDB compounds that contain 6 0r more ECFP fragments├── ECFP_fragmentFilteredLibrary_7FragsOrMore.csv.tar.xz # T38DrugDB compounds that contain 7 0r more ECFP fragments├── ECFP_fragmentFilteredLibrary_8FragsOrMore.csv.tar.xz # T38DrugDB compounds that contain 8 0r more ECFP fragments├── ECFP_fragmentFilteredLibrary_9FragsOrMore.csv.tar.xz # T38DrugDB compounds that contain 9 0r more ECFP fragments├── ECFP_fragmentFilteredLibrary_10FragsOrMore.csv.tar.xz # T38DrugDB compounds that contain 10 0r more ECFP fragments└── BRICS_DB_BuiltMaxDepth_3.txt # Data base containing SMILES of all available BRICS assemblies with maxDepth set to 3
Citation
If you use these Fragment libraries, please cite:
```
Luis Vollmers, Shu-Yu Chen, Martin Zacharias. In Silico Analysis of Potential Stabilizer Binding Sites at Protein–RNA Interfaces. Comput Struct Biotechnol J. 2026;35:0016.DOI:10.34133/csbj.0016
```
License
This work is licensed under a Creative Commons Attribution 4.0 International License. See creativecommons.org/licenses/
by/4.0/ for further information.
Contact
For questions, issues, or contributions:
- luis.vollmers@tum.de
- zacharias@tum.de
- Publication Link: https://doi.org/10.34133/csbj.0016
Files
BRICS_DB_BuiltMaxDepth_3.txt
Files
(978.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:4d53232e6e0357f7cfd4b27f00b1da0a
|
355.6 MB | Preview Download |
|
md5:4eb759045c9e32d3b1104b93c4d288ad
|
7.7 kB | Download |
|
md5:a3044d73766019cf307875c3912fe1ec
|
238.5 MB | Download |
|
md5:b89a6ff15e7421e1e9687d52dca578ee
|
190.6 MB | Download |
|
md5:9b58665803941d34f5c1a6bbbd7dc235
|
120.0 MB | Download |
|
md5:ad3593e17f72d411860dc4d1605ec879
|
53.9 MB | Download |
|
md5:7cba351a9aa0a1405267e7236a542d47
|
16.5 MB | Download |
|
md5:83c4b735fa0e7b1bca02cbc54c52add9
|
3.3 MB | Download |
|
md5:abf6cc00d5c72a0fed8ca395f1894af9
|
480.4 kB | Download |
|
md5:621ebce1fad3e17bfbbff111fdc3aae3
|
68.4 kB | Download |