Example Workflows and Datasets for molli: Molecular Library Toolkit
Creators
Description
About
Molli is a general purpose molecule library generation and handling package written in Python 3.10. This repository is meant to provide supplementary data to support the claims of the publication. We provide code examples for common workflows to showcase molli's functionality, as well as several example datasets.
These code examples require to have the following:
- working Python >=3.10 environment
- molli >=1.0.0 installation (see the repository info below)
- common python packages that can be installed from PyPI or conda-forge
- pandas
- RDKit
- OpenBabel
- (optional) quantum electronic structure software to reproduce some of the workflows de-novo:
- ORCA
- CREST, XTB
molli package can be found here: https://github.com/SEDenmarkLab/molli. Documentation is available here: https://molli.readthedocs.io/en/latest/
Description of the contents:
00-libraries/
Contains files in .mlib and .clib format
the binary format developed for storing molecular and conformer libraries, respectively.
These files are the results of the workflows shown below (or required to run them.)
01-geom-molnet-dataset/
The workflow for reimporting the molecule-net subset of GEOM dataset as .clib file
see https://www.nature.com/articles/s41597-022-01288-4
02-geom-crude-dataset.py/
The workflow for reimporting the crude subset of GEOM dataset as .clib file
see https://www.nature.com/articles/s41597-022-01288-4
03-combinatorial-library/
The workflow for the generation of demo phosphine libraries. Using a simple ChemDraw(TM)
.CDXML file, generate two libraries of chiral and achiral phosphines.
04-collection-comparison/
Scripts required to reproduce the benchmark comparison of storage efficiency and
read times of the molli chemical collections (.zip files and .clib/.mlib).
05-gbca-benchmark/
Scripts required to reproduce the timings pertinent to different levels of optimization
of grid-based conformer averaged descriptors.
06-gbca-calculation-vis/
Scripts required to reproduce the grid-based descriptor visualization using pyvista.
07-kras-inhibitor-rotation-barrier/
The workflow of creating a combinatorial library of KRAS inhibitors
with the purpose of calculating the rotational barrier energies and geometries.
08-workflow-nmr-prediction/
The workflow of creating a library of cladosporin diastereomers with the purpose
of CREST/XTB conformer generation, DFT geometry optimization and
DFT NMR calculation.
Files
molli-supplementary.zip
Files
(30.1 GB)
Name | Size | Download all |
---|---|---|
md5:e899666f7f25b7fab33bccde1a8de918
|
30.1 GB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/SEDenmarkLab/molli
- Programming language
- Python, Jupyter Notebook
- Development Status
- Wip