Published March 2, 2024 | Version v1.0
Workflow Open

Example Workflows and Datasets for molli: Molecular Library Toolkit

Description

About

Molli is a general purpose molecule library generation and handling package written in Python 3.10. This repository is meant to provide supplementary data to support the claims of the publication. We provide code examples for common workflows to showcase molli's functionality, as well as several example datasets.

These code examples require to have the following:

  1. working Python >=3.10 environment
  2. molli  >=1.0.0 installation (see the repository info below)
  3. common python packages that can be installed from PyPI or conda-forge
    1. pandas
    2. RDKit
    3. OpenBabel
  4. (optional) quantum electronic structure software to reproduce some of the workflows de-novo:
    1. ORCA
    2. CREST, XTB

molli package can be found here: https://github.com/SEDenmarkLab/molli. Documentation is available here: https://molli.readthedocs.io/en/latest/

Description of the contents:

00-libraries/
    Contains files in .mlib and .clib format
    the binary format developed for storing molecular and conformer libraries, respectively.
    These files are the results of the workflows shown below (or required to run them.)

01-geom-molnet-dataset/
    The workflow for reimporting the molecule-net subset of GEOM dataset as .clib file
    see https://www.nature.com/articles/s41597-022-01288-4

02-geom-crude-dataset.py/
    The workflow for reimporting the crude subset of GEOM dataset as .clib file
    see https://www.nature.com/articles/s41597-022-01288-4

03-combinatorial-library/
    The workflow for the generation of demo phosphine libraries. Using a simple ChemDraw(TM)
    .CDXML file, generate two libraries of chiral and achiral phosphines.

04-collection-comparison/
    Scripts required to reproduce the benchmark comparison of storage efficiency and
    read times of the molli chemical collections (.zip files and .clib/.mlib).

05-gbca-benchmark/
    Scripts required to reproduce the timings pertinent to different levels of optimization 
    of grid-based conformer averaged descriptors.

06-gbca-calculation-vis/
    Scripts required to reproduce the grid-based descriptor visualization using pyvista.
    
07-kras-inhibitor-rotation-barrier/
    The workflow of creating a combinatorial library of KRAS inhibitors
    with the purpose of calculating the rotational barrier energies and geometries.

08-workflow-nmr-prediction/
    The workflow of creating a library of cladosporin diastereomers with the purpose
    of CREST/XTB conformer generation, DFT geometry optimization and 
    DFT NMR calculation.

Files

molli-supplementary.zip

Files (30.1 GB)

Name Size Download all
md5:e899666f7f25b7fab33bccde1a8de918
30.1 GB Preview Download

Additional details

Software

Repository URL
https://github.com/SEDenmarkLab/molli
Programming language
Python, Jupyter Notebook
Development Status
Wip