Published May 23, 2026 | Version v1
Working paper Open

MolCore: A Rust-Accelerated AI-Native Cheminformatics Substrate for Scalable Molecular Intelligence

Authors/Creators

Description

Modern cheminformatics systems are increasingly constrained by the architectural mismatch between legacy molecular toolchains and the computational demands of contemporary artificial intelligence. This paper introduces MolCore, a Rust-accelerated, AI-native cheminformatics substrate designed to unify high-performance molecular computation, graph representation learning, and scalable interoperability with modern deep learning frameworks. MolCore integrates immutable graph-based molecular structures, parallelized Extended-Connectivity Fingerprint generation, host-side zero-copy tensor interoperability, and deterministic preprocessing into a cohesive execution engine.

By replacing Python-bound graph conversion loops with native Rust multithreading and memory-contiguous tensor arrays, MolCore reduces data-loading bottlenecks in molecular machine learning workflows. We evaluate MolCore against standard cheminformatics baselines on one million molecules sampled from PCQM4Mv2, using ablation studies that isolate the effects of Rust execution, multiprocessing, memory-copy elimination, and PyO3-backed tensor transfer. In our benchmark configuration, MolCore demonstrates a 14.8× throughput improvement over optimized Python multiprocessing for parallelized ECFP4 generation, alongside a substantially reduced peak memory footprint, and a 68× speedup over serial PyTorch Geometric graph extraction.

 

Files

molcore_chem.pdf

Files (235.2 kB)

Name Size Download all
md5:ce0e76365a223e83021744e6d93caf5a
235.2 kB Preview Download

Additional details

Software

Repository URL
https://github.com/Anteneh-T-Tessema/molcore
Programming language
Python
Development Status
Active