Published June 3, 2026 | Version v0.1.3
Software Open

ProteinTensor: AI-Native Biomolecular Tensor Storage for Structural Biology ML

Authors/Creators

Description

ProteinTensor is a Python library and file format (.ptt) that eliminates redundant preprocessing in structural biology machine learning pipelines. By converting mmCIF/PDB structures once into a Zarr-backed, LZ4-compressed, memory-mappable store, ProteinTensor provides zero-parse access to atomic coordinates, backbone geometry, covalent bond graphs, MSA tokens, pairwise distance features, and protein language model embeddings. Benchmarked on proteins from 76 to 3,525 residues, full feature assembly is 34x faster on average than traditional mmCIF-based pipelines.

Notes

If you use ProteinTensor in your research, please cite it as below.

Files

mooreneural/HelixDB-v0.1.3.zip

Files (70.0 kB)

Name Size Download all
md5:5d86da98cf11b5189e3118c555488a18
70.0 kB Preview Download

Additional details

Related works