Published June 21, 2026 | Version v0.2.0

ProteinTensor: AI-Native Biomolecular Tensor Storage for Structural Biology ML

Authors/Creators

Description

ProteinTensor is a Python library and file format (.ptt) that eliminates redundant preprocessing in structural biology machine learning pipelines. It converts mmCIF/PDB structures - or raw protein sequences - once into a Zarr-backed, LZ4-compressed, memory-mappable store, providing zero-parse access to atomic coordinates, backbone geometry, covalent bond graphs, MSA tokens, pairwise distance features, and protein language model embeddings. Sequence-only entries serve as direct input to AlphaFold- and Boltz-style predictors. Round-trip conversion is lossless, and structure loading is benchmarked at 2-95x faster than mmCIF parsing across proteins from 74 to 3,525 residues.

Notes

  • If you use ProteinTensor in your research, please cite it as below.

Files

mooreneural/HelixDB-v0.2.0.zip

Files (83.8 kB)

Name Size Download all
md5:3314b4860fdb174bcf642782cc4ae12d
83.8 kB Preview Download

Additional details

Related works