Published February 17, 2026 | Version v1
Preprint Open

Protein Structural Motif Discovery at AlphaFold Scale

Authors/Creators

Description

Preprint: ET-miner discovers which combinations of protein features (Pfam domains, Gene Ontology terms, structural properties) co-occur across the AlphaFold protein universe. Using GPU-resident frequent itemset mining, ET-miner exhaustively mines 76.9 million multi-feature proteins in 7.3 minutes on a single NVIDIA H100, revealing 26.8 million co-occurrence patterns up to K=22 features deep. The deepest pattern describes a neuronal antiviral RNA helicase sentinel shared by exactly 8 proteins. This work demonstrates that exact Apriori computation at 100-million-transaction scale is practical on current-generation datacenter GPUs.

Files

et_miner_proteome.pdf

Files (386.2 kB)

Name Size Download all
md5:56f9d67e69acf4a9a2c8d720f08d0462
386.2 kB Preview Download

Additional details

Software

Programming language
Python , Cuda , Rust
Development Status
Active