Protein Structural Motif Discovery at AlphaFold Scale
Authors/Creators
Description
Preprint: ET-miner discovers which combinations of protein features (Pfam domains, Gene Ontology terms, structural properties) co-occur across the AlphaFold protein universe. Using GPU-resident frequent itemset mining, ET-miner exhaustively mines 76.9 million multi-feature proteins in 7.3 minutes on a single NVIDIA H100, revealing 26.8 million co-occurrence patterns up to K=22 features deep. The deepest pattern describes a neuronal antiviral RNA helicase sentinel shared by exactly 8 proteins. This work demonstrates that exact Apriori computation at 100-million-transaction scale is practical on current-generation datacenter GPUs.
Files
et_miner_proteome.pdf
Files
(386.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:56f9d67e69acf4a9a2c8d720f08d0462
|
386.2 kB | Preview Download |