Fast remote homology detection and structural alignment using deep learning
Description
Exploiting sequence-structure-function relationships in molecular biology and computational modeling relies on detecting proteins with high sequence similarities. However, the most commonly used sequence alignment-based methods, such as BLAST, frequently fail on proteins with low sequence similarity to previously annotated proteins. We developed two deep learning methods to address this gap, TM-Vec and DeepBLAST. TM-Vec allows searching for structure-structure similarities in large sequence databases. It is trained to accurately predict TM-scores as a metric of structural similarity directly from sequence pairs without the need for intermediate computation or solution of structures. For remote homologs (sequence similarity <10%) that are highly structurally similar (TM-score >0.6), we predict TM-scores within 0.026 of their value computed by TM-align. Once structurally similar proteins are identified, DeepBLAST can structurally align proteins using only sequence information by identifying structurally homologous regions between proteins. DeepBLAST is an end-to-end differentiable alignment algorithm. It outperforms traditional sequence alignment methods and performs similar to structure-based alignment methods. We show the merits of TM-vec and DeepBLAST on the CATH, SwissProt, Malidup, Malisam and the BAGEL datasets, showcasing its ability to quickly and accurately identify remotely homologous proteins better than state-of-the-art sequence alignment and structure prediction methods.
Files
Files
(3.2 GB)
Name | Size | Download all |
---|---|---|
md5:7c89ccc668a344b5edf888fc1ccc4152
|
3.2 GB | Download |