Published October 7, 2024 | Version 1
Dataset Open

ProtNote: a multimodal method for protein-function annotation

  • 1. ROR icon Microsoft (United States)
  • 2. ROR icon University of Washington
  • 3. Microsoft Research

Description

Understanding protein sequence-function relationships is essential for advancing protein biology and engineering. However, fewer than 1% of known protein sequences have human-verified functions, and scientists continually update the set of possible functions. While deep learning methods have demonstrated promise for protein function prediction, current models are limited to predicting only those functions on which they were trained. Here, we introduce ProtNote, a multimodal deep learning model that leverages free-form text to enable both supervised and zero-shot protein function prediction. ProtNote not only maintains near state-of-the-art performance for annotations in its train set, but also generalizes to unseen and novel functions in zero-shot test settings. We envision that ProtNote will enhance protein function discovery by enabling scientists to use free text inputs, without restriction to predefined labels – a necessary capability for navigating the dynamic landscape of protein biology.

Files

ablation_models.zip

Files (77.7 GB)

Name Size Download all
md5:87abb3d3b72a59d1520be6aa20dd36eb
41.4 GB Preview Download
md5:703141e8eb7a89c83190122f0adfde35
17.6 GB Preview Download
md5:da3cbe3e9d0a8367109a872e90017d06
18.7 GB Preview Download

Additional details

Software

Repository URL
https://github.com/microsoft/protnote
Programming language
Python