Published November 14, 2023 | Version v2
Software Open

ConservFold: Conservation to 3D structure generator

Description

ConservFold: Conservation to 3D structure generator

This colab workspace allows for automatic conservation analysis, in the absence of the updated Weblogo and Consurf servers. Check run all and download the weblogo.png and .txt files to understand the most conserved residues. It then also adds this to the b-factor of an alphafold generated model, to show this in 3D.

It uses this gitlab - https://github.com/WebLogo/weblogo as well as Colabfold servers to generate the a3m files for alignment.

This workbook is based greatly off the Alphafold colab notebook here: https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb#scrollTo=G4yBrceuFbf3 and published here:

Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: Making protein folding accessible to all. Nature Methods, 2022

To cite this simple notebook that combines these elements please use the doi attached. 

Troubleshooting

  • Check that the runtime type is set to GPU at "Runtime" -> "Change runtime type".
  • Try to restart the session "Runtime" -> "Factory reset runtime".
  • Check your input sequence.

Known issues

  • Google Colab assigns different types of GPUs with varying amount of memory. Some might not have enough memory to predict the structure for a long sequence.
  • Your browser can block the pop-up for downloading the result file. You can click on the little folder icon to the left, navigate to file: jobname.result.zip, right-click and select "Download" (see screenshot).

Limitations

  • Computing resources: Our MMseqs2 API can handle ~20-50k requests per day.
  • MSAs: MMseqs2 is very precise and sensitive but might find less hits compared to HHblits/HMMer searched against BFD or MGnify. *One must run the pipeline in order, due to dependencies.

Description of the plots

  • Number of sequences per position - We want to see at least 30 sequences per position, for best performance, ideally 100 sequences.
  • Predicted lDDT per position - model confidence (out of 100) at each position. The higher the better.
  • Predicted Alignment Error - For homooligomers, this could be a useful metric to assess how confident the model is about the interface. The lower the better.

Bugs

  • If you encounter any bugs, please report the issue to (https://rawgithubusercontent.com/sokrypton/ColabFold/main/LICENSE)

License

The source code of ColabFold is licensed under MIT. Additionally, this notebook uses the AlphaFold2 source code and its parameters licensed under Apache 2.0 and CC BY 4.0 respectively. Read more about the AlphaFold license here.

Read more about weblogo and its license here: https://github.com/WebLogo/weblogo

Acknowledgments

We thank the AlphaFold team for developing an excellent model and open sourcing the software.

Phillip Stansfeld and Chris LB Graham for attaching weblogo as well as the conservation to pdb b factor addition.

KOBIC and Söding Lab for providing the computational resources for the MMseqs2 MSA server. Which this conservation server relies upon.

Inspired by a colab by Sergey Ovchinnikov (@sokrypton), Milot Mirdita (@milot_mirdita), Martin Steinegger (@thesteinegger),

Files

ConservFold.ipynb

Files (2.5 MB)

Name Size Download all
md5:e08a45f6ebfec9319c4d339fb1f25f92
2.5 MB Preview Download

Additional details