Published June 1, 2022 | Version 1.1
Dataset Open

Sequence-structure-function relationships in the microbial protein universe

  • 1. Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY
  • 2. Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
  • 3. Broad Institute, Cambridge, MA, USA
  • 4. Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
  • 5. Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA 92093, USA
  • 6. Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA

Description

The Microbiome Immunity Project (MIP) dataset contains models predicted with both Rosetta and DMPFold (folder `dataset/`). It also contains DeepFRI function predictions for all models. 

The `metadata` folder contains additional data which may be useful for searching the MIP database (FASTA files, BLAST databases and useful scripts for structure/function search) as well as retrieving the sequence/structural annotations.

The `intermediate_data` folder contains preprocessed output for reproducing many of the figures in our manuscript in conjunction with scripts and Juypter notebooks found in our git repository: https://github.com/microbiome-immunity-project/protein_universe .

More information about the dataset and associated metadata is provided in the `README.md` file).

We are also providing workflows to search the MIP database against a protein sequence or structure or function of interest (see `SEARCHING.md` for more details).

Files

microbiome_immunity_project_dataset.zip

Files (45.4 GB)

Name Size Download all
md5:b3e021609ffa052d2ab2333dc998964b
45.4 GB Preview Download
md5:2bd4a82bb4b12190ecd2b584ca1b745e
9.3 kB Preview Download