Ruegeria pomeroyi digital microbe databases

Veseli, Iva; Cooper, Zac

doi:10.5281/zenodo.7888446

Published May 1, 2023 | Version 04

Dataset Open

Ruegeria pomeroyi digital microbe databases

1. University of Chicago
2. University of Georgia

These databases consolidate a variety of datasets related to the model organism Ruegeria pomeroyi DSS-3. The data were primarily generated by members of the Moran Lab at the University of Georgia, and put together in this format using anvi'o v7.1-dev through the collaborative efforts of Zac Cooper, Sam Miller, and Iva Veseli (special thanks to Christa Smith and Lidimarie Trujillo Rodriguez for their help with gene annotations). The data includes:

- (R_POM_DSS3-contigs.db) the complete genome and megaplasmid sequence of R. pomeroyi, along with highly-curated gene annotations established by the Moran Lab and automatically-generated annotations from NCBI COGs, KEGG KOfam/BRITE, Pfams, and anvi'o single-copy core gene sets. It also contains annotations for the Moran Lab's TnSeq mutant library.

- (PROFILE-VER_01.db) read-mapping data from multiple transcriptome and metatranscriptome samples generated by the Moran lab to the R. pomeroyi genome. Some coverage data is stored in the AUXILIARY-DATA.db file. This data can be visualized using anvi-interactive. Publicly-available samples are labeled with their SRA accession number.

- (DEFAULT-EVERYTHING.db) gene-level coverage data from the transcriptome and meta-transcriptomes samples stored in the profile database, as well as per-gene normalized spectral abundance counts from proteomes matched to a subset of the transcriptomes. This data can also be visualized using anvi-interactive (see instructions below). The proteome data layers are labeled according to their matching transcriptome samples.

- (R_pom_reproducible_workflow.md) a reproducible workflow describing how the databases were generated.

Instructions for visualizing the genes database in the anvi'o interactive interface: Anvi'o expects genes databases to be located in a folder called `GENES`, so in order to use the specific database included in this datapack, you must move it to the expected location by running the following commands in your terminal:

mkdir GENES
mv DEFAULT-EVERYTHING.db GENES/

Once that is done, you can use the following command to visualize the gene-level information:

anvi-interactive -c R_POM_DSS3-contigs.db -p PROFILE-VER_01.db -C DEFAULT -b EVERYTHING --gene-mode

To view only the proteomic data and its matched transcriptomes, you can add the flag `--state-autoload proteomes` to the above command.

Files

R_pom_reproducible_workflow.md

Files (1.1 GB)

Name	Size	Download all
AUXILIARY-DATA.db md5:7e6f0effe639013b72d978f886144ad7	400.7 MB	Download
DEFAULT-EVERYTHING.db md5:8ff882936f10205fa3c8b7c3d16cd70d	520.7 MB	Download
PROFILE-VER_01.db md5:c37aa5160ac55fdaa826c81919ff35ae	135.2 MB	Download
R_POM_DSS3-contigs.db md5:dffe9dd562e4a4350bf68f6ec79920df	10.5 MB	Download
R_pom_reproducible_workflow.md md5:641034fe37b01ae543f1c2e2d8873320	36.4 kB	Preview Download

	All versions	This version
Views	1,412	217
Downloads	1,023	385
Data volume	236.9 GB	84.7 GB

Ruegeria pomeroyi digital microbe databases

Authors/Creators

Description

Files

R_pom_reproducible_workflow.md

Files (1.1 GB)