The honey bee gut microbiota genomic database

Ellegaard, Kirsten; Engel, Philipp

doi:10.5281/zenodo.4661061

Published April 3, 2021 | Version 1.0.0

Dataset Open

The honey bee gut microbiota genomic database

1. University of Lausanne

This data repository contains the latest version of the "honey bee gut microbiota genomic database", and two example data-sets, which can be used to run the pipelines:

Community_profiling (Github)
Species_validation (Github)

For previous publications using these pipelines, see 10.1038/s41467-019-08303-0, 10.1016/j.cub.2020.04.070

The genomic database contains data from a total of 198 bacterial genomes, as detailed in the database metafile (see file descriptions here below). It has been tested on the Western and Eastern honey bee (Apis mellifera, Apis cerana), for which it has been shown to recruit about 90% of all the reads in most metagenomic samples (excluding host-derived reads). The database also contains genomes derived from other bee species, such as bumble bees, but it has not been tested with metagenomic data for these bee species yet. Most species in the database are represented by multiple genomes, but still with a maximum of 98.5% gANI (genomic average nucleotide identity) between genomes. Thus, several published genomes isolated from social bees are not included.

FILE DESCRIPTIONS

genome_db_metafile_210402.txt

Plain text-file with identifiers for genomes in the database.

Tab1 contains locus-tags (derived from the gene-ids of the annotation files), which are used as main identifiers for the genomes in all database files.
Tab2 contains the genome phylotype-affiliation (> 97% 16S rRNA identity).
Tab3 contains the genome SDP-affiliation ("Sequence-discrete populations"), as determined with genomic and metagenomic data (largely corresponding to the currently named species of the honey bee gut microbiota).
Tab 4 indicates whether the genome was chosen as reference for plotting core gene-family coverage (Community profiling pipeline).
Tab 5 contains accession numbers of genomes in public repositories (Genbank Assembly accession/IMG accession).

genome_db_210402.tar.gz

Contains the genomic database, with all files required for running the "Community profiling" pipeline:

"genome_db_210402": fasta-file with genome sequences of bacteria included in the database. For draft genomes (the majority), the contigs have been concatenated into a single contig per genome, to facilitate downstream processing with bioinformatic pipelines.
"genome_db_metafile_210402.txt": meta-data for genomes, see detailed description here above
"faa_files": directory containing the amino-acid sequences of genes for all genomes
"ffn_files": directory containing the nucleotide sequences of genes for all genomes
"bed_files": directory containing bed-files, specifying the location of genes on the concatenated contigs.
"gff_files": directory containing gff-files with annotations of genes for all genomes
"Orthofinder": directory containing files with filtered single-copy orthologous gene-families (estimated with "Orthofinder"), for quantifying the abundance of community members based on core gene family coverage.

species_validation.tar.gz

Example data-set for running the "Species_validation" pipeline. Contains nucleotide sequences of ORFs (open reading-frames) predicted on two assembled metagenomes derived from the gut microbiota of Apis mellifera (ORFs denoted "AmAi03" in the fasta-headers) and Apis cerana (denoted "AcCh03" in the fasta-headers). The samples were previously analyzed in https://doi.org/10.1016/j.cub.2020.04.070 as part of a much larger data-set.

Additionally, it contains amino-acid and nucleotide sequences of genomes included in the honey bee gut microbiota genomic database, which are required for generating the core gene alignments used in the validation.

metagenomic_reads.fastq.tar.gz

Example data-set for running the "Community profiling" pipeline. Contains metagenomic reads from two samples derived from the gut microbiota of Apis mellifera, previously analyzed in https://doi.org/10.1016/j.cub.2020.04.070 as part of a much larger data-set. To further reduce the file-size, the data were subset for reads mapping to the phylotype Lactobacillus Firm5. Complete data is publicly available on NCBI: PRJNA598094

Files

genome_db_metafile_210402.txt

Files (1.7 GB)

Name	Size
genome_db_210402.tar.gz md5:9fa43b1bfa981115409bada6f52d58e1	368.9 MB	Download
genome_db_metafile_210402.txt md5:071ec5e9cf7f444c049de87549a9c960	8.9 kB	Preview Download
metagenomic_reads.fastq.tar.gz md5:ff640807b464f73c6e9a2a8a6e574f48	1.1 GB	Download
species_validation.tar.gz md5:368d352604c0539ab31564851fe38970	218.8 MB	Download

Additional details

Cites: Journal article: 10.1038/s41467-019-08303-0 (DOI); Journal article: 10.1016/j.cub.2020.04.070 (DOI)

European Commission
MicroBeeOme - Evolution of the honey bee gut microbiome through bacterial diversification 714804

	All versions	This version
Views	759	758
Downloads	436	436
Data volume	180.7 GB	180.7 GB

genome_db_metafile_210402.txt

Files (1.7 GB)

Related works

Funding

The honey bee gut microbiota genomic database

Authors/Creators

Description

Files

genome_db_metafile_210402.txt

Files (1.7 GB)

Additional details

Related works

Funding