Published January 4, 2024 | Version v3
Dataset Open

Datasets for "Micromonosporaceae Biosynthetic Gene Cluster Diversity Highlights the Need for Broad Spectrum Investigation"

Description

In this data collection is:
Data S1: A folder with all the fasta files, representing the 42 strains (41 Micromonosporaceae, 1 Streptomycetaceae).
Data S2: A folder with all the .gbk files for the BGC regions predicted by antiSMASH v5.1.1. These files were used as inputs for BiG-SCAPE and BiG-SLiCE.
Data S3: A folder with all the .gbk files for the BGC regions predicted by antiSMASH v6.1.0.
Data S4: A folder containing all the Quast outputs for the 42 strains.
Data S5: A folder containing all the BUSCO outputs for the 42 strains. Example scripts are provided for scraping relevant information from the individual BUSCO outputs.
Data S6: A folder containing GTDB (Genome Taxonomy Database) classification results, and species-level grouping results using FastANI (95% cutoff).
Data S7: A folder containing an Interactive Tree of Life (iTOL)-compatible bar chart annotation using antiSMASH v5.1.1 BGC region information.
Data S8: A folder containing a word document that describes the parameters used with Ubuntu WSL (Windows Subsystem for Linux) on the command line for programs antiSMASH v6.1.2, BiG-SCAPE v1.1.2, and BiG-SLiCE v1.1.1. Also included are parameters for MDSC in python. An example script is also provided for batch queries of BGCs against BiG-SLiCE v1.1.1’s pre-processed dataset of ~1.2 million BGCs.
Data S9: A folder containing the BiG-SCAPE visualization of the 38 Micromonosporaceae (post-QC filtering, excluding WMMA1363, WMMB482, WMMB486, and WMMC500) in Cytoscape.
Data S10: A folder containing:
The pre-processed dataset of 1.2 million BGCs from BiG-SLiCE.
All report folders generated by BiG-SLiCE for the 779 Micromonosporaceae BGCs queried against the 1.2 million BGCs.
The results data.db and associated folders for the pre-processed dataset of 1.2 million BGCs.
Data S11: A folder containing the scripts necessary to regenerate the figures and perform independent analyses, and the relevant data used for the analyses.
Data S12: A folder containing the results of the nucleotide blast of WMMA1947.region12's siderophore contig against WMMD1120.region14's siderophore contig.

Supplementary Information: Supplementary Table S1 and Supplementary Figures S1-S181.

Files

Alas, Bugni et. al. SI doc.pdf

Files (11.0 MB)

Name Size Download all
md5:a3e448e5848ece9d811238be18f10902
11.0 MB Preview Download