Datasets for "Micromonosporaceae Biosynthetic Gene Cluster Diversity Highlights the Need for Broad Spectrum Investigation"
Description
In this data collection is:
Data S1: A folder with all the fasta files, representing the 42 strains (41 Micromonosporaceae, 1 Streptomycetaceae).
Data S2: A folder with all the .gbk files for the BGC regions predicted by antiSMASH v5.1.1. These files were used as inputs for BiG-SCAPE and BiG-SLiCE.
Data S3: A folder with all the .gbk files for the BGC regions predicted by antiSMASH v6.1.2.
Data S4: A folder containing all the Quast outputs for the 42 strains.
Data S5: A folder containing all the BUSCO outputs for the 42 strains. Example scripts are provided for scraping relevant information from the individual BUSCO outputs.
Data S6: A folder containing GTDB (Genome Taxonomy Database) classification results, and species-level grouping results using FastANI (95% cutoff).
Data S7: A folder containing an Interactive Tree of Life (iTOL)-compatible bar chart annotation using antiSMASH v5.1.1 BGC region information.
Data S8: A folder containing a word document that describes the parameters used with Ubuntu WSL (Windows Subsystem for Linux) on the command line for programs antiSMASH v6.1.2, BiG-SCAPE v1.1.2, and BiG-SLiCE v1.1.1. Also included are parameters for MDSC in python. An example script is also provided for batch queries of BGCs against BiG-SLiCE v1.1.1’s pre-processed dataset of ~1.2 million BGCs.
Data S9: A folder containing the BiG-SCAPE visualization of the 38 Micromonosporaceae (post-QC filtering, excluding WMMA1363, WMMB482, WMMB486, and WMMC500) in Cytoscape.
Data S10: A folder containing:
The pre-processed dataset of 1.2 million BGCs from BiG-SLiCE.
All report folders generated by BiG-SLiCE for the 779 Micromonosporaceae BGCs queried against the 1.2 million BGCs.
The results data.db and associated folders for the pre-processed dataset of 1.2 million BGCs.
Data S11: A folder containing the scripts necessary to regenerate the figures and perform independent analyses, and the relevant data used for the analyses.
Files
DataS1-fastaFiles.zip
Files
(20.5 GB)
Name | Size | Download all |
---|---|---|
md5:b23d02113eb70ebc5b8698ac4b80f784
|
77.3 MB | Preview Download |
md5:f0f726d7bf69112706d54b394415afa6
|
18.3 GB | Preview Download |
md5:37b6f0b5d4e4e93d45a4d5de1ebb7542
|
41.5 MB | Preview Download |
md5:e8c0a41b73ed51a8c2adbb65cffaacef
|
382.7 MB | Preview Download |
md5:9b32ac3e44d5e33b17c0caa8db340c59
|
1.3 GB | Preview Download |
md5:922241a67acca778f94ee1eb421fdd2d
|
11.4 MB | Preview Download |
md5:dba07d0bc631895de69bca17560bfcc0
|
328.9 MB | Preview Download |
md5:386eec30703b623ae9c4955480ac0278
|
51.6 kB | Preview Download |
md5:07f56ebc4e1bb0df12f6b6edd5f147f0
|
3.0 kB | Preview Download |
md5:d796dc29e2ba7941835ba7a37a6727d6
|
11.5 kB | Preview Download |
md5:40d400d9793d525076dc446b19fcd70c
|
485.2 kB | Preview Download |