Published August 30, 2027 | Version v1
Dataset Open

CuratedMetagenomicData Query for Enterocloster spp. and ucd Genes

  • 1. Department of Pharmacology and Therapeutics, McGill University

Description

R code (.R), RData (.RData), and .csv files related to the analysis of curatedMetagenomicData datasets (https://waldronlab.io/curatedMetagenomicData/). The purpose of this analysis was to assess the prevalence of Enterocloster spp. capable of metabolizing urolithin C in human stool metagenomic sequencing experiments (https://waldronlab.io/curatedMetagenomicData/articles/available-studies.html). 

The data were generated/analyzed in the following way:

1 - 1_ExperimentHub_Local_Cache_Export.R was used to generate .RData files for all 93 studies are available in curatedMetagenomicData. The resulting files (.RData) generated are named according to the following convention: Studies##-##.RData.

2 - The Studies##-##.RData files were loaded into an R environment in the Narval compute cluster (Digital Research Alliance of Canada) and the 2_cMD_FromRData.R script was used to query the relative abundance (ra) of bacteria of interest (boi) and the gene families abundance (gf) for genes of interest (goi). Metadata (md) was also extracted from each study. The resulting files (.csv) generated are named according to the following convention: Study##_datatype.csv (datatype = ra_boi, gf_goi, md).

3 - The Study##_datatype.csv (datatype = ra_boi, gf_goi, md) files were then analyzed using the 3_cMD_Merge_Outputs_And_Graph.R script to merge individual outputs into a single dataframe to then graph the prevalence results. The .RData generated throughout the running of this script are provided as cMD_Merge_Outputs_And_Graph.RData

Dependencies for each R script are provided at the start of each script. To reproduce the analysis, file paths must be changed.

The cMD_study_summary.csv file contains metadata related to each study (86 total) included in the analysis.

The cMD_metadata_ra_boi_gf_goi_stool.csv file contains a merged table of sample metadata (n = 21030 samples), relative abundance (ra) for bacteria of interest (boi), gene family (gf) abundance for genes of interest (goi). 

Files

cMD_metadata_ra_boi_gf_goi_stool.csv

Files (20.0 MB)

Name Size Download all
md5:7b747719c56842f7de9265d8e5713dd9
13.1 kB Download
md5:9119fec6a565c4493630469e298b2f39
8.1 kB Download
md5:a3786945cf82df4ccb2fdfd482f61da8
24.8 kB Download
md5:bf98cf2d687588663d4c6b458e726d08
6.2 MB Download
md5:1e18b2c8b5bfb5e8e2bcfdcacad5231c
13.8 MB Preview Download
md5:94037bf275f2a058588ac84ce62f6e55
8.1 kB Preview Download

Additional details

Dates

Submitted
2024-02-07

Software

Programming language
R