There is a newer version of the record available.

Published April 6, 2023 | Version chopin2-1.0.6
Dataset Open

Feature selection on microbial profiles of CRC samples with chopin2

  • 1. Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, USA
  • 2. Department of Engineering, Uninettuno University, Rome, Italy

Description

This Zenodo entry contains the result of the feature selection algorithm implemented through a backward variable elimination strategy and in chopin2 applied on MetaPhlAn3 microbial profiles of a public dataset of metagenomic stool samples collected from patients affected by the colorectal cancer (CRC) as well as from healthy individuals.

Microbial profiles have been extracted through the curatedMetagenomicData package for R under the IDs ThomasAM_2018a, ThomasAM_2018b, and ThomasAM_2019_a.

The feature selection algorithm is implemented as a backward variable elimination method, and it makes use of the vector-symbolic architecture described in Cumbo F 2020.

Deposited data is described below:

  • datasets.tar.gz: it contains the datasets used as input of chopin2 as the result of merging the three datasets with relative abundances mentioned above, also stratified by age and sex (with prefix RA). The same datasets have been also binarized (with prefix BIN);
  • hd-models.tar.gz: it contains the output of the feature selection performed with chopin2 on the datasets with both relative abundance and binary profiles (RA and BIN);
  • ml-models.tar.gz: it contains the result of the feature selection produced with classical wrapper-based techniques (i.e., Random Forest, Decision Tree, Support Vector Machine, Logistic Regression, and Extreme Gradient Boosting) in addition to a Python 3.8 script to reproduce the results.

Please note that the datasets RA__ThomasAM__species.csv and BIN__ThomasAM__species.csv are also included into the datasets.tar.gz archive.

Files

BIN__ThomasAM__species.csv

Files (16.3 MB)

Name Size Download all
md5:1402c25e968381d789d9a6639f131abc
198.3 kB Preview Download
md5:e632a3f4782f46daae7cf33b249c826d
325.1 kB Download
md5:18bc74af3ae5f4458d3e6d5cf7528286
14.9 MB Download
md5:c0db7ea44c2af19605b37457aa5a0d9a
562.6 kB Download
md5:4546e53af85c0b76a1e563b547dcbd81
260.7 kB Preview Download

Additional details

References

  • Cumbo, Fabio at al. (2020). "A brain-inspired hyperdimensional computing approach for classifying massive DNA methylation data of cancer". MDPI Algorithms 2020