Published May 30, 2024 | Version chopin2-1.0.8
Dataset Open

Feature selection on microbial profiles of CRC samples with chopin2 (powered by hdlib)

  • 1. Center for Computational Life Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, USA
  • 2. Department of Engineering, Uninettuno University, Rome, Italy

Description

This Zenodo entry contains the result of the feature selection algorithm implemented through a backward variable elimination strategy in chopin2 (powered by hdlib) applied on MetaPhlAn3 microbial profiles of a public dataset of metagenomic stool samples collected from patients affected by the colorectal cancer (CRC) as well as from healthy individuals.

Microbial profiles have been extracted through the curatedMetagenomicData package for R under the IDs ThomasAM_2018a, ThomasAM_2018b, and ThomasAM_2019_a.

The feature selection algorithm is implemented as a backward variable elimination method, and it makes use of the vector-symbolic architecture described in Cumbo F 2020.

Deposited data is described below:

  • datasets.tar.gz: it contains the datasets used as input of chopin2 as the result of merging the three datasets with relative abundances mentioned above, also stratified by age and sex (with prefix RA). The same datasets have been also binarized (with prefix BIN);
  • hd-models.tar.gz: it contains the output of the feature selection performed with chopin2 (powered by hdlib) on the datasets with both relative abundance and binary profiles (RA and BIN);
  • ml-models.tar.gz: it contains the result of the feature selection produced with classical wrapper-based techniques (i.e., Random Forest, Decision Tree, Support Vector Machine, Logistic Regression, and Extreme Gradient Boosting) in addition to a Python 3.8 script to reproduce the results.

Please note that the datasets RA__ThomasAM__species.csv and BIN__ThomasAM__species.csv are also included into the datasets.tar.gz archive.

Files

BIN__ThomasAM__species.csv

Files (1.4 MB)

Name Size Download all
md5:1402c25e968381d789d9a6639f131abc
198.3 kB Preview Download
md5:ff0d14c6866c5f1ae24b02397159d0a7
326.4 kB Download
md5:bdc15ddea4d13970d7c2f0ccec00476c
25.2 kB Download
md5:2c6c6e9a87da7348b56639b497d117f1
558.3 kB Download
md5:4546e53af85c0b76a1e563b547dcbd81
260.7 kB Preview Download

Additional details

Related works

References
Publication: 10.3390/a13090233 (DOI)
Publication: 10.21105/joss.05704 (DOI)
Software: https://github.com/cumbof/chopin2 (URL)
Software: https://github.com/cumbof/hdlib (URL)

Software

Repository URL
https://github.com/cumbof/chopin2
Programming language
Python
Development Status
Active

References

  • Cumbo, Fabio at al. (2020). "A brain-inspired hyperdimensional computing approach for classifying massive DNA methylation data of cancer". MDPI Algorithms 2020