Feature selection on microbial profiles of CRC samples with chopin2
- 1. Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, USA
- 2. Department of Engineering, Uninettuno University, Rome, Italy
Description
This Zenodo entry contains the result of the feature selection algorithm implemented through a backward variable elimination strategy and in chopin2 applied on MetaPhlAn3 microbial profiles of a public dataset of metagenomic stool samples collected from patients affected by the colorectal cancer (CRC) as well as from healthy individuals.
Microbial profiles have been extracted through the curatedMetagenomicData package for R under the IDs ThomasAM_2018a, ThomasAM_2018b, and ThomasAM_2019_a.
The feature selection algorithm is implemented as a backward variable elimination method, and it makes use of the vector-symbolic architecture described in Cumbo F 2020.
Deposited data is described below:
- datasets.tar.gz: it contains the datasets used as input of chopin2 as the result of merging the three datasets with relative abundances mentioned above, also stratified by age and sex (with prefix RA). The same datasets have been also binarized (with prefix BIN);
- hd-models.tar.gz: it contains the output of the feature selection performed with chopin2 on the datasets with both relative abundance and binary profiles (RA and BIN);
- ml-models.tar.gz: it contains the result of the feature selection produced with classical wrapper-based techniques (i.e., Random Forest, Decision Tree, Support Vector Machine, Logistic Regression, and Extreme Gradient Boosting) in addition to a Python 3.8 script to reproduce the results.
Please note that the datasets RA__ThomasAM__species.csv and BIN__ThomasAM__species.csv are also included into the datasets.tar.gz archive.
Files
BIN__ThomasAM__species.csv
Files
(16.3 MB)
Name | Size | Download all |
---|---|---|
md5:1402c25e968381d789d9a6639f131abc
|
198.3 kB | Preview Download |
md5:e632a3f4782f46daae7cf33b249c826d
|
325.1 kB | Download |
md5:18bc74af3ae5f4458d3e6d5cf7528286
|
14.9 MB | Download |
md5:c0db7ea44c2af19605b37457aa5a0d9a
|
562.6 kB | Download |
md5:4546e53af85c0b76a1e563b547dcbd81
|
260.7 kB | Preview Download |
Additional details
References
- Cumbo, Fabio at al. (2020). "A brain-inspired hyperdimensional computing approach for classifying massive DNA methylation data of cancer". MDPI Algorithms 2020