Usage and examples¶
The command line interface for neuropredict is strongly recommended (given its focus on batch processing multiple comparisons). If the installation was successful, options could be obtained by typing one of the following commands:
neuropredict
neuropredict -h
Those options are also shown below (may not always show up due to problems with software for auto generation of docs). Check the bottom of this page for examples.
Easy, standardized and comprehensive predictive analysis.
usage: neuropredict [-h] [-m META_FILE] [-o OUT_DIR] [-f FS_SUBJECT_DIR]
[-y PYRADIGM_PATHS [PYRADIGM_PATHS ...]]
[-u USER_FEATURE_PATHS [USER_FEATURE_PATHS ...]]
[-d DATA_MATRIX_PATHS [DATA_MATRIX_PATHS ...]]
[-a ARFF_PATHS [ARFF_PATHS ...]] [-p POSITIVE_CLASS]
[-t TRAIN_PERC] [-n NUM_REP_CV]
[-k NUM_FEATURES_TO_SELECT]
[-s [SUB_GROUPS [SUB_GROUPS ...]]]
[-g {none,light,exhaustive}]
[-e {randomforestclassifier,extratreesclassifier}]
[-z MAKE_VIS] [-c NUM_PROCS] [-v]
Named Arguments¶
-m, --meta_file | |
Abs path to file containing metadata for subjects to be included for analysis. At the minimum, each subject should have an id per row followed by the class it belongs to. E.g. .. parsed-literal: sub001,control
sub002,control
sub003,disease
sub004,disease
| |
-o, --out_dir | Output folder to store gathered features & results. |
-f, --fs_subject_dir | |
Absolute path to E.g. |
Input data and formats¶
Only one of the following types can be specified.
-y, --pyradigm_paths | |
Path(s) to pyradigm datasets. Each path is self-contained dataset identifying each sample, its class and features. | |
-u, --user_feature_paths | |
List of absolute paths to user’s own features. Format: Each of these folders contains a separate folder for each subject (named after its ID in the metadata file) containing a file called features.txt with one number per line. All the subjects (in a given folder) must have the number of features (#lines in file). Different parent folders (describing one feature set) can have different number of features for each subject, but they must all have the same number of subjects (folders) within them. Names of each folder is used to annotate the results in visualizations. Hence name them uniquely and meaningfully, keeping in mind these figures will be included in your papers. For example, --user_feature_paths /project/fmri/ /project/dti/ /project/t1_volumes/
Only one of | |
-d, --data_matrix_paths | |
List of absolute paths to text files containing one matrix of size N x p (num_samples x num_features). Each row in the data matrix file must represent data corresponding to sample in the same row of the meta data file (meta data file and data matrix must be in row-wise correspondence). Name of this file will be used to annotate the results and visualizations. E.g. ``–data_matrix_paths /project/fmri.csv /project/dti.csv /project/t1_volumes.csv `` Only one of
| |
-a, --arff_paths | |
List of paths to files saved in Weka’s ARFF dataset format.
|
Cross-validation¶
Parameters related to training and optimization during cross-validation
-p, --positive_class | |
Name of the positive class (e.g. Alzheimers, MCI etc) to be used in calculation of area under the ROC curve. Applicable only for binary classification experiments. Default: class appearing last in order specified in metadata file. | |
-t, --train_perc | |
Percentage of the smallest class to be reserved for training. Must be in the interval [0.01 0.99]. If sample size is sufficiently big, we recommend 0.5. If sample size is small, or class imbalance is high, choose 0.8. | |
-n, --num_rep_cv | |
Number of repetitions of the repeated-holdout cross-validation. The larger the number, more stable the estimates will be. | |
-k, --num_features_to_select | |
| |
-s, --sub_groups | |
This option allows the user to study different combinations of classes in a multi-class (N>2) dataset. For example, in a dataset with 3 classes CN, FTD and AD,
two studies of pair-wise combinations can be studied separately
with the following flag Format: Different subgroups must be separated by space, and each sub-group must be a comma-separated list of class names defined in the meta data file. Hence it is strongly recommended to use class names without any spaces, commas, hyphens and special characters, and ideally just alphanumeric characters separated by underscores. Any number of subgroups can be specified, but each subgroup must have atleast two distinct classes. Default: | |
-g, --gs_level | Possible choices: none, light, exhaustive Flag to specify the level of grid search during hyper-parameter optimization on the training set. Allowed options are : ‘none’, ‘light’ and ‘exhaustive’, in the order of how many values/values will be optimized. More parameters and more values demand more resources and much longer time for optimization.
|
Predictive Model¶
Parameters related to pipeline comprising the predictive model
-e, --classifier | |
Possible choices: randomforestclassifier, extratreesclassifier String specifying one of the implemented classifiers. (Classifiers are carefully chosen to allow for the comprehensive report provided by neuropredict). Default: ‘RandomForestClassifier’ More options will be implemented in due course. |
Visualization¶
Parameters related to generating visualizations
-z, --make_vis | Option to make visualizations from existing results in the given path. This is helpful when neuropredict failed to generate result figures automatically e.g. on a HPC cluster, or another environment when DISPLAY is either not available. |
Computing¶
Parameters related to computations/debugging
-c, --num_procs | |
Number of CPUs to use to parallelize CV repetitions. Default : 4. Number of CPUs will be capped at the number available on the machine if higher is requested. | |
-v, --version | show program’s version number and exit |
If you don’t see any command line usage info shown above, click here:
A rough example of usage can be:
neuropredict -m meta_data.csv -f /work/project/features_dir
Example for meta-data
For example, if you have a dataset with the following three classes: 5 controls, 6 disease_one and 9 other_disease, all you would need to do is produce a meta data file as shown below (specifying a class label for each subject):
3071,controls
3069,controls
3064,controls
3063,controls
3057,controls
5004,disease_one
5074,disease_one
5077,disease_one
5001,disease_one
5002,disease_one
5003,disease_one
5000,other_disease
5006,other_disease
5013,other_disease
5014,other_disease
5016,other_disease
5018,other_disease
5019,other_disease
5021,other_disease
5022,other_disease
and neuropredict will produce the figures (and numbers in a CSV files) as shown here:

The higher resolution PDFs are included in the docs folder.
The typical output on the command line would like something like:
neuropredict -y *.MLDataset.pkl -m meta_FourClasses.csv -o ./predictions -t 0.75 -n 250
Requested features for analysis:
get_pyradigm from chebyshev.MLDataset.pkl
get_pyradigm from chebyshev_neg.MLDataset.pkl
get_pyradigm from chi_square.MLDataset.pkl
get_pyradigm from correlate_1.MLDataset.pkl
get_pyradigm from correlate.MLDataset.pkl
get_pyradigm from cosine_1.MLDataset.pkl
get_pyradigm from cosine_2.MLDataset.pkl
get_pyradigm from cosine_alt.MLDataset.pkl
get_pyradigm from cosine.MLDataset.pkl
get_pyradigm from euclidean.MLDataset.pkl
get_pyradigm from fidelity_based.MLDataset.pkl
Different classes in the training set are stratified to match the smallest class!
CV repetition 0
feature 0 weight_chebyshev : balanced accuracy: 0.3018
feature 1 weight_chebyshev_neg : balanced accuracy: 0.2917
feature 2 weight_chi_square : balanced accuracy: 0.2603
feature 3 weight_correlate_1 : balanced accuracy: 0.3271
feature 4 weight_correlate : balanced accuracy: 0.3647
feature 5 weight_cosine_1 : balanced accuracy: 0.3202
feature 6 weight_cosine_2 : balanced accuracy: 0.2869
feature 7 weight_cosine_alt : balanced accuracy: 0.3656
feature 8 weight_cosine : balanced accuracy: 0.3197
feature 9 weight_euclidean : balanced accuracy: 0.2579
feature 10 weight_fidelity_based : balanced accuracy: 0.1190
CV repetition 1
feature 0 weight_chebyshev : balanced accuracy: 0.3416
feature 1 weight_chebyshev_neg : balanced accuracy: 0.3761
feature 2 weight_chi_square : balanced accuracy: 0.3748
feature 3 weight_correlate_1 : balanced accuracy: 0.3397
feature 4 weight_correlate : balanced accuracy: 0.4087
feature 5 weight_cosine_1 : balanced accuracy: 0.3074
feature 6 weight_cosine_2 : balanced accuracy: 0.4059
feature 7 weight_cosine_alt : balanced accuracy: 0.3658
feature 8 weight_cosine : balanced accuracy: 0.3290
feature 9 weight_euclidean : balanced accuracy: 0.2662
feature 10 weight_fidelity_based : balanced accuracy: 0.2090
CV repetition 2
. . . .
. . . .
. . . .
CV repetition n
pyradigm here is the python class to ease your ML workflow - check it out here: pyradigm.readthedocs.io
I hope this user-friendly tool would help you get started on the predictive analysis you’ve been wanting to do for a while.