Usage and examples

The command line interface for graynet (preferred interface, given its target is HPC) is shown below. Check the bottom of this page for examples.

usage: neuropredict [-h] -m META_FILE -o OUT_DIR [-f FS_SUBJECT_DIR]
                    [-y PYRADIGM_PATHS [PYRADIGM_PATHS ...] | -u
                    USER_FEATURE_PATHS [USER_FEATURE_PATHS ...] | -d
                    DATA_MATRIX_PATHS [DATA_MATRIX_PATHS ...]]
                    [-p POSITIVE_CLASS] [-t TRAIN_PERC] [-n NUM_REP_CV]
                    [-k NUM_FEATURES_TO_SELECT] [-a ATLASID]
                    [-s [SUB_GROUPS [SUB_GROUPS ...]]]

Named Arguments

-m, --meta_file
 

Abs path to file containing metadata for subjects to be included for analysis. At the minimum, each subject should have an id per row followed by the class it belongs to.

E.g. .. parsed-literal:

sub001,control
sub002,control
sub003,disease
sub004,disease
-o, --out_dir Output folder to store gathered features & results.
-f, --fs_subject_dir
 Absolute path to SUBJECTS_DIR containing the finished runs of Freesurfer parcellation (each subject named after its ID in the metadata file). E.g. --fs_subject_dir /project/freesurfer_v5.3
-y, --pyradigm_paths
 
Path(s) to pyradigm datasets. Each path is self-contained dataset identifying each sample, its class and features.
-u, --user_feature_paths
 

List of absolute paths to user’s own features.

Format: Each of these folders contains a separate folder for each subject (named after its ID in the metadata file) containing a file called features.txt with one number per line. All the subjects (in a given folder) must have the number of features (#lines in file). Different parent folders (describing one feature set) can have different number of features for each subject, but they must all have the same number of subjects (folders) within them.

Names of each folder is used to annotate the results in visualizations. Hence name them uniquely and meaningfully, keeping in mind these figures will be included in your papers. For example,

--user_feature_paths /project/fmri/ /project/dti/ /project/t1_volumes/

Only one of --pyradigm_paths, user_feature_paths and daa_matrix_path options can be specified.

-d, --data_matrix_paths
 

List of absolute paths to text files containing one matrix of size N x p (num_samples x num_features). Each row in the data matrix file must represent data corresponding to sample in the same row of the meta data file (meta data file and data matrix must be in row-wise correspondence). Name of this file will be used to annotate the results and visualizations.

E.g. ``–data_matrix_paths /project/fmri.csv /project/dti.csv /project/t1_volumes.csv ``

Only one of --pyradigm_paths, user_feature_paths and daa_matrix_path options can be specified. File format could be

  • a simple comma-separated text file (with extension .csv or .txt): which can easily be read back with numpy.loadtxt(filepath, delimiter=’,’) or
  • a numpy array saved to disk (with extension .npy or .numpy) that can read in with numpy.load(filepath).

One could use numpy.savetxt(data_array, delimiter=',') or numpy.save(data_array) to save features. File format is inferred from its extension.

-p, --positive_class
 Name of the positive class (Alzheimers, MCI or Parkinsons etc) to be used in calculation of area under the ROC curve. Applicable only for binary classification experiments. Default: class appearning second in order specified in metadata file.
-t, --train_perc
 Percentage of the smallest class to be reserved for training. Must be in the interval [0.01 0.99].If sample size is sufficiently big, we recommend 0.5.If sample size is small, or class imbalance is high, choose 0.8.
-n, --num_rep_cv
 Number of repetitions of the repeated-holdout cross-validation. The larger the number, the better the estimates will be.
-k, --num_features_to_select
 

Number of features to select as part of feature selection. Options:

  • ‘tenth’
  • ‘sqrt’
  • ‘log2’
  • ‘all’

Default: ‘tenth’ of the number of samples in the training set. For example, if your dataset has 90 samples, you chose 50 percent for training (default), then Y will have 90*.5=45 samples in training set, leading to 5 features to be selected for taining. If you choose a fixed integer, ensure all the feature sets under evaluation have atleast that many features.

-a, --atlas Name of the atlas to use for visualization. Default: fsaverage, if available.
-s, --sub_groups
 

This option allows the user to study different combinations of classes in a multi-class (N>2) dataset. For example, in a dataset with 3 classes CN, FTD and AD, two studies of pair-wise combinations can be studied separately with the following flag --sub_groups CN,FTD CN,AD. This allows the user to focus on few interesting subgroups depending on their dataset/goal.

Format: Different subgroups must be separated by space, and each sub-group must be a comma-separated list of class names defined in the meta data file. Hence it is strongly recommended to use class names without any spaces, commas, hyphens and special characters, and ideally just alphanumeric characters separated by underscores. Any number of subgroups can be specified, but each subgroup must have atleast two distinct classes.

Default: 'all', leading to inclusion of all available classes in a all-vs-all multi-class setting.

A rough example of usage can be:

neuropredict -m meta_data.csv -f /work/project/features_dir

Example for meta-data

For example, if you have a dataset with the following three classes: 5 controls, 6 disease_one and 9 other_disease, all you would need to do is produce a meta data file as shown below (specifying a class label for each subject):
3071,controls
3069,controls
3064,controls
3063,controls
3057,controls
5004,disease_one
5074,disease_one
5077,disease_one
5001,disease_one
5002,disease_one
5003,disease_one
5000,other_disease
5006,other_disease
5013,other_disease
5014,other_disease
5016,other_disease
5018,other_disease
5019,other_disease
5021,other_disease
5022,other_disease

and neuropredict will produce the figures (and numbers in a CSV files) as shown here:

_images/composite_flyer.001.png

The higher resolution PDFs are included in the docs folder.

The typical output on the command line would like something like:

neuropredict -y *.MLDataset.pkl -m meta_FourClasses.csv -o ./predictions -t 0.75 -n 250

Requested features for analysis:
get_pyradigm from chebyshev.MLDataset.pkl
get_pyradigm from chebyshev_neg.MLDataset.pkl
get_pyradigm from chi_square.MLDataset.pkl
get_pyradigm from correlate_1.MLDataset.pkl
get_pyradigm from correlate.MLDataset.pkl
get_pyradigm from cosine_1.MLDataset.pkl
get_pyradigm from cosine_2.MLDataset.pkl
get_pyradigm from cosine_alt.MLDataset.pkl
get_pyradigm from cosine.MLDataset.pkl
get_pyradigm from euclidean.MLDataset.pkl
get_pyradigm from fidelity_based.MLDataset.pkl
Different classes in the training set are stratified to match the smallest class!

 CV repetition   0
     feature   0      weight_chebyshev : balanced accuracy: 0.3018
     feature   1  weight_chebyshev_neg : balanced accuracy: 0.2917
     feature   2     weight_chi_square : balanced accuracy: 0.2603
     feature   3    weight_correlate_1 : balanced accuracy: 0.3271
     feature   4      weight_correlate : balanced accuracy: 0.3647
     feature   5       weight_cosine_1 : balanced accuracy: 0.3202
     feature   6       weight_cosine_2 : balanced accuracy: 0.2869
     feature   7     weight_cosine_alt : balanced accuracy: 0.3656
     feature   8         weight_cosine : balanced accuracy: 0.3197
     feature   9      weight_euclidean : balanced accuracy: 0.2579
     feature  10 weight_fidelity_based : balanced accuracy: 0.1190

 CV repetition   1
     feature   0      weight_chebyshev : balanced accuracy: 0.3416
     feature   1  weight_chebyshev_neg : balanced accuracy: 0.3761
     feature   2     weight_chi_square : balanced accuracy: 0.3748
     feature   3    weight_correlate_1 : balanced accuracy: 0.3397
     feature   4      weight_correlate : balanced accuracy: 0.4087
     feature   5       weight_cosine_1 : balanced accuracy: 0.3074
     feature   6       weight_cosine_2 : balanced accuracy: 0.4059
     feature   7     weight_cosine_alt : balanced accuracy: 0.3658
     feature   8         weight_cosine : balanced accuracy: 0.3290
     feature   9      weight_euclidean : balanced accuracy: 0.2662
     feature  10 weight_fidelity_based : balanced accuracy: 0.2090

 CV repetition   2
 . . . .
 . . . .
 . . . .
 CV repetition   n

pyradigm here is the python class to ease your ML workflow - check it out here: pyradigm.readthedocs.io

I hope this user-friendly tool would help you get started on the predictive analysis you’ve been wanting to do for a while.