﻿Setting Up Group Analysis
==========================
Overview
^^^^^^^^

C-PAC uses `FSL/FEAT <http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FEAT/UserGuide>`_ to compare findings across groups.  You can construct models using a subject list and a phenotype file, select derivatives to be predicted by the model, and define contrasts between conditions using either the GUI or a custom csv file.  Then FSL/FEAT will run a second-level `General Linear Model (GLM) <http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FEAT/UserGuide#Appendix_A:_Brief_Overview_of_GLM_Analysis>`_ for you.

The following links provide an introduction to how groups are compared using FSL, as well as how to define contrasts:

* http://www.fmrib.ox.ac.uk/fslcourse/lectures/feat1_part2.pdf

The following presentation also gives a good overview of the group analysis user interface:

* https://docs.google.com/presentation/d/1cJVNeNSK8Uy8UTzN6mMG4YR5YLqnpV4HopqsKZbma5k/pub?start=false&loop=false&delayms=10000#slide=id.p

The example files used in the presentation above are also available below for your perusal:

* A Subject List 
* A Phenotype File 

As with subject list and pipeline configuration, there are two ways to set up FSL group analysis for C-PAC:

* Using a text editor (useful for remote servers where using the GUI is not possible or impractical)
* Using the pipeline configuration interface in the GUI

Using a Text Editor
^^^^^^^^^^^^^^^^^^^^

Similar to the pipeline and data configuration YAMLs for the pipeline and subject list specification steps, you can generate a group analysis configuration as a YAML.  An example of such a file can be found here .  A list of possible keys and values for this YAML are listed below.

.. csv-table::
    :header: "Key","Description","Potential Values"
    :widths: 5,30,15
    :file: _static/params/group_config.csv

The possible values for the items in the *derivatives* key are as follows:

*  For z-scored analyses:

 'alff_to_standard_zstd', 'alff_to_standard_smooth_zstd', 'falff_to_standard_zstd', 'falff_to_standard_smooth_zstd', 'reho_to_standard_zstd', 'reho_to_standard_smooth_zstd', 'sca_roi_to_standard_fisher_zstd', 'sca_roi_to_standard_smooth_fisher_zstd', 'sca_seed_to_standard_fisher_zstd', 'sca_seed_to_standard_smooth_fisher_zstd', 'vmhc_fisher_zstd', 'vmhc_fisher_zstd_zstat_map', 'dr_tempreg_maps_zstat_files_to_standard', 'dr_tempreg_maps_zstat_files_to_standard_smooth', 'sca_tempreg_maps_zstat_files', 'sca_tempreg_maps_zstat_files_smooth', 'centrality_outputs_zstd', 'centrality_outputs_smoothed_zstd'
 
Using the GUI
^^^^^^^^^^^^^^
To configure group-level analysis from the main screen of the GUI, select a pipeline for which you have run preprocessing and individual-level analysis, then click *Edit*.  Then, navigate to *Group Analysis Settings* and click on the *+* symbol.  Here you may select a model configuration YAML file to use by clicking on the folder to the right to select a file or by typing in a path.  Clicking *Create or Load FSL Model* will allow you to specify models via the C-PAC interface, or load a YAML configuration file containing model settings that can then be adjusted interactively.

.. figure:: /_images/ga_main.png

#. **Number of Models to Run Simultaneously - [integer]:** This number depends on computing resources. Choose how many models to run at the same time (parallelization).

#. **Models to Run - [checkboxes]:** Use the + to add FSL Models to be run (or to create new models).

Specifying Models to Run
""""""""""""""""""""""""

.. figure:: /_images/ga_model_setup.png

#. **Participant List - [path]:** Full path to a list of subjects to be included in the model. This should be a text file with one subject per line.  A list in this format containing all subjects run through CPAC was generated along with the main CPAC subject list (see the subject list in `Overview`). Another easy way to manually create this file is to copy the subjects column from your Regressor/EV spreadsheet.

#. **Phenotype/EV File -[path]:** Full path to a .csv file containing EV information for each subject. A file in this format (containing a single column listing all subjects run through CPAC) was generated along with the main CPAC subject list (see the phenotype file in `Overview`).  Levels for categorical variables in this file can be expressed as words ('ADHD'/'TD') or numerical values (0/1) depending on your preferences.

#. **Participant Column Name [text]:** Name of the subjects column in your EV file.

#. **Model Setup - [checkboxes]:** A list of EVs from your phenotype file will populate in this window. From here, you can select whether the EVs should be treated as categorical or if they should be demeaned (continuous/non-categorical EVs only). 'MeanFD' and 'Measure Mean' will also appear in this window automatically as options to be used as regressors that can be included in your model design. Note that the MeanFD and mean of measure values are automatically calculated and supplied by C-PAC via individual-level analysis. Measure mean is calculated using the mean signal from raw data rather than z-score data, which is then demeaned.  Also, MeanFD and mean of measure values are automatically demeaned prior to being inserted into the group analysis model.

#. **Design Matrix Formula - [Patsy formula]:** Specify the formula to describe your model design. Essentially, including EVs in this formula inserts them into the model. The most basic format to include each EV you select would be 'EV + EV + EV + ..', etc. You can also select to include MeanFD, Measure_Mean, or an intercept here (by adding ' + Measure_Mean', ' + MeanFD_<Power/Jenkinson>' or ' + Intercept ' respectively). Note that when you add an intercept to your formula categorical variables will automatically be demeaned since this is a requirement for FLAME to run properly. This design formula is pre-generated for the user depending on the EVs in the phenotype file, but can be edited at any time. C-PAC uses the Python library Patsy to generate the design matrices, so more information on how to format your design formula for specific designs can be found here- Patsy formula documentation .  If you have used R in the past, Patsy's formula syntax should be familiar.  

#. **Custom ROI Mean Mask (optional) - [path]:** Use a binarized mask with one or more ROIs to add averages for those ROIs to the model as EVs.  Mask file must be in NifTI format.  These averages will be calculated using the raw data, rather than z-scored data, and will then be demeaned afterwards.

#. **Select Derivatives - [checkboxes]:** Select which derivatives you would like to include when running group analysis. When including Dual Regression, make sure to correct your P-value for the number of maps you are comparing. When including Multiple Regression SCA, you must have more degrees of freedom (subjects) than there were time series.

#. **Mask for Means Calculation - [Group Mask, Individual Mask]:** C-PAC can add the average voxel intensity for a derivative as an EV in the model.  If this average voxel intensity is present in the model, this menu allows you to select either a group-level or individual-level mask.  Otherwise, this menu can be ignored.


#. **Use z-score Standardized Derivatives - [True, False]:** Run model on a z-score standardized version of individual-level outputs or the raw versions.

#. **Z Threshold - [decimal]:** Only voxels with a Z-score higher than this value will be considered significant.

#. **Cluster Significance Threshold - [decimal]:** Significance threshold (P-value) to use when doing cluster correction for multiple comparisons.

#. **Coding Scheme - [Treatment, Sum]:** Select the encoding for your design matrix.  For more details, see Patsy's pages on Treatment  coding.

#. **Model Group Variances Separately - [Off, On]:** Specify whether FSL should model the variance for each group separately. If this option is enabled, you must specify a grouping variable below.

#. **Grouping Variable - [text]:** The name of the EV that should be used to group subjects when modeling variances. If you do not wish to model group variances separately, set this value to None.

#. **Sessions (Repeated Measures Only) - [dialogue: list of session names]:** Enter the session names in your dataset that you wish to include within the same model (this is for repeated measures/  within-subject designs).  These will be the names listed as "unique_id" in  the original individual-level participant list, or the labels in the original data directories you marked as {session} while creating the C-PAC participant list.  Do not adjust your phenotype CSV- C-PAC will re-formulate it internally before passing it to FLAME.

#. **Series/Scans (Repeated Measures Only) - [dialogue: list of scan names]:**  Enter the series names in your dataset that you wish to include within the same model (this is for repeated measure / within-subject designs).  These will be the labels listed under "rest:" in the original individual-level participant list, or the labels in the original data directories you marked as {series} while creating the C-PAC participant list.  Do not adjust your phenotype CSV- C-PAC will re-formulate it internally before passing it to FLAME. 


Upon populating these fields and clicking `Load Phenotype File`, your model builder will look something like this:

.. figure:: /_images/ga_model_setup_populated.png

Upon making your selections and clicking `Next`, you will be able to define contrasts.

Specifying Contrasts
"""""""""""""""""""""

.. figure:: /_images/ga_contrasts.png

#. **Contrasts - [checkboxes]:** Specify your contrasts in this box.   When the model builder builds the design matrix, it will process the categorical variables appropriately and provide the names of the different levels available as contrast labels (printed in Patsy syntax ) , listed under 'Available Contrasts'. Contrasts are specified as formulas (e.g., C(diagnosis)[ADHD] + C(diagnosis)[TD]) = 0 for an ADHD > TD contrast or age = 0 for an age > 0 contrast).  Note that the way in which you define your model will determine which contrasts are capable of being run.  When you are done specifying contrasts check the contrasts you wish to run.

#. **f-Tests - [checkboxes]:** Define an f-test by selecting two or more contrasts to include.  When you are done, select the f-tests that you wish to run.  

#. **Custom Contrasts Matrix - [path]:** Define contrasts using a custom CSV file.  Instructions for constructing a csv may be found below.  Note that if you choose to use a custom CSV, any of the options specified in the 'Contrasts' and 'f-Tests' boxes will be ignored.

#. **Model Name - [text]:** Specify a name for the new model.

#. **Output Directory - [path]:** Full path to the directory where CPAC should place the model files (.mat, .con, .grp, .csv) and the outputs of group analysis.  The CSV file is a human-readable version of the .mat file used by FLAME that you can use to examine the exact model that C-PAC generated.

When you are done, the contrast screen should look like this:

.. figure:: /_images/ga_contrasts_populated.png

Click `Save Settings` and place your model specification within an appropriate directory.  You are now able to reload it for future use.

Creating a Custom CSV File
^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. figure:: /_images/ga_contrast_csv.png

A custom contrasts csv can be used to define contrasts manually rather than using the graphical model builder.  When you create a custom contrasts csv, fill the first cell of the first row with the label 'Contrasts', followed by labels for each of the EVs you wish to use.  The first column should be filled with labels for the contrasts that you can define - these do not have to follow any particular convention, and can be whatever works best for your experiment.  The remainder of the cells can be populated with contrast weights according to your needs.

If you would like to add f-tests, add each f-test as a column to the csv and the assign weights to each contrast to be included in the f-test.

.. figure:: /_images/ga_contrast_ftest.png

