White Matter Microstructure and Macrostructure Brain Charts Across the Human Lifespan - Models
Description
Documentation Table of Contents
- Introduction
- Build and Testing Environment
- Expected Runtime and Memory Usage
- How to Run
- Loading Docker
- Obtaining Centile Curves
- Aligning Out-Of-Sample Datasets
- Expected Outputs
- Test/Example Dataset Tutorial
- Obtaining Centile Curves
- Aligning Out-Of-Sample Datasets
- Appendix
- Creating own normative trajectories
- Process Data for Global WM and Normalized Measures
- Information for Tutorial Dataset
Introduction
The Docker available on this page contains the fit models for macrostructural and microstrucutral brain charts across the human lifespan (0-100 years of age). Researchers are able to use this Docker to align their out-of-sample (new) datasets to these brain charts or extract the normative trajectories.
Build and Testing Environment
The container was built and tested using a machine running on Ubuntu 22.04, with 62.5GB of memory. The CPU is a Intel(R) Xeon(R) W-2255 CPU running at 3.70GHz. The docker was also tested, and runs succesfully on a RedHat 7.7 OS machine (CPU: Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz)
Expected Runtime and Memory Usage
Use of the Docker should require less than 5GB.
How to Run
Before doing anything, Docker needs to be installed. Official instructions can be found here: (https://docs.docker.com/get-started/get-docker/). Docker must be running properly before proceeding.
NOTE: The Docker image was built on a computer with x86_64 (AMD64) architecture, which is the most common architecture for Linux/PC desktops and servers. However, it may not be usable on systems with ARM-based architectures (such as Apple Silicon Macs with M1, M2, M3, etc. chips). If you encounter an error like exec format error
, it likely means the Docker image is not compatible with your system’s architecture.
Loading Docker
Download the Docker image (provided as a .tar file), and save it somewhere it can easily be located. Now, run the following command:
docker load -i </path/to/docker/.tar/file>
This command should finish within a few minutes. Alternatively, the Docker image can be loaded through the Docker Desktop GUI. Confirm that it is properly loaded by running
docker images
where the name of the docker (r_lifespan_env) should be present.
Obtaining Centile Curves
For researchers who wish to examine the normative trajectories of features more closely, we also provide a method for obtaining centiles of trajectories in a CSV format. To do so, run the following command:
docker run --rm \
-v </path/to/output/directory>:/OUTPUTS \
r_lifespan_env \
python3 /WMLifespan/scripts/output_centile_curves.py \
<tract> <measure> /OUTPUTS/centiles.csv
where </path/to/output/directory> is the directory (make sure it is an absolute path!) you wish to save the cenile CSF file in and centiles.csv is the name of the file you wish to save discrete values of the normative trajectory for the given <tract> and <measure>. Note that <tract> must be one of the TractSeg defined tract names found on the TractSeg github page here: https://github.com/MIC-DKFZ/TractSeg, whereas <measure> must be one of {fa-mean, md-mean, ad-mean, rd-mean, volume, surface_area, avg_length}.
Aligning Out-Of-Sample Datasets
One of the most important aspects of brain charts is the ability to score new data within the normative trajectories to determine how abnormal quantitative brain metrics are. For any researchers who would like to use these brain charts, we provide the following tutorial:
1.) Preprocess diffusion MRI (dMRI) data to correct for susceptibility-induced and eddy-current induced artifacts. We recommend using the PreQual pipeline, as it provides a QA document to determine whether or not the data are acceptable to use: (https://github.com/MASILab/PreQual, https://zenodo.org/records/14593034). Instructions for running PreQual can be found in both the github respository and the Zenodo page.
2.) Fit diffusion tensors (dwi2tensor
) and obtain FA/MD/AD/RD microstructural maps (tensor2metric
) using ONLY the dMRI volumes with bvalues less than or equal to 1500 s/mm^2. For consistency, we use MRtriX3 software (v.3.0.3), where dwi2tensor
and tensor2metric
are both coming from MRtriX3.
3.) Resample the preprocessed dRMI data AND the FA/MD/AD/RD maps to 1mm isotropic voxel sizes. For consistency, we use the MRtriX3 command, (example: mrgrid dwmri.nii.gz regrid dwmri_1mm_iso.nii.gz -voxel 1
).
4.) Run TractSeg (https://github.com/MIC-DKFZ/TractSeg) on the resampled data to obtain the 72 TractSeg defined white matter tracts as .tck files.
5.) Get microstructural and macrostrucutral measures for each of the 72 white matter tracts. For consistency, we use scilpy (https://github.com/scilus/scilpy) to obtain microstructural and macrostructural features (v1.5.0). For this version, the scilpy scripts are called scil_evaluate_bundles_individual_measures.py
for macrostructural and scil_compute_bundle_mean_std.py
for microstrucutral measures, and the commands are:
scil_compute_bundle_mean_std.py <TRACT>.tck FA_map_1mm_iso.nii.gz MD_map_1mm_iso.nii.gz AD_map_1mm_iso.nii.gz RD_map_1mm_iso.nii.gz --density_weighting --reference=dwmri_1mm_iso.nii.gz > <TRACT>-DTI.json
scil_evaluate_bundles_individual_measures.py <TRACT>.tck <TRACT>-SHAPE.json --reference=dwmri_1mm_iso.nii.gz
where <TRACT>
is the name of a TractSeg defined tract (will be output after running tractseg), FA_map_1mm_iso.nii.gz, MD_map_1mm_iso.nii.gz, AD_map_1mm_iso.nii.gz, RD_map_1mm_iso.nii.gz
are the DTI maps from step 2.) and dwmri_1mm_iso.nii.gz
is the dMRI data (all have been resampled as according to step 3.)). <TRACT>-DTI.json
and <TRACT>-SHAPE.json
will be the output files containing microstrucutral and macrostructural infomation respectively.
6.) Before alignment, the data from <TRACT>-DTI.json
and <TRACT>-SHAPE.json
must be properly formatted in a CSV file that can be read by the Docker image. The CSV is required to have columns age, sex, and diagnosis where age is a numerical value, sex is a binary variable where “male” is encoded as 0 and “female” is encoded as 1, and diagnosis is a categorical variable. Typically developing/aging (also referred to as “cognitively normal”) participants are encoded as “CN” for diagnosis, and to perform alignment there must be rows in the CSV file that contain “CN” as the diagnosis. For better alignment, ensure as many “CN” participants as possible, and note that having a small number of participants may result in poorly aligned data and thus poorly estimated centile scores. There must also be at least one quantitative variable column in the CSV file, where quantitative variables are named as:
<tract>-<measure>
<tract> must be one of the TractSeg defined tract names, whereas <measure> must be one of {fa-mean, md-mean, ad-mean, rd-mean, volume, surface_area, avg_length}. Thus, the CSV should follow formatting such as:
age |
sex |
diagnosis |
AF_left-fa-mean |
AF_right-md-mean |
… |
75.1 |
0 |
CN |
0.453 |
0.00110 |
… |
45 |
1 |
CN |
0.562 |
0.00140 |
… |
62.5 |
1 |
AD |
0.398 |
0.00098 |
… |
… |
… |
… |
… |
… |
… |
Note that in cases where rows have empty entries, centile scores will not be calculated for these metrics and as a result will have a missing entry in the respective centile score output. Further, only rows labeled with a “CN” diagnosis and with non-missing centile values will be used for estimating the random effect terms (for alignment purposes) for a particular measure.
7.) Run the following Docker command:
docker run --rm -v </path/to/OOS.csv>:/INPUTS/input.csv \
-v </path/to/output/directory>:/OUTPUTS \
r_lifespan_env \
python3 /WMLifespan/scripts/perform_OOS_alignment.py \
/INPUTS/input.csv /OUTPUTS/aligned.csv
where aligned.csv is the destination file you wish to save the aligned centile score values and input.csv is the structured CSV file from step 6.). The aligned.csv file will contain a new column for each of the metric columns the Docker could find (which should follow the <tract>-<measure> naming).
NOTE: As detailed in the Methods section, these normative curves are cross-sectional in nature. Thus, researchers performing out-of-sample alignment should only include cross-sectional data in the CSV file, or one scan per participant. Should researchers wish to evaluate longitudinal data with the cross-sectional models, the flag can be used to also save the estimated random effect terms for the dataset. We also note that this alignment to the normative models assumes that the data in the CSV file come from the same primary dataset. Calculation of centile scores for multiple datasets need to be done in separate Docker commands, each with their own distinct input CSV file.
Expected Outputs
For obtaining centile trajectories, the CSV will contain one column for the ages being sampled at, with the remaining columns containing values corresponding to specific centiles across the lifespan at each of the sampled ages.
For the alignment process, the output CSV will be the input CSV, but also contain new columns corresponding to the aligned centile values for each of the datapoints (with the column heading <tract>-<metric>_centile_score). Centile scores should be between 0 and 1, where values are the percentile represented as a decimal.
Test/Example Dataset Tutorial
We have provided example output files, OOS_aligned.csv
and example_centiles.csv
, and an example input file, MS_dataset.csv
, for the purposes of a tutorial on out-of-sample alignment/extracting centile curves AND for consistency to ensure that the Docker image code is running properly with the correct data.
Obtaining Centile Curves
Create a test output directory. For the purposes of this tutorial, let's call it /home/user/testout
. Run the following command to create the directory:
mkdir /home/user/testout
Next, run the following command to get the example trajectory:
docker run --rm \
-v /home/user/testout:/OUTPUTS \
r_lifespan_env \
python3 /WMLifespan/scripts/output_centile_curves.py \
AF_left volume /OUTPUTS/centiles.csv
There should now be a file called /home/user/testout/centiles.csv
, with a header that looks like:
ages,male_AF_left-volume_0.025_centile,male_AF_left-volume_0.5_centile,male_AF_left-volume_0.975_centile,female_AF_left-volume_0.025_centile,female_AF_left-volume_0.5_centile,female_AF_left-volume_0.975_centile
where ages contains the age in years. The rest of the columns correspond to the sex-specific trajectories of the centile curves. For instance, male_AF_left-volume_0.025_centile corresponds to the 2.5th percentile for the male-specific trajectory of volume of the AF_left.
Then compare the results of /home/user/testout/centiles.csv
to the example_centiles.csv
file (they should be the same). For more options on obtaining centile curves, please run sudo docker run --rm r_lifespan_env python3 /WMLifespan/scripts/output_centile_curves.py --help
Aligning Out-of-Sample Datasets
Create a test output and test input directory. For the purposes of this tutorial, let's call them /home/user/testout
and /home/user/testin
. Run the following command to create the directory:
mkdir /home/user/testout /home/user/testin
Next, place the MS_dataset.csv
file into the /home/user/testin
directory. To perform the centile alignment process, run:
docker run --rm -v /home/user/testin/MS_dataset.csv:/INPUTS/input.csv \
-v /home/user/testout:/OUTPUTS \
r_lifespan_env \
python3 /WMLifespan/scripts/perform_OOS_alignment.py \
/INPUTS/input.csv /OUTPUTS/aligned.csv
This will take about 5 minutes or so to complete. There should now be a file at /home/user/testout/aligned.csv
that contains all of the original columns that were in the MS_dataset.csv
file in addition to new columns that correspond to centile scores for each of the aligned features. Compare the centile score columns (ending with "_centile_score") in the /home/user/testout/aligned.csv
file to those provided in the OOS_aligned.csv
file to make sure that the alignment was done "correctly".
NOTE: The sample dataset being used has very few participants and thus alignment to the centile curves may be improper. However, we provide it here for the purposes of reproducibilty to ensure that the code is running properly in the Docker image.
For more options on alignment, please run sudo docker run --rm r_lifespan_env python3 /WMLifespan/scripts/output_centile_curves.py --help
Appendix
Creating own Normative Trajectories
Although we have released our normative trajectories in the Docker image above, we also provide the code for creating your own normative trajectories. First, structure your data in the same format as step 6.) under Aligning Out-Of-Sample Datasets, with the addition that you add a single column called dataset, which should contain a unique string corresponding to a specific batch/dataset variable. The structured CSV should have a format like so:
age |
sex |
diagnosis |
dataset |
AF_left-fa-mean |
… |
75.1 |
0 |
CN |
my_dataset_1 |
0.453 |
… |
45 |
1 |
CN |
my_dataset_1 |
0.562 |
… |
62.5 |
1 |
CN |
my_dataset_2 |
0.398 |
… |
… |
… |
… |
… |
… |
… |
Put this CSV into a directory that can be accessed. For simplicity, assume you have placed it at /home/user/inputs/input.csv
. Also create an output directory (for simplicity, assume it is called /home/user/outputs/
). Run the following command to create additional directories:
mkdir -p /home/user/outputs/fit_models/fit_models
Now run the following command:
docker run --rm -v /home/user/inputs/input:/INPUTS \
-v /home/user/outputs:/OUTPUTS \
r_lifespan_env \
python3 /WMLifespan/scripts/LifespanExtension/fit_models_parallel.py \
<tract> <metric> \
/INPUTS/input.csv /OUTPUTS/aligned.csv \
--datacsv /INPUTS/input.csv --outdir /OUTPUTS
The time to fit models depends on the amount of data, but it will likely take around an hour to run. When it is finished, your normative models will be saved in /home/user/outputs/fit_models/fit_models
.
Process Data for Global WM and Normalized Measures
If you would like to obtain measurements for the normalized macrostrucutral measures and the global WM features, run the following processing in addition to the preprocessing steps (1-5) from Aligning Out-Of-Sample Datasets (NOTE: these steps require a T1-weighted (T1w) image from the same scanning session as the dMRI scan):
1.) Run FreeSurfer (https://surfer.nmr.mgh.harvard.edu/) on the T1w image. For completeness, we use version 7.2.
2.) Extract the FreeSurfer metrics of "Brain_Segmentation_Volume_Without_Ventricles", "Total_cerebral_white_matter_volume", and "Estimated_Total_Intracranial_Volume".
3.) Compute a white matter mask from the FreeSurfer segmentation. We follow the MRtrix3 convention for a FreeSurfer-defined white matter mask, which can be found in the FreeSurfer2ACT.txt
file (https://github.com/MRtrix3/mrtrix3/blob/master/share/mrtrix3/_5ttgen/FreeSurfer2ACT.txt).
4.) Coregister the T1w scan to the dMRI scan. To do so, we use epi_reg
from FSL (v.6.0.6). Apply the transformation matrix to get the white matter mask from step 3.) into the dMRI space.
5.) Compute the average FA, MD, AD, and RD metrics within the registered white matter mask using the outputs from step 2.) of Aligning Out-Of-Sample Datasets. These, along with the "Total_cerebral_white_matter_volume" measure from FreeSurfer, are the Global WM Metrics.
6.) Appropriately normalize the tract volume, surface_area, and avg_length measures from the <TRACT>-SHAPE.json
files based on the units (using the measures from step 2.) ). For volume, simply divide by the measure from step 2.) For surface area and average length, we are assuming that the brain is a sphere and dividing the tract measures by the corresponding derived measures. Thus, we estimate the normalizing radius (for average length) as r = (3V/(4π))^(1/3)
and the normalizing surface area as SA = 4πr^2
where V is a measure from step 2.). Doing so results in the Normalized Macrostructural Metrics.
Information for Tutorial Dataset
25 Healthy Controls (40.7+/- 13.0 years old, 18 females) and 21 patients with Relapse-Remitting Multiple Sclerosis (pwRRMS) (41.6 +/- 9.3 years old, 18 females) were consented and scanned on a 3T Philips Elition X (Philips Medical Systems, Best, The Netherlands) using a dual-channel transmit body coil and 16-channel neurovascular coil for signal reception]. Diffusion data were collected using pulsed gradient spin echo, with a single-shot EPI readout: TR/TE=4600ms/85ms; in-plane resolution=2.5mm; slice thickness=2.5mm; number of slices=62, scan time=9 mins. Diffusion sensitization included b-values=750/1500/2250/3000 s/mm2 acquired with 10/20/30/40 diffusion directions, respectively, per shell, and 10 measurements at b-value 0 s/mm2.
Files
example_centiles.csv
Additional details
Software
- Programming language
- Python, R
- Development Status
- Active