White Matter Microstructure and Macrostructure Brain Charts Across the Human Lifespan - Models

Kim, Michael; Schilling, Kurt

doi:10.5281/zenodo.15538472

Published May 8, 2025 | Version v2

Model Open

White Matter Microstructure and Macrostructure Brain Charts Across the Human Lifespan - Models

1. Vanderbilt University
2. Vanderbilt University Medical Center

Contributors

Project leader:

Kim, Michael¹

Supervisor:

Schilling, Kurt²

1. Vanderbilt University
2. Vanderbilt University Medical Center

Documentation Table of Contents

Introduction
- Build and Testing Environment
- Expected Runtime and Memory Usage
How to Run
- Loading Docker
- Obtaining Centile Curves
- Aligning Out-Of-Sample Datasets
Expected Outputs
Test/Example Dataset Tutorial
- Obtaining Centile Curves
- Aligning Out-Of-Sample Datasets
Appendix
- Creating own normative trajectories
- Process Data for Global WM and Normalized Measures
- Information for Tutorial Dataset

Introduction

The Docker available on this page contains the fit models for macrostructural and microstrucutral brain charts across the human lifespan (0-100 years of age). Researchers are able to use this Docker to align their out-of-sample (new) datasets to these brain charts or extract the normative trajectories.

Build and Testing Environment

The container was built and tested using a machine running on Ubuntu 22.04, with 62.5GB of memory. The CPU is a Intel(R) Xeon(R) W-2255 CPU running at 3.70GHz. The docker was also tested, and runs succesfully on a RedHat 7.7 OS machine (CPU: Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz)

Expected Runtime and Memory Usage

Use of the Docker should require less than 5GB.

How to Run

Before doing anything, Docker needs to be installed. Official instructions can be found here: (https://docs.docker.com/get-started/get-docker/). Docker must be running properly before proceeding.

NOTE: The Docker image was built on a computer with x86_64 (AMD64) architecture, which is the most common architecture for Linux/PC desktops and servers. However, it may not be usable on systems with ARM-based architectures (such as Apple Silicon Macs with M1, M2, M3, etc. chips). If you encounter an error like exec format error, it likely means the Docker image is not compatible with your system’s architecture.

Loading Docker

Download the Docker image (provided as a .tar file), and save it somewhere it can easily be located. Now, run the following command:

docker load -i </path/to/docker/.tar/file>

This command should finish within a few minutes. Alternatively, the Docker image can be loaded through the Docker Desktop GUI. Confirm that it is properly loaded by running

docker images

where the name of the docker (r_lifespan_env) should be present.

Obtaining Centile Curves

For researchers who wish to examine the normative trajectories of features more closely, we also provide a method for obtaining centiles of trajectories in a CSV format. To do so, run the following command:

docker run --rm \

-v </path/to/output/directory>:/OUTPUTS \

r_lifespan_env \

python3 /WMLifespan/scripts/output_centile_curves.py \

<tract> <measure> /OUTPUTS/centiles.csv

where </path/to/output/directory> is the directory (make sure it is an absolute path!) you wish to save the cenile CSF file in and centiles.csv is the name of the file you wish to save discrete values of the normative trajectory for the given <tract> and <measure>. Note that <tract> must be one of the TractSeg defined tract names found on the TractSeg github page here: https://github.com/MIC-DKFZ/TractSeg, whereas <measure> must be one of {fa-mean, md-mean, ad-mean, rd-mean, volume, surface_area, avg_length}.

Aligning Out-Of-Sample Datasets

One of the most important aspects of brain charts is the ability to score new data within the normative trajectories to determine how abnormal quantitative brain metrics are. For any researchers who would like to use these brain charts, we provide the following tutorial:

1.) Preprocess diffusion MRI (dMRI) data to correct for susceptibility-induced and eddy-current induced artifacts. We recommend using the PreQual pipeline, as it provides a QA document to determine whether or not the data are acceptable to use: (https://github.com/MASILab/PreQual, https://zenodo.org/records/14593034). Instructions for running PreQual can be found in both the github respository and the Zenodo page.

2.) Fit diffusion tensors (dwi2tensor) and obtain FA/MD/AD/RD microstructural maps (tensor2metric) using ONLY the dMRI volumes with bvalues less than or equal to 1500 s/mm^2. For consistency, we use MRtriX3 software (v.3.0.3), where dwi2tensor and tensor2metric are both coming from MRtriX3.

3.) Resample the preprocessed dRMI data AND the FA/MD/AD/RD maps to 1mm isotropic voxel sizes. For consistency, we use the MRtriX3 command, (example: mrgrid dwmri.nii.gz regrid dwmri_1mm_iso.nii.gz -voxel 1).

4.) Run TractSeg (https://github.com/MIC-DKFZ/TractSeg) on the resampled data to obtain the 72 TractSeg defined white matter tracts as .tck files.

5.) Get microstructural and macrostrucutral measures for each of the 72 white matter tracts. For consistency, we use scilpy (https://github.com/scilus/scilpy) to obtain microstructural and macrostructural features (v1.5.0). For this version, the scilpy scripts are called scil_evaluate_bundles_individual_measures.py for macrostructural and scil_compute_bundle_mean_std.py for microstrucutral measures, and the commands are:

scil_compute_bundle_mean_std.py <TRACT>.tck FA_map_1mm_iso.nii.gz MD_map_1mm_iso.nii.gz AD_map_1mm_iso.nii.gz RD_map_1mm_iso.nii.gz --density_weighting --reference=dwmri_1mm_iso.nii.gz > <TRACT>-DTI.json

scil_evaluate_bundles_individual_measures.py <TRACT>.tck <TRACT>-SHAPE.json --reference=dwmri_1mm_iso.nii.gz

where <TRACT> is the name of a TractSeg defined tract (will be output after running tractseg), FA_map_1mm_iso.nii.gz, MD_map_1mm_iso.nii.gz, AD_map_1mm_iso.nii.gz, RD_map_1mm_iso.nii.gz are the DTI maps from step 2.) and dwmri_1mm_iso.nii.gz is the dMRI data (all have been resampled as according to step 3.)). <TRACT>-DTI.json and <TRACT>-SHAPE.json will be the output files containing microstrucutral and macrostructural infomation respectively.

6.) Before alignment, the data from <TRACT>-DTI.json and <TRACT>-SHAPE.json must be properly formatted in a CSV file that can be read by the Docker image. The CSV is required to have columns age, sex, and diagnosis where age is a numerical value, sex is a binary variable where “male” is encoded as 0 and “female” is encoded as 1, and diagnosis is a categorical variable. Typically developing/aging (also referred to as “cognitively normal”) participants are encoded as “CN” for diagnosis, and to perform alignment there must be rows in the CSV file that contain “CN” as the diagnosis. For better alignment, ensure as many “CN” participants as possible, and note that having a small number of participants may result in poorly aligned data and thus poorly estimated centile scores. There must also be at least one quantitative variable column in the CSV file, where quantitative variables are named as:

<tract>-<measure>

<tract> must be one of the TractSeg defined tract names, whereas <measure> must be one of {fa-mean, md-mean, ad-mean, rd-mean, volume, surface_area, avg_length}. Thus, the CSV should follow formatting such as:

age	sex	diagnosis	AF_left-fa-mean	AF_right-md-mean	…
75.1	0	CN	0.453	0.00110	…
45	1	CN	0.562	0.00140	…
62.5	1	AD	0.398	0.00098	…
…	…	…	…	…	…

Note that in cases where rows have empty entries, centile scores will not be calculated for these metrics and as a result will have a missing entry in the respective centile score output. Further, only rows labeled with a “CN” diagnosis and with non-missing centile values will be used for estimating the random effect terms (for alignment purposes) for a particular measure.

7.) Run the following Docker command:

docker run --rm -v </path/to/OOS.csv>:/INPUTS/input.csv \

-v </path/to/output/directory>:/OUTPUTS \

r_lifespan_env \

python3 /WMLifespan/scripts/perform_OOS_alignment.py \

/INPUTS/input.csv /OUTPUTS/aligned.csv

where aligned.csv is the destination file you wish to save the aligned centile score values and input.csv is the structured CSV file from step 6.). The aligned.csv file will contain a new column for each of the metric columns the Docker could find (which should follow the <tract>-<measure> naming).

NOTE: As detailed in the Methods section, these normative curves are cross-sectional in nature. Thus, researchers performing out-of-sample alignment should only include cross-sectional data in the CSV file, or one scan per participant. Should researchers wish to evaluate longitudinal data with the cross-sectional models, the flag can be used to also save the estimated random effect terms for the dataset. We also note that this alignment to the normative models assumes that the data in the CSV file come from the same primary dataset. Calculation of centile scores for multiple datasets need to be done in separate Docker commands, each with their own distinct input CSV file.

Expected Outputs

For obtaining centile trajectories, the CSV will contain one column for the ages being sampled at, with the remaining columns containing values corresponding to specific centiles across the lifespan at each of the sampled ages.

For the alignment process, the output CSV will be the input CSV, but also contain new columns corresponding to the aligned centile values for each of the datapoints (with the column heading <tract>-<metric>_centile_score). Centile scores should be between 0 and 1, where values are the percentile represented as a decimal.

Test/Example Dataset Tutorial

We have provided example output files, OOS_aligned.csv and example_centiles.csv, and an example input file, MS_dataset.csv, for the purposes of a tutorial on out-of-sample alignment/extracting centile curves AND for consistency to ensure that the Docker image code is running properly with the correct data.

Obtaining Centile Curves

Create a test output directory. For the purposes of this tutorial, let's call it /home/user/testout. Run the following command to create the directory:

mkdir /home/user/testout

Next, run the following command to get the example trajectory:

docker run --rm \

-v /home/user/testout:/OUTPUTS \

r_lifespan_env \

python3 /WMLifespan/scripts/output_centile_curves.py \

AF_left volume /OUTPUTS/centiles.csv

There should now be a file called /home/user/testout/centiles.csv, with a header that looks like:

ages,male_AF_left-volume_0.025_centile,male_AF_left-volume_0.5_centile,male_AF_left-volume_0.975_centile,female_AF_left-volume_0.025_centile,female_AF_left-volume_0.5_centile,female_AF_left-volume_0.975_centile

where ages contains the age in years. The rest of the columns correspond to the sex-specific trajectories of the centile curves. For instance, male_AF_left-volume_0.025_centile corresponds to the 2.5th percentile for the male-specific trajectory of volume of the AF_left.

Then compare the results of /home/user/testout/centiles.csv to the example_centiles.csv file (they should be the same). For more options on obtaining centile curves, please run sudo docker run --rm r_lifespan_env python3 /WMLifespan/scripts/output_centile_curves.py --help

Aligning Out-of-Sample Datasets

Create a test output and test input directory. For the purposes of this tutorial, let's call them /home/user/testout and /home/user/testin. Run the following command to create the directory:

mkdir /home/user/testout /home/user/testin

Next, place the MS_dataset.csv file into the /home/user/testin directory. To perform the centile alignment process, run:

docker run --rm -v /home/user/testin/MS_dataset.csv:/INPUTS/input.csv \

-v /home/user/testout:/OUTPUTS \

r_lifespan_env \

python3 /WMLifespan/scripts/perform_OOS_alignment.py \

/INPUTS/input.csv /OUTPUTS/aligned.csv

This will take about 5 minutes or so to complete. There should now be a file at /home/user/testout/aligned.csv that contains all of the original columns that were in the MS_dataset.csv file in addition to new columns that correspond to centile scores for each of the aligned features. Compare the centile score columns (ending with "_centile_score") in the /home/user/testout/aligned.csv file to those provided in the OOS_aligned.csv file to make sure that the alignment was done "correctly".

NOTE: The sample dataset being used has very few participants and thus alignment to the centile curves may be improper. However, we provide it here for the purposes of reproducibilty to ensure that the code is running properly in the Docker image.

For more options on alignment, please run sudo docker run --rm r_lifespan_env python3 /WMLifespan/scripts/output_centile_curves.py --help

Appendix

Creating own Normative Trajectories

Although we have released our normative trajectories in the Docker image above, we also provide the code for creating your own normative trajectories. First, structure your data in the same format as step 6.) under Aligning Out-Of-Sample Datasets, with the addition that you add a single column called dataset, which should contain a unique string corresponding to a specific batch/dataset variable. The structured CSV should have a format like so:

age	sex	diagnosis	dataset	AF_left-fa-mean	…
75.1	0	CN	my_dataset_1	0.453	…
45	1	CN	my_dataset_1	0.562	…
62.5	1	CN	my_dataset_2	0.398	…
…	…	…	…	…	…

Put this CSV into a directory that can be accessed. For simplicity, assume you have placed it at /home/user/inputs/input.csv. Also create an output directory (for simplicity, assume it is called /home/user/outputs/). Run the following command to create additional directories:

mkdir -p /home/user/outputs/fit_models/fit_models

Now run the following command:

docker run --rm -v /home/user/inputs/input:/INPUTS \

-v /home/user/outputs:/OUTPUTS \

r_lifespan_env \

python3 /WMLifespan/scripts/LifespanExtension/fit_models_parallel.py \

<tract> <metric> \

/INPUTS/input.csv /OUTPUTS/aligned.csv \

--datacsv /INPUTS/input.csv --outdir /OUTPUTS

The time to fit models depends on the amount of data, but it will likely take around an hour to run. When it is finished, your normative models will be saved in /home/user/outputs/fit_models/fit_models.

Process Data for Global WM and Normalized Measures

If you would like to obtain measurements for the normalized macrostrucutral measures and the global WM features, run the following processing in addition to the preprocessing steps (1-5) from Aligning Out-Of-Sample Datasets (NOTE: these steps require a T1-weighted (T1w) image from the same scanning session as the dMRI scan):

1.) Run FreeSurfer (https://surfer.nmr.mgh.harvard.edu/) on the T1w image. For completeness, we use version 7.2.

2.) Extract the FreeSurfer metrics of "Brain_Segmentation_Volume_Without_Ventricles", "Total_cerebral_white_matter_volume", and "Estimated_Total_Intracranial_Volume".

3.) Compute a white matter mask from the FreeSurfer segmentation. We follow the MRtrix3 convention for a FreeSurfer-defined white matter mask, which can be found in the FreeSurfer2ACT.txt file (https://github.com/MRtrix3/mrtrix3/blob/master/share/mrtrix3/_5ttgen/FreeSurfer2ACT.txt).

4.) Coregister the T1w scan to the dMRI scan. To do so, we use epi_reg from FSL (v.6.0.6). Apply the transformation matrix to get the white matter mask from step 3.) into the dMRI space.

5.) Compute the average FA, MD, AD, and RD metrics within the registered white matter mask using the outputs from step 2.) of Aligning Out-Of-Sample Datasets. These, along with the "Total_cerebral_white_matter_volume" measure from FreeSurfer, are the Global WM Metrics.

6.) Appropriately normalize the tract volume, surface_area, and avg_length measures from the <TRACT>-SHAPE.json files based on the units (using the measures from step 2.) ). For volume, simply divide by the measure from step 2.) For surface area and average length, we are assuming that the brain is a sphere and dividing the tract measures by the corresponding derived measures. Thus, we estimate the normalizing radius (for average length) as r = (3V/(4π))^(1/3) and the normalizing surface area as SA = 4πr^2 where V is a measure from step 2.). Doing so results in the Normalized Macrostructural Metrics.

Information for Tutorial Dataset

25 Healthy Controls (40.7+/- 13.0 years old, 18 females) and 21 patients with Relapse-Remitting Multiple Sclerosis (pwRRMS) (41.6 +/- 9.3 years old, 18 females) were consented and scanned on a 3T Philips Elition X (Philips Medical Systems, Best, The Netherlands) using a dual-channel transmit body coil and 16-channel neurovascular coil for signal reception]. Diffusion data were collected using pulsed gradient spin echo, with a single-shot EPI readout: TR/TE=4600ms/85ms; in-plane resolution=2.5mm; slice thickness=2.5mm; number of slices=62, scan time=9 mins. Diffusion sensitization included b-values=750/1500/2250/3000 s/mm² acquired with 10/20/30/40 diffusion directions, respectively, per shell, and 10 measurements at b-value 0 s/mm².

Files

example_centiles.csv

Files (8.2 GB)

Name	Size	Download all
example_centiles.csv md5:50190f7ff6970954908cec664d406169	128.3 kB	Preview Download
MS_dataset.csv md5:181aabfb5041e542582ac0933465f0fd	372.4 kB	Preview Download
OOS_aligned.csv md5:1c145f9ecd954b53193151a24f99ba3f	835.1 kB	Preview Download
r_lifepsan_env.tar md5:98635ce389e94b38461ab7fa68262d02	8.2 GB	Download

Additional details

Programming language: Python, R
Development Status: Active

	All versions	This version
Views	129	32
Downloads	114	93
Data volume	295.6 GB	123.6 GB

White Matter Microstructure and Macrostructure Brain Charts Across the Human Lifespan - Models

Creators

Contributors

Project leader:

Supervisor:

Description

Documentation Table of Contents

Introduction

Build and Testing Environment

Expected Runtime and Memory Usage

How to Run

Loading Docker

Obtaining Centile Curves

Aligning Out-Of-Sample Datasets

Expected Outputs

Test/Example Dataset Tutorial

Obtaining Centile Curves

Aligning Out-of-Sample Datasets

Appendix

Creating own Normative Trajectories

Process Data for Global WM and Normalized Measures

Information for Tutorial Dataset

Files

example_centiles.csv

Files (8.2 GB)

Additional details

Software