fMRIPrep: Building a Robust Preprocessing Pipeline for fMRI image/svg+xml fMRIPrep: Building a Robust Preprocessing Pipeline for fMRI June 7, 2018 Christopher J Markiewicz BIDS Derivatives Derivatives are outputs of (pre-)processing pipelines, capturing data and meta-data sufficient for a researcher to understand and (critically) reuse those outputs in subsequent processing. Standardizing derivatives is motivated by use cases where formalized machine-readable access to processed data enables higher level processing. A derivative dataset is a collection of derivatives, or files that have been generated from the data. Broadly, a derivative can be considered to be preprocessed or processed, such that the data type is unchanged or changed, respectively, from that of the source data file(s). BIDS Derivatives was finalized in version 1.4.0 of the BIDS specification. Tour of a BIDS Derivative As with BIDS datasets, all conformant derivative datasets contain a dataset_description.json . New fields include DatasetType , which distinguishes "derivative" datasets from "raw" ; GeneratedBy , a list of processes that generated the data; SourceDatasets , a list of datasets used to generate the derivative. { "Name" : "FMRIPREP Outputs" , "BIDSVersion" : "1.4.0" , "DatasetType" : "derivative" , "GeneratedBy" : [ { "Name" : "fmriprep" , "Version" : "1.4.1" , "Container" : { "Type" : "docker" , "Tag" : "poldracklab/fmriprep:1.4.1" } }, { "Name" : "Manual" , "Description" : "Re-added RepetitionTime metadata to bold.json files" } ], "SourceDatasets" : [ { "DOI" : "10.18112/openneuro.ds000114.v1.0.1" , "URL" : "https://openneuro.org/datasets/ds000114/versions/1.0.1" , "Version" : "1.0.1" } ] } Preprocessed data Data is considered to be preprocessed if it is fundamentally similar to the source data. Artifact removal, motion correction and resampling to a template space are examples of preprocessing. An example of a subject with simultaneous EEG/fMRI resting state scan, aligned along with a T1w image to the MNI305 template: pipeline1/ sub-01/ anat/ sub-01_space-MNI305_T1w.nii.gz sub-01_space-MNI305_T1w.json eeg/ sub-01_task-rest_desc-filtered_eeg.edf sub-01_task-rest_desc-filtered_eeg.json func/ sub-01_task-rest_space-MNI305_desc-preproc_bold.nii.gz sub-01_task-rest_space-MNI305_desc-preproc_bold.json The space entity indicates that a file is aligned to some reference space. For standard templates, this is sufficient. For custom templates (e.g., individual or study-specific), additional SpatialReference metadata is required in the JSON sidecar files. The desc (description) entity allows for unrestricted alphanumeric labels, in the absence of a more appropriate entity to distinguish one file from another. Derivative data types Data is considered to be processed if it is fundamentally different to the source data. Processed data may differ substantially in BIDS datatypes from the original input data. The initial offering of BIDS Derivatives only specifies anatomical derivatives that are of general use: masks and segmentations. Mask images are binary images with 1 representing the region of interest and all other voxels containing 0. The following example shows a manually constructed lesion mask: manual_masks/ sub-01/ anat/ sub-01_desc-lesion_mask.nii.gz sub-01_desc-lesion_mask.json A mask of the functionally-defined area fusiform face area could be encoded: localizer/ sub-01/ func/ sub-01_task-loc_space-individual_label-FFA_mask.nii.gz sub-01_task-loc_space-individual_label-FFA_mask.json BIDS Derivatives introduces “discrete segmentations” and “probabilisitic segmentations”. A segmentation is a labeling of regions of an image such that each location (for example, a voxel or a surface vertex) is identified with a label or a combination of labels. Labeled regions may include anatomical structures (such as tissue class, Brodmann area or white matter tract), discontiguous, functionally-defined networks, tumors or lesions. A discrete segmentation represents each region with a unique integer label. A probabilistic segmentation represents each region as values between 0 and 1 (inclusive) at each location in the image, and one volume/frame per structure may be concatenated in a single file. A BIDS App that calculates ROIs in BOLD space from the automated anatomical labeling (AAL, doi:10.1006/nimg.2001.0978) could store discrete and probabilistic (or partial volume) segmentations as follows: tissue_segmentation/ desc-AAL_dseg.tsv desc-AAL_probseg.json sub-01/ func/ sub-01_task-rest_desc-AAL_dseg.nii.gz sub-01_task-rest_desc-AAL_probseg.nii.gz The dseg.tsv file is a lookup table for interpreting a discrete segmentation and probseg.json contains a list identifying the labels for each consecutive volume. Background BIDS Neuroimaging experiments result in complicated data that can be arranged in many different ways. The Brain Imaging Data Structure (BIDS, [2]) is a comprehensive and use-case-driven way of organizing neuroimaging and behavioral data. Originally written for MRI studies, BIDS has added descriptions for organizing electrophysiological (EEG [6], MEG [5] and iEEG/ECoG [4]) data. Work is being done to add PET, ASL and NIRS to the standard, among other modalities. BIDS Apps A common specification of neuroimaging datasets affords queries for and adaptation to the available data. [BIDS Apps] are programs that accept BIDS data as inputs, and produce some output. This permits a simple command-line protocol: bids-app /bids-directory /output-directory participant [OPTIONS] There are a growing number of data analysis software packages that can understand data organised according to BIDS. The output of a BIDS App is a derivative of the input dataset. BIDS Derivatives seeks to formalize this notion. Learn More The full BIDS specification is available at bids-specification.readthedocs.io . Self- contained PDFs are archived on Zenodo ( doi:10.5281/zenodo.3686061 ). The BIDS Starter Kit is a more informal, human-friendly introduction to BIDS. Get Involved BIDS is a collaborative effort, and contributions of all kinds are welcome! The NeuroStars forum ( https://neurostars.org ) is a forum to ask, search for and answer questions about any neuroscience topic, and the BIDS community strongly recommends this resource. For BIDS-specific questions, the bids tag makes your question easier to find. The BIDS specification can be extended in a backwards compatible way and will evolve over time. These are accomplished with BIDS Extension Proposals (BEPs), which are community-driven processes. A list of BEPs, as well as instructions on how to propose a new BEP, can be found at https://bids.neuroimaging.io /get_involved.html . If the specification is ambiguous, inconsistent or silent on some point, proposals can be made to the BIDS Specification ( https://github.com/bids-standard/bids- specification/ ) GitHub repository. The BIDS Starter Kit ( https://github.com /bids-standard/bids-starter-kit/ ) repository exists to provide a more user-friendly guide, and accepts proposals for improvement, as well. The Making Of BIDS Derivatives The need to specify BIDS Derivatives was identified during the early stages of BIDS specification, and a BIDS Extension Proposal (BEP) was started in February 2016, prior to the release of BIDS 1.0 in June 2016. Development of the proposal was largely based on the experience of developing BIDS Apps. As multiple applications produced similar or equivalent derivatives, common naming schemes were added to the proposal to facilitate reuse of the derivatives. For example, the Configurable Pipeline for the Analysis of Connectomes (C-PAC) and fMRIPrep took similar inputs and had a broad overlap in their outputs, and so made sense to coordinate. An August 2017 meeting at Stanford led to an agreement to divide the increasingly large BEP into a series of BEPs, most focused on particular modalities or use cases. In July 2018, a survey of the neuroimaging community was taken to establish priorities (essential, desirable or inessential) for structural, functional and diffusion MRI derivatives. The results of the survey[1] were posted in advance of an August 2018 workshop of 31 participants, where sub-proposals were pushed toward completion and common principles were established. In December 2018, Release Candidate 1 was published, including all imaging modalities, for implementation and feedback. In July 2019, a “Common Derivatives” proposal was re-introduced establishing more general principles, to be followed by subsequent modality-specific and non-imaging proposals. Common Derivatives entered final review in May 2020 and were released as part of BIDS 1.4.0 in June 2020. Unspecified data types Derivatives can never be fully specified, as new methods can always be developed, requiring new data representations. BIDS recognizes this and encourages adopting “BIDS-style naming conventions”: Additional files and folders containing raw data MAY be added as needed for special cases. All non-standard file entities SHOULD conform to BIDS-style naming conventions, including alphabetic entities and suffixes and alphanumeric labels/indices. Non-standard suffixes SHOULD reflect the nature of the data, and existing entities SHOULD be used when appropriate. This recommendation remains in force for derivatives datasets. Additionally, BIDS Derivatives acknowledges that it may be desirable to distribute derivatives generated by non-compliant applications, for the sake of reproducibility and non-duplication of effort. Therefore, if a BIDS dataset contains a derivatives/ sub-directory, the contents of that directory may be a heterogeneous mix of BIDS Derivatives datasets and non- compliant derivatives. One example of such a non-compliant derivative dataset would be FreeSurfer reconstructions of subject surfaces: bids-root/ derivatives/ freesurfer/ sub-01/ label/ mri/ ... ... sub-01/ anat/ sub-01_T1w.nii.gz ... Note that subject directory names conform to BIDS conventions, but contents are determined by the generating application, in this case, FreeSurfer. Organizing datasets and their derivatives BIDS Derivatives datasets are intended to be interpretable and distributable with or without the datasets used to generate them. This is necessary for storage and bandwidth constraints, as well as to permit the distribution of derivatives when the source data are restricted. This independence affords flexibility in the relative organization of datasets. The following examples show three ways to organize, relative to each other, a raw BIDS dataset, a preprocessed derivative dataset, and an analysis that uses both as inputs. A collection of derivative datasets may be stored in the derivatives/ subdirectory of a BIDS (or BIDS Derivatives) dataset: my_dataset/ derivatives/ preprocessed/ analysis/ sub-01/ ... A BIDS Derivatives dataset may contain references to its input datasets in the sourcedata/ subdirectory: my_analysis/ sourcedata/ raw/ preprocessed/ sub-01/ ... Note that the sourcedata/ and derivatives/ subdirectories constitute dataset boundaries. Any contents of these directories may be validated independently, but their contents must not affect the interpretation of the nested or containing datasets. Unnested datasets are also possible. For example: my_study/ raw_data/ sub-01/ ... derivatives/ preprocessed/ analysis/ Future Directions The initial offering of BIDS Derivatives is intended to establish a set of ground rules for future elaboration. There are existing BIDS Extension Proposals (BEPs) for the following derivatives: Structural MRI derivatives (BEP011) Functional MRI derivatives (BEP012) Diffusion MRI derivatives (BEP016) Affine transformations and nonlinear warp fields (BEP014) Connectivity data schema (BEP017) Common electrophysiological (EEG/MEG/iEEG) derivatives (BEP021) PET preprocessing derivatives (BEP023) Provenance (BEP028) Statistical and computational modeling derivatives are a logical further effort, and are likely to result in BEPs in the near future. References [1] Feingold, F.W. (2018), ‘BIDS-Processed Data Survey Results’, Stanford Center for Reproducible Neuroscience, http://reproducibility.stanford.edu/bids-processed-data- survey-results/ [2] Gorgolewski, K.J. (2016), ‘The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments’ Scientific Data, 3:160044. doi:10.1038/sdata.2016.44 [3] Gorgolewski, K.J. (2017a), ‘BIDS apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods’, PLOS Computational Biology 13(3): e1005209, doi:10.1371/journal.pcbi.1005209 [4] Holdgraf, C. (2019), ‘iEEG-BIDS, extending the Brain Imaging Data Structure specification to human intracranial electrophysiology’ Scientific Data, 6:102. doi:10.1038/s41597-019-0105-7 [5] Niso, G. (2018), ‘MEG-BIDS, the brain imaging data structure extended to magnetoencephalography’ Scientific Data, 5:180110. doi:doi:10.1038/sdata.2018.110 [6] Pernet, C.R. (2019), ‘EEG-BIDS, an extension to the brain imaging data structure for electroencephalography’ Scientific Data, 6:103. doi:10.1038/s41597-019-0104-8 BIDS Derivatives Standardization of Processing Results in Brain Imaging C J Markiewicz1, S Appelhoff1, V Calhoun2, E W Dickie3, E Duff4, E DuPre5, O Esteban1, F Feingold1, S Ghosh6, Y O Halchenko7, M P Harms8, P Herholz5, M Mennes10,M Nørgaard9, R Oostenveld10, C Pernet11, F Pestilli12, R A Poldrack1, A Rokem13, R E Smith14, T Yarkoni15, K J Gorgolewski16 1. Stanford University 2. Georgia State/Georgia Tech/Emory 3. Centre for Addiction and Mental Health, University of Toronto 4. University of Oxford 5. McGill University 6. MIT 7. Dartmouth College 8. Washington University in St Louis 9. Neurobiologisk Forskningsenhed10. Donders Institute for Brain, Cognition and Behaviour, Radboud University 11. The University of Edinburgh 12. Indiana University 13. The University of Washington eScience 14. Florey Institute of Neuroscience and Mental Health 15. University of Texas at Austin 16. Google OHBM 2020Poster #1895