# **CONP Privacy and De-identification Guide [\[1\]](#governance)**
**Unless otherwise noted, this work is licensed under [Attribution 4.0 International](http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1)**
The fundamental goal of protecting participant privacy is to prevent the
identification of an individual’s data by others. In keeping with
relevant guidance from Canadian privacy authorities, data must present a
low or very low likelihood of individual re-identification before open
release, roughly analogous to a maximum threshold of 9% risk of
re-identification. Open access datasets should never contain
individually identifying information such as names, health card numbers,
or social insurance numbers.
## Preparing your data: de-identification techniques
The CONP Portal brings together diverse datasets that contain many
different types of information in several modalities and formats, e.g.,
structural and functional MRI, EEG, and behavioural results. The
guidance below seeks to help researchers prepare neuroscience data for
deposit in Canadian open-access data repositories that accept
de-identified data, including the CONP Portal Community Server.
This guide is not comprehensive and may need to be tailored to your
data. **Researchers bear the** **responsibility to ensure that tools are
applied properly and that information is de-identified before sharing**.
### General tools
### Image headers
| **Tool name** | **Description** | **Link** |
|------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------|
| pydicom’s deid | Removes information from image headers (customizable). | [https://pydicom.github.io/deid/](https://pydicom.github.io/deid/) |
| dicomanon (for MATLAB) | Removes confidential medical information from the DICOM file file_in and creates a new file file_out with the modified values. Image data and other attributes are unmodified. | [https://www.mathworks.com/help/images/ref/dicomanon.html](https://www.mathworks.com/help/images/ref/dicomanon.html) |
### Facial and dental features
*Cf* also:
[https://community.imagingqa.com/docs](https://community.imagingqa.com/docs)
### Synthetic data
Synthetic data are data that have been generated from either “real” data
or models and that possess the same statistical properties as the
original data. While not completely free of re-identification risks,
synthetic data are increasingly popular for machine-learning
applications.
#### Articles
Vaden, Kenneth I., Mulugeta Gebregziabher, Dyslexia Data Consortium, and
Mark A. Eckert. “Fully Synthetic Neuroimaging Data for Replication and
Exploration.” *NeuroImage* 223 (December 1, 2020): 117284.
[https://doi.org/10.1016/j.neuroimage.2020.117284](https://doi.org/10.1016/j.neuroimage.2020.117284).
El Emam, Khaled, Lucy Mosquera, and Jason Bass. “Evaluating Identity
Disclosure Risk in Fully Synthetic Health Data: Model Development and
Validation.” *Journal of Medical Internet Research* 22, no. 11 (November
16, 2020): e23139.
[https://doi.org/10.2196/23139](https://doi.org/10.2196/23139).
### Deciding between open, registered, and controlled access
| **Access model** | **Description** | **Identifiability of data** |
|------------------|---------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
| Open | Accessible with minimal restrictions or verifications. | Fully de-identified data (both direct and indirect identifiers removed) and/or aggregate data. |
| Registered | Accessible only to users who have an account that has identified them as a *bona fide* researcher. | De-identified data and/or aggregate data where inferences may be made about indirectly identifying individual records. |
| Controlled | Accessible only upon review of a data access application, which includes prior approval by a research ethics board. | Individual-level data with direct identifiers removed or replaced by a code. |
#### Articles
Dyke, Stephanie O. M., Mikael Linden, Ilkka Lappalainen, Jordi Rambla De
Argila, Knox Carey, David Lloyd, J. Dylan Spalding, et al. “Registered
Access: Authorizing Data Access.” *European Journal of Human Genetics*
26, no. 12 (December 2018): 1721–31.
[https://doi.org/10.1038/s41431-018-0219-y](https://doi.org/10.1038/s41431-018-0219-y).
Dyke, Stephanie O. M., Emily Kirby, Mahsa Shabani, Adrian Thorogood,
Kazuto Kato, and Bartha M. Knoppers. “Registered Access: A ‘Triple-A’
Approach.” *European Journal of Human Genetics* 24, no. 12 (December
2016): 1676–80.
[https://doi.org/10.1038/ejhg.2016.115](https://doi.org/10.1038/ejhg.2016.115).
## Additional resources
### Basic concepts
Chapter 5: Privacy and Confidentiality of the *Tri-Council Policy
Statement: Ethical Conduct for Research Involving Humans*
[https://ethics.gc.ca/eng/tcps2-eptc2_2018_chapter5-chapitre5.html](https://ethics.gc.ca/eng/tcps2-eptc2_2018_chapter5-chapitre5.html).
Sensitive Data Expert Group. “Sensitive Data Toolkit for Researchers
Part 1: Glossary of Terms for Sensitive Data Used for Research
Purposes,” September 30, 2020.
[https://doi.org/10.5281/zenodo.4088946](https://doi.org/10.5281/zenodo.4088946).
Sensitive Data Expert Group. “Sensitive Data Toolkit for Researchers
Part 2: Human Participant Research Data Risk Matrix,” October 1, 2020.
[https://doi.org/10.5281/zenodo.4088954](https://doi.org/10.5281/zenodo.4088954).
Canadian Institutes of Health Research. “CIHR Best Practices for
Protecting Privacy in Health Research,” September 15, 2005.
[https://cihr-irsc.gc.ca/e/documents/et_pbp_nov05_sept2005_e.pdf](https://cihr-irsc.gc.ca/e/documents/et_pbp_nov05_sept2005_e.pdf).
Beauvais, Michael J.S., Bartha Maria Knoppers, and Judy Illes. “A
Marathon, Not a Sprint – Neuroimaging, Open Science and Ethics.”
*NeuroImage* 236 (August 1, 2021): 118041.
[https://doi.org/10.1016/j.neuroimage.2021.118041](https://doi.org/10.1016/j.neuroimage.2021.118041).
### Data management plans
Morissette, Erica, Lina Harper, Isabella Peters, Felicity Tayler, and
Stefanie Haustein. “Data Management Plan Template: Open Science
Workflows,” April 9, 2021.
[https://doi.org/10.5281/zenodo.4701021](https://doi.org/10.5281/zenodo.4701021).
Strauss, Ted. “Data Management Plan Template: Neuroimaging in the
Neurosciences,” April 9, 2021.
[https://doi.org/10.5281/zenodo.4673558](https://doi.org/10.5281/zenodo.4673558).
### Creation of open access datasets
Tremblay-Mercier, Jennifer, Cécile Madjar, Samir Das, Alexa Pichet
Binette, Stephanie O. M. Dyke, Pierre Étienne, Marie-Elyse
Lafaille-Magnan, et al. “Open Science Datasets from PREVENT-AD, a
Longitudinal Cohort of Pre-Symptomatic Alzheimer’s Disease.” BioRxiv,
November 30, 2020, 2020.03.04.976670.
[https://doi.org/10.1101/2020.03.04.976670](https://doi.org/10.1101/2020.03.04.976670).
Tremblay-Mercier, Jennifer, Cécile Madjar, Samir Das, Stephanie O. M.
Dyke, Pierre Étienne, Marie-Elyse Lafaille-Magnan, Pierre Bellec, et al.
“Creation of an Open Science Dataset from PREVENT-AD, a Longitudinal
Cohort Study of Pre-Symptomatic Alzheimer’s Disease.” BioRxiv, March 5,
2020, 2020.03.04.976670.
[https://doi.org/10.1101/2020.03.04.976670](https://doi.org/10.1101/2020.03.04.976670).
### Data governance
Eke, Damian, Amy Bernard, Jan G. Bjaalie, Ricardo Chavarriaga, Takashi
Hanakawa, Anthony Hannan, Sean Hill, et al. “International Data
Governance for Neuroscience.” PsyArXiv, June 1, 2021.
[https://doi.org/10.31234/osf.io/esz9b](https://doi.org/10.31234/osf.io/esz9b).
\[1\] Developed by the
Ethics and Governance Committee of the Canadian Open Neuroscience
Platform. Members: Bartha Maria Knoppers (Chair), Michael Beauvais
(Manager), Ann Cavoukian, John Clarkson, Lindsay Green-Noble, Judy Iles,
Jason Karamchandani, Roland Nadler, Dylan Roskams-Edris, and Walter
Stewart.