TANGO

A Data Management Plan created using DMPonline


Creators: Valesca Retel, Jasmin K. Böhmer, Edwin Cuppen


Affiliation: Other


Template: ZonMW


Grant number: 846001002


Project abstract:

Background: Personalized medicine in advanced melanoma and non-small cell lung cancer (NSCLC) offer important health benefits to patients with specific genetic profiles, but are expensive and may induce severe side effects. Whole Genome Sequencing (WGS) of tumor DNA tests for all relevant genetic aberrations in individual patients, allowing immediate selection for optimal therapy. This approach will improve survival, avoid adverse effects, and limit health care costs by optimal patient selection. Objectives: Expand molecular profiling of tumors to improve immune- and targeted treatment selection in patients with advanced melanoma or NSCLC, and to determine the cost-effectiveness and budget impact of WGS. Methods: The project consists of 5 Work Packages: 1)Diagnostic value of WGS, 2)Treatment decisions based on WGS, 3)Prediction of long-term health benefits and harms by micro-simulation, 4)Tumor-overarching early cost-effectiveness modelling, 5)Nation-wide organization of WGS, 6) Responsible implementation of WGS according to ethical, legal and societal implications (ELSI) principles.

Last modified: 12-11-2019



TANGO



1. Kenmerken van het project en de dataverzameling


1.1 Contactgegevens projectleider

Valesca Retèl, PhD

Netherlands Cancer Institute

Division PSOE

Plesmanlaan 121

1066 CX Amsterdam

The Netherlands

tel: +31 20 5126197

email: v.retel@nki.nl



1.2 Ik heb mijn DMP opgesteld in samenwerking met een expert op het gebied van datamanagement. Noem naam, functie, organisatie/afdeling, telefoonnummer, e-mailadres.

Jasmin Böhmer - Data Steward Utrecht Bioinformatics Expertise Core

Center for Molecular Medicine, University Medical Center Utrecht

+31 88 75 680 82

j.k.bohmer@umcutrecht.nl

bec@umcutrecht.nl



1.3 Bij het verzamelen van de data voor mijn project ga ik als volgt te werk:

Existing data:

- Genetic data: WGS Data from the CPCT-02 study will be used and provided through the Hartwig Medical Foundation (HMF). A data request has been made via the Data Access Board of the Center for Personalized Cancer Treatment (CPCT)/HMF and has been honored. The contract was concluded between HMF and Erasmus.

Clinical data: Clinical data of the CPCT-02 patients will be provided by the relevant hospitals. EMC, Meander, NKI-AvL, UMC Utrecht, AmsterdamUMC

- For clinical data in WP3 use will be made of Santeon database (confirmed and received),

- Dutch Association of Doctors for Lung Diseases and Tuberculosis (NVALT) (applied for)

- Dutch Melanoma Treatment Registry (DMTR) (confirmed and received).

- IKNL data

- Clinical data from UMCG and AmsterdamUMC for care use in NSCLC patients

 

 

 

New data:

- Analysis method for selection for immunotherpy based on Whole Genome Sequencing (WGS).

- Cost analysis with respect to sequencing and results

- Results from models of WP3,4,5

- Results of WP6 (ethical legal), reports on ethical/legal frameworks

- Questionnaires for patients on quality of life and toxicities. TANGO will record this in its own eCRF, the data remains property of NKI-AVL. At the end of the study we will see if this data can be combined with CPCT-02 to keep it available in the future.

 

Link data files:

- WGS result will be linked in the Pathologisch Anatomisch Landelijk Geautomatiseerd Archief (Palga)

- WGS data will be linked with the questionnaires

- WGS data will be linked to clinical data of the CPCT-02 patients of the hospitals concerned. EMC, Meander, NKI-AvL, UMC Utrecht, AmsterdamUMC

- IKNL data and data from NKI-AVL, Rijnstate, UMCG and AmsterdamUMC

Together with the PATH project we strive for cooperation in the use of data from the registrations of NVALT and DMTR, and the link with PALGA. We will also discuss the structure and the system we want to use for the inventory of cost data for the purpose of cost effectiveness analysis.

 



1.4 Ik ga mensgebonden onderzoek doen.

The questionnaires for patients regarding Quality of Life and toxicities

(Under the umbrella of the CPCT-02 human bound data is collected. We use the informed consent of the CPCT-02 study for the WGS data.)

For the questionnaires we use CPCT-02 logistics and TANGO is responsible for the questionnaires and takes care of the handling of privacy data in accordance with AVG and manages the questionnaires and ensures their distribution. Patient can indicate in IC of CPCT-02 whether he/she wants to participate in the questionnaires.

 



1.5 Ik ga bestaande data hergebruiken en ik heb toestemming van de data-eigenaar/eigenaren voor het gebruik van zijn/hun data.

- Clinical & Genetic:

Data from the CPCT-02 study will be used, these will be provided via Hartwig Medical Foundation (HMF, permission has been granted, research question TANGO falls within the objective of CPCT-02 study and consent given). For this purpose, a one-time data request will have to be made via the Data Access Board of the HMF (this will be requested in due course).

- For clinical data in WP3, use will be made of the Santeon database (permission is requested for TANGO questioning, there is a confirmation for other questioning outside TANGO),

- NVALT (in application)

- DMTR (permission granted) 

 

- Update July 2019:

1)  Whole Genome Sequencing Data (WGS): HMF data was approved and provided to EMC; VU/AMC

2) Clinical Data: the contractual situation between the EMC and their partnering hospitals is clear; the contractual situation between VU and their partnering hospitals is still under way

3) Patient Data Registries: the contracts between VU and DMTR, and Santeon are signed and the data transfer approved; the contract between VU and NVALT is still in application status

4) Costing Data: The contracts between UMCU and Rijnstate, and NKI are in place and the data is transferred; the contracts between UT and NKI, AND UMCU, and Rijnstate are in place an the data is transferred.

5) Quality of Life (QoL): the contractual situation between NKI and UMCU/CPCT is still under review; the contractual situation between NKI and CPCT for clinical data is still in review status.

 

 

 



1.6 Ik ga bestaande data koppelen en ik heb afspraken gemaakt met de data-eigenaar/eigenaren voor de koppeling.

The linking of WGS results will be organised in this project with PALGA. A declaration of intent has been signed by HMF and PALGA for this purpose.

The linking of the clinical data and WGS data of the CPCT-02 patients has been arranged and falls within the CPCT-02 Informed Consent.



1.7 Bij het verzamelen van data werk ik samen met andere partijen.

- WGS data is supplied by HMF/CPCT-02.

- Collection and generation of research data will take place within a Consortium, consisting of NKI, UMC Utrecht, University of Twente, Erasmus MC, MUMC+ and AmsterdamUMC.

- Because the research takes place within a Consortium, a Consortium Agreement has been concluded between the participating centres.



1.8 Ik voorzie de volgende eindproducten van het project en stel deze beschikbaar voor vervolgonderzoek en verificatie. (licht kort toe)

Raw sequencing data cannot be made available because it is the responsibility of HMF. This data has been requested via the HMF Data Access Board with a formal request (more information about the procedure and application forms can be found at www.hartwigmedicalfoundation.nl). HMF's bioinformatic pipeline is based on open source software and publicly available at GitHub (https://github.com/hartwigmedical/pipeline).

Explanation of the end products that we provide and that can be made available:

- Processed data: subset of WGS results

- Documentation: code book on methodology

- Protocols

- Possibly software related to models (www.anylogic.com)



1.9 Ik kan een inschatting maken van de omvang van de dataverzameling, namelijk het aantal deelnemers of subjects ("n=") van de dataverzameling en de grootte in giga-/terabytes.

n = 400 NSCLC patients

The raw data and analysis files of the WGS data are managed by HMF. These can be viewed remotely if necessary, which does not make it necessary to copy many terabytes of data. The gVCF files are used for the actual analysis and must be stored within TANGO (at least until the analysis is completed). Original files are safe with HMF in an ISO/NEN7510 accredited private cloud (Schuberg-Phillis).

 

 

Calculation:

- 20Gb per patient for WGS analysis (gVCF) = 8TB

- Models: 5-10MB per model, about 5Gb in total

- Questionnaires in Word, analysis in SPSS, max 1 Gb (?)

Total footprint:

8TB + 5Gb + 1Gb ~> 8TB

 

 

 

 

Institution Acronym

Work package

Overall Type of Data

Overall Estimated
Data-Size

AMC

WP1

Whole Genome
Sequencing;
Molecular Data

~1TB – 2TB

UMCU

WP1

Healthcare Cost and Consumption Data

<10GB

EMC

WP2

Whole Genome
Sequencing

~1-1.5TB

VUmc

WP2

Whole Genome
Sequencing

~1-1.5TB

VUmc

WP3

Patient Registry data

< 10GB

MUMC+

WP4

Survey Data;
Patient Registry Data

<100GB

UT

WP5

Survey Data;
Patient Registry Data;

Simulation Data

<100GB

AMC/UMCU

WP6

Survey Data;

Interview data

<100GB

 

 

 

 

WP1 - WGS and NGS Data

Data Type

Data File Format

Required Software

Estimated
Data Size

Special Characteristics

WGS

VCF

tbd

~125GB per patient

Coded and pseudonymised

Patient Records and Clinical Diagnostics data

Proprietary Patient Record File

tbd

<100GB

Not anonymised

Next Generation Sequencing

VCF

tbd

NGS whole exome VCF 1GB

Not anonymised

Gene Panels

VCF

tbd

NGS gene panel (50 genes) 1MB, NGS gene panel (500 genes): 10MB

Not anonymised

Immunohistochemical
Data

TIFF

tbd

<100GB

Not anonymised

Output Data

Graphical Illustrations

Microsoft Office Suite

<100GB

Publication related content

 

 


WP1 - Healthcare Cost and Consumption Data

Data Type

Data File Format

Required Software

Estimated
Data Size

Special Characteristics

Tabular Data

CSV;
Microsoft Excel

Microsoft Excel

<10GB

Coded

R-Scripts

R

R, R-Studio

<10GB

Coded

 

 

WP2 - WGS and Clinical Data EMC

Data Type

Data File Format

Required Software

Estimated
Data Size

Special Characteristics

Clinical Data

TXT;
Microsoft Excel

Microsoft Excel

<1GB

CPCT-02/ HMF identification number included

WGS raw data

BAM

R Studio

~1.2TB

CPCT-02/ HMF identification number included

WGS variants

VCF

IGV

<5GB

Not anonymised

 

 

WP2 - WGS and Clinical Data VUmc

Data Type

Data File Format

Required Software

Estimated
Data Size

Special Characteristics

Clinical Data

TXT;
Microsoft Excel

Microsoft Excel

<1GB

CPCT-02/ HMF identification number included

WGS raw data

BAM

R Studio

~1.2TB

CPCT-02/ HMF identification number included

WGS variants

VCF

IGV

<5GB

Not anonymised

 

 

 

 

WP3 – Analysis of survival pattern Data

Data Type

Data File Format

Required Software

Estimated
Data Size

Special Characteristics

Data type

Data File Format

Requirement Software

Estimated Data Size

Special Characteristics

Patient Registry Data

sav: SPSS

csv: Microsoft Excel

SPSS

Microsoft Excel

R

< 10 GB

Pseudonymised

 

 

WP4 - Cost-effectiveness Analysis Data

Data Type

Data File Format

Required Software

Estimated
Data Size

Special Characteristics

Data registries

SPSS;

CSV;

R-file

SPSS;

Microsoft Excel;

R statistics

<100 GB

pseudonymised

Survey data

SPSS;

CSV;

R-file

SPSS;

Microsoft Excel;

R statistics

<100 GB

No personal data included

Simulation data

CSV;

R-file

Microsoft Excel;

R statistics

<100 GB

n.a.

 

 

WP5 – WGS Implementation Requirements Analysis

Data Type

Data File Format

Required Software

Estimated
Data Size

Special Characteristics

Survey Data

CSV

Microsoft Excel

<100 MB

n.a.

Patient Registries

XLS;

CSV

Microsoft Excel

<100 MB

Confidential, pseudoanonymized

Simulation Data

CSV

Microsoft Excel

< 1GB

n.a.

Cost Data

XLS;

CSV

Microsoft Excel

<100 MB

n.a.

Simulation Model

ALP

AnyLogic

<100GB

n.a.

 

 

 

 

 



1.10 Gedurende het project heb ik voldoende opslaglocaties en -capaciteit en heb ik een back-up van de data beschikbaar. Geef een korte toelichting.

- WGS data: raw and analysis data is stored at HMF, is safely stored at Schuberg Phillis, of which a mirror is also available.

- Each institute is responsible for the secure storage of derived data, analysis protocols and software on its own servers. They must also provide a regular backup (this will be included in the terms of use).

Utrecht Bio-informatics Expertise Centre (UBEC) will include this as an extra step in the periodic DMP check (which is done every six months). So besides re-evaluating the DMP and correcting it with the state of affairs at that moment, we will contact the participants beforehand to see if and how the conditions of use are met.

- For the HMF, the responsibility for auditing the DMP lies with the Data Access Committee/Board of HMF/CPCT, which has been accredited since the beginning of 2018. 

 

In November 2019 the PhD students and junior researcher have received a comprehensive training session about research data management, in which the ins and outs of a solid back-up routine was taught.

 




2. Wet- en regelgeving (incl privacy)


2.1 Ik ga mensgebonden onderzoek doen en ik verklaar dat ik op de hoogte ben van en mij houd aan de wet- en regelgeving betreffende privacygevoelige gegevens

- Aware of the WBP, but we do not need to report personal data to the “Autoriteit Persoonsgegevens”, we will report this to the Data Protection Officer (FG) of the UMC Utrecht. After reporting, a PIA was also drawn up in collaboration with the FG. This will be evaluated annually, after evaluation of the DMP.

- CPCT-02 is WMO mandatory research, METC approval obtained at UMCU, permission has also been applied for locally and granted at centres participating in the CPCT research. Access to the data is subject to certain conditions.

- Linking the clinical data to the CPCT-02 data may be possible using the CPCT number. In practice, however, this is not possible (data access boards do not give permission to link the data to other sources). This appears to be a legal obstacle in particular, as there is no clear correspondence about the legislation. A solution that is still being explored is the retrieval of the data (what data?) per hospital via the PI of the CPCT. At this moment (April 2018) it is being investigated whether the analysis can take place per hospital dataset, or whether it is necessary to combine the data. One option may then be that the hospital itself makes the link and then safely stores or deletes the code. The data can then be transported to the person responsible within TANGO.

- Generating data from questionnaires will most likely be done via non-WMO mandatory research.

- WGBO applies to lung doctors.

- There is a consortium agreement in place. Additionally, every individual UMC/University puts contracts with their partnering institutions in place complying to their institutional legislation and contractual requirements.

 



2.2 Ik ga mensgebonden onderzoek doen en ik heb geregeld dat ik de onderzoeksgegevens verkrijg met (een vorm van) toestemming van de deelnemers.

- Via informed consent of the CPCT-02 study.

- Within the informed consent of the CPCT-02, permission is requested for sending questionnaires. The amendment currently lies with the METC of the UMCU. To link the questionnaires and clinical data, a formal data request will have to be made to CPCT-02.



2.3 Ik ga mensgebonden onderzoek doen en ik ga privacygevoelige onderzoeksgegevens anonimiseren of pseudonimiseren.

Data released shall be pseudoanonymised. The study ID (code) is stored separately, and is very limited accessible (management of key is done by a data manager at one of the centers).

Patients have a CPCT -02 ID number, which is registered in the EPD. This CPCT-02 ID number will also be used for the questionnaires.



2.4 Ik houd me aan het privacyreglement van de organisatie waaraan ik verbonden ben.

All participants in the study are working within UMCs with corresponding privacy regulations for all employees.

 




3. Data vindbaar maken


3.1 De dataverzameling die in mijn project is gemaakt, is vindbaar voor vervolgonderzoek. (Let op: Dit is een kerngegeven dat u aan het einde van uw project aan ZonMw door moet geven.)

Zenodo was chosen as dedicated long-term archive for most of the outputs from this project. A community was created on Zenodo: https://zenodo.org/communities/tango-wgs/

Other website that refer to the archival collection are: the ZonMW project website [https://www.zonmw.nl/nl/onderzoek-resultaten/geneesmiddelen/programmas/project-detail/personalised-medicine/technology-assessment-of-next-generation-sequencing-in-personalized-oncology-tango/], the CPCT-02 study website [https://www.cpct.nl/cpct-02/], and the project record on NARCIS [https://www.narcis.nl/research/RecordID/OND1361542].

 

Up until now, mainly presentation and poster publications are provided on the Zenodo archive.

 

Raw data: The received data from the CPCT study, or additional data requested from hospitals remains with the providing institutions. For reuse-purposes the Data Access Committees of each individual institution can be contacted. That means that the raw data remains under restricted access.

Anonymised raw data from the non-genomic data will be made accessible as open as possible, as closed as necessary.

 

Processed data: the selection of the appropriate repositories and certified archives is still open. Restricted access is appropriate due to the sensitive content of the data-files. Processed data related to non-genomic data will be made accessible as open as possible, as closed as necessary.

 

Output data: All non-sensitive data will be made openly accessible as possible. Open Data is desired.



3.2 Voor de beschrijving van de (gehele) dataverzameling gebruik ik een metadataschema.

The basic metadata standard Dublin Core will be applied, which is compliant to Datacite 4.0. This metadata standard is applied to the whole data-set and will enable the findability of the research data.



3.3 Ik zal gebruik maken van een Persistent Identifier om duurzaam naar de data te verwijzen. (Let op: Dit is een kerngegeven dat u aan het einde van uw project aan ZonMw door moet geven.)

Depending on the dedicated data archive and its provided persistent identifier, there will be DOIs and HANDLE.

 




4. Data toegankelijk maken


4.1 De data zullen na afloop van het project toegankelijk zijn voor verificatie en vervolgonderzoek.

The data will be made available after an embargo period of 3 months after publication of the results generated on the different datasets.

There are several reasons for this:

- Many different parties

- Multiple publications

- Longer contract period of individual researchers

The data from a number of sub-projects can be expected to be made available within the foreseeable future, for example the cost analysis of WGS versus standard diagnostics.

We aim to make the data available as soon as the relevant sub-component has been completed and published.

The WGS data is not generated in this study itself, but is accessible via the Hartwig Medical Foundation.



4.2 Na afloop van het project wordt de dataverzameling openbaar toegankelijk, zonder aanvullende voorwaarden (open access).

Conditions are linked to access to the data collection for the following reasons:

- Part of the data is personally identifiable and therefore privacy sensitive.

- Part of the data is already available on request under certain conditions (for example, the raw data of HMF can be retrieved via data access board request).

 

If the above restrictions do not apply to a certain data type (e.g. static information at an aggregated level) and it is allowed according to informed consent, this data will be made publicly available.



4.3 Ik heb gebruiksvoorwaarden beschikbaar waarmee ik de voorwaarden voor toegang tot mijn dataverzameling uitleg. (Let op: Dit is een kerngegeven dat u aan het einde van uw project aan ZonMw door moet geven. Geef een link of Persistent Identifier.)

Depending on the dedicated data archive and its license agreement, a mix of licenses is anticipated. In case the license options of the data archive are not sufficient or not available, license files will be provided with each data-set.



4.4 In de voorwaarden die ik stel aan het gebruik van mijn data (restricted access), heb ik in ieder geval de hieronder aangekruiste punten opgenomen.

We must adhere to the informed consent of the CPCT-02 study, so we cannot and will not share the data just like that.




5. Data interoperabel maken (uitwisselbaar, koppelbaar)


5.1 Ik kies een dataformat zodat mijn dataverzameling leesbaar is voor andere onderzoekers en hun computers.

So far known:

Excel: in CSV format

Word/R/C++: in plain text, if possible within a version control system such as Git/SVN

SPSS: in CSV format

 

Data is processed in Word and Excel by the researchers themselves, but distributions in text and CSV files will be delivered, which are interoperable. The linked metadata will make the data partly machine readable. A pilot project is underway to deliver datasets as FAIR data points (including FAIR distribution of the dataset). If the pilot is completed with good results, this project would be eligible for FAIR conversion.

The sequencing data is stored in FASTQ format at HMF. The analyses are done on VCF format. Both are a standard in bioinformatics, both readable via a text editor.'.



5.2 Ik kies een metadatastandaard zodat mijn dataverzameling gekoppeld kan worden aan andere dataverzamelingen. (Let op: Dit is een kerngegeven dat u aan het einde van uw project aan ZonMw door moet geven).

The basic metadata standard Dublin Core will be applied, which is compliant to Datacite 4.0. Where applicable, domain specific metadata will be captured and provided in addition to Dublin Core. This metadata standard is applied to the data file level to enable the interoperability of all the files within the data-sets.

 



5.3 Ik ga mensgebonden onderzoek doen en heb bij de privacybescherming rekening gehouden met hergebruik van de data en eventuele koppeling met andere datasets.

See previous answers.




6. Data herbruikbaar maken en duurzaam opslaan


6.1 Ik verklaar dat de data van goede kwaliteit zijn opdat andere onderzoekers ze kunnen interpreteren en gebruiken.

Research process:

- Protocols and research proposal

 

Quality checks:

- SOPs and settings WGS equipment

- Check presence of informed consents

- SOPs about data cleaning

 

HMF works according to ISO17025 accreditation. Lab protocols (SOPs) are available on request for interested parties.

 

For bioinformatic data analysis, all pipeline and tools are publicly available and versions are maintained. These can be found at GitHub (https://github.com/hartwigmedical).

 

- A data privacy impact assessment was performed in July 2017 and will be updated in Spring 2018

- A research data management audit was performed in October 2018

- The data management plan was updated in December 2018

 



6.2 Ik heb selectiecriteria om te bepalen welk deel van de data moet worden bewaard.

- VCF data as obtained from HMF does not need to be stored locally after completion of the study, as these data can be requested from HMF.

- The rest of the data is still difficult to estimate.

 

It is intended to create an archived collection that is as open as possible and as closed as necessary, while providing as much of the relevant processed and output data.

 



6.3 Na de selectie van de data kan ik een inschatting maken van de omvang van de dataverzameling (in Gb/Tb) die ik voor lange termijn ga opslaan, archiveren.

Relates to 1.9, however since the original raw is supplied by the HMF and other medical centres, there is no need to archive them again. Therefore, the overall estimated data volume for archiving ranges between 200GB and 500GB.

 



6.4 Ik heb een keuze gemaakt voor een archief of repository voor duurzame lange termijnarchivering (gecertificeerd) van mijn dataverzameling. (Let op: Dit is een kerngegeven dat u aan het einde van uw project aan ZonMw door moet geven.)

See question 3.1. The appropriate solution is still under consideration.

The data from the questionnaires will be included in the CPCT-02 dataset to keep the data accessible in the future.

 



6.5 Ik zal voor mijn data de aanbevolen bewaartermijn van minimaal 10 jaar hanteren.

Minimum 10 years. We will not retain the data longer than necessary for the purpose for which it was collected or to comply with legal or regulatory requirements. After this, the data will be deleted.

How is it ensured that all parties comply with these conditions? 



6.6 De kosten van (het voorbereiden van de data voor) archivering zijn gedekt.

In the budget 10% is reserved for data stewardship.