Published March 31, 2019 | Version D2.7rev2
Project deliverable Open

Patient-related data for colorectal cancer samples

  • 1. BBMRI-ERIC
  • 2. UNIMIB
  • 3. ERIC
  • 4. DFKZ
  • 5. DKFZ
  • 6. Medizinische Universität Graz
  • 7. Masaryk Memorial Cancer Institute
  • 8. CNR
  • 9. Charité

Description

ADOPT BBMRI-ERIC Deliverable D2.7

This Deliverable summarizes achievements and results obtained during the collection of Colorectal Cancer Cohort (CRC-Cohort), a Europe-wide effort to collect a cohort of at least 10 000 colorectal cancer patients within the ADOPT project. The collected data set is made available for better visibility and accessibility of the contributing BBMRI-ERIC biobanks. The total number of collected cases reached 10 480, provided by 25 biobanks from 12 European countries, providing good geographical coverage of Europe. This deliverable provides basic information on the methods used to collect the cohort, on IT tools developed for the collection process, as well as on aggregate statistical information on the collected cohort.

The effort began in March 2016 with the definition of the data set to be collected. The main data collection period ran from January 2018 to March 2019. The delivery date was extended with Amendment #2 from M36 to M42 due to the delay in collecting the colon cancer cohort.

The process of collecting the CRC-Cohort comprised several steps. First, BBMRI-ERIC biobanks were queries regarding interest in participation. Second, a working group of medical and IT experts convened to define the common data model (D2.4), towing the line between clearly defined data structures while providing enough information for meaningful medical research.

The third step, composing the Data Protection Policy, raised many issues and challenges but was ultimately successful. (D2.1 contains a summary of the result, and the final policy is published as D2.3 Annex III.)

In the fourth step, the policies were distributed to member biobanks so that they could determine their ability to provide samples to the CRC-Cohort.

The subsequent steps were the Implementation of the resulting data model in the common Metadata Repository (MDR), Implementation of central data collections system called Colorectal Cancer Data Collection system (CCDC), Design and implementation of data harmonization tools to support the conversion process from common tabular files, Design and implementation of data quality checks in collaboration between expert pathologists and IT experts, and finally, the Data quality improvement cycle.

The deliverable then provides an in-depth discussion of the developed reimbursement model, which turned out to be significantly more complex than originally anticipated. Since almost all biobanks qualified for semi-automated extraction mode, meaning that a substantial amount of data was already available in a structured form, it was decided to use a backup plan already prepared in the ADOPT project proposal; the reimbursement model was adjusted to reflect these changes. The resulting model is a linear combination of UNIMIB and BBMRI-ERIC funding sources: each biobank gets a proportion of the cases reimbursed as manually delivered and part as delivered using automated processing; due to the two funding sources
with different fixed reimbursement rates the model is relatively complex.

Results of the deliverable are numerous and extend beyond collecting over the targeted 10,000 colorectal cancer datasets. The CRC-Cohort Data Protection Policy was released in October 2017 and used in the dataset collection process. It supports biobanks by offering guidelines on a range of topics such as data access and quality assurance.

Furthermore, the data model defined by the interdisciplinary expert working group has been implemented in the MDR, where it is available via an API for access by other components of the CRC-Cohort ecosystem of IT tools. The central Colorectal Cancer Data Collection system was implemented based on open-source software extended adequately to support the CRC-Cohort and to feature both graphical user interface (via web) or an API for programmatic upload of the data into the system. The web-based user interface was anticipated for the small biobanks contributing data manually only. The system has been deployed on BBMRI-ERIC production IT infrastructure managed within Common Service IT.

In addition, a system of data quality checks was developed as a central service running on the central database, with 70 different data checks covering consistency and suspicious data. Further checks for missing data uploaded by the biobanks were also created.

Exceeding the initial target, ADOPT managed to collect 10 480 cases of colorectal cancer from 25 biobanks in 12 European countries, thus covering an unprecedented area of patients, from the UK to Finland to Cyprus. The datasets have been anonymised, aggregated and analysed in terms of survival rates, availability of molecular markers, and therapy events.

The conclusion admits that the demands far exceeded initial expectations for several reasons: the significant discrepancy in availability of structured in-depth data in different European Countries, in organizational requirements, as well as in availability of IT expertise to manipulate the data at source. Technical and organizational measures to ensure data security and protect privacy of the persons contributing their data to CRC-Cohort were discussed in-depth and the agreement on data transfer was reached in compliance with the General Data Protection Regulation (GDPR), with the IMI Code of Practice on Secondary Use of Medical Data in Scientific Research Projects and taking into consideration the differences in regulatory and ethical issues within the different European Countries.

Lessons learned from the deliverable are chronicled for future reference. These include the greater-thanexpected amount of time needed both for initiating the data collection process and for approval of contracts.  Furthermore, the importance of robust legal support, working in harmony with IT, cannot be understated in order to create contracts that allow for the sharing of data while upholding national legal requirements. The need for data quality checks was apparent and will need to be addressed in future projects in which data remains in source repositories and thus cannot be checked by the federated research infrastructure. With federated querying systems such as the BBMRI-ERIC Locator, the biobanks should be contractually bound to update their own data; still, it will be difficult for a central entity to ensure quality for each participating biobank under this model. Biobanks also need to be warned that they will need to allocate significant resources to continuous data quality improvement, not only to primary data collection.

All lessons learned will be taken into consideration when improving the organization of BBMRI-ERIC biobanks and when designing and developing IT tools to make the biological material and data compliant with FAIR and FAIR-Health principles. The resulting CRC-Cohort of 10,480 datasets is being made available for researchers in compliance with the access modes defined in the CRC-Cohort Data Protection Policy (see ADOPT Deliverable D2.3 Appendix III).

Files

ADOPT-D2-7.pdf

Files (3.7 MB)

Name Size Download all
md5:cd7bd6d52e467c716adb09e02703bf4a
3.7 MB Preview Download

Additional details

Funding

ADOPT BBMRI-ERIC – implementAtion anD OPeration of the gateway for healTh into BBMRI-ERIC 676550
European Commission