Published June 26, 2023 | Version 1.00
Dataset Restricted

BRAINTEASER ALS and MS Datasets

Description

BRAINTEASER (Bringing Artificial Intelligence home for a better care of amyotrophic lateral sclerosis and multiple sclerosis) is a data science project that seeks to exploit the value of big data, including those related to health, lifestyle habits, and environment, to support patients with Amyotrophic Lateral Sclerosis (ALS) and Multiple Sclerosis (MS) and their clinicians. Taking advantage of cost-efficient sensors and apps, BRAINTEASER will integrate large, clinical datasets that host both patient-generated and environmental data.

As part of its activities, BRAINTEASER organized two open evaluation challenges on Intelligent Disease Progression Prediction (iDPP), iDPP@CLEF 2022 and iDPP@CLEF 2023, co-located with the Conference and Labs of the Evaluation Forum (CLEF).

The goal of iDPP@CLEF is to design and develop an evaluation infrastructure for AI algorithms able to:

  • better describe disease mechanisms;
  • stratify patients according to their phenotype assessed all over the disease evolution;
  • predict disease progression in a probabilistic, time dependent fashion.

The iDPP@CLEF challenges relied on retrospective ALS and MS patient data made available by the clinical partners of the BRAINTEASER consortium. The datasets contain data about 2,204 ALS patients (static variables, ALSFRS-R questionnaires, spirometry tests, environmental/pollution data) and  1,792 MS patients (static variables, EDSS scores, evoked potentials, relapses, MRIs).

More in detail, the BRAINTEASER project retrospective datasets derived from the merging of already existing datasets obtained by the clinical centers involved in the BRAINTEASER Project. 

  • The ALS dataset was obtained by the merge and homogenisation of the Piemonte and Valle d’Aosta Registry for Amyotrophic Lateral Sclerosis (PARALS, Chiò et al., 2017) and the Lisbon ALS clinic (CENTRO ACADÉMICO DE MEDICINA DE LISBOA, Centro Hospitalar Universitário de Lisboa-Norte, Hospital de Santa Maria, Lisbon, Portugal,) dataset. Both datasets was initiated in 1995 and are currently maintained by researchers of the ALS Regional Expert Centre (CRESLA), University of Turin and of the CENTRO ACADÉMICO DE MEDICINA DE LISBOA-Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa. They include demographic and clinical data, comprehending both static and dynamic variables.
  • The MS dataset was obtained from the Pavia MS clinical dataset, that was started in 1990 and contains demographic and clinical information that are continuously updated by the researchers of the Institute and the Turin MS clinic dataset (Department of Neurosciences and Mental Health, Neurology Unit 1, Città della Salute e della Scienza di Torino.
  • Retrospective environmental data are accessible at various scales at the individual subject level. Thus, environmental data have been retrieved at different scales: 
    • To gather macroscale air pollution data we’ve leveraged data coming from public monitoring stations that cover the whole extension of the involved countries, namely the European Air Quality Portal;
    •  data from a network of air quality sensors (PurpleAir - Outdoor Air Quality Monitor / PurpleAir PA-II) installed in different points of the city of Pavia (Italy) were extracted as well. In both cases, environmental data were previously publicly available. In order to merge environmental data with individual subject location we leverage on postcodes (postcodes of the station for the pollutant detection and postcodes of subject address). Data were merged following an anonymization procedure based on hash keys. Environmental exposure trajectories have been pre-processed and aggregated in order to avoid fine temporal and spatial granularities. Thus, individual exposure information could not disclose personal addresses.

 

The datasets are shared in two formats:

  • RDF (serialized in Turtle) modeled according to the BRAINTEASER Ontology (BTO);
  • CSV, as shared during the iDPP@CLEF 2022 and 2023 challenges, split into training and test.

Each format corresponds to a specific folder in the datasets, where a dedicated README file provides further details on the datasets. Note that the ALS dataset is split into multiple ZIP files due to the size of the environmental data.

 

The BRAINTEASER Data Sharing Policy section below reports the details for requesting access to the datasets.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

BRAINTEASER Data Sharing

Sharing Research data is a necessary component of Research, encouraging more connection and collaboration between researchers, which can result in important new findings within the field.  In order to promote broad, transparent and responsible data sharing the BRAINTEASER Project has developed the BRAINTEASER Data Sharing Policy reported in this guideline, within constraints of funders' and regulatory requirements, on when specific conditions of access should be put in place. The data sharing policy was developed in accordance with the General Data Protection Regulation (GDPR, EU Regulation 2016/679), which provides a number of bases for sharing personal information.

 

Request for use of BRAINTEASER datasets

Please inform us about your intended use of BRAINTEASER datasets by sending an email to

data@brainteaser.health

Doing so will help us to keep track of ongoing research initiatives and allow us to facilitate collaboration of researchers, whenever possible. If you would like additional results, please submit a short, informal research proposal.

 

Citations in publications

When you report results of data that utilizes publicly available BRAINTEASER project data in any way, it is our policy that you:

  • Acknowledge the BRAINTEASER Project Consortium by:
    • Listing the “Brainteaser Project Consortium” among the co-author
      OR
    • Including the following statement in the acknowledgements:
      The authors would like to thank the ‘Brainteaser Project Consortium’
  • Cite the relevant publication of the original results.

 

Clinical Data

Summary statistics derived from environmental and clinical data registered with or without  sensor recordings

 

A - Studies using retrospective clinical data

Retrospective clinical data derived from a huge and long term work of data collection performed for ALS by Prof. A. Chiò and colleagues in Turin and Prof. M. De Carvalho and colleagues in Lisbon. Retrospective data on MS patients were also collected in many years by Dott. P. Cavalla and colleagues in Turin and Prof. R. Bergamaschi and colleagues in Pavia.

To obtain these datasets, the researcher should send a request for access to the data together with a detailed and structured study proposal that will be evaluated by the BRAINTEASER Project Data Committee in order to understand the purposes of the requesting research group. After the decision and authorisation, the requesting research group will receive all the information and data. The subsequent passage, following the analysis and the potential results, will be characterized by the revision and validation process made by the BRAINTEASER Project Data Committee. Requests including topics under current analysis by the members of this Consortium will be declined due to conflict of interests.

Considering the aims of the proposed work and the amount of the dataset to be used, it could be requested inclusion of the project and their members  either as main authors or as “Brainteaser Project Consortium” as co-authors in the publication.

The inclusion request will be communicated by the BRAINTEASER Project Data Committee before the delivering of the dataset to external researchers.

 

B -  Studies using prospective clinical data from the Project

All the participants in the BRAINTEASER Project will be included in the paper as part of the “Brainteaser Project Consortium”. An updated list of all participants will be provided periodically. Based on the different involvement in data analysis, interpretation and writing, a list of main authors will be also defined for each specific paper in addition to the “Brainteaser Project Consortium”.

 

BRAINTESEAR Project Committee members

  • Roberto Bergamaschi, University of Pavia, Italy
  • Maria Fernanda Cabrera-Umpierrez, Technical University of Madrid, Spain
  • Adriano Chiò, University of Turin, Italy
  • Arianna Dagliati, University of Pavia, Italy
  • Mamede De Carvalho, University of Lisbon, Portugal
  • Barbara Di Camillo, University of Padua, Italy
  • Nicola Ferro, University of Padua, Italy
  • Jose Manuel Garcia Dominguez, Gregorio Marañon Hospital in Madrid, Spain
  • Sara C. Madeira, University of Lisbon, Portugal
  • José Luis Muñoz Blanco, Gregorio Marañon Hospital in Madrid, Spain

 

You are currently not logged in. Do you have an account? Log in here

Additional details

Funding

BRAINTEASER – BRinging Artificial INTelligencE home for a better cAre of amyotrophic lateral sclerosis and multiple SclERosis 101017598
European Commission

References

  • Chiò A, Mora G, Moglia C, Manera U, Canosa A, Cammarosano S, Ilardi A, Bertuzzo D, Bersano E, Cugnasco P, Grassano M, Pisano F, Mazzini L, Calvo A (2017). Piemonte and Valle d'Aosta Register for ALS (PARALS). Secular Trends of Amyotrophic Lateral Sclerosis: The Piemonte and Valle d'Aosta Register. JAMA Neurol., 74(9):1097-1104. doi: 10.1001/jamaneurol.2017.1387
  • Bergamaschi R, Monti MC, Trivelli L, Mallucci G, Gerosa L, Pisoni E, Montomoli C. (2021). PM2.5 exposure as a risk factor for multiple sclerosis. An ecological study with a Bayesian mapping approach. Environ Sci Pollut Res Int., 28(3):2804-2809, doi: 10.1007/s11356-020-10595-5
  • Bergamaschi R, Monti MC, Trivelli L, Introcaso VP, Mallucci G, Borrelli P, Gerosa L, Montomoli C. (2020). Increased prevalence of multiple sclerosis and clusters of different disease risk in Northern Italy. Neurol Sci., 41(5):1089-1095, doi: 10.1007/s10072-019-04205-7
  • Guazzo, A., Trescato, I., Longato, E., Hazizaj, E., Dosso, D., Faggioli, G., Di Nunzio, G. M., Silvello, G., Vettoretti, M., Tavazzi, E., Roversi, C., Fariselli, P., Madeira, S. C., de Carvalho, M., Gromicho, M., Chiò, A., Manera, U., Dagliati, A., Birolo, G., Aidos, H., Di Camillo, B., and Ferro, N. (2022). Intelligent Disease Progression Prediction: Overview of iDPP@CLEF 2022. In Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Thirteenth International Conference of the CLEF Association (CLEF 2022), pages 395–422. Lecture Notes in Computer Science (LNCS) 13390, Springer, Heidelberg, Germany. doi: 10.1007/978-3-031-13643-6_25
  • Faggioli, G., Guazzo, A., Marchesin, S., Menotti, L., Trescato, I., Aidos, H., Bergamaschi, R., Birolo, G., Cavalla, P., Chiò, A., Dagliati, A., de Carvalho, M., Di Nunzio, G. M., Fariselli, P., Garc ́ıa Dominguez, J. M., Gromicho, M., Longato, E., Madeira, S. C., Manera, U., Sil- vello, G., Tavazzi, E., Tavazzi, E., Vettoretti, M., Di Camillo, B., and Ferro, N. (2023). Intelligent Disease Progression Prediction: Overview of iDPP@CLEF 2023. In Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fourteenth International Conference of the CLEF Association (CLEF 2023). Lecture Notes in Computer Science (LNCS), Springer, Heidelberg, Germany.