BRAINTEASER ALS and MS Datasets
Creators
- Faggioli, Guglielmo1
- Guazzo, Alessandro1
- Marchesin, Stefano1
- Menotti, Laura1
- Trescato, Isotta1
- Aidos, Helena2
- Bergamaschi, Roberto3
- Birolo, Giovanni4
- Cavalla, Paola5
- Chiò, Adriano4
- Dagliati, Arianna3
- de Carvalho, Mamede2
- Di Nunzio, Giorgio Maria1
- Fariselli, Piero4
- García Dominguez, Jose Manuel6
- Gromicho, Marta2
- Longato, Enrico1
- Madeira, Sara C.2
- Manera, Umberto4
- Silvello, Gianmaria1
- Tavazzi, Eleonora1
- Tavazzi, Erica1
- Vettoretti, Marta1
- Di Camillo, Barbara1
-
Ferro, Nicola1
- 1. University of Padua, Italy
- 2. University of Lisbon, Portugal
- 3. University of Pavia, Italy
- 4. University of Turin, Italy
- 5. Città della Salute e della Scienza', Turin, Italy
- 6. Gregorio Marañon Hospital in Madrid, Spain
Description
BRAINTEASER (Bringing Artificial Intelligence home for a better care of amyotrophic lateral sclerosis and multiple sclerosis) is a data science project that seeks to exploit the value of big data, including those related to health, lifestyle habits, and environment, to support patients with Amyotrophic Lateral Sclerosis (ALS) and Multiple Sclerosis (MS) and their clinicians. Taking advantage of cost-efficient sensors and apps, BRAINTEASER will integrate large, clinical datasets that host both patient-generated and environmental data.
As part of its activities, BRAINTEASER organized two open evaluation challenges on Intelligent Disease Progression Prediction (iDPP), iDPP@CLEF 2022 and iDPP@CLEF 2023, co-located with the Conference and Labs of the Evaluation Forum (CLEF).
The goal of iDPP@CLEF is to design and develop an evaluation infrastructure for AI algorithms able to:
- better describe disease mechanisms;
- stratify patients according to their phenotype assessed all over the disease evolution;
- predict disease progression in a probabilistic, time dependent fashion.
The iDPP@CLEF challenges relied on retrospective ALS and MS patient data made available by the clinical partners of the BRAINTEASER consortium. The datasets contain data about 2,204 ALS patients (static variables, ALSFRS-R questionnaires, spirometry tests, environmental/pollution data) and 1,792 MS patients (static variables, EDSS scores, evoked potentials, relapses, MRIs).
More in detail, the BRAINTEASER project retrospective datasets derived from the merging of already existing datasets obtained by the clinical centers involved in the BRAINTEASER Project.
- The ALS dataset was obtained by the merge and homogenisation of the Piemonte and Valle d’Aosta Registry for Amyotrophic Lateral Sclerosis (PARALS, Chiò et al., 2017) and the Lisbon ALS clinic (CENTRO ACADÉMICO DE MEDICINA DE LISBOA, Centro Hospitalar Universitário de Lisboa-Norte, Hospital de Santa Maria, Lisbon, Portugal,) dataset. Both datasets was initiated in 1995 and are currently maintained by researchers of the ALS Regional Expert Centre (CRESLA), University of Turin and of the CENTRO ACADÉMICO DE MEDICINA DE LISBOA-Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa. They include demographic and clinical data, comprehending both static and dynamic variables.
- The MS dataset was obtained from the Pavia MS clinical dataset, that was started in 1990 and contains demographic and clinical information that are continuously updated by the researchers of the Institute and the Turin MS clinic dataset (Department of Neurosciences and Mental Health, Neurology Unit 1, Città della Salute e della Scienza di Torino.
- Retrospective environmental data are accessible at various scales at the individual subject level. Thus, environmental data have been retrieved at different scales:
- To gather macroscale air pollution data we’ve leveraged data coming from public monitoring stations that cover the whole extension of the involved countries, namely the European Air Quality Portal;
- data from a network of air quality sensors (PurpleAir - Outdoor Air Quality Monitor / PurpleAir PA-II) installed in different points of the city of Pavia (Italy) were extracted as well. In both cases, environmental data were previously publicly available. In order to merge environmental data with individual subject location we leverage on postcodes (postcodes of the station for the pollutant detection and postcodes of subject address). Data were merged following an anonymization procedure based on hash keys. Environmental exposure trajectories have been pre-processed and aggregated in order to avoid fine temporal and spatial granularities. Thus, individual exposure information could not disclose personal addresses.
The datasets are shared in two formats:
- RDF (serialized in Turtle) modeled according to the BRAINTEASER Ontology (BTO);
- CSV, as shared during the iDPP@CLEF 2022 and 2023 challenges, split into training and test.
Each format corresponds to a specific folder in the datasets, where a dedicated README file provides further details on the datasets. Note that the ALS dataset is split into multiple ZIP files due to the size of the environmental data.
The BRAINTEASER Data Sharing Policy section below reports the details for requesting access to the datasets.
Files
Additional details
Funding
References
- Chiò A, Mora G, Moglia C, Manera U, Canosa A, Cammarosano S, Ilardi A, Bertuzzo D, Bersano E, Cugnasco P, Grassano M, Pisano F, Mazzini L, Calvo A (2017). Piemonte and Valle d'Aosta Register for ALS (PARALS). Secular Trends of Amyotrophic Lateral Sclerosis: The Piemonte and Valle d'Aosta Register. JAMA Neurol., 74(9):1097-1104. doi: 10.1001/jamaneurol.2017.1387
- Bergamaschi R, Monti MC, Trivelli L, Mallucci G, Gerosa L, Pisoni E, Montomoli C. (2021). PM2.5 exposure as a risk factor for multiple sclerosis. An ecological study with a Bayesian mapping approach. Environ Sci Pollut Res Int., 28(3):2804-2809, doi: 10.1007/s11356-020-10595-5
- Bergamaschi R, Monti MC, Trivelli L, Introcaso VP, Mallucci G, Borrelli P, Gerosa L, Montomoli C. (2020). Increased prevalence of multiple sclerosis and clusters of different disease risk in Northern Italy. Neurol Sci., 41(5):1089-1095, doi: 10.1007/s10072-019-04205-7
- Guazzo, A., Trescato, I., Longato, E., Hazizaj, E., Dosso, D., Faggioli, G., Di Nunzio, G. M., Silvello, G., Vettoretti, M., Tavazzi, E., Roversi, C., Fariselli, P., Madeira, S. C., de Carvalho, M., Gromicho, M., Chiò, A., Manera, U., Dagliati, A., Birolo, G., Aidos, H., Di Camillo, B., and Ferro, N. (2022). Intelligent Disease Progression Prediction: Overview of iDPP@CLEF 2022. In Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Thirteenth International Conference of the CLEF Association (CLEF 2022), pages 395–422. Lecture Notes in Computer Science (LNCS) 13390, Springer, Heidelberg, Germany. doi: 10.1007/978-3-031-13643-6_25
- Faggioli, G., Guazzo, A., Marchesin, S., Menotti, L., Trescato, I., Aidos, H., Bergamaschi, R., Birolo, G., Cavalla, P., Chiò, A., Dagliati, A., de Carvalho, M., Di Nunzio, G. M., Fariselli, P., Garc ́ıa Dominguez, J. M., Gromicho, M., Longato, E., Madeira, S. C., Manera, U., Sil- vello, G., Tavazzi, E., Tavazzi, E., Vettoretti, M., Di Camillo, B., and Ferro, N. (2023). Intelligent Disease Progression Prediction: Overview of iDPP@CLEF 2023. In Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fourteenth International Conference of the CLEF Association (CLEF 2023). Lecture Notes in Computer Science (LNCS), Springer, Heidelberg, Germany.