Published December 19, 2023 | Version v1
Preprint Open

Lifecycle for FAIR Machine Learning

  • 1. ZB MED Information Centre for Life Sciences
  • 2. NFDI4DataScience
  • 3. Institute of Applied Biosciences, Centre for Research and Technology Hellas
  • 4. Euro-BioImaging ERIC Bio-Hub, European Molecular Biology Laboratory (EMBL) Heidelberg
  • 5. AI4Life
  • 6. 4TU.ResearchData/TU Delft
  • 7. Department of Medical Informatics, Institute for Community Medicine, University Medicine Greifswald
  • 8. Technical University of Berlin

Description

Abstract 

Despite the advances in Machine Learning Operations and the availability of variation of the Machine Learning lifecycle, there is none yet aligned to the Findable, Accessible, Interoperable and Reusable (FAIR) principles. Here we present our proposal of such a lifecycle, including an initial analysis on which and how the FAIR principles apply together with some additional information on reporting best practices and existing resources that could support the different phases in the lifecycle.

Keywords  

Machine Learning lifecycle, ML lifecycle, FAIR, FAIR4ML

 

Files

2023.12 Lifecycle for FAIR Machine Learning.pdf

Files (1.4 MB)

Additional details

Funding

NFDI4DS - NFDI for Data Science and Artificial Intelligence 460234259
Deutsche Forschungsgemeinschaft

References

  • Kreuzberger D, Kühl N, Hirschl S. Machine Learning Operations (MLOps): Overview, Definition, and Architecture. IEEE Access. 2023;11: 31866–31879. doi:10.1109/ACCESS.2023.3262138
  • Castro LJ, Katz DS, Psomopoulos F. Working Towards Understanding the Role of FAIR for Machine Learning. PUBLISSO; 2021. doi:10.4126/FRL01-006429415
  • Wilkinson MD, Dumontier M, Aalbersberg IjJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3: 160018. doi:10.1038/sdata.2016.18
  • Chue Hong NP, Katz DS, Barker M, Lamprecht A-L, Martinez C, Psomopoulos FE, et al. FAIR Principles for Research Software (FAIR4RS Principles). 2022. doi:10.15497/RDA00068
  • Barker M, Chue Hong NP, Katz DS, Lamprecht A-L, Martinez-Ortiz C, Psomopoulos F, et al. Introducing the FAIR Principles for research software. Sci Data. 2022;9: 622. doi:10.1038/s41597-022-01710-x
  • Visser C de, Johansson LF, Kulkarni P, Mei H, Neerincx P, Velde KJ van der, et al. Ten quick tips for building FAIR workflows. PLOS Computational Biology. 2023;19: e1011369. doi:10.1371/journal.pcbi.1011369
  • Goble C, Cohen-Boulakia S, Soiland-Reyes S, Garijo D, Gil Y, Crusoe MR, et al. FAIR Computational Workflows. Data Intelligence. 2020;2: 108–121. doi:10.1162/dint_a_00033
  • Duarte J, Li H, Roy A, Zhu R, Huerta EA, Diaz D, et al. FAIR AI Models in High Energy Physics. arXiv; 2022. doi:10.48550/arXiv.2212.05081
  • Ravi N, Chaturvedi P, Huerta EA, Liu Z, Chard R, Scourtas A, et al. FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy. Sci Data. 2022;9: 657. doi:10.1038/s41597-022-01712-9
  • Katz DS, Pollard T, Psomopoulos F, Huerta E, Erdmann C, Blaiszik B. FAIR principles for Machine Learning models. 2020 [cited 25 Aug 2023]. doi:10.5281/ZENODO.4271996
  • Huerta EA, Blaiszik B, Brinson LC, Bouchard KE, Diaz D, Doglioni C, et al. FAIR for AI: An interdisciplinary and international community building perspective. Sci Data. 2023;10: 487. doi:10.1038/s41597-023-02298-6
  • Walsh I, Fishman D, Garcia-Gasulla D, Titma T, Pollastri G, Capriotti E, et al. DOME: recommendations for supervised machine learning validation in biology. Nature Methods. 2021. doi:10.1038/s41592-021-01205-4
  • Mitchell M, Wu S, Zaldivar A, Barnes P, Vasserman L, Hutchinson B, et al. Model Cards for Model Reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency. 2019. pp. 220–229. doi:10.1145/3287560.3287596
  • Pergl R, Hooft R, Suchánek M, Knaisl V, Slifka J. "Data Stewardship Wizard": A Tool Bringing Together Researchers, Data Stewards, and Data Experts around Data Management Planning. Data Science Journal. 2019;18: 59. doi:10.5334/dsj-2019-059
  • Klar J, Michaelis O, Engelhardt C, Enke H, Frenzel J, Hausen D, et al. Research Data Management Organizer (RDMO). 2023. doi:10.5281/zenodo.596581
  • Jones MB, Boettiger C, Mayes AC, Arfon Smith, Slaughter P, Niemeyer K, et al. CodeMeta: an exchange schema for software metadata. KNB Data Repository. KNB Data Repository; 2016. doi:10.5063/SCHEMA/CODEMETA-1.0
  • Gray AJG, Goble C, Jimenez RC. From Potato Salad to Protein Annotation. ISWC Posters and Demo session. Vienna, Austria; 2017. p. 4. Available: http://ceur-ws.org/Vol-1963/paper579.pdf
  • Gray A, Castro LJ, Juty N, Goble C. Schema.org for Scientific Data. Artificial Intelligence for Science. WORLD SCIENTIFIC; 2022. pp. 495–514. doi:10.1142/9789811265679_0027
  • Giraldo O, Geist L, Quiñones N, Solanki D, Rebholz-Schuhmann D, Castro LJ. machine-actionable Software Management Plan Ontology (maSMP Ontology). Zenodo; 2023. doi:10.5281/zenodo.7806638
  • Giraldo O, Dessi D, Dietze S, Rebholz-Schuhmann D, Castro LJ. Machine-Actionable Metadata for Software and Software Management Plans for NFDI. Proceedings of the Conference on Research Data Infrastructure. 2023. doi:10.52825/cordi.v1i.279
  • Giraldo O, Geist L, Quiñones N, Solanki D, Alves R, Bampalikis D, et al. A metadata schema for machine-actionable Software Management Plans. PUBLISSO-FRL; 2023. doi:10.4126/FRL01-006444988
  • Alves R, Bampalikis D, Castro LJ, González JMF, Harrow J, Kuzak M, et al. ELIXIR Software Management Plan for Life Sciences. BioHackrXiv; 2021. doi:10.37044/osf.io/k8znb
  • Castro LJ, Geist L, Gonzalez E, Gonzalez-Ocanto M, Grossmann YV, Pronk T, et al. Five Minutes to Write a Software Management Plan – A Machine-actionable Approach to Simplify the Creation of SMPs. Zenodo; 2023. doi:10.5281/zenodo.10374839
  • Rehm G, Berger M, Elsholz E, Hegele S, Kintzel F, Marheinecke K, et al. European Language Grid: An Overview. In: Calzolari N, Béchet F, Blache P, Choukri K, Cieri C, Declerck T, et al., editors. Proceedings of the Twelfth Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association; 2020. pp. 3366–3380. Available: https://aclanthology.org/2020.lrec-1.413
  • Vasey B, Clifton DA, Collins GS, Denniston AK, Faes L, Geerts BF, et al. DECIDE-AI: new reporting guidelines to bridge the development-to-implementation gap in clinical artificial intelligence. Nat Med. 2021;27: 186–187. doi:10.1038/s41591-021-01229-5
  • Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med. 2020;26: 1364–1374. doi:10.1038/s41591-020-1034-x
  • Cruz Rivera S, Liu X, Chan A-W, Denniston AK, Calvert MJ. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat Med. 2020;26: 1351–1363. doi:10.1038/s41591-020-1037-7
  • Vanschoren J, van Rijn JN, Bischl B, Torgo L. OpenML: networked science in machine learning. SIGKDD Explor Newsl. 2014;15: 49–60. doi:10.1145/2641190.2641198