Published March 20, 2024 | Version v1
Presentation Open

Croissant ML standard in the context of Dataverse, EOSC and beyond

  • 1. ROR icon Data Archiving and Networked Services
  • 2. Harvard Institute for Quantitative Social Science (IQSS)

Description

The Croissant metadata format simplifies how data is used by ML models and provides a vocabulary for dataset attributes, streamlining how data is loaded across ML frameworks, including specification on Responsible AI and reproducibility. Croissant format export is going to be added in the Dataverse data repository and available for the whole Dataverse network. In this presentation, we will explore the innovation path for the rapid evolution of the Croissant standard. We will demonstrate how to implement all changes without modifying the source code, utilizing a novel approach known as FAIR semantic mappings. There is a significant intersection between Croissant and the DDI and DDI-CDI standards, which are commonly used in the Social Sciences and Humanities, and already supported by Dataverse and other data repositories. In this presentation, we'll explore how these standards can assist the Machine Learning community in mitigating bias in data."

Files

Croissant ML standard in Dataverse and beyond.pdf

Files (7.8 MB)

Additional details