A Survey on Metadata for Machine Learning Models and Datasets: Standards, Practices, and Harmonization Challenges

Gesese, Genet Asefa

doi:10.5281/zenodo.17693280

Published November 23, 2025 | Version v1

Presentation Open

A Survey on Metadata for Machine Learning Models and Datasets: Standards, Practices, and Harmonization Challenges

Gesese, Genet Asefa¹

1. FIZ Karlsruhe – Leibniz Institute for Information Infrastructure

This was a talk presented at the Sci-K Workshop co-located with ISWC 2025 in Nara, Japan.

The growing availability of machine learning (ML) models, datasets, and related artifacts across platforms, such as Hugging Face, GitHub, and Zenodo, has amplified the need for structured and standardized metadata. However, metadata practices remain highly heterogeneous, differing in schema design, vocabulary usage, and semantic expressiveness, posing significant challenges for tasks such as representation, extraction, alignment, and integration. This fragmentation impedes the development of infrastructures that depend on machine-actionable metadata to support discovery, provenance tracking, or cross-platform interoperability. While metadata is also foundational to enabling FAIR (Findable, Accessible, Interoperable, and Reusable) principles in ML, there is a lack of consolidated understanding of how existing standards support interoperability and alignment across platforms. In this survey, we review and compare a range of general-purpose and ML-specific metadata standards, evaluating their suitability for cross-platform alignment, discoverability, extensibility, and interoperability. We assess these standards based on defined criteria and analyze their potential to support unified, FAIR-compliant metadata infrastructures for ML, laying the groundwork for scalable and interoperable tooling in future ML ecosystems.

Files

Sci-K 2025 - Metadata-Survey.pdf

Files (958.2 kB)

Name	Size	Download all
Sci-K 2025 - Metadata-Survey.pdf md5:89a11f5a86f2c96b093fdebdf690127a	958.2 kB	Preview Download

Additional details

Is supplement to: Conference paper: https://ceur-ws.org/Vol-4065/paper05.pdf (URL)

	All versions	This version
Views	90	90
Downloads	39	39
Data volume	48.9 MB	48.9 MB

A Survey on Metadata for Machine Learning Models and Datasets: Standards, Practices, and Harmonization Challenges

Authors/Creators

Description

Files

Sci-K 2025 - Metadata-Survey.pdf

Files (958.2 kB)

Additional details

Related works