Published February 7, 2025 | Version 1.0
Dataset Open

ITALIC

  • 1. Interuniversity Research Centre for Public Services
  • 2. ROR icon University of Milano-Bicocca

Description

We present ITALIC, a large-scale benchmark dataset of 10,000 multiple-choice questions designed to evaluate the natural language understanding of the Italian language and culture. ITALIC spans 12 domains, exploiting public tests to score domain experts in real-world scenarios. We detail our data collection process, stratification techniques, and selection strategies.

ITALIC provides a comprehensive assessment suite that captures commonsense reasoning and linguistic proficiency in a morphologically rich language. It serves as a benchmark for evaluating existing models and as a roadmap for future research, encouraging the development of more sophisticated and culturally aware natural language systems.

Curated by: CRISP research centre https://crispresearch.it/

Learn more at: https://italicbench.it/

Files

Files (3.3 MB)

Name Size Download all
md5:fc2dc1d188adafa5521c50ad9eb151c5
3.3 MB Download

Additional details

Dates

Accepted
2025-01-23
Accepted at NAACL 2025
Available
2025-02-07
Dataset available

Software

Repository URL
https://github.com/Crisp-Unimib/ITALIC
Programming language
Python
Development Status
Active