ITALIC

Seveso, Andrea; Potertì, Daniele; MEZZANZANICA, MARIO; Mercorio, Fabio

doi:10.5281/zenodo.14725823

Published February 7, 2025 | Version 1.0

Dataset Open

ITALIC

1. Interuniversity Research Centre for Public Services
2. University of Milano-Bicocca

We present ITALIC, a large-scale benchmark dataset of 10,000 multiple-choice questions designed to evaluate the natural language understanding of the Italian language and culture. ITALIC spans 12 domains, exploiting public tests to score domain experts in real-world scenarios. We detail our data collection process, stratification techniques, and selection strategies.

ITALIC provides a comprehensive assessment suite that captures commonsense reasoning and linguistic proficiency in a morphologically rich language. It serves as a benchmark for evaluating existing models and as a roadmap for future research, encouraging the development of more sophisticated and culturally aware natural language systems.

Curated by: CRISP research centre https://crispresearch.it/

Learn more at: https://italicbench.it/

Files

Files (3.3 MB)

Name	Size	Download all
italic.jsonl md5:fc2dc1d188adafa5521c50ad9eb151c5	3.3 MB	Download

Additional details

Accepted: 2025-01-23

Accepted at NAACL 2025
Available: 2025-02-07

Dataset available

Repository URL: https://github.com/Crisp-Unimib/ITALIC
Programming language: Python
Development Status: Active

277

Views

Downloads

Show more details

	All versions	This version
Views	277	277
Downloads	69	69
Data volume	235.2 MB	235.2 MB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Conference

ITALIC: An Italian Culture-Aware Natural Language Benchmark (ITALIC) , Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, April 29–May 4, 2025

Languages

Italian

License: MIT License

A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code. Read more

Technical metadata

Created: February 7, 2025
Modified: May 1, 2025

Files (3.3 MB)

Dates

Software

ITALIC

Authors/Creators

Description

Files

Files (3.3 MB)

Additional details

Dates

Software