There is a newer version of the record available.

Published April 7, 2025 | Version v1

It's the same but not the same: Do LLMs distinguish Spanish varieties?

  • 1. Universidad Autónoma de Madrid
  • 2. ROR icon Universidad Politécnica de Madrid
  • 3. ROR icon New York University
  • 4. ROR icon Universidad Carlos III de Madrid

Description

Spanish, spoken by over 600 million people, exhibits significant lexical, morphological, and syntactic diversity. Traditional benchmarks often overlook dialectal nuances, leading to biased assessments. This benchmark addresses the gap by focusing on dialectal variation and LLM performance in handling different Spanish dialects.

The Spanish Dialect Benchmark dataset evaluates the ability of LLMs to distinguish and accurately use various Spanish dialects. It addresses the challenge of dialectal bias by presenting 31 multiple-choice questions reflecting regional linguistic variations.

Examples:

  • ¿Cuál suena más natural?

    • a. «Llegas tarde, vístete y corre». (Peninsular, Chilean Spanish)

    • b. «Llegas tarde, vístete y córrele». (Antillean, Mexican Spanish)

  • ¿Qué verbo usas para describir la acción de ponerse de pie?

    • a. levantarse (Rioplatense, Peninsular Spanish)

    • b. pararse (Antillean, Mexican Spanish)

Files

Results.zip

Files (680.7 kB)

Name Size Download all
md5:e383cfb9aee948920ac22db86d5998d3
361.5 kB Download
md5:484464661bb196db2e0e42839488407e
319.2 kB Preview Download

Additional details

Additional titles

Translated title (Spanish)
Es igual pero no es lo mismo: ¿Distinguen los LLMs las variedades del español?

Funding

Agencia Estatal de Investigación
Fun4Date PID2022-136684OB-C21/C22
European Commission
SMARTY 101140087