Published March 31, 2026 | Version 2.0
Dataset Open

The Manzini & Savoia (2005) Corpus: Morphosyntactic Variation in Italian and Romansh Dialects

Description

Corresponding Author

Mazzaggio, Greta (greta.mazzaggio@unifi.it)

Abstract

This dataset consists of linguistic examples of Italian morphosyntactic microvariation documented in the three volumes: Manzini M.R., Savoia L.M. (2005), I dialetti italiani e romanci. Morfosintassi generativa, Alessandria, Edizioni dell’Orso.
These data are also accessible from the following link: https://manzinisavoia.changes.unifi.it/

Dataset content

The dataset consists of a corpus of linguistic examples illustrating microvariation in Italian dialects, compiled by Manzini and Savoia (2005). It includes data from 457 Italian dialectal varieties, 9 Corsican varieties, and 19 Swiss varieties, all collected through field research and annotated using the International Phonetic Alphabet (IPA). The corpus contains a total of 64,472 linguistic examples, each consisting of a dialectal sentence transcribed in IPA along with its Italian gloss.

Data are in JSON and CSV format. For further information about data acquisition and digitization, please refer to the publication below. In order to correctly view the examples in IPA, it is essential to use a text editor or software that supports UTF-8 encoding.

The dataset consists of a flat CSV file (42.1 MB approx) and a structured JSON file (88.8 MB approx).

Terms of use

This work has been supported by funding from the Italian Ministero dell’Università e della Ricerca and from the European Union (PNRR - PE05 CHANGES CUP B53C22004010006).

The dataset is open access for scientific research and non-commercial purposes.
The authors require to acknowledge their work and, in case of scientific publication, to cite the following works:

  • Manzini M.R., & Savoia L.M. (2005). I dialetti italiani e romanci. Morfosintassi generativa. Alessandria, Edizioni dell’Orso.
  • Mazzaggio, G., & Binazzi, N. (2024). Valorizzare il patrimonio immateriale: un’esperienza di digitalizzazione del dialetto. DILEF. Rivista digitale del Dipartimento di Lettere e Filosofia, 3, pp. 224-242. https://doi.org/10.35948/DILEF/2024.4348
  • Mazzaggio, G., Ludovico, L. A., Vena, M. V., Manzini, M. R., & Savoia, L. M. (2023). Morphosyntax of Italian and Romance Varieties: Presentation of the Manzini and Savoia (2005) Corpus and Its Digitalization. Bollettino dell’Atlante Linguistico Italiano, 2023(47), 185-210.

Files

MS_corpus_aligned_tagged_v2.0.json

Files (130.9 MB)

Name Size Download all
md5:d8ad36361684ea60a16f957f8759efee
88.8 MB Preview Download
md5:cf5aeec289bc97e9e86af0e96778daa4
42.1 MB Preview Download

Additional details

Related works

Is described by
Publication: 10.35948/DILEF/2024.4348 (DOI)