#ReVoc Corpus v1.0 (26/04/2024) This work was created by Lo Congrès permanent de la lenga occitana (https://locongres.org) as part of its "Còrpus" project (http://abrac.at/corpusproject). The sentences in this corpus were collected by Lo Congrès using ReVoc (https://contribuir.locongres.com/revoc), its collaborative tool for recording Occitan sentences. Hundreds of people of various ages, genders and Occitan varieties have read sentences, resulting in an aligned audio corpus with more than 40,000 audio files. ReVoc Corpus content : ZIP files containing all the recordings for one dialect. The files are named according to the IETF languages subtags: - aranes: aranese gascon occitan, 2 recordings - auvern: auvergnate occitan, 40 recordings - gascon: gascon occitan (except for aranese), 23,377 recordings - lemosin: lemosin occitan, 38 recordings - lengadoc: lengadocian occitan, 15,537 recordings - nicard: niçard (nissart) provençal occitan, 17 recordings - provenc: provençal occitan (except for niçard), 179 recordings - vivaraup: vivaroalpine occitan, 8 recordings corpus_revoc.csv, transcriptions file, with tabulation as delimiter and the following columns: - file: path to the audio file - transcription: written form of the sentence recorded in the file - date: recording date - variety: occitan variety of the sentence - age: age group of the speaker - gender: gender of the speaker The "ReVoc" project is produced by Lo Congrès permanent de la lenga occitana, in partnership with the Elhuyar Foundation and Rolde de estudios aragoneses, with financial support from the Nouvelle-Aquitaine region, the Occitanie region and the Pyrénées-Atlantiques department. ## License The ReVoc Corpus is distributed under the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0).