
The transcription used in this dataset is taken from [Flexique version 1.3](http://www.llf.cnrs.fr/fr/flexique-fr.php). The main difference is that phonological sequences are segmented into phonemes by introducing spaces between each phonemes (per the [paralex](http://www.paralex-standard.org) convention). Most of the following text is taken from the documentation of Flexique.

Three symbols that are not part of the IPA are used to transcribe neutralized mid vowels:

- E for the vowel neutral between e and E
- O for the vowel neutral between o and O
- Ø for the vowel neutral between ø and œ

IPA symbols have their standard interpretation. As is standard in studies of French, ə notes a vowel alternating between ø, œ and no realization.

Flexique originally did not include overabundance. As a result, arbitrations were made so that a single transcription be proposed that is as close as possible to a likely surface form while allowing inference of the other possible forms. A few of these choices are still inherited by this resource:

- Except in word final position, schwas have been included wherever they are possible, even in cases where the actual realization of a schwa is very unlikely. For instance for the future 3rd singular of AIMER ‘like’, the transcription is ɛməʁa, which is a lot less likely than ɛmʁa in normal speech. This decision is motivated by the fact that it is possible to predict the range of possible realizations of a form from the form containing the maximal number of word-internal schwas, but it is not possible to predict where schwas will be possible from a form without schwas—e.g. the form kɔ̃tʁa is ambiguous between the future 2nd or 3rd singular of COMPTER ‘count’ and the simple past 2nd or 3rd singular of CONTRER ‘counter’, whereas kɔ̃təʁa is unambiguously the former. For this reason including all schwas is the only solution if one is to give a single phonological representation for a word form, but one should be aware of the fact that it artificially diminishes the prevalence of homophony; thus in many applications it is advisable to process the Lexique data so as to suppress some schwas. Word-final schwas are not included because their distribution is entirely dependent on the phonological context (see Dell 1995)
- For non-first conjugation verbs, when the stem ends in ʁ, the future and conditional forms have a geminate ʁ in conservative varieties (e.g. MOURIR ‘die’, future 3rd singular: muʁʁa. This is definitely not the only possibility: degemination is very common (muʁa), and regularizations by vowel epenthesis, while frowned upon, are quite frequent (muʁəʁa,muʁiʁa). Flexique records the conservative form. - French morphophonology leads to frequent alternations between high vowels i, y and u, the corresponding glides j, ɥ and w, and the vowel-glide sequence ij, yɥ, uw. The transcription convention is to use:
    - A vowel wherever it is the only possible form, e.g. elle relie ‘she links’: ʁəli
    - A glide wherever it is the only possible form, e.g. elle paye ‘she pays’: pɛj
    - A vowel-glide sequence ij wherever it is the only possible form, e.g. elle priait ‘she prayed’: pʁijE
    - A single glide where there is alternation between glide and vowel-glide sequence, e.g. elle  reliait ‘she linked’, which can be realized both ʁəljE and  ʁəlijE, is transcribed as ʁəljE. 
    - Where morphology would warrant a geminate glide (e.g. nous payions: pEj+j+ɔ̃) this is realized   as a single glide in standard French. Hypercorrect forms such as pEjjɔ̃ are sometimes heard,   but are ignored in Vlexique: payions is transcribed pEjɔ̃

# References

- François Dell (1995), ‘Consonant clusters and phonological syllables in French’, Lingua 95:5–26.
- [Flexique documentation (downloadable archive)](http://www.llf.cnrs.fr/fr/flexique-fr.php)

