Criteria for creating new standard reading passages for the assessment of speech and voice: A Delphi consensus study

ABSTRACT Standard reading passages allow for the study of the integrated functions of speech and voice components in contextual, running speech, with target stimuli in a controlled environment. In both clinical practice and research, these texts provide rapid insight into the characteristics of the patient’s speech, with fewer hesitations than in conversational speech and better predictability by the evaluator. Although a plethora of texts exist in different languages, they present various limitations. A specifically created standardised text in each language allowing for an ecological assessment of speech and voice functions, meeting most required criteria for standard speech and voice assessment and adapted to the target language’s cultural and linguistic specificities, would therefore be an interesting option. However, no guidelines exist for the creation of such a reading passage. This article describes the international Delphi consensus study carried out to identify a minimal set of criteria to take into account when creating standard reading passages for an overall speech and voice assessment in adolescents and adults. This survey was conducted in three consecutive rounds; forty experts participated in the first round, with a total dropout of 17% from round 1 to round 3. It results in a minimal set of ten criteria which were selected by a majority of the experts and were rated as most important. This set contains five phoneme-level, two word-level, two sentence-level criteria and one global-level criterion. It can be used as a general guideline for the creation of standard reading passages in Indo-European Romance and Germanic languages such as English, French and German. The construction of a new reading passage in French following this guideline is briefly described.


Introduction
Speech can be assessed perceptually or instrumentally on samples of different levels of granularity (e.g. single words vs sentences) which are more or less functional (e.g. reading aloud from a text vs conversational speech sample; Pommée et al., 2021b). A wide variety of assessment tasks and tools exist, both in clinical practice (Gurevich & Scamihorn, 2017;Pommée et al., 2021a) and in scientific research (Pommée et al., 2021).
Each type of unit allows for an assessment that serves a specific purpose, which may differ depending on whether the assessment is perceptual or instrumental. In this article, we will focus more specifically on texts. Indeed, while segmental units represent the main unit of analytical intelligibility assessment (Fredouille et al., 2019;Pommée et al., 2021b;Woisard et al., 2013), the use of a reading passage allows the speech assessment to be integrated into a more natural production context for a more functional assessment. Examples of speech and voice assessment types that can be carried out on reading passages are pausing and breathing patterns, voice quality, pitch and intensity, disfluencies, the use of prosodic patterns, vowel and consonant articulation accuracy and velopharyngeal function. Thus, standard reading passages are of major interest in patient populations with sufficient reading proficiency, because they allow for the study of the integrated functions of speech and voice components in contextual, running speech (closest to spontaneous speech as compared to isolated words or sentences), with target stimuli in a controlled environment (Patel et al., 2013). In both clinical practice and research, these texts provide rapid insight into the characteristics of the patient's speech (Auzou & Rolland-Monnoury, 2006), with fewer hesitations than in conversational speech (Vasilescu et al., 2004) and better predictability by the evaluator.
However, although a plethora of texts exist in different languages, none of them seem to really meet the expectations and needs of clinicians and researchers for routine speech and voice assessment. Some were created for a specific purpose, such as the Zoo Passage in English (Fletcher, 1972) for the assessment of velopharyngeal closure, or the texts by Kuo and Weismer (Kuo & Weismer, 2016) for the assessment of vowel reduction. For most of the others, little information is available about how they were created. Some are simply excerpts from literary works (e.g. Alphonse Daudet's 'La chèvre de Monsieur Seguin' (Daudet, 1869) in French), or translations of texts such as "The Northwind and the Sun" (International Phonetic Association, 1999) without adaptation to the target language (Jesus et al., 2015). A standardised text in each language allowing for an ecological assessment of speech and voice functions, meeting most required criteria for standard speech and voice assessment and adapted to the target language's cultural and linguistic specificities, would therefore be an interesting option. Specific speech stimuli, e.g. for the assessment of velopharyngeal function, could then be used once the main symptoms have been identified. However, we noticed the absence of guidelines for the creation of such a reading passage, which would benefit clinicians and researchers from various linguistic communities. Hence, the aim of this article is to describe the international Delphi consensus study we carried out to identify a minimal set of criteria to take into account when creating standard reading passages for the assessment of speech and voice functions. The reader should be aware that this study targets reading passages for the assessment of articulation (dysarthria, apraxia), prosodic variations and phonatory behaviour (e.g. dysphonia, vocal feminisation) and fluency disorders (stuttering/stammering) in patients whose reading fluency is sufficiently developed. It is therefore not adapted for developmental speech disorder assessment in children, nor intended for the assessment of reading proficiency or for language assessment (e.g. aphasia).
In this article, we will describe the selection and prioritisation of the construction criteria by the international expert group.

Participants
We targeted professionals (clinicians, researchers, lecturers) engaged in activities in speech sound disorders 1 and/or fluency disorders (stuttering/stammering). 'Activities' were defined as clinical activity, research, academic or industrial activity, or a combination of these, if at least approximately 20% related to speech (self-estimated by the participants).
Recruitment was carried out via: • national professional associations (speech-language-hearing, phoniatrics, voice, acoustics, computer science/signal processing, linguistics/phonetics); over 200 organisations were contacted worldwide by email; • social networks (Twitter, Facebook), where targeted professions and associations were also solicited in private groups; • email to over 50 hand-selected speech experts identified in literature searches on PubMed, who had at least three publications in the field of speech, were authors of a reference book, or participated in research projects linked with pathological speech.
Non-respondents for each round were excluded from subsequent rounds. An a posteriori statistical comparison using McNemar's Chi-square test for paired samples was carried out to verify that the expert panel characteristics did not significantly change with the dropouts. It was also verified that the increasing consensus throughout the Delphi rounds was not biased by the dropouts of predominantly disagreeing experts. To that end, quantitative data in all three rounds with and without the dropouts were compared.
Detailed data for the participants in each round are available in Appendix A. The trends described hereafter are constant throughout the three rounds despite the dropouts. No statistical difference was found between the expert characteristics with and without the dropouts. A majority of the expert panel were speech and language pathologists (SLPs) working in the fields of speech, fluency and voice disorders. Other major groups of participants were linguists, ENT/phoniatricians and computer scientists. More than half of the experts had at least 10 years of experience working in the speech and voice domains. Their main activity was research for about 40%, clinical practice and academic activity for about a third of the experts, respectively. The vast majority of experts were engaged in at least two main activities; clinical activity and research were combined in half of the cases. More than half of the participants had a third-cycle diploma (PhD) obtained on average in 2009-2010 (±8 years).
France, the United Kingdom and Germany were the most represented countries, while the most frequent main language spoken at work was English, followed by French and German.
The patient/study populations showed a slight prevalence for the elderly population. Also, the most encountered were acquired and degenerative neurological pathologies.

Ethics approval
This research was registered with the data protection officer of the Centre National de la Recherche Scientifique (CNRS) and was also approved by the computer science ethics advisory board (Comité consuLtatif d'Éthique concernant la Recherche en Informatique de Toulouse -CLERIT). An information sheet describing the purpose of the study, the participant's rights and the data privacy policy was provided to all participants prior to the first round of the survey. Participants gave consent for their answers to be used in an anonymised and aggregated manner to derive consensus statements. Only email addresses were known to the moderator, in order to enable round-to-round survey monitoring.

Study design
In general, the Delphi methodology is conducted over several consecutive rounds (usually three; Birko et al., 2015;Linstone & Turoff, 2002). In a 'modified Delphi study', statements to be rated are directly provided in the first round, based on a preliminary literature search (Cunningham et al., 2019;Denman et al., 2019). After each round, the panel's responses are synthesised, and areas of agreement/disagreement are identified. Aggregated controlled feedback (Von der Gracht, 2012) is then provided to the panel in the following round to explain modifications that have been made to facilitate consensus, and participants give their opinion on the new assertions (Denman et al., 2019;Diamond et al., 2014). The iterations are usually carried out until consensus is reached or until stability in the answers is observed (see, (Chalmers & Armour, 2019;Von der Gracht, 2012) for examples of stability measures).
The data discussed in the present article are part of a broader modified Delphi study which was designed in two parts. The first part, addressing the definition and measures of intelligibility and comprehensibility, is described in another article (Pommée et al., 2021b). The present article addresses the second part, i.e. about the criteria for creating standard reading passages. The expert panel is thus the same for both studies. The stages of this second part of our Delphi study are summarised in Figure 1. The main steps which will be described hereafter are the problem definition, the literature search, the panel selection and the construction of the three consecutive Delphi rounds.

Definition of the problem
As explained in the introduction, most available reading passages are either excerpts from literature works or translations, or were created to target a specific speech measure. To allow for the creation of new standard reading passages for the assessment of speech and voice, guidelines are necessary. Therefore, the aim of this Delphi study was to draw up a consensual minimal set of criteria for the creation of such reading passages.

Identification of criteria
Prior to launching the survey, a targeted non-systematic literature search was carried out to identify commonly used reading passages in English, French and Dutch, as well as the criteria that were used to construct these passages. This search was carried out using PubMed as well as reference lists of known articles on the topic.
A non-extensive list of reading passages in French, English and Dutch, as well as their reported characteristics and shortcomings, can be found in Supplemental material 1.
Four articles describing reading passage creations in detail were retrieved and were used as the primary source for developing an initial list of criteria: The criteria identified in these articles were then enriched by an exchange with the authors of the French MonPaGe protocol (Laganaro et al., 2021), who had written a text specifically created for the assessment of dysarthric and apraxic speech and validated on French-speaking talkers from Belgium, France, Quebec and Switzerland (Fougeron et al., 2019;Pernon et al., 2020).
Published in 2021, it is the only French text to date for which the construction criteria have been detailed by the authors. Indeed, they constructed a short text (fewer than 200 words) that includes: • words also presented in isolation in the protocol (e.g. days of the week also found in an automatic speech task of the protocol); • glides; • word repetitions; • elements that allow for the assessment of vowel-to-vowel coarticulation (/papa/ vs. /papi/); • the cardinal vowels /a,i,u/ in monosyllabic consonant-vowel (CV), CVC, and bisyllabic CVCV words; • the same sequence at the beginning and end of the text (to assess fatigability); • dialogues and melodic variations; • predominantly nasal segments that allow for the evaluation of nasality; • complex syllabic sequences.
However, these criteria were not based on an inventory of the needs of clinicians and researchers.
Based on this collaboration, as well as on the literature search, the following list of 25 potential criteria to consider when creating a new reading passage was established. The rationale for these 25 criteria can be found in Supplemental material 2.

Phoneme-level criteria.
(1) Complete phonemic inventory (i.e. the text includes all of the target language's phonemes at least once); (2) Phonetic balance (i.e. the included phonemes occur at approximately the same frequency at which they occur in ordinary conversation in the target language); (3) Inclusion of bi-and triconsonantal clusters; (4) Taking into account the positions of the phonemes in the words (beginning/middle/ end), as well as their phonemic context; (5) Repeated inclusion of cardinal vowels; (6) Taking into account vowel-to-vowel coarticulation (e.g. word pair /papa/-/papi/); (7) Inclusion of initial voiceless stops and clusters, as well as unvoiced-voiced phoneme sequences (fluency disorders); (8) Occurrence of words with glides (lingual articulatory dynamics, dysarthria) and words inducing lingual movements across the vowel triangle (e.g. /kajak/, /kiwi/);

Word-level criteria.
(1) Control of the ratio of function/content words (fluency disorders); (2) Control of the articulatory/phonetic complexity of words; (3) Repeated occurrence of some words (error consistency, apraxia vs dysarthria); (4) Inclusion of word pairs of increasing length (e.g. amuse/amusement); (5) Control of lexical frequency; (6) Coupling with items from isolated assessment tasks to compare isolated vs continuous speech performance (e.g. include words from single-word lists of assessment batteries); (7) Control of the phonological neighbourhood of words;

Sentence-level criteria.
(1) Presence of various intonation patterns (questions, statements, contrastive emphasis); (2) Inclusion of predominantly nasal vs. oral sentences (velar insufficiency, cleft palate); (3) Presence of a 100% voiced sentence; (4) Repetition of an identical segment at the beginning and at the end of the passage (fatigue effect); (5) Inclusion of various sentence lengths (breathing support, fatigability); (6) Control of the mean length of utterance; (7) Control of the syntactical complexity; Global-level criteria.
(1) Calculation of a readability index (e.g. Flesch-Kincaid); (2) Control of the topic (e.g. 'modern' topic, emotionally neutral content to avoid emotional charging and minimise memory effects); (3) Control of the overall length of the reading passage.

Delphi rounds and data analysis
This Delphi consensus survey was conducted in three consecutive rounds, between July and December 2020. The termination criterion defined prior to data analysis was either consensus on all items -i.e. agreement by at least 75% of the expert panel (Denman et al., 2019;Diamond et al., 2014) -, or stability of responses.
Round 1 was available for two months and a half; rounds 2 and 3 were available for one month. The first questionnaire was open access, with built-in duplicate checking and security procedures. The subsequent questionnaires were restricted to previous participants and required an individual single-use ID provided by the moderator. All questionnaires were piloted by five researchers to get an estimate of the response time and to detect and correct potential execution problems (glitches and logical structure issues).
In each round, participants who did not agree with a statement were encouraged to explain the reason of their disagreement, but comments were not mandatory so as not to bias towards positive answers.
The qualitative and quantitative data were analysed using descriptive statistics (frequencies, means, standard deviations, medians, interquartile ranges) on Stata/MP software (version 14, StataCorp, College Station, TX).
Round 1. The first round was constructed in two parts.
In the first part, participants listed the measures or ratings they wished to be able to use on a standardised reading passage in open-ended responses (at least 3). Two raters, blinded to the identity of the participants, carried out conventional content analysis (Denman et al., 2019;Hsieh & Shannon, 2005) of these open-ended responses to identify semantic groupings.
The second part addressed the selection and prioritisation of the most important criteria for the construction of a standard reading passage. Participants were asked to select and rank, among the list of 25 criteria described above (see "Study design -2. Identification of criteria), those they considered essential to control for the creation of a new reading passage. Any criteria that were not selected were considered non-essential; additional criteria could be suggested by the participants, specifying the target language. The responses and comments from this first round were analysed independently and anonymously by the same two raters.
Three measures were made for each criterion: mean rank (MR), the frequency with which a criterion was rated (and thus was considered by the participant to be essential; rating frequency, RF), and the mean rank when rated (MRR, thus excluding non-ratings).
For the MRs, non-ranked criteria were assigned the lowest value of 26 (because there were 25 available criteria). This measure is a composite indicator, taking into account not only the number of times each criterion was rated, but also the rank assigned to it at each rating.
RFs and MRRs, on the other hand, allow for different nuances of information: while the MRR informs us about the relative importance of each criterion for the group of experts who selected it, the RF informs us about the proportion of experts who judged a criterion as relevant, without, however, taking into account the relative importance assigned to it. The thresholds for a criterion to be considered essential to the panel of experts were set a priori: a criterion had to be ranked by at least half of the participants (i.e. the majority of experts found it important), and its MRR had to be less than 7.5/25 (33%) (i.e. it had to be ranked important by those who selected it).

Round 2.
Responses and comments from round 1 were analysed in an anonymised manner by the same two raters. They were then synthesised and fed back to the participants for contextualisation in round 2 (i.e. reminder of the task used in round 1, followed by a summarised overview of the quantitative and qualitative results), together with four new statements based on the previous response.
In this second round, participants rated a set of the most and best ranked criteria from the first round, as well as three additional statements regarding further criteria (inclusion of pseudowords based on participants' comments, as well as two mixed-opinion criteria from round 1). All statements were rated using binary answers (Agree/Disagree), with optional comments.

Round 3.
A third round was necessary to clarify one statement, which was found to be somewhat ambiguous in round 2. Table 1 summarises the measures and ratings participants want to be able to use on a standard reading passage according to their open-ended responses (by decreasing order of mentions). Figure 2 shows the 25 presented criteria and their average ranking by the participants. The eight most and best ranked criteria are listed in Table 2, in decreasing order of the mean rank (MR, out of 25).

Selection and prioritisation of criteria
The two most poorly ranked criteria also had the smallest occurrences of selection: Two other criteria were highlighted in the analysis, in light of the adverse results between the number of occurrences and the MRR (hereafter termed 'divergent criteria'): • The overall length of the reading passage was ranked by 53% of the participants, but with an MRR of 9.14. This criterion thus seems important to consider by the majority of participants, but not as a priority. • The vowel-to-vowel coarticulation was ranked by only 20% of the participants, but with an MRR of 6.5. Therefore, although it was not selected by the majority, it seemed to be quite an important criterion for a subgroup of the experts.  Furthermore, 7.5% (3/40) of the experts add in their comments that the text should also contain pseudo-words (which were not initially included in the available criteria). This element was added to round 2 for validation by the entire panel.

Round 2
Participants then rated a set of the most and best ranked criteria from the first round, as well as two additional statements regarding the 'divergent' criteria and one regarding pseudowords. The results of this round are presented in Table 3 as percent agreement for each of the statements grouped by target concepts (asterisks indicate that the consensus threshold was not reached). Consensus was reached for three of the four statements (>75% agreement among raters), with high agreement (>90%) for two of them. For the set of eight essential criteria, only one expert did not indicate agreement, stating that criteria 7 and 8 were not important to him because they can be tested in other ways. Only 50% of the experts agreed with the statement about pseudowords. Of these, 41% were strictly against the inclusion of pseudowords in a text (mainly because they are unnatural and could therefore have an impact on prosody), while another 41% specified that they were allowed but not necessary (depending on the purpose of the text).

Round 3
A third round was necessary to clarify this statement about the inclusion of pseudowords. Indeed, reading the comments revealed that there seemed to be some ambiguities in the wording. Therefore, the statement was rephrased and clarified in the third round: 'The reading passage should contain pseudo-words in order to assess acoustic-phonetic decoding, provided that the integration is done in a way that best respects the natural character of the reading (e.g. pseudowords as proper names or incantations)'. In this final round, with 73% agreement (24/33), respondents still did not reach a consensus. Those who still disagree mainly stand by their opinion that pseudowords are not natural enough; one recommends using low-predictability sentences instead. Given this lack of consensus, it was concluded that pseudowords should not be systematically included in a text for speech and voice assessment. The focus (at least for perceptual assessment) should be on comprehensibility, in a more functional and ecological way. The assessment of intelligibility, on the other hand, hence requires a specific and distinct task, using unpredictable stimuli (pseudowords, minimal word pairs or unpredictable sentences).

Final outcome
This international Delphi consensus study allowed us to identify ten main criteria among the original 25 to take into account when creating a reading passage for the assessment of speech and voice, according to the expert panel. This set contains five phoneme-level, two word-level, two sentence-level criteria and one global-level criterion, as shown in Figure 3.

Discussion
Reading passages are very commonly used, both in clinical practice and in research, as they provide predictable speech material with physiological demands resembling those of conversational speech (Mendes et al., 2012). The choice of the criteria to take into account when creating reading passages for speech and voice assessment in adolescents and adults may evidently vary depending on specific aims. However, the goal of this Delphi study was to provide a general baseline containing the criteria that the international expert panel judged to be essential for an overall assessment of speech and voice functions.

Expert panel
While a panel of 12 to 15 experts has been reported in numerous studies (McPherson et al., 2018), 40 experts participated in the first round of the present study. Furthermore, a low dropout rate of 17% from the first to the third round was observed, while rates of 20 to 30% are usually expected (Chalmers & Armour, 2019). The sample size is thus satisfactory, and even more importantly, the characteristics of the expert panel are interesting, as it includes participants from various speech-related fields, backgrounds, and cultural and linguistic contexts. Furthermore, the expert panel profiles remained constant throughout the Delphi process (see Appendix A). Hence, the Delphi panel was considered as satisfying both in size and composition to address the main objective of this study.

Criteria for the creation of standard reading passages
The resulting set validated by this panel contains five phoneme-level, two word-level, two sentence-level criteria and one global-level criterion. We will now discuss possible reasons behind this selection.
First, we hypothesise that the preponderance of phoneme-level criteria originates from the focus of the reading passage on speech rather than on language assessment. Speech intelligibility, for example, is indeed mostly assessed with a focus on segmental units, while prosody also plays a part (Pommée et al., 2021b). Hence, prosody was consistently selected by the expert panel in addition to the five phoneme-level criteria. Lexical frequency, syntactic complexity and the passage's readability, on the other hand, are criteria that relate more to linguistic and cognitive rather than to speech production skills. While a speech-related rationale exists (see Supplemental material 2), their relevance might thus be less straightforward, which could explain that they were not considered essential.
A second hypothesis regarding some of the selected criteria is that they are the most well-known and documented in the literature. The phonemic inventory and the phonetic balance, for example, are widely used in various speech production materials. Similarly, taking into account the positions of phonemes and their phonemic context is consistent with the extensive literature regarding the effect of coarticulation on speech production and perception (e.g. Katz et al., 1991;Liberman et al., 1967;Nguyen, 2001;Suomi, 1985;Van Son & Pols, 1999). What is more surprising, however, is that the repetition of cardinal vowels was not considered essential, while vowel space measures are the most commonly investigated in speech research (Pommée et al., 2021). Vowel-to-vowel coarticulation, on the other hand, although it is not yet widely documented and studied, was considered as very important by a subset of experts. No comment nor common participant characteristic could be identified to explain the importance of this criterion for this subset of experts.
A third hypothesis refers to the properties of the panel of experts. The two criteria more specifically relating to fluency disorders (ratio of content/function words; initial voiceless stops, clusters and unvoiced-voiced phoneme sequences) were not considered as essential. This can be explained by the fact that only two of the experts were exclusively specialised in fluency disorders, with a total of 33% of the experts being at least partly active in this field and an average percentage of patients with fluency disorders of only about 5%. Similarly, the inclusion of a 100% voiced sentence would be most interesting for experts specialising in the field of voice and phonation. While 58% of the experts state that they are at least partly active in the field of voice, none of them exclusively targets this field, and the average percentage of dysphonic patients (structural, functional and neurogenic voice disorders) only reaches about 15%. Hence, the non-selection of these criteria could be explained by the fact that the expert panel is mostly composed of professionals active in the field of speech disorders. The expert panel properties as well as their patient populations also explain the inclusion of criteria mostly relating to dysarthria and apraxia (e.g. repeated occurrence of some words for error consistency, articulatory complexity of words, inclusion of consonant clusters).
Overall, while using a consensus procedure allows to identify the main points on which a panel of experts agree, its use for the creation of new standard reading passages also has a pitfall regarding potentially under-represented subgroups of experts. Indeed, some criteria (e.g. vowel-to-vowel articulation, inclusion of a 100% voiced sentence and inclusion of pseudo-words), while discarded by most experts, nonetheless seem to be of interest for specific purposes. These niche criteria should therefore not be dismissed at first hand, but rather be considered as secondary as compared to the main set. If they can be included without impeding the main criteria or the readability, a new reading passage constructed with both the main and the niche criteria will benefit the largest number of potential users.

Construction of a new reading passage
How to proceed now that the minimal set of criteria for the creation of standard reading passages has been identified? This set is meant to be used as a general guideline. We applied these guidelines to create a new standard reading passage in French for patients aged at least 12 years (taking into account reading proficiency and allowing for a non-infantilizing reading content for adults). The description of the construction of this text goes beyond the scope of this article. However, to offer an idea of the process, its steps will now briefly be described to provide a baseline, e.g. for application in other languages.
The first step for the creation of our French reading passage was to set up an international and multidisciplinary working group of experts from Belgium, France and Quebec. This working group gathered to discuss existing reading passages and why these did not satisfy researchers' and clinicians' needs.
To get a broader view of these needs in the French-speaking community with regard to the speech and voice assessment, we carried out an online survey. This survey allowed us to identify the measures to be applied on the future text and the target populations, and to define the specific objective and scope of the desired outcome.
Once the aim was set, we carried out a review of existing reading passages in French. This step allows identifying the most suitable existing reading passage, if applicable. Considering the minimal set of criteria described above, it is important at this stage to decide whether to use an existing passage or to start from scratch. If reading passages exist for which the construction criteria have been described, these might provide a good working basis. Otherwise, starting from scratch might prove easier and timesaving.
In our case, the recent text from the MonPaGe protocol (Pernon et al., 2020) proved to be resourceful and rather well described. We thus contacted the authors to obtain permission to use it as a working base. This text was then adapted to meet the minimal set of criteria, as well as additional criteria based on the needs identified by our working group and taking into account the linguistic and cultural specificities of the French-speaking context. Furthermore, we decided to control as many criteria as possible in a first part of the text to allow for a short reading. This way, if a reading of the whole text is not possible because of time constraints or the patient's fatigability, this short reading will nevertheless allow extracting some parameters to analyse the patient's speech.

Limits and future perspectives
The minimal set of criteria provided in this study applies to the creation of standardised reading passages for the assessment of speech and voice in patients with sufficient reading proficiency. Reading passages are not ideal for the assessment of speech and voice in young children (e.g. for the early identification of developmental speech sound disorders) or in illiterate adults, as reading difficulties might affect the speech production.
The expert panel that participated in this study slightly weights towards neurogenic adult speech disorders (dysarthria and apraxia). Criteria that are more specific to the assessment of fluency and voice disorders might therefore have been discounted and could be considered in addition to the main criteria set.
It is also to be noted that the minimal set of criteria provides a guideline that mainly applies to Indo-European Romance and Germanic languages such as English, French and German. Other language groups, such as Indo-European Balto-Slavic or Sino-Tibetan languages (including Chinese), are not or only little represented in the expert panel. Separate studies may be necessary to identify the most important criteria that apply to the specificities of these languages. Furthermore, this set is not adapted for specific research aims, for which particular criteria might be needed (see for example, (Kuo & Weismer, 2016;Van Zundert et al., 1998)).

Conclusion
Despite their importance for the assessment of speech and voice, available standard reading passages present various limitations. To allow for the creation in various languages of new texts for standard and ecological overall speech and voice assessment, an international Delphi consensus survey was carried out to identify the most important criteria to take into account. This study results in a minimal set of ten criteria which were selected by a majority of the experts and were rated as most important. It contains five phoneme-level, two word-level, two sentence-level criteria and one global-level criterion. It can be used as a general guideline for the creation of standard reading passages in Indo-European Romance and Germanic languages such as English, French and German.

Appendices
Appendix A -Description of the expert panel Combinations were possible (e.g. in countries with more than one official language). c Each participant distributed 100% over all given categories; results are average percentages across participants for each category.