Intelligibility and comprehensibility: A Delphi consensus study

Background : Intelligibility and comprehensibility in speech disorders can be assessed both perceptually and instrumentally, but a lack of consensus exists regarding the terminology and the related speech measures, both in the clinical and in the scientific fields. Aims : To draw up a more consensual definition of intelligibility and comprehensibility and to define which assessment methods relate to both concepts, as part of their definition. Methods & Procedures : A three-round modified Delphi consensus study was carried out among clinicians, researchers and lecturers engaged in activities in speech disorders. Outcomes & Results : Forty international experts from different fields (mainly clinicians, linguists and computer scientists) participated in the elaboration of a comprehensive definition of intelligibility and comprehensibility and their assessment. While both concepts are linked and both contribute to functional human communication, they relate to two different reconstruction levels of the transmitted speech material. Intelligibility refers to the acoustic-phonetic decoding of the utterance, while comprehensibility relates to the reconstruction of the meaning of the message. Consequently, the perceptual assessment of intelligibility requires the use of unpredictable speech material (pseudo-words, minimal word pairs, unpredictable sentences), whereas comprehensibility assessment is meaning-and context-related and entails more functional speech stimuli and tasks. Conclusion & Implications : This consensus study provides the scientific and clinical communities with a better understanding of intelligibility and comprehensibility. A comprehensive definition was drafted, including specifications regarding the tasks that best fit their assessment. The outcome has implications both for clinical practice and for scientific research, as the disambiguation improves communication between professionals and thereby


Introduction
The assessment of speech disorders aims to evaluate several dimensions to allow for a comprehensive and individualized overview of each patient's speech.These dimensions include an examination of the orofacial sensitivity and motor functions, a functional assessment of respiration, phonation and resonance, articulation (motor planning, programming and execution), intelligibility (acoustic-phonetic decoding), comprehensibility (understandability), as well as the psychosocial impact of the speech disorder (Dykstra et al. 2007, Rumbach et al. 2019).
Both perceptual and instrumental measures can be used, the first of which still seem to be the most common option in clinical practice (Altaher et al. 2019, Pommée et al. 2021, Rumbach et al. 2019).However, there appears to be a lack of consensus regarding the terminology of the perceptual concepts related to speech, as well as how to assess them.A recent clinician survey in French-speaking countries indeed revealed a lack of standardization of the speech assessment, regarding its overall structure, but also the assessment tasks and stimuli used for each dimension (Pommée et al. 2021).Furthermore, the terms used by the speech-and-language pathologists in this study indicated a lack of clarity regarding their definitions, more specifically regarding intelligibility and comprehensibility.This ambiguity in the use of professional terminology is also observed in existing assessment batteries such as the French Batterie d'Évaluation Clinique de la Dysarthrie (Auzou and Rolland-Monnoury 2006), as well as in the scientific literature (Denman et al. 2019, Pommée et al. 2020, Walsh et al. 2006, Walsh 2005).
Wood already expressed the "terminology problem" in 1971: "Many terms and their meanings are not well crystallized because the subject matter is always changing; concepts themselves are often tentative and fluid ….This growth of speech pathology … has generated hundreds of terms, some of which are interchangeable, some of which have different means to different people" (Wood 1971).A more recent report by the Australian Institute of Health and Welfare confirmed that this issue has not yet been resolved: "Classification and terminology used to describe speech impairments are particularly fraught with inconsistency, in particular the use of different interpretations for the same terminology or different terminologies for the same meaning" (Australian Institute of Health and Welfare 2003).This issue can lead to communication issues between professionals and impede the efficiency of patient care (e.g. in a multidisciplinary team) as well as the progress of research (e.g. by hampering scientific debates and inducing difficulty to compare and combine research results) (Denman et al. 2019, Walsh et al. 2006, Walsh 2005).In addition to its impact in the clinical and in the scientific fields, the lack of consensual terminology also impedes the link between these two fields by affecting research translation (Denman et al. 2019, Roulstone 2015).
In light of the important clinical and scientific impact of terminological ambiguity, the main aim of this study is to draw up a more consensual definition of intelligibility and comprehensibility.It also addresses a secondary aim that is closely linked to the first, to define which assessment methods relate to both concepts (as part of their definition).
The three most commonly used methods in health-related research that target consensus are the consensus development conference, the expert panel (or "nominal group") and the Delphi method (Jones andHunter 1995, McMillan et al. 2016).The consensus development conference was originally developed by the National Institutes of Health (NIH) to validate the safety and effectiveness of health-related technologies and facilitate their transmission into clinical practice (Perry and Kalberer 1980).A panel of experts typically gathers for a few days (including all-night sessions) to draft and modify a consensus statement of recommendations.This method is rather tedious to organize and does not control for issues related to the group setting (e.g.social pressure to comply); also, all-night sessions lead to the likelihood of an agreement deriving from the panel's tiredness.Furthermore, no formal decision-making criteria or voting processes are used, and only qualitative estimations can be made (Letrilliart and Vanmeerbeek 2011).Conversely, the nominal group and Delphi methods are structured and systematic qualitative approaches that allow for quantitative results.Like the consensus development conference, the nominal group technique is also carried out in a face-to-face meeting of experts (McMillan et al. 2014), but uses highly structured consecutive rounds to rate and discuss a series of questions (see [Jones and Hunter 1995] for a description of the different stages).Finally, the Delphi method is also a multi-stage process, but uses self-completed questionnaires instead of a face-to-face setting and allows for the use of larger panels, as compared to a recommended panel size of seven experts for the nominal group survey (McMillan et al. 2016).Originally developed in a military context, the Delphi method has since the 1980s been applied to various fields of research (von der Gracht 2012) to make forecasts or make decisions about present issues (Chalmers and Armour 2019), generate ideas or determine priorities (McMillan et al. 2016).It is nowadays commonly used in health-related topics, such as guideline development (McMillan et al. 2016), assessment of treatment appropriateness (Beers et al. 1991), disease prevalence forecasts (Chin et al. 1990) and improvement of education and training in health professions (Fasser et al. 1992).One of the many advantages of this method, in addition to the cheap cost and to the absence of geographical limitations (the rounds can be carried out by mail or online), is its quasi-anonymous nature (von der Gracht 2012, Sinha et al. 2011).The participants' identity remains unknown to each other, which allows for freedom of expression without any social or professional pressure from peers.
The quasi-anonymous nature, the use of multiple rounds and the provision of structured feedback between the rounds allow to reduce bias in the consensus-aiming process (Chalmers and Armour 2019).
In light of its many advantages, the Delphi method was used in this study to draw up a more consensual definition of intelligibility and comprehensibility and their assessment.

Ethics Approval
This research was registered with the data protection officer of the Centre National de la Recherche Scientifique (CNRS) and was also approved by the computer science ethics advisory board (Comité consuLtatif d'Éthique concernant la Recherche en Informatique de Toulouse -CLERIT).An information sheet describing the purpose of the study, the participant's rights and the data privacy policy was provided to all participants prior to the first round of the survey.Participants gave consent for their answers to be used in an anonymized and aggregated manner to derive consensus statements.Only e-mail addresses were known to the moderator, in order to enable round-to-round survey monitoring.

Study Design
The Delphi methodology is conducted over several consecutive rounds (usually three) (Birko et al. 2015, Linstone andTuroff 2002).In a "modified Delphi study", statements to be rated are directly provided in the first round, based on a preliminary literature search (Cunningham et al. 2019, Denman et al. 2019).After each round, the panel's responses are synthesized, and areas of agreement/disagreement are identified.Aggregated controlled feedback (von der Gracht 2012) is then provided to the panel in the following round to explain modifications that have been made to facilitate consensus, and participants give their opinion on the new assertions (Denman et al. 2019, Diamond et al. 2014).The iterations are usually carried out until consensus is reached or until stability in the answers is observed (see [Chalmers andArmour 2019, von der Gracht 2012] for examples of stability measures).
The stages of the present modified Delphi study are summarized in figure 1.The main steps which will be described hereafter are the problem definition, the panel selection, the literature search and the construction of the three consecutive Delphi rounds.

Definition of problem
As explained in the introduction, ambiguity exists in the speech-related terminology, particularly with regard to intelligibility and comprehensibility.This ambiguity is noted both in the clinical field and in the literature.It can lead to a communication problem between professionals and impede the efficiency of patient care as well as the progress of research.It also leads to a lack of consensus on the speech tasks and measures to be included in a standard speech assessment.Therefore, the aim of this Delphi study is to draw up a more consensual definition of intelligibility and comprehensibility and their assessment.

Selection of Experts
We targeted professionals (clinicians, researchers, lecturers) engaged in activities in speech sound disorders 1 and/or fluency disorders (stuttering/stammering). "Activities" were defined as clinical activity, research, academic or industrial activity, or a combination of these, if at least approximately 20% related to speech (self-estimated by the participants).These professionals were required to be able to read English at an intermediate level.

Recruitment was carried out via:
-national professional associations (speech-language-hearing, phoniatrics, voice, acoustics, computer science/signal processing, linguistics/phonetics); over 200 organizations were contacted worldwide by e-mail -social networks (Twitter, Facebook), where targeted professions and associations were also solicited in private groups 1 See the ASHA's definition of speech sound disorders: https://www.asha.org/practice-portal/clinicaltopics/articulation-and-phonology/-e-mail to over 50 hand-selected speech experts identified in literature searches on PubMed, who had at least three publications in the field of speech, were authors of a reference book, or participated in research projects linked with pathological speech Non-respondents for each round were excluded from subsequent rounds.An a posteriori analysis using descriptive statistics was carried out to verify that the expert panel characteristics did not change.It was also verified that the increasing consensus throughout the Delphi rounds was not biased by the dropouts of predominantly disagreeing experts.To that end, quantitative data in all three rounds with and without the dropouts were compared, also using descriptive statistics.

Literature Search
Prior to launching the survey, a targeted non-systematic literature search was carried out by the first author to identify various definitions of intelligibility and comprehensibility.This search was carried out using PubMed as well as reference lists of known articles on the topic.The definitions had to be explicitly mentioned in the research paper, with the terms "intelligibility" and "comprehensibility".Five definitions, which were considered to best reflect the different interpretations of both terms, were retained to be presented in the first round of this Delphi study: -Ghio et al. (Ghio et al. 2018) (translated from French2 ): "The perception of speech is a complex process that integrates both an ascending flow of information from the speech signal and a descending flow based on high-level information held by the listener.The bottom-up flow is mainly an acoustic-phonetic decoding operation that consists in identifying phonemes from the speech signal.Phonemes, which can be considered as the smallest units for opposing meaning, are the basic elements of speech intelligibility.[…] Acoustic-phonetic decoding is therefore the fundamental process for perceptually measuring a speaker's intelligibility."-Hodge et al. (Hodge and Whitehill 2010): "Intelligibility, or how understandable one's speech is to another, is a functional indicator of oral communication competence.It reflects a talker's ability to convert language to a physical signal (speech) and a listener's ability to perceive and decode this signal to recover the meaning of the talker's message."-Hustad (Hustad 2008): "Intelligibility refers to how well a speaker's acoustic signal can be accurately recovered by a listener."-Yorkston et al. (Yorkston et al. 1996): "The term intelligibility refers to the degree to which the acoustic signal (the utterance produced by the dysarthric speaker) is understood by a listener.[…] The concepts of comprehensibility and intelligibility may be distinguished by the fact that comprehensibility incorporates signalindependent information such as syntax, semantics, and physical context." - Barefoot et al. (Barefoot et al. 1993): "[…] comprehensibility is defined as the extent to which a listener understands utterances produced by a speaker in a communication context.In our view, comprehensibility pertains to the domains of both speech and language, whereas intelligibility pertains principally to the domain of speech.The primary distinction between comprehensibility and intelligibility is that comprehensibility is intended to account for communication features of utterances that extend beyond the auditoryacoustic domain.Comprehensibility, in our use of the term, explicitly incorporates contextual features such as syntax, semantics, and pragmatics, and involves face-to-face communication activity in which meaningful utterances are produced by talkers and processed by listeners." The following statement was also added to stimulate the participants' reflection: "Intelligibility and comprehensibility can be used as synonyms."

Delphi Rounds and Data Analysis
This Delphi consensus survey was conducted in three consecutive rounds, between July and December 2020.Round 1 was available for two months and a half; rounds 2 and 3 were available for 1 month.The online questionnaires are still available on the LimeSurvey platform 3 .The first questionnaire was open access, with built-in duplicate checking and security procedures.The subsequent questionnaires were restricted to previous participants and required a token provided by the moderator.All questionnaires were piloted by five researchers to get an estimate of the response time and to detect and correct potential execution problems (glitches and logical structure issues).
In each round, participants who did not agree with a statement were encouraged to explain the reason of their disagreement, but comments were not mandatory so as not to bias towards positive answers.
The qualitative and quantitative data were analyzed using Stata/MP software (version 14, StataCorp, College Station, TX).

Terminology (13 questions):
open-ended definitions of intelligibility and comprehensibility4 listing of any other terms used when assessing/describing speech disorders degree of agreement with six definitions/statements from the literature regarding intelligibility and comprehensibility, on a 6-point scale (1: Disagree Strongly -6: Agree Strongly) and optional comment ranking of the same definitions/statements in decreasing order of preference Two raters (first and second authors), blinded to the identity of the participants, carried out conventional content analysis (Denman et al. 2019, Hsieh andShannon 2005) of the open-ended definitions as well as of the definitions from the literature, to identify the main recurring themes and concepts.Frequency analysis was used to identify trends by quantifying the number of experts mentioning each of these concepts in their open definitions of both intelligibility and comprehensibility.The degrees of agreement and preference rankings regarding the definitions from the literature were also analyzed using frequency analysis, taking into account the concepts included in each of the definitions.
Together, all of these results were then used to draft 22 statements for round 2, targeting each of the identified main concepts relating to intelligibility and comprehensibility.
Open answers on other terms used to describe and assess speech disorders were semantically grouped into generic and specific terms by the two raters, and frequency analysis was applied.

Assessment methods (five questions):
perceptual assessment of intelligibility and comprehensibility (multiple choice with "Other" and "None" options) and listing of any other perceptual speech measures -"objective" assessment of intelligibility and comprehensibility (multiple choice with "Other" and "None" options) Frequency analysis was used to identify the main trends regarding perceptual and "objective" assessment methods of intelligibility and comprehensibility.

Round 2
Responses and comments from round 1 were synthesized and fed back to the participants for contextualization in round 2, together with the 22 new statements based on the previous responses.This second round was constructed in two parts; all statements were rated using binary answers (Agree/Disagree), with optional comments: 1) Terminology (14 statements): the new statements were grouped into the six concepts identified in the content analysis from round 1 (cf."Terminology -Intelligibility and Comprehensibility" in the Results section) 2) Speech assessment methods (eight statements): the new statements regarding the perceptual and "objective" speech assessment of intelligibility and comprehensibility were also based on results from round 1 Round 3 A third round was necessary to clarify three statements, which were found to be somewhat ambiguous in round 2. A draft definition paragraph of intelligibility and comprehensibility, integrating all the consensual elements from the previous rounds, was also provided.

Consensus and Stop Criteria
The threshold for consensus was defined before data analysis as the agreement of at least 75% of the expert panel (Denman et al. 2019, Diamond et al. 2014).
The planned maximum number of rounds was four, and the stop criterion was the obtention of consensus on all items from each of the two main investigated parts (terminology and speech measures) or stability in the responses.
Detailed data for the participants in each round are available in Appendix A. The trends described hereafter are constant throughout the three rounds despite the dropouts.Percentages between brackets are ranges across the three rounds.
A majority of the expert panel are speech and language pathologists (SLPs) (70-73%) working in the fields of speech, fluency and voice disorders.Other major groups of participants are linguists (23-24%), ENT/phoniatricians (20-21%) and computer scientists (18-21%).More than half of the experts (58-61%) have at least 10 years of experience working in the speech and voice domains.Their main activity is research for 35-42%, clinical practice for 27-33% and academic activity for 27-28% of the experts (40-46% are associate professors); only two initial participants are engaged in industrial activity.Eighty-five to ninety-four percent of the experts are engaged in at least two main activities; clinical activity and research are combined in 53-55%.More than half of the participants have a third-cycle diploma (PhD, 58-64%) obtained on average in 2009-2010 (±8 years).
The patient/study populations are rather balanced regarding the age groups, with a slight prevalence for the elderly (32-37%) population.Also, the most encountered are acquired and degenerative neurological pathologies (38%-44%).

Preliminary note on dropouts
The a posteriori verification revealed that removing the seven dropouts from the analysis in all three rounds did not significantly change the conclusions and consensus values regarding the terminology and the measures of intelligibility and comprehensibility.
The six dropouts after round 1 agreed with the majority of the other experts.The percentages of agreement with the proposed definitions in round 1, for example, decreased by 0-3% when the dropouts were excluded.After round 2, only one additional dropout was counted.The participant was one of three participants who agreed to all of the proposed statements in round 2. The impact on the consensus rate was therefore considered to be minimal, if not positive regarding the reliability of the final results.

Terminology -Intelligibility and Comprehensibility
The conventional content analysis on the open definitions of intelligibility and comprehensibility revealed six main concepts, which also featured in the subsequently presented definitions from the literature: -Synonymy: mentions of intelligibility and comprehensibility being synonyms Seventy-eight percent (31/40) of the experts disagreed with the statement "Intelligibility and comprehensibility are synonyms" (mean degree of agreement: 2.18/6, mode: 1/6).
Seventy-three percent (29/40) ranked this statement last in preference relative to the other five.
Only five percent (2/40) ranked it at the first place.Fifteen percent (6/40) of participants highlighted in their comments that intelligibility refers to the speaker rather than to the listener.

Message Reconstruction
The majority of participants, in their spontaneous definitions, noted that intelligibility and comprehensibility allow the reconstruction of a message by a listener (intelligibility: 63%, 25/40; comprehensibility: 85%, 34/40).Regarding intelligibility, 90% (36/40) specified that the message is conveyed by the sound signal.
Ninety-eight percent (39/40) agreed with Yorkston et al.'s definition (mean degree of agreement: 5.48/6, mode: 6/6), which states in relation to intelligibility that the information is carried by the acoustic signal.

Functional communication
In

Phonetic-acoustic production
In their open-ended definitions, with regard to intelligibility, some participants indicated that the reconstruction of the message is allowed by the speaker's phonetic-acoustic production ability (10%, 4/40), in order to obtain a message that is clear (clarity: 15%, 6/40) and easily understood by the speaker (ease of understanding: 8%, 3/40).
Ninety percent (36/40) agreed with Hodge et al.'s definition, which is the only one to take into account the concept of phonetic-acoustic production relating to intelligibility (alongside the concepts of acoustic-phonetic decoding, communication and the recovery of the meaning of the message).

Acoustic-phonetic decoding
Spontaneously, participants indicated that in the context of intelligibility the reconstruction of the message is based on the acoustic-phonetic decoding abilities of the listener (35%, 14/40), and that it is linked to the sensory capacities of the listener (5%, 2/40).
Ninety-five percent (38/40) agreed with Ghio et al.'s definition, and 100% with Hustad's, which both exclusively link the concept of acoustic-phonetic decoding to intelligibility.However, some participants (8%, 3/40) raised doubts about the limitation to the phonemic level, as well as the exclusion of higher-level elements.
Ninety-eight percent (39/40) agreed with Yorkston et al.'s definition, which equates intelligibility with acoustic-phonetic decoding and comprehensibility with the reconstruction of the meaning of the message using syntactic, semantic and contextual information.Fifteen percent (6/40), while agreeing with this definition, felt that intelligibility also incorporates signal-independent information.Both participants who agreed and who disagreed with Hodge et al.'s definition highlighted that it describes comprehensibility more than intelligibility.
Together, all of these results were used to draft 14 terminology-related binary assertions (Agree/Disagree) for round 2, targeting each of the six previously described main concepts.
These statements, together with the resulting percentages of agreement in round 2, are shown in Appendix B.1.
The respondents did not reach agreement on one statement in round 2: only 41% (14/34) of them agreed with the statement "The assessment of intelligibility and comprehensibility should not take into account the perceptual abilities of the listener."In their comments, participants who disagreed pointed out that perception is part of communication and should be taken into account (95%, 19/20), although mainly in comprehensibility (30%, 6/20).Furthermore, the respondents' comments indicated that perception could be interpreted at different levels of the communication loop: at the peripheral auditory level (hearing screening), but also at the level of receptive language skills of the listener, as well as with regard to the auditory context (e.g.

background noise).
Therefore, in round 3, the assertion was specified as follows: "In the context of perceptual assessments, while listener's speech perception factors have to be controlled beforehand (i.e.listener's hearing, but also receptive language skills and auditory context), intelligibility and comprehensibility are used to assess the talker's speech production".Reformulated as such, 85% (28/33) of the participants agreed with this statement.Three of those who disagreed (60%) again indicated that both concepts, but more specifically comprehensibility, also include the listener's ability to reconstruct the utterance/message.

Perceptual measures
According to the participants' answers in round 1 regarding the perceptual measures which best describe speech intelligibility and comprehensibility (see figure 2): -Intelligibility is best measured using orthographic transcription scores (e.g.%-correct items):  Ratings on low-level linguistic units (phonemes, pseudo-words, words) are most commonly used for intelligibility (38% for pseudowords, 48% for phonemes, 50% for words).
Higher-lever ratings are preferred for the assessment of comprehensibility (e.g.60% for semantic content questions).Word-level measures are associated with intelligibility more than with comprehensibility, consistent with the concern raised regarding the reduction of intelligibility to the phoneme-level.Three participants (7%) highlighted that word-level scores are more functional and allow to take into account coarticulation.Two others (5%) emphasized that the use of word-level ratings remains a challenge because of the memorization by the listener and compensation processes based on their linguistic knowledge.Furthermore, two other experts (5%), while agreeing that low-level units are of major interest to assess speech intelligibility, highlighted that phrase-level symptoms such as respiration in dysarthria are neglected.

"Objective" measures
According to the participants' answers in round 1 regarding the "objective" measures which best describe speech intelligibility and comprehensibility (see figure 3  Based on these results, six binary statements (Agree/Disagree) regarding the assessment of intelligibility and comprehensibility were constructed for round 2. These statements, as well as the resulting percentages of agreement in round 2, are shown in Appendix B.2.The respondents did not reach consensus on one of these statements from round 2, relating to the granularity or level of analysis for the acoustic assessment of intelligibility: "Intelligibility is best assessed using phoneme-level acoustic measures."Seventy-four percent (25/34) of the experts agreed with it.Those who disagreed specified that these measures are not exclusive, and that a combination of phoneme, word and sentence-level measures is recommended, taking into account various phonetic contexts.However, from the comments of the respondents, it appeared that some had interpreted the statement as referring to measures on isolated phonemes only, thus not taking into account the phonemic context.Therefore, this statement was reformulated in round 3: "Intelligibility is best assessed using consonant, vowel and glide acoustics (incl.inter-phoneme formant transitions), be they on isolated phonemes, or embedded in syllables, in (pseudo-)words or in sentences."The consensus threshold was reached for this more specific assertion, with 79% (26/33) of the experts now agreeing (note that the participant who dropped out after round 2 agreed to the original assertion; his withdrawal did thus not impact the observed increase of consensus).Three of those who disagreed (43%) specified that there is no "best" way, but rather a necessity to consider several concepts and dimensions.The term "best" was consequently avoided in the integrated definition (see "Final outcome" hereafter).
Still pertaining to the assessment of intelligibility, a second statement of round 2 caught the authors' attention: "Intelligibility also includes signal-independent elements."While reaching the consensus cut-off (76% [26/34] agreed), this result was highly inconsistent with other responses (e.g.97% agreement to the assertion "The intelligibility of a message is specifically carried by the acoustic signal.").There seemed to be uncertainty about the term "signalindependent elements", which was explicitly uttered in some respondents' comments.The intended meaning of "signal-independent elements" was: all information that is not carried by the acoustic signal, including the knowledge of the conversation topic, the general knowledge, the use of the linguistic context and of non-verbal communication… Hence, "signalindependent elements" referred to the top-down cognitive processes that are independent of the acoustic-phonetic decoding (bottom-up) processes.Accordingly, the statement was rephrased in round 3: "Intelligibility, as opposed to comprehensibility, does not include signalindependent elements (i.e.information from cognitive processes: knowledge of the conversation topic, general knowledge, use of the linguistic context and of non-verbal communication….)".This new phrasing indeed yielded 91% (30/33) agreement.Hence, signalindependent elements were related to comprehensibility rather than to intelligibility in the above definition.

Final Outcome
The aim of this study was to draft a more consensual definition of intelligibility and comprehensibility.Throughout the Delphi process, it quickly appeared that SLPs/phoniatricians, computer scientists, linguists, audiologists, etc. have slightly different but complementary views of the concepts at hand.The following comprehensive passage includes all the consensual elements gathered throughout the Delphi process and tries to reconcile the points of view from the different fields of expertise: Intelligibility and comprehensibility are two terms relating to speech, but they are not synonyms.They both refer to the assessment of the speaker's production abilities and both contribute to communication.Hence, while speech production is targeted, the listener's speech perception factors cannot be dismissed (i.e.listener's hearing loss should be excluded at least).
Intelligibility refers to the reconstruction of an utterance at the acoustic-phonetic level, intelligibility-related information is thus carried by the acoustic signal (i.e.intelligibility focuses on signal-dependent information).This reconstruction is made possible both by the speaker's phonetic-acoustic production ability and by the listeners acoustic-phonetic decoding skills.
Perceptually, intelligibility is best analyzed on low-predictability stimuli: phonemes, syllables, pseudo-words, but also words (in minimal pairs) and unpredictable sentences for a more functional assessment taking coarticulation and phrase-level symptoms into account (e.g.respiration and prosody), as long as top-down cognitive compensation processes of the listener are avoided (i.e.no help from semantic or linguistic context).
Objectively, intelligibility can be assessed using consonant, vowel and glide acoustics (incl. inter-phoneme formant transitions), be they on isolated phonemes, or embedded in syllables, in (pseudo-)words or in sentences.Furthermore, in some cases, voice quality also contributes to intelligibility, as it plays a role in certain phonemic contrasts.Supra-segmental parameters (e.g.objectively assessed by speech rate or stress) also contribute to intelligibility.
Comprehensibility refers to the reconstruction of a message at the semantic-discursive level, subsequent to the acoustic-phonetic reconstruction.Therefore, intelligibility is a component of comprehensibility.In addition to the acoustic-phonetic decoding, it also includes signal-independent, contextual elements such as the linguistic or the non-verbal context.However, one can be comprehensible without all low-level units necessarily being accurately decoded; therefore, while intelligibility affects comprehensibility, the latter is, however, not fully dependent on it.
Comprehensibility refers to the more functional dimension of communication and is perceptually best assessed using meaning-related ratings (i.e.taking into account top-down cognitive processes which might compensate for degraded acoustic-phonetic information).
Nowadays, no objective instrumental measure is yet suitable to assess comprehensibility per se (i.e. the transmission of the overall meaning of the message).However, some suprasegmental parameters contribute to comprehensibility and can be objectively assessed (e.g.timing and intonation measures).
and Derwing 1995, Field 2005) or endorsing one from the literature (e.g.Berns 2008).Smith and Nelson (1985), for example, in addition to providing short definitions of intelligibility, comprehensibility and interpretability, also presented examples to illustrate the distinctions.Nelson (2008) provides an historical overview of intelligibility and comprehensibility in the study of world Englishes.Thomson (2018) also provides a literature review on the use of intelligibility and comprehensibility and stresses the need for greater consistency in their definition.However, to our knowledge, no study has used a methodological procedure to reach a consensus on these terms.Our Delphi study resulted in a comprehensive definition of intelligibility and comprehensibility, integrating all the consensual elements identified throughout the process.Figure 4 summarizes this definition and illustrates the relationship between the two concepts.

Figure 4 Intelligibility and comprehensibility in speech production
One way to differentiate intelligibility and comprehensibility in the proposed definition is by the term used to refer to the respective (re-)construction process: "utterance" is used to relate to the acoustic-phonetic speech material (and, thereby, to intelligibility), while "message" is used as a broader term referring to the (re-)construction at the semantic level (i.e.comprehensibility, thus also including elements of intelligibility).Indeed, as two participants suggested in their comments, while the term "message" can be defined as referring to the "underlying theme or idea"6 (thus, with a sense of "meaning"), the term "utterance" detaches itself from the communicated semantic content and rather relates to the transmitted acoustic signal.Also, while intelligibility and comprehensibility are mostly meant to describe an individual's speech production, the listener side, from which both concepts are usually assessed and defined, also plays an important role.Indeed, it proved to be unequivocal in our Delphi study that intelligibility and comprehensibility both contribute to communication, and that functional human communication ("a process by which information is exchanged between individuals"7 ) by definition requires both a speaker and a listener (Schramm 1954).
Intelligibility thus integrates not only the accuracy of the phonetic-acoustic production by the speaker, but also the acoustic-phonetic decoding by the listener.Comprehensibility additionally involves numerous higher-level factors, which are also both speaker-related (e.g.non-verbal cues and intonation to compensate for low intelligibility) and listener-related (e.g.listener's knowledge of the speaker, of their intentions and emotions).
Figure 5 further illustrates intelligibility and comprehensibility as part of the communication loop.At each of the latter's levels, variations can occur depending on natural or pathological characteristics of the speaker and of the listener (e.g.gender, age, regional accent, motor speech disorder…).The stages of this loop that form the concept of intelligibility are highlighted in gray.The remaining levels, which pertain to the semantic and pragmatic (re-)construction of the intended message, further account for the concept of comprehensibility.The latter, in addition to the linguistic content, also includes paralinguistic (e.g.sigh, grunt…) and extralinguistic cues (e.g.body language and facial expressions), as well as contextual elements (e.g.prior knowledge of the conversation topic, of the communication partner and field of common experience), which can facilitate the reconstruction of the transmitted message.

Speech measures
As discussed above, intelligibility and comprehensibility are most often defined from the listener's perspective, with regard to the reconstruction of the message rather than to its initial construction by the speaker.More specifically, definitions of both concepts usually refer to how they are assessed.Therefore, elements relating to perceptual and acoustic speech measures have also been included in the resulting definition of this Delphi study.
Perceptually, intelligibility can be assessed using low-predictability stimuli (phonemes, syllables, pseudo-words, minimal word pairs or unpredictable sentences) 8 .As mentioned by the participants, the use of real words and standard sentences, while more functional, can be subject to memorization by the listener if the assessment is carried out by the speech pathologist.
Ideally, the speech should thus be assessed by an unfamiliar listener, to avoid any prior knowledge of the expected speech stimuli.Additionally, words and sentences are subject to compensation processes based on the listener's linguistic knowledge and hence border comprehensibility assessment, even if carried out by a lay person.Nonetheless, as illustrated in figure 5, in oral communication the acoustically encoded and decoded phoneme sequences are integrated in an utterance with its prosodic (suprasegmental) features.Both segmental and suprasegmental features thus contribute to intelligibility.For these reasons, minimal word pairs and unpredictable sentences were included in our definition, in addition to phonemes, syllables and pseudo-words.Minimal word pairs are more functional than isolated phonemes, syllables or pseudo-words, as they reflect the discrimination of meaningful units while minimizing the influence of linguistic information.Unpredictable sentences additionally allow coarticulatory and phrase-level phenomena to be taken into account.Preferably, the assessment should still be carried out by a lay person, unless the stimuli are randomly drawn from a large database with distractors to prevent prediction.
In order to assess intelligibility with less subjectivity, one might want to cast off the listener dimension.One way to do so is to turn to computer-aided measures.With regard to the acoustic assessment of intelligibility, while phoneme-level acoustic measures are a major instrumental indicator, the phonemic and suprasegmental contexts in which the target phonemes occur have to be taken into account (just like in perceptual measures, as previously discussed).Ideally, intelligibility measures should thus be carried out on running speech to allow for a more functional assessment, reflecting natural speech conditions.The automatic assessment of intelligibility, for example, is usually carried out on continuous speech but excludes any contextual clues from the algorithms to focus on the assessment of the phonetic-acoustic production (Fredouille et al. 2019).The automatic assessment of comprehensibility, however, still remains problematic.The typical modus operandi of automatic speech recognition (ASR) systems for example can only partly be linked to the concept of comprehensibility.Some contextual cues are indeed used by the algorithms to reconstruct spoken messages.Comprehensibility is then assessed by examining the accuracy of the outputs of these ASR systems.Additionally, the lexical, syntactical and semantic aspects the speaker's production can then be further analyzed.However, human communication is far more complex and involves numerous para-and extralinguistic dimensions that today's ASR systems do not yet take into account.Computer-aided measures therefore still only partially account for speech comprehensibility in human communication.Nonetheless, it is to be noted that the inclusion of extra-and paralinguistic and contextual information are being more and more investigated, particularly in the field of human-computer interactions, and that promising results have been observed (Kennington et al. 2015, Porzel 2011, Schuller et al. 2013, 2019).
To conclude, both acoustic and perceptual measures at different levels of granularity (i.e.segmental and suprasegmental) and on various speech materials (isolated phonemes and syllables, words, pseudowords and sentences) should be taken into account.Only their combination allows for a comprehensive assessment of both intelligibility and comprehensibility and thereby provides information on the patient's speech both at the segmental and functional levels.

Limits and Perspectives
The term "objective" was initially used in the online survey.However, the notion of "objectivity" is subject to discussion, some experts arguing that even acoustic measures remain somewhat subjective, as they are carried out by humans, with subjective biases remaining in the recording procedure, analysis settings, choice of the stimuli, of the window analysis… Therefore, this term was used in quotation marks throughout this manuscript.The initially intended meaning was "reproducible, instrumental measures" as compared to perceptual, more subjective methods.In further studies targeting more consensual definitions of speech-related terms, more consideration needs to be given to the choice of vocabulary used throughout the process.
Several ambiguities persist regarding the terminology related to speech assessment, for example regarding the terms "objective", "subjective", "instrumental", "perceptual" and "measure", for which various interpretations can still be found in the literature.As one expert from the panel underlined, "… it seems cross-lingual semantics may be part of the tricky issue rather than the scientific principles which we probably agree upon but the semantics/terms are the most challenging part of this problem."Therefore, further studies are needed to clarify terminology-related issues, generate more unified definitions and allow for an easier progress of research in speech and voice through better communication among experts.The Delphi methodology seems to be an appropriate medium to that end, in light of its features developed in the introduction.As participants pointed out, the Delphi process is "thought-provoking and intellectually challenging", and sometimes calls into question ones sometimes long-standing daily used terminology.Through the use of multiple iterations, the Delphi process thus stimulates a more insightful problem-solving mindset (Hsu and Sandford 2007).Pending a more consensual speech-related terminology, it is highly recommended that authors clearly define targeted concepts when introducing their research, so as to avoid any ambiguities and allow people from various backgrounds to unequivocally understand their intended meanings.

Conclusion
This Delphi consensus survey has enabled the drafting of a comprehensive definition of intelligibility and comprehensibility, including all the consensual elements gathered throughout the process and thus taking into account the points of view from different fields of expertise.
The result of this process allows clinicians and researchers to get a better understanding of these two commonly used speech-related terms.It enabled us to specify their assessment by describing the tasks that best fit their comprehensive definition.While intelligibility and comprehensibility are linked and both contribute to functional human communication, they relate to the reconstruction of the transmitted speech material at two different levels.
Intelligibility refers to the acoustic-phonetic decoding of the utterance, while comprehensibility relates to the reconstruction of the meaning of the message.Consequently, the perceptual assessment of intelligibility requires the use of unpredictable speech material (pseudo-words, minimal word pairs, unpredictable sentences), whereas comprehensibility assessment is meaning-and context-related and entails more functional speech stimuli and tasks.
The terminological disambiguation helps to improve communication between experts in the field of speech disorders and thereby benefits the progress of research as well as research translation.In a clinical perspective, less ambiguous communication between professionals (e.g. in a multidisciplinary team) allows to improve the efficiency of patient care.Furthermore, this study allowed us to specify, for clinicians and researchers, the assessment tasks that best fit the definition of both intelligibility and comprehensibility, thereby providing valuable information to improve speech disorder assessment and its standardization.In round 2, respondents have reached a consensus on 20 out of 22 statements (i.e., > 75% agreement between raters).A high agreement (> 90%) was even reached for 14 of these statements.Results for this second round are reported as frequencies of agreement for each of the binary assertions (Agree/Disagree), grouped by target concepts and in decreasing order of agreement.In bold, the statements that did not reach the consensus threshold or were inconsistent with other results.These statements were either rephrased in the subsequent round or discarded after round 3.

Figure 1
Figure 1 Flowchart of the Delphi process used in the present study

-
Message reconstruction: definitions of intelligibility or comprehensibility as the accuracy of the reconstruction of the message by the listener, either at the level of the acoustic signal or at the semantic level -Phonetic-acoustic production: with regard to intelligibility, mentions of the contribution of the low-level production abilities of the speaker to the message reconstruction -Acoustic-phonetic decoding: with regard to intelligibility, mentions of the contribution of the low-level decoding abilities on the listener's side to the message reconstruction -Functional communication: emphasis of the contribution of comprehensibility to functional communication -Contextual elements: mentions of linguistic, extra-linguistic and para-linguistic elements contributing to comprehensibility As a reminder, in round 1, participants first had to provide open definitions of intelligibility and comprehensibility.They were then asked to rate their agreement with six provided statements (cf."Literature search" in the Materials and methods section) and to rank them in decreasing order of preference.The results are presented for each of the identified main concepts:1.SynonymsWhen providing spontaneous definitions, no participant mentioned that intelligibility and comprehensibility are synonyms.
the context of comprehensibility, participants emphasized the functional aspect of communication in their open definitions.Ninety-three percent (37/40) of them agreed with Barefoot et al.'s definition, which identifies comprehensibility as an indicator of functional communication, as opposed to intelligibility, which is not a direct indicator of functional communication according to the respondents' comments.Thirteen percent (5/40) of the participants who agreed however found it too restrictive to talk about face-to-face communication only.Seventy-eight percent (31/40) ranked this definition in the top three and no participant ranked it last.

Figure 2
Figure 2 Perceptual measures which best describe speech intelligibility (dark gray) and comprehensibility (light gray); for easier visualization, results were ordered by decreasing order for intelligibility measures; p.-w.: pseudowords, unp.sent.: unpredictable sentences

Figure 3 "
Figure 3 "Objective" measures which best describe speech intelligibility (dark gray) and comprehensibility (light gray); for easier visualization, results were ordered by decreasing order for intelligibility measures

Figure 5
Figure 5 Spoken communication loop.In grey: stages referring to intelligibility