Working paper Open Access
The speech and social network dynamics experiment investigates interpersonal social influences on speech in an ad-hoc network, focusing on a timescale of days/weeks. Generally, we want to understand what we can predict about change in speech behavior on this timescale. The generic hypotheses of the study are that some temporal variation in speech is caused by interactions between people, and that interpersonal social relations modulate this effect. A detailed presentation of hypotheses, background, and methodology appears in Tilsen 2015, and is not repeated in this paper, which focuses on preliminary analyses of experimental data. These analyses demonstrate that macroscopically, variation in social relations correlates with variation in speech behavior, specifically in regards to vowel quality, sibilant fricative quality, and syntactic patterns. These findings prompt an analysis of social modulation of behavior on the scale of individual interactions, which provides less conclusive results.
To contextualize the rationale for the experimental design, let’s consider three major problems in analyzing social influences on speech. First, one problem is the vast range of contexts in which speech typically occurs and the enormous variability in conversational goals. These characteristics of speech are obvious from a bit of self-reflection: consider for each utterance you make in a given day, where you were, who was there, and what the direct goal of that specific utterance was. There is very little consistency in our answers to these questions, which is problematic for several reasons: the occurrence rates of most lexical items are relatively low, the lexical/syntactic contexts in which those items occur are quite diverse, and word productions are influenced by hard-to-quantify paralinguistic factors. Hence statistical power is low: if we choose to study the production of some particular word, we will tend to have to wait a long time to observe multiple tokens, and even longer if we hope to observe them in relatively similar contexts. Thus it is desirable to minimize linguistic-contextual variation and goal/task-contextual variation.
A second major obstacle is the complexity of the social networks in which speakers participate. None of our networks are isolated. Most of us participate in numerous social networks, which can be defined on a range of spatial and temporal scales. The structure of these networks and our positions in them vary, many networks overlap or are embedded/hierarchically related, we occasionally leave old networks and join new ones, and different networks carry different contextual associations and social valences. Even defining the social networks of a given speaker requires many arbitrary analytical decisions. The problem is that if we want to understand the influence of social relations in a given network on linguistic behavior, we have to quantify those relations. To do that, we have to assume that social relations in the networks that we have not sampled have negligible effects. This seems to be a pretty miserable assumption, but we might lessen such effects by constructing an ad-hoc network, i.e., selecting speakers who have no prior acquaintance with each other.
A third obstacle is the logistical and methodological difficulty in obtaining information regarding speech behavior and social relations with sufficiently frequent spatial and temporal sampling. It is fairly obvious that a decent spatiotemporal sampling of speech behavior requires recording all of the utterances from all of the speakers in a network over an extended period of time. In contrast, it is far from obvious how to frequently sample social relations: the sampling procedure should provide quantitative measures of social relations between all dyads in the network, and should obtain those measures frequently in time—yet the sampling procedure should not unduly interfere with the collection of speech data, and should not be so invasive that it drives the social dynamics of network. How have these problems—i.e., contextual variation, network complexity, social sampling—been addressed by previous studies of speech on extended timescales? The network complexity problem has been addressed by using a corpus of data from members of a coherent social network, e.g., the reality television show Big Brother (Bane, Graff, & Sonderegger, 2010) or the U.S. Supreme Court (Yu, Abrego-Collier, Phillips, Pillion, & Chen, 2015). However, data from such contexts are suboptimal because of high variability in interaction contexts, sparse sampling of speech behavior (i.e., “off-camera” and “off-the-record” interactions), and lack of quantitative information regarding social relations.
The contextual variation problem has been addressed by controlling interaction contexts, for example using a map task in which dyadic interactions are highly goal-oriented (Anderson et al., 1991; Pardo, 2006). Such approaches have demonstrated pairwise convergence in dyadic interactions, but have not investigated those patterns on longer timescales. The problem of how to sample social relations both frequently and non-invasively has not been addressed to my knowledge.
The current experiment is an attempt to combine some of the desirable features of the above studies. Specifically, network complexity was minimized by the use of a small, ad-hoc network. Task and linguistic contextual variation was minimized by allowing speech interactions only during a map game and by enforcing the use of a limited vocabulary when playing the game. Frequent sampling of social relations was achieved by eliciting full-network teammate preference rankings from each player in each round of the game. Below I briefly describe the design of the experiment, report on several types of analyses, and discuss future analysis directions.