The Sincere Apology Corpus (SinA-C)
Creators
- 1. ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, Univeristy of Augsburg, Germany
- 2. Department of Music, University of Liverpool, United Kingdom
Description
This repository contains the Sincere Apology Corpus (SinA-C). SinA-C is an English speech corpus of acted apologies in various prosodic styles created with the purpose of investigating the attributes of the human voice which convey sincerity.
Thirty-two speakers were recorded in a studio at the Columbia Computer Music Center inside a sound-proof recording booth. Audio was recorded with an AKG C414 dynamic microphone. The digital audio workstation Logic Pro 9 was used to collect the audio signals. Recordings were captured at 44.1 kHz and 16 bit in AIFF format and later converted to mono WAV files.
Speakers
- Gender: 15 male and 17 female
- Age: 20-60 years old (mean: 29.8 years; std; 9.9 years)
- Background: 27 American born English native speakers, and 5 from other nationalities (all fluent in spoken English). 24 speakers were professional actors, and the remaining 12 were artists
Recordings
Speakers were given a description of the study, a set of 6 sentences (apologies; see Table 1) and a short definition for a set of 4 prosodic styles (see Table 2) to adopt when uttering each sentence (the recordings are not spontaneous, but rather acted). The sentences used were the following:
- Sorry.
- I am sorry for everything I have done to you.
- I cannot tell you how sorry I am for everything I did.
- Please allow me to apologise for everything I did to you. I was inappropriate and lacked respect.
- It was never my intention to offend you, for this I am very sorry.
- I am sorry but I am going to have to decline your generous offer. Thank you for considering me.
The prosodic styles intended to be adopted when uttering each of the sentences were:
- monotonic;
- pitch prominence (labelled as `Stress');
- fast speaking rate;
- slow speaking rate.
Annotations
The SinA-C audio recordings were labelled in terms of the sincerity perceived by listeners (`How sincere was the apology you just heard?') on a 5-point Likert scale ranging from 0 (Not Sincere) to 4 (Very Sincere) by 22 volunteers (13 male and 9 female; age range: 18-22; μ 19.5 std 1.0). Of the 22 annotators, all reported to have normal hearing, and all were English speakers (6 reported to be bilingual with at least one other language).
Raw annotations were standardised to zero mean and unit standard deviation on a per-subject basis in order to eliminate potential individual rating biases. We then computed the mean across all subjects for each utterance. This resulted in a set of ratings ranging from [-1.51, 1.72] (mean -0.002 std 0.60) which are used as the gold-standard for regression experiments. We also converted these ratings to binary labels. Average ratings larger than 0 were labelled as `Sincere` (S), and those smaller of equal to 0 were labelled as `Not Sincere` (NS). This resulted in 478 instances labelled as S and 438 as NS. These labels are the gold-standard for classification tasks.
SinA-C Baseline
The Baseline for the dataset is described in detail in the INTERSPEECH 2019 publication "Sincerity in Acted Speech: Presenting the Sincere Apology Corpus and Results" [1]. This article presents both classification and regression baseline results. The modelling experiments included both a 3-fold Speaker Independent Nest Cross Validation (SICV) schema as well as Speaker Independent folds (C-SIF) (train, validation, test). C-SIF is provided by the original database baseline from the INTERSPEECH 2016 COMputation PARalinguistics challengE (COMPARE) [2]. For reproducibility, speaker distributions across the two partitioning strategies are provided with the corpus package.
The audio descriptors include conventional and state-of-the-art features extracted from the audio files. We used Support Vector Machines (SVM) for classification tests and linear Support Vector Regression (SVR) for the regression ones. In both cases we used linear kernels and both SVM and SVR were implemented using the open-source machine learning toolkit Scikit-Learn. During the development phase, we trained various models (using the training set) with different complexity parameters (C ∈ 10-7, 10-6, 10-5, 10-4, 10-3, 10-2, 10-1, 1), and evaluated their performance on the validation set. After determining the optimal value for C, we concatenated the training and validation sets, re-trained the model with this enlarged training set, and evaluated the performance on the test set. Further detail on the baseline development are given in [1].
Comments
SinA-C was initially gathered between 2015-2016 at the Columbia University Computer Music Centre (CCMC) in New York City, United States of America. The dataset was also included in the INTERSPEECH 2016 COMPARE challenge [2], and prior to that in 2015 a subset of the dataset was also exhibited as part of a graduate-school art exhibition.
Citing this corpus
When using the data set for your own research, and within publications, please cite this repository and [1].
Bibliography
[1] Baird, A., Coutinho, E., Hirschberg, J., & Schuller, B. W. (2019). Sincerity in Acted Speech: Presenting the Sincere Apology Corpus and Results. In Interspeech 2019, in press.
[2] Schuller, B. W., Steidl, S., Batliner, A., Hirschberg, J., Burgoon, J. K., Baird, A., Elkins, A. C., Zhang, Y., Coutinho, E., & Evanini, K. (2016). The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native Language. In Interspeech 2016, 2001-2005.
Files
sincerity-audio.zip
Additional details
References
- Baird, A., Coutinho, E., Hirschberg, J., & Schuller, B. W. (2019). Sincerity in Acted Speech: Presenting the Sincere Apology Corpus and Results. In Interspeech, 2019, to appear.
- Schuller, B. W., Steidl, S., Batliner, A., Hirschberg, J., Burgoon, J. K., Baird, A., Elkins, A. C., Zhang, Y., Coutinho, E., & Evanini, K. (2016). The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native Language. In Interspeech, 2016, 2001-2005.