# COVYT

This folder contains the first version of the COVYT dataset.
The dataset has also been converted to [audformat](https://audeering.github.io/audformat/)
and the corresponding files have been stored inside `./dataset`.
Additionally, a set of features compatible with `audformat` 
have been extracted using [openSMILE](https://audeering.github.io/opensmile/) 
and wav2vec2.0
and stored under `./features`.

## Usage

Some sample code for using and analyzing the features is found below:

```python
import audformat
import pandas as pd

db = audformat.Database.load('./dataset')
df = db['covid'].df
df['speaker'] = db['files'].get(index=df.index)['speaker']
df['language'] = df['speaker'].apply(lambda x: db.schemes['speaker'].labels[x]['language'])

features = pd.read_csv('features/eGeMAPSv02.csv')
features['start'] = features['start'].apply(pd.to_timedelta)
features['end'] = features['end'].apply(pd.to_timedelta)
features.set_index(['file', 'start', 'end'], inplace=True)

# get all features of German COVID-positive speakers
features.loc[df.loc[(df['language'] == 'german') & df['covid']].index]
# get all features of Boris Johnson when he was COVID-positive
features.loc[df.loc[(df['speaker'] == 'johnson') & df['covid']].index]
# get all features of Boris Johnson when he was COVID-negative
features.loc[df.loc[(df['speaker'] == 'johnson') & ~df['covid']].index]
```