# Dataset: Fast and Secure Contact Exchange in Groups (CSCW 2024)

This repository contains the dataset for our paper at CSCW 2024:

> Florentin Putz, Steffen Haesler, and Matthias Hollick. **Sounds Good? Fast and Secure Contact Exchange in Groups**. *Proc. ACM Hum.-Comput. Interact. 8, CSCW2*, 2024. https://doi.org/10.1145/3686964

Our pseudonymous dataset is in the file `data.csv`. It has the following structure:

-   `id`: Participant identifier
-   `group_id`: The study group that the participant was part of
-   `pairsonic_first`: Order of experiments. 1 = PairSonic first, SafeSlinger second
-   `a_sus0`...`a_sus9`: System Usability Scale (Brooke1996) for first system (1 = strongly disagree, 5 = strongly agree)
-   `a_sec`: Perceived security of first system (single Likert item: 1 = strongly disagree, 5 = strongly agree)
-   `b_sus0`...`b_sus9`: System Usability Scale (Brooke1996) for second system (1 = strongly disagree, 5 = strongly agree)
-   `b_sec`: Perceived security of second system (single Likert item: 1 = strongly disagree, 5 = strongly agree)
-   `pref`: Preference. 1 = first system, 2 = second system
-   Collaborative/social tool usage for distinct types of communication groups (0 = public chat groups (e.g., Discord, Telegram, IRC), 1 = private chat groups (e.g., WhatsApp, Signal), 2 = professional chat groups (e.g., Microsoft Teams), 3 = online forums, 4 = groups in social networks (e.g., Facebook groups), 5 = audio/video conferences (e.g., TeamSpeak, Zoom), 6 = mailing lists, 7 = collaboration tools (e.g., Google Docs, Etherpad, Miro), 8 = project plannung tools (e.g., Jira, Trello, GitHub))
    - `group0_part`...`group8_part`: 1 = active participation in this groups of this type
    - `group0_size`...`group8_size`: average number of participants in groups of this type
    - `group0_sec`...`group8_sec`: 1 = security (i.e., authenticated participants) are important in groups of this type
    - `groups_total`: estimate of the total number of digital communication groups with active participation
-   `gender`: 1 = male, 2 = female, 3 = diverse, 98 = no answer
-   `age`: Age cluster. 15 = 18-19, 20 = 20-24, 25 = 25-29, 30 = 30-34, 35 = 35-39, 40 = 40-44, ...
-   `edu_school`: Highest school education level (1 = still in school, 2 = lower secondary education, 3 = polytechnic high school, 4 = intermediary secondary education, 5 = university entrance qualifaciton, 6 = no general school leaving certificate, 8 = no answer)
-   `edu_professional`: Highest professional education level (1 = vocational training/training in dual system, 2 = technical college diploma, 3 = technical college diploma in the former GDR, 4 = Bachelor's degree, 5 = Master's degree, 6 = Diploma, 7 = PhD, 8 = no professional/vocational degree, 10 = no answer)
-   `student`: 1 = yes, 2 = no
-   `smartphone_exp`: 1 = has been using smartphones for more than two years
-   `ati0`...`ati8`: Affinity for Technology Interaction (Franke2019) (1 = completely disagree, 6 = completely agree)

The value 98 means that the participant explicitely selected the "NA" option for this field.

The value 99 means that the participant selected nothing on the paper questionnaire. Two participants did not fill out the whole questionnaire (participants 6 and 8).


# Statistical Analysis (R Code)

The R Markdown file `contact-exchange-in-groups.Rmd` contains the full reproducible code for our study.

The cells should be run from top to bottom and will generate all statistical figures from our paper. We also provide code to reproduce our tables and quantitative results from the paper (referenced in the headings).

We used R version 4.2.2, running on macOS. We provide detailed environment information (including exact package versions) in our script.

## Diverged Bar Plot in Figure 6a

We use a modified version of the `likert` R package from Jason Bryer (<https://github.com/jbryer/likert>). To see a summary of our changes, we provide a patch file in `likert/likert-bar-plot-pairsonic.patch`. Our changes are based on commit d339f65b49fcec9dcfc9ea968aed9e5809c1e4da. To generate Figure 6, first clone the `likert` repository into the `likert` folder, checkout this commit, and then apply our patch.


# References

- Barrett M (2021). _ggokabeito: 'Okabe-Ito' Scales for 'ggplot2' and 'ggraph'_. R package version 0.1.0, <https://CRAN.R-project.org/package=ggokabeito>.
- Bryer J (2022). _likert: Analysis and Visualization Likert Items_. http://jason.bryer.org/likert, http://github.com/jbryer/likert.
- Chaltiel D (2023). _crosstable: Crosstables for Descriptive Analyses_. R package version 0.7.0, <https://CRAN.R-project.org/package=crosstable>.
- Comtois D (2022). _summarytools: Tools to Quickly and Neatly Summarize Data_. R package version 1.0.1, <https://CRAN.R-project.org/package=summarytools>.
- Grolemund G, Wickham H (2011). “Dates and Times Made Easy with lubridate.” _Journal of Statistical Software_, *40*(3), 1-25. <https://www.jstatsoft.org/v40/i03/>.
- Grosjean P, Ibanez F (2018). _pastecs: Package for Analysis of Space-Time Ecological Series_. R package version 1.3.21, <https://CRAN.R-project.org/package=pastecs>.
- Harrell Jr F (2022). _Hmisc: Harrell Miscellaneous_. R package version 4.7-2, <https://CRAN.R-project.org/package=Hmisc>.
- Müller K, Wickham H (2023). _tibble: Simple Data Frames_. R package version 3.2.1, <https://CRAN.R-project.org/package=tibble>.
- Neuwirth E (2022). _RColorBrewer: ColorBrewer Palettes_. R package version 1.1-3, <https://CRAN.R-project.org/package=RColorBrewer>.
- R Core Team (2022). _R: A Language and Environment for Statistical Computing_. R Foundation for Statistical Computing, Vienna, Austria. <https://www.R-project.org/>.
- Revelle W (2022). _psych: Procedures for Psychological, Psychometric, and Personality Research_. Northwestern University, Evanston, Illinois. R package version 2.2.9, <https://CRAN.R-project.org/package=psych>.
- Rudis B (2020). _hrbrthemes: Additional Themes, Theme Components and Utilities for 'ggplot2'_. R package version 0.8.0, <https://CRAN.R-project.org/package=hrbrthemes>.
- Sarkar D (2008). _Lattice: Multivariate Data Visualization with R_. Springer, New York. ISBN 978-0-387-75968-5, <http://lmdvr.r-forge.r-project.org>.
- Therneau T (2022). _A Package for Survival Analysis in R_. R package version 3.4-0, <https://CRAN.R-project.org/package=survival>. Terry M. Therneau, Patricia M. Grambsch (2000). _Modeling Survival Data: Extending the Cox Model_. Springer, New York. ISBN 0-387-98784-3.
- Wickham H (2007). “Reshaping Data with the reshape Package.” _Journal of Statistical Software_, *21*(12), 1-20. <http://www.jstatsoft.org/v21/i12/>.
- Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” _Journal of Open Source Software_, *4*(43), 1686. doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
- Wickham H, Pedersen T, Seidel D (2023). _scales: Scale Functions for Visualization_. R package version 1.3.0, <https://CRAN.R-project.org/package=scales>.
- Wilke C (2020). _cowplot: Streamlined Plot Theme and Plot Annotations for 'ggplot2'_. R package version 1.1.1, <https://CRAN.R-project.org/package=cowplot>.
- Zeileis A, Croissant Y (2010). “Extended Model Formulas in R: Multiple Parts and Multiple Responses.” _Journal of Statistical Software_, *34*(1), 1-13. doi:10.18637/jss.v034.i01 <https://doi.org/10.18637/jss.v034.i01>.
