Published November 3, 2025 | Version v1
Dataset Restricted

Comparison of objective WhatsApp data and subjective self-reports before and after data-driven personalized feedback

  • 1. CITEC Bielefeld
  • 2. ROR icon Bielefeld University

Contributors

Project member:

  • 1. ROR icon Bielefeld University

Description

General information

Note that this dataset  partically overlaps with another dataset published earlier. However, since the data are associated with different analyses and publications, we chose to release two targeted datasets corresponding to each.

The dataset contains de-identified messaging meta-data from 68 WhatsApp data donations. The data was collected from August 2022 to June 2024 in an online study using the data donation platform Dona. The participants first answered questions to their sociodemographic information, current mood and aspects of their texting behavior, such as whether they send a higher number of words per month than they receive. After the survey, the participants donated their WhatsApp data on Dona and received visualizations of their messaging behavior, such as how much they write in different chats, then are they more active, et cetera. The goal was to compare whether data-driven visualizations change self-assessments of messaging behavior toward more objective values. 

For more information on Dona, the associated publications and updates, please visit https://mbp-lab.github.io/dona-blog/. 

File description

  1. donation_table_CHB.csv - contains general information about donations including
    • donation_id: donation identifier
    • donor_id: the ID of the donor to distinguish the messages sent by them from those sent by contacts
    • source: the messaging platform from which the data is donated (WhatsApp)
    • external_id: ID used to connect messaging data with the survey data
  2. donation_table_CHB_filtered.csv - same as donation_table_CHB excluding 3 participants who did not provide the required number of chats 
  3. messages_table_CHB.csv - contains the donated messages including
    • conversation_id: chat identifier
    • sender_id: sender identifier
    • datetime: time of the message, UNIX time for Facebook and device time for WhatsApp
    • word_count: word count of the messages achieved by splitting the text based on whitespace
    • donation_id: donation identifier (also listed in donation_table_CHB.csv)
  4. messages_table_CHB_filtered.csv - same structuve as messages_table_CHB.csv excluding the three donors who did not provide the required number of chats
  5. pre_survey_CHB.xlsx -> survey responses before data donation and visual feedback
  6. post_survey_CHB.xlsx -> survey responses after data donation and visual feedback
  7. survey_coding_CHB.xlsx → contains the mapping between the column names in surveysand their meaning, including the original survey questions and response options.

 

Request access:

If you would like to request access to these files, please reach out to  Dr. Olya Hakobyan at olya.hakobyan@uni-bielefeld.de. 

You need to satisfy these conditions in order for this request to be accepted:

Individuals wishing to use the data set must hold an academic affiliation. Further to this, they have to download and fill out the End User License Agreement (EULA) and submit it to us.

This dataset is intended for research purposes only. 

 

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/15504318">Log in</a> to check if you have access.

Additional details

Funding

Federal Ministry of Education and Research
Empathische Künstliche Intelligenz 01IS20046

Software

Repository URL
https://github.com/mbp-lab/dona-chb
Programming language
Python