Published June 27, 2024 | Version v1
Dataset Open

Data Donation with Dona: De-identified Messaging Data (WhatsApp and Facebook) and Evaluation Responses

  • 1. ROR icon Bielefeld University
  • 2. Center for Cognitive Interaction Technology (CITEC), Bielefeld University, Germany

Contributors

Data collector:

  • 1. ROR icon Bielefeld University
  • 2. Center for Cognitive Interaction Technology (CITEC), Bielefeld University, Germany

Description

General information

The dataset contains de-identified messaging meta-data from 78 WhatsApp and 7 Facebook data donations. The dataset was collected in an online study using the data donation platform Dona. After donating their messaging data, the study participants viewed visual summaries of their messaging data and evaluated this visual feedback. The responses to the evaluation questions and the sociodemographic data of the participants are also included in the dataset.  

The data was collected from August 2022 to June 2024. 

For more information on Dona, the associated publications and updates, please visit https://mbp-lab.github.io/dona-blog/. 

File description

  1. donation_table.csv - contains general information about the donations including
    • donation_id: donation identifier
    • donor_id: the ID of the donor to distinguish the messages sent by them from those sent by contacts
    • source: the messaging platform from which the data is donated (WhatsApp or Facebook)
    • external_id: ID used to connect messaging data with the survey data
  2.  messages_table.csv - contains the donated messages including
    • conversation_id: chat identifier
    • sender_id: sender identifier
    • datetime: time of the message, UNIX time for Facebook and device time for WhatsApp
    • word_count: word count of the messages achieved by splitting the text based on whitespace
    • donation_id: donation identifier (also listed in donation_table.csv)
  3. messages_filtered_table.csv - same structure as messages_table.csv except that chats with no considerable interactions were removed. This was defined as chats where donor's word count contribution was less than 10% or more than 90%. 
  4. survey.xlsx → contains survey responses of the participants.
  5. survey_table_coding.xlsx → contains the mapping between the column names in survey.xlsx and their meaning, including the original survey questions and response options. Different sheets of the Excel file detail the survey questions and responses in one of the study languages (English, German, Armenian). 

 

Notes

The dataset has already been checked for identical donations which have been removed. In addition, some WhatsApp donations including messages with unusual dates in the future have been removed. Below is a list of these donations and the number of removed messages. 

donation_id number of removed messages
e8fe298a-4411-41be-98d3-aca1d9457e96 64
760360a8-cdb8-41d6-a7f3-8b539d8eb049 3
269abde1-e7b5-4363-bb96-ee1839164764 5706

Files

messages_filtered_table.csv

Files (1.0 GB)

Name Size Download all
md5:a53acfe20ff02a1ab192906a185bd81f
7.7 kB Preview Download
md5:88f32bde7ad12bbc05a55157937be768
502.2 MB Preview Download
md5:31c48ff9d6ad7ed1d811f18c15d59984
534.4 MB Preview Download
md5:998ecc88ed94f26270f06cff0a7b739d
12.5 kB Download
md5:47ad7f7d1105fd538aa507263e291071
15.0 kB Download

Additional details

Funding

Empathische Künstliche Intelligenz 01IS20046
Federal Ministry of Education and Research

Software

Repository URL
https://github.com/mbp-lab/dona-brm
Programming language
Python