Published October 2025 | Version v1
Dataset Restricted

Dataset of Data Exports (Hidden in Plain Bytes)

  • 1. EDMO icon University of Wisconsin-Madison
  • 2. ROR icon University of Wisconsin–Madison

Description

Overview

This repository contains 12 data exports obtained under "right of access" requests (i.e., GDPR/CCPA) from 6 major online platforms (Apple, Discord, Facebook, Google, Instagram, and Snapchat), which were collected and analyzed for the following paper:
Julia Nonnenkamp, Naman Gupta, Abhimanyu Dev Gupta, and Rahul Chatterjee. 2025. Hidden in Plain Bytes: Investigating Interpersonal Account Compromise with Data Exports. In Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security (CCS '25). ACM, Taipei, Taiwan, 1–14 (October 13–17, 2025). DOI: 10.1145/3719027.3765147.

Contributors 

This data was collected by Julia Nonnenkamp and Abhimanyu Dev Gupta (University of Wisconsin–Madison), and cleaned and analyzed with additional help from Naman Gupta and Rahul Chatterjee (University of Wisconsin–Madison). For questions, please contact Julia Nonnenkamp (nonnenkamp@wisc.edu).

Data sources

We simulated benign and malicious activity researcher-controlled accounts (made under the pseudonym "Sam") on six platforms: Apple, Discord, Facebook, Google, Instagram, and Snapchat. We then requested and downloaded data exports from each platform twice, in January 2025 and 30 days later in February 2025.
 
The file sam_january_cleaned.zip contains subdirectories for each of the 6 data exports from January, and sam_february_cleaned.zip contains the same for February.
 
The data provided is partially pre-processed to remove personally identifying information (PII) and bulky media files and directories irrelevant to our analysis. If referencing the accompanying paper, the data provided has undergone the transformations described in Section 3.4, "Pseudonymization" and "Filtering Files." Below, we describe each of these steps in more detail.
 

(A)  Pseudonymization

To protect the privacy of the researchers and any other individuals whose data may be present in the exports, we pseudonymized all personally identifying information (PII) in the data. This includes IP addresses, phone numbers belonging to the researchers (needed to verify accounts), precise location coordinates, and state/city details outside of our lab building. We replaced these with syntactically similar values, e.g. IP addresses as 0.0.0.1,
0.0.0.2, etc., and masked state/cities as State1, City2, etc.
 

(B)  File filtering

We removed files that did not contain machine-readable text (images, videos) and files from platform features we did not use during simulation (IoT integrations, streaming, payments, educational tools). The retained files are primarily HTML, JSON, and CSV formats, with some TXT files.
 
See the accompanying paper (Section 3.4) for more detailed reasoning for file filtering. See removed_files.md for the complete list of excluded file paths.

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/17058860">Log in</a> to check if you have access.

Additional details

Funding

U.S. National Science Foundation
CAREER: Account Security Against Interpersonal Attacks 2339679
University of Wisconsin–Madison
Baldwin Wisconsin Idea Endowment