Published January 27, 2021 | Version baseline_version
Dataset Open

A set of generated Instagram Data Download Packages (DDPs) to investigate their structure and content

  • 1. Utrecht University


Instagram data-download example dataset

In this repository you can find a data-set consisting of 11 personal Instagram archives, or Data-Download Packages (DDPs).


How the data was generated

These Instagram accounts were all new and generated by a group of researchers who were interested to figure out in detail
the structure and variety in structure of these Instagram DDPs. The participants user the Instagram account extensively for approximately a week. The participants also intensively communicated with each other so that the data can be used as an example of a network. 

The data was primarily generated to evaluate the performance of de-identification software. Therefore, the text in the DDPs particularly contain many randomly chosen (Dutch) first names, phone numbers, e-mail addresses and URLS. In addition, the images in the DDPs contain many faces and text as well. The DDPs contain faces and text (usernames) of third parties. However, only content of so-called `professional accounts' are shared, such as accounts of famous individuals or institutions who self-consciously and actively seek publicity, and these sources are easily publicly available. Furthermore, the DDPs do not contain sensitive personal data of these individuals. 

Obtaining your Instagram DDP

After using the Instagram accounts intensively for approximately a week, the participants requested their personal Instagram DDPs by using the following steps. You can follow these steps yourself if you are interested in your personal Instagram DDP. 

1. Go to and log in
2. Click on your profile picture, go to *Settings* and *Privacy and Security*
3. Scroll to *Data download* and click *Request download*
4. Enter your email adress and click *Next*
5. Enter your password and click *Request download*

Instagram then delivered the data in a compressed zip folder with the format **** (i.e., Instagram handle and date of download) to the participant, and the participants shared these DDPs with us.


Data cleaning

To comply with the Instagram user agreement, participants shared their full name, phone number and e-mail address. In addition, Instagram logged the i.p. addresses the participant used during their active period on Instagram. After colleting the DDPs, we manually replaced such information with random replacements such that the DDps shared here do not contain any personal data of the participants.


How this data-set can be used

This data-set was generated with the intention to evaluate the performance of the de-identification software. We invite other researchers to use this data-set for example to investigate what type of data can be found in Instagram DDPs or to investigate the structure of Instagram DDPs. The packages can also be used for example data-analyses, although no substantive research questions can be answered using this data as the data does not reflect how research subjects behave `in the wild'. 


The data collection is executed by Laura Boeschoten, Ruben van den Goorbergh and Daniel Oberski of Utrecht University. For questions, please contact 



The researchers would like to thank everyone who participated in this data-generation project.


Files (147.5 MB)

Name Size Download all
26.4 MB Preview Download
5.8 MB Preview Download
6.4 MB Preview Download
2.6 MB Preview Download
22.2 MB Preview Download
6.2 MB Preview Download
10.9 MB Preview Download
14.1 MB Preview Download
19.3 MB Preview Download
633.5 kB Download
24.2 MB Preview Download
8.7 MB Preview Download