Published January 27, 2021 | Version baseline_version
Dataset Open

A set of generated Instagram Data Download Packages (DDPs) to investigate their structure and content

  • 1. Utrecht University

Description

Instagram data-download example dataset

In this repository you can find a data-set consisting of 11 personal Instagram archives, or Data-Download Packages (DDPs).

 

How the data was generated

These Instagram accounts were all new and generated by a group of researchers who were interested to figure out in detail
the structure and variety in structure of these Instagram DDPs. The participants user the Instagram account extensively for approximately a week. The participants also intensively communicated with each other so that the data can be used as an example of a network. 

The data was primarily generated to evaluate the performance of de-identification software. Therefore, the text in the DDPs particularly contain many randomly chosen (Dutch) first names, phone numbers, e-mail addresses and URLS. In addition, the images in the DDPs contain many faces and text as well. The DDPs contain faces and text (usernames) of third parties. However, only content of so-called `professional accounts' are shared, such as accounts of famous individuals or institutions who self-consciously and actively seek publicity, and these sources are easily publicly available. Furthermore, the DDPs do not contain sensitive personal data of these individuals. 


Obtaining your Instagram DDP

After using the Instagram accounts intensively for approximately a week, the participants requested their personal Instagram DDPs by using the following steps. You can follow these steps yourself if you are interested in your personal Instagram DDP. 

1. Go to www.instagram.com and log in
2. Click on your profile picture, go to *Settings* and *Privacy and Security*
3. Scroll to *Data download* and click *Request download*
4. Enter your email adress and click *Next*
5. Enter your password and click *Request download*

Instagram then delivered the data in a compressed zip folder with the format **username_YYYYMMDD.zip** (i.e., Instagram handle and date of download) to the participant, and the participants shared these DDPs with us.

 

Data cleaning

To comply with the Instagram user agreement, participants shared their full name, phone number and e-mail address. In addition, Instagram logged the i.p. addresses the participant used during their active period on Instagram. After colleting the DDPs, we manually replaced such information with random replacements such that the DDps shared here do not contain any personal data of the participants.

 

How this data-set can be used

This data-set was generated with the intention to evaluate the performance of the de-identification software. We invite other researchers to use this data-set for example to investigate what type of data can be found in Instagram DDPs or to investigate the structure of Instagram DDPs. The packages can also be used for example data-analyses, although no substantive research questions can be answered using this data as the data does not reflect how research subjects behave `in the wild'. 


Authors

The data collection is executed by Laura Boeschoten, Ruben van den Goorbergh and Daniel Oberski of Utrecht University. For questions, please contact l.boeschoten@uu.nl. 

 

Acknowledgments

The researchers would like to thank everyone who participated in this data-generation project.

Files

100billionfaces_20201021.zip

Files (147.5 MB)

Name Size Download all
md5:0dfeb1266b1d87dfc9b7b3e53d2b106e
26.4 MB Preview Download
md5:f4f5f3daca24e7f29b729ada02d18479
5.8 MB Preview Download
md5:c7bc0b6ffbed1845088be0ac80889be1
6.4 MB Preview Download
md5:d4b01cbe1377d0dec23ea3356aea14c0
2.6 MB Preview Download
md5:2e940cb75320cf24fc260707b767f81f
22.2 MB Preview Download
md5:d08753a0613c753bfffe3e3d698750db
6.2 MB Preview Download
md5:09bdc26e2934189086c14c9c1d1da429
10.9 MB Preview Download
md5:373261c79ca2a53cd3f0302a695376fc
14.1 MB Preview Download
md5:3fcac1cecc8105bcb2f5614c2c7ecdad
19.3 MB Preview Download
md5:8d8aa62ac52b828bc39ec5ea7d26e1aa
633.5 kB Download
md5:47e391cecdecec854772e5702421c569
24.2 MB Preview Download
md5:90fd68a27764f409ce80fd4f5af86b0d
8.7 MB Preview Download