A set of generated Instagram Data Download Packages (DDPs) to investigate their structure and content
Description
Instagram data-download example dataset
In this repository you can find a data-set consisting of 11 personal Instagram archives, or Data-Download Packages (DDPs).
How the data was generated
These Instagram accounts were all new and generated by a group of researchers who were interested to figure out in detail
the structure and variety in structure of these Instagram DDPs. The participants user the Instagram account extensively for approximately a week. The participants also intensively communicated with each other so that the data can be used as an example of a network.
The data was primarily generated to evaluate the performance of de-identification software. Therefore, the text in the DDPs particularly contain many randomly chosen (Dutch) first names, phone numbers, e-mail addresses and URLS. In addition, the images in the DDPs contain many faces and text as well. The DDPs contain faces and text (usernames) of third parties. However, only content of so-called `professional accounts' are shared, such as accounts of famous individuals or institutions who self-consciously and actively seek publicity, and these sources are easily publicly available. Furthermore, the DDPs do not contain sensitive personal data of these individuals.
Obtaining your Instagram DDP
After using the Instagram accounts intensively for approximately a week, the participants requested their personal Instagram DDPs by using the following steps. You can follow these steps yourself if you are interested in your personal Instagram DDP.
1. Go to www.instagram.com and log in
2. Click on your profile picture, go to *Settings* and *Privacy and Security*
3. Scroll to *Data download* and click *Request download*
4. Enter your email adress and click *Next*
5. Enter your password and click *Request download*
Instagram then delivered the data in a compressed zip folder with the format **username_YYYYMMDD.zip** (i.e., Instagram handle and date of download) to the participant, and the participants shared these DDPs with us.
Data cleaning
To comply with the Instagram user agreement, participants shared their full name, phone number and e-mail address. In addition, Instagram logged the i.p. addresses the participant used during their active period on Instagram. After colleting the DDPs, we manually replaced such information with random replacements such that the DDps shared here do not contain any personal data of the participants.
How this data-set can be used
This data-set was generated with the intention to evaluate the performance of the de-identification software. We invite other researchers to use this data-set for example to investigate what type of data can be found in Instagram DDPs or to investigate the structure of Instagram DDPs. The packages can also be used for example data-analyses, although no substantive research questions can be answered using this data as the data does not reflect how research subjects behave `in the wild'.
Authors
The data collection is executed by Laura Boeschoten, Ruben van den Goorbergh and Daniel Oberski of Utrecht University. For questions, please contact l.boeschoten@uu.nl.
Acknowledgments
The researchers would like to thank everyone who participated in this data-generation project.
Files
100billionfaces_20201021.zip
Files
(147.5 MB)
Name | Size | Download all |
---|---|---|
md5:0dfeb1266b1d87dfc9b7b3e53d2b106e
|
26.4 MB | Preview Download |
md5:f4f5f3daca24e7f29b729ada02d18479
|
5.8 MB | Preview Download |
md5:c7bc0b6ffbed1845088be0ac80889be1
|
6.4 MB | Preview Download |
md5:d4b01cbe1377d0dec23ea3356aea14c0
|
2.6 MB | Preview Download |
md5:2e940cb75320cf24fc260707b767f81f
|
22.2 MB | Preview Download |
md5:d08753a0613c753bfffe3e3d698750db
|
6.2 MB | Preview Download |
md5:09bdc26e2934189086c14c9c1d1da429
|
10.9 MB | Preview Download |
md5:373261c79ca2a53cd3f0302a695376fc
|
14.1 MB | Preview Download |
md5:3fcac1cecc8105bcb2f5614c2c7ecdad
|
19.3 MB | Preview Download |
md5:8d8aa62ac52b828bc39ec5ea7d26e1aa
|
633.5 kB | Download |
md5:47e391cecdecec854772e5702421c569
|
24.2 MB | Preview Download |
md5:90fd68a27764f409ce80fd4f5af86b0d
|
8.7 MB | Preview Download |