Learning Privacy from Visual Entities - Curated data sets and pre-computed visual entities

Xompero, Alessio; Cavallaro, Andrea

doi:10.5281/zenodo.15348506

Published May 7, 2025 | Version v1

Dataset Open

Learning Privacy from Visual Entities - Curated data sets and pre-computed visual entities

1. Queen Mary University of London
2. Idiap Research Institute
3. École Polytechnique Fédérale de Lausanne

This repository contains the curated image privacy datasets and pre-computed visual entities used in the publication Learning Privacy from Visual Entities by A. Xompero and A. Cavallaro.
[arxiv][code]

Curated image privacy data sets

In the article, we trained and evaluated models on the Image Privacy Dataset (IPD) and the PrivacyAlert dataset. The datasets are originally provided by other sources and have been re-organised and curated for this work.

Our curation organises the datasets in a common structure. We updated the annotations and labelled the splits of the data in the annotation file. This avoids having separated folders of images for each data split (training, validation, testing) and allows a flexible handling of new splits, e.g. created with a stratified K-Fold cross-validation procedure. As for the original datasets (PicAlert and PrivacyAlert), we provide the link to the images in bash scripts to download the images. Another bash script re-organises the images in sub-folders with maximum 1000 images in each folder.

Both datasets refer to images publicly available on Flickr. These images have a large variety of content, including sensitive content, seminude people, vehicle plates, documents, private events. Images were annotated with a binary label denoting if the content was deemed to be public or private. As the images are publicly available, their label is mostly public. These datasets have therefore a high imbalance towards the public class. Note that IPD combines two other existing datasets, PicAlert and part of VISPR, to increase the number of private images already limited in PicAlert. Further details in our corresponding publication.

List of datasets and their original source:

PicAlert [Images occupy 2.4 GB]
VISPR [Images occupy 49.7 GB]
PrivacyAlert [Images occupy 1 GB]

Notes:

For PicAlert and PrivacyAlert, only urls to the original locations in Flickr are available in the Zenodo record
Collector and authors of the PrivacyAlert dataset selected the images from Flickr under Public Domain license
Owners of the photos on Flick could have removed the photos from the social media platform
Running the bash scripts to download the images can incur in the "429 Too Many Requests" status code

Pre-computed visual entitities

Some of the models run their pipeline end-to-end with the images as input, whereas other models require different or additional inputs. These inputs include the pre-computed visual entities (scene types and object types) represented in a graph format, e.g. for a Graph Neural Network. Re-using these pre-computed visual entities allows other researcher to build new models based on these features while avoiding re-computing the same on their own or for each epoch during the training of a model (faster training).

For each image of each dataset, namely PrivacyAlert, PicAlert, and VISPR, we provide the predicted scene probabilities as a .csv file , the detected objects as a .json file in COCO data format, and the node features (visual entities already organised in graph format with their features) as a .json file. For consistency, all the files are already organised in batches following the structure of the images in the datasets folder. For each dataset, we also provide the pre-computed adjacency matrix for the graph data.

Note: IPD is based on PicAlert and VISPR and therefore IPD refers to the scene probabilities and object detections of the other two datasets. Both PicAlert and VISPR must be downloaded and prepared to use IPD for training and testing.

Further details on downloading and organising data can be found in our GitHub repository: https://github.com/graphnex/privacy-from-visual-entities (see ARTIFACT-EVALUATION.md#pre-computed-visual-entitities-)

Enquiries, questions and comments

If you have any enquiries, question, or comments, or you would like to file a bug report or a feature request, use the issue tracker of our GitHub repository.

Files

curated_imageprivacy_datasets.zip

Files (291.9 MB)

Name	Size	Download all
curated_imageprivacy_datasets.zip md5:b603dde00eab35080833f0de37bd0043	30.9 MB	Preview Download
graphdata_IPD.zip md5:fa4ee8cd54b4297663541ce90b0d9996	21.9 kB	Preview Download
graphdata_picalert.zip md5:f3ddcacfa4aacb8942108145bae10edf	9.3 MB	Preview Download
graphdata_privacyalert.zip md5:c27ebc63be72f539d08be4bdf2497b9d	2.4 MB	Preview Download
graphdata_VISPR.zip md5:6b51391c915affef807d826704c077b2	6.9 MB	Preview Download
objects_picalert.zip md5:e74eb97d794fd93d087718780092263d	9.9 MB	Preview Download
objects_privacyalert.zip md5:dd771fd26e7ff421d9e7a9cc1c240e9b	2.4 MB	Preview Download
objects_VISPR.zip md5:5f7700402f3864f7e9727026ac1ca178	7.8 MB	Preview Download
scenes_picalert.zip md5:d7b6a72578a298951befd17c9f657925	109.9 MB	Preview Download
scenes_privacyalert.zip md5:91a3bab46e5e2afcff3eda0580bfcb06	26.4 MB	Preview Download
scenes_VISPR.zip md5:d1308e3bae70d028991ad836017d3e3d	85.9 MB	Preview Download

Additional details

Is derived from: Dataset: https://zenodo.org/record/6406870#.Y2KtsdLP3ow (URL); Dataset: https://zenodo.org/record/6406870#.Y2KtsdLP3ow (URL); Dataset: https://tribhuvanesh.github.io/vpa/ (URL)
Is supplement to: Preprint: 10.48550/arXiv.2503.12464 (DOI)

UK Research and Innovation
GraphNEx: Graph Neural Networks for Explainable Artificial Intelligence EP/V062107/1

Repository URL: https://github.com/graphnex/privacy-from-visual-entities
Programming language: Python
Development Status: Active

C. Zhao, J. Mangat, S. Koujalgi, A. Squicciarini, and C. Caragea, "PrivacyAlert: A Dataset for Image Privacy Prediction", In Int. AAAI Conf. Web and Social Media, 2022
B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, "Places: A 10 million Image Database for Scene Recognition", in IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, n. 6, pp. 1452–1464, 2018
S. Zerr, S. Siersdorfer, and J. Hare, "PicAlert! A System for Privacy-Aware Image Classification and Retrieval", In ACM Int. Conf. Information and Knowledge Management, 2012
G. Yang, J. Cao, Z. Chen, J. Guo, and J. Li, "Graph-based Neural Networks for Explainable Image Privacy Inference", in Pattern Recognit., vol. 105, pp. 1–12, 2020
D. Stoidis and A. Cavallaro, "Content-based Graph Privacy Advisor", In IEEE Int. Conf. Multimedia Big Data, 2022
T. Orekondy, B. Schiele, and M. Fritz, "Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images", In Int. Conf. Comput. Vis., 2017

	All versions	This version
Views	46	46
Downloads	181	181
Data volume	4.8 GB	4.8 GB

Learning Privacy from Visual Entities - Curated data sets and pre-computed visual entities

Curated image privacy data sets

Pre-computed visual entitities

Enquiries, questions and comments

Files

curated_imageprivacy_datasets.zip

Files (291.9 MB)

Additional details

Related works

Funding

Software

References

Learning Privacy from Visual Entities - Curated data sets and pre-computed visual entities

Creators

Description

Curated image privacy data sets

Pre-computed visual entitities

Enquiries, questions and comments

Files

curated_imageprivacy_datasets.zip

Files (291.9 MB)

Additional details

Related works

Funding

Software

References