Published May 7, 2025 | Version v1
Dataset Open

Learning Privacy from Visual Entities - Curated data sets and pre-computed visual entities

  • 1. ROR icon Queen Mary University of London
  • 2. ROR icon Idiap Research Institute
  • 3. ROR icon École Polytechnique Fédérale de Lausanne

Description

This repository contains the curated image privacy datasets and pre-computed visual entities used in the publication Learning Privacy from Visual Entities by A. Xompero and A. Cavallaro.
[arxiv][code]

Curated image privacy data sets

In the article, we trained and evaluated models on the Image Privacy Dataset (IPD) and the PrivacyAlert dataset. The datasets are originally provided by other sources and have been re-organised and curated for this work.

Our curation organises the datasets in a common structure. We updated the annotations and labelled the splits of the data in the annotation file. This avoids having separated folders of images for each data split (training, validation, testing) and allows a flexible handling of new splits, e.g. created with a stratified K-Fold cross-validation procedure. As for the original datasets (PicAlert and PrivacyAlert), we provide the link to the images in bash scripts to download the images. Another bash script re-organises the images in sub-folders with maximum 1000 images in each folder. 

Both datasets refer to images publicly available on Flickr. These images have a large variety of content, including sensitive content, seminude people, vehicle plates, documents, private events. Images were annotated with a binary label denoting if the content was deemed to be public or private. As the images are publicly available, their label is mostly public. These datasets have therefore a high imbalance towards the public class. Note that IPD combines two other existing datasets, PicAlert and part of VISPR, to increase the number of private images already limited in PicAlert. Further details in our corresponding publication

List of datasets and their original source:

Notes:

  • For PicAlert and PrivacyAlert, only urls to the original locations in Flickr are available in the Zenodo record
  • Collector and authors of the PrivacyAlert dataset selected the images from Flickr under Public Domain license
  • Owners of the photos on Flick could have removed the photos from the social media platform
  • Running the bash scripts to download the images can incur in the "429 Too Many Requests" status code

Pre-computed visual entitities

Some of the models run their pipeline end-to-end with the images as input, whereas other models require different or additional inputs. These inputs include the pre-computed visual entities (scene types and object types) represented in a graph format, e.g. for a Graph Neural Network. Re-using these pre-computed visual entities allows other researcher to build new models based on these features while avoiding re-computing the same on their own or for each epoch during the training of a model (faster training). 

For each image of each dataset, namely PrivacyAlert, PicAlert, and VISPR, we provide the predicted scene probabilities as a .csv file , the detected objects as a .json file in COCO data format, and the node features (visual entities already organised in graph format with their features) as a .json file. For consistency, all the files are already organised in batches following the structure of the images in the datasets folder. For each dataset, we also provide the pre-computed adjacency matrix for the graph data.

Note: IPD is based on PicAlert and VISPR and therefore IPD refers to the scene probabilities and object detections of the other two datasets. Both PicAlert and VISPR must be downloaded and prepared to use IPD for training and testing.

Further details on downloading and organising data can be found in our GitHub repository: https://github.com/graphnex/privacy-from-visual-entities  (see ARTIFACT-EVALUATION.md#pre-computed-visual-entitities-)

Enquiries, questions and comments

If you have any enquiries, question, or comments, or you would like to file a bug report or a feature request, use the issue tracker of our GitHub repository.

Files

curated_imageprivacy_datasets.zip

Files (291.9 MB)

Name Size Download all
md5:b603dde00eab35080833f0de37bd0043
30.9 MB Preview Download
md5:fa4ee8cd54b4297663541ce90b0d9996
21.9 kB Preview Download
md5:f3ddcacfa4aacb8942108145bae10edf
9.3 MB Preview Download
md5:c27ebc63be72f539d08be4bdf2497b9d
2.4 MB Preview Download
md5:6b51391c915affef807d826704c077b2
6.9 MB Preview Download
md5:e74eb97d794fd93d087718780092263d
9.9 MB Preview Download
md5:dd771fd26e7ff421d9e7a9cc1c240e9b
2.4 MB Preview Download
md5:5f7700402f3864f7e9727026ac1ca178
7.8 MB Preview Download
md5:d7b6a72578a298951befd17c9f657925
109.9 MB Preview Download
md5:91a3bab46e5e2afcff3eda0580bfcb06
26.4 MB Preview Download
md5:d1308e3bae70d028991ad836017d3e3d
85.9 MB Preview Download

Additional details

Funding

UK Research and Innovation
GraphNEx: Graph Neural Networks for Explainable Artificial Intelligence EP/V062107/1

Software

Repository URL
https://github.com/graphnex/privacy-from-visual-entities
Programming language
Python
Development Status
Active

References

  • C. Zhao, J. Mangat, S. Koujalgi, A. Squicciarini, and C. Caragea, "PrivacyAlert: A Dataset for Image Privacy Prediction", In Int. AAAI Conf. Web and Social Media, 2022
  • B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, "Places: A 10 million Image Database for Scene Recognition", in IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, n. 6, pp. 1452–1464, 2018
  • S. Zerr, S. Siersdorfer, and J. Hare, "PicAlert! A System for Privacy-Aware Image Classification and Retrieval", In ACM Int. Conf. Information and Knowledge Management, 2012
  • G. Yang, J. Cao, Z. Chen, J. Guo, and J. Li, "Graph-based Neural Networks for Explainable Image Privacy Inference", in Pattern Recognit., vol. 105, pp. 1–12, 2020
  • D. Stoidis and A. Cavallaro, "Content-based Graph Privacy Advisor", In IEEE Int. Conf. Multimedia Big Data, 2022
  • T. Orekondy, B. Schiele, and M. Fritz, "Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images", In Int. Conf. Comput. Vis., 2017