Learning Privacy from Visual Entities - Curated data sets and pre-computed visual entities
Creators
Description
This repository contains the curated image privacy datasets and pre-computed visual entities used in the publication Learning Privacy from Visual Entities by A. Xompero and A. Cavallaro.
[arxiv][code]
Curated image privacy data sets
In the article, we trained and evaluated models on the Image Privacy Dataset (IPD) and the PrivacyAlert dataset. The datasets are originally provided by other sources and have been re-organised and curated for this work.
Our curation organises the datasets in a common structure. We updated the annotations and labelled the splits of the data in the annotation file. This avoids having separated folders of images for each data split (training, validation, testing) and allows a flexible handling of new splits, e.g. created with a stratified K-Fold cross-validation procedure. As for the original datasets (PicAlert and PrivacyAlert), we provide the link to the images in bash scripts to download the images. Another bash script re-organises the images in sub-folders with maximum 1000 images in each folder.
Both datasets refer to images publicly available on Flickr. These images have a large variety of content, including sensitive content, seminude people, vehicle plates, documents, private events. Images were annotated with a binary label denoting if the content was deemed to be public or private. As the images are publicly available, their label is mostly public. These datasets have therefore a high imbalance towards the public class. Note that IPD combines two other existing datasets, PicAlert and part of VISPR, to increase the number of private images already limited in PicAlert. Further details in our corresponding publication.
List of datasets and their original source:
- PicAlert [Images occupy 2.4 GB]
- VISPR [Images occupy 49.7 GB]
- PrivacyAlert [Images occupy 1 GB]
Notes:
- For PicAlert and PrivacyAlert, only urls to the original locations in Flickr are available in the Zenodo record
- Collector and authors of the PrivacyAlert dataset selected the images from Flickr under Public Domain license
- Owners of the photos on Flick could have removed the photos from the social media platform
- Running the bash scripts to download the images can incur in the "429 Too Many Requests" status code
Pre-computed visual entitities
Some of the models run their pipeline end-to-end with the images as input, whereas other models require different or additional inputs. These inputs include the pre-computed visual entities (scene types and object types) represented in a graph format, e.g. for a Graph Neural Network. Re-using these pre-computed visual entities allows other researcher to build new models based on these features while avoiding re-computing the same on their own or for each epoch during the training of a model (faster training).
For each image of each dataset, namely PrivacyAlert, PicAlert, and VISPR, we provide the predicted scene probabilities as a .csv file , the detected objects as a .json file in COCO data format, and the node features (visual entities already organised in graph format with their features) as a .json file. For consistency, all the files are already organised in batches following the structure of the images in the datasets folder. For each dataset, we also provide the pre-computed adjacency matrix for the graph data.
Note: IPD is based on PicAlert and VISPR and therefore IPD refers to the scene probabilities and object detections of the other two datasets. Both PicAlert and VISPR must be downloaded and prepared to use IPD for training and testing.
Further details on downloading and organising data can be found in our GitHub repository: https://github.com/graphnex/privacy-from-visual-entities (see ARTIFACT-EVALUATION.md#pre-computed-visual-entitities-)
Enquiries, questions and comments
If you have any enquiries, question, or comments, or you would like to file a bug report or a feature request, use the issue tracker of our GitHub repository.
Files
curated_imageprivacy_datasets.zip
Files
(291.9 MB)
Name | Size | Download all |
---|---|---|
md5:b603dde00eab35080833f0de37bd0043
|
30.9 MB | Preview Download |
md5:fa4ee8cd54b4297663541ce90b0d9996
|
21.9 kB | Preview Download |
md5:f3ddcacfa4aacb8942108145bae10edf
|
9.3 MB | Preview Download |
md5:c27ebc63be72f539d08be4bdf2497b9d
|
2.4 MB | Preview Download |
md5:6b51391c915affef807d826704c077b2
|
6.9 MB | Preview Download |
md5:e74eb97d794fd93d087718780092263d
|
9.9 MB | Preview Download |
md5:dd771fd26e7ff421d9e7a9cc1c240e9b
|
2.4 MB | Preview Download |
md5:5f7700402f3864f7e9727026ac1ca178
|
7.8 MB | Preview Download |
md5:d7b6a72578a298951befd17c9f657925
|
109.9 MB | Preview Download |
md5:91a3bab46e5e2afcff3eda0580bfcb06
|
26.4 MB | Preview Download |
md5:d1308e3bae70d028991ad836017d3e3d
|
85.9 MB | Preview Download |
Additional details
Related works
- Is derived from
- Dataset: https://zenodo.org/record/6406870#.Y2KtsdLP3ow (URL)
- Dataset: https://zenodo.org/record/6406870#.Y2KtsdLP3ow (URL)
- Dataset: https://tribhuvanesh.github.io/vpa/ (URL)
- Is supplement to
- Preprint: 10.48550/arXiv.2503.12464 (DOI)
Funding
- UK Research and Innovation
- GraphNEx: Graph Neural Networks for Explainable Artificial Intelligence EP/V062107/1
Software
- Repository URL
- https://github.com/graphnex/privacy-from-visual-entities
- Programming language
- Python
- Development Status
- Active
References
- C. Zhao, J. Mangat, S. Koujalgi, A. Squicciarini, and C. Caragea, "PrivacyAlert: A Dataset for Image Privacy Prediction", In Int. AAAI Conf. Web and Social Media, 2022
- B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, "Places: A 10 million Image Database for Scene Recognition", in IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, n. 6, pp. 1452–1464, 2018
- S. Zerr, S. Siersdorfer, and J. Hare, "PicAlert! A System for Privacy-Aware Image Classification and Retrieval", In ACM Int. Conf. Information and Knowledge Management, 2012
- G. Yang, J. Cao, Z. Chen, J. Guo, and J. Li, "Graph-based Neural Networks for Explainable Image Privacy Inference", in Pattern Recognit., vol. 105, pp. 1–12, 2020
- D. Stoidis and A. Cavallaro, "Content-based Graph Privacy Advisor", In IEEE Int. Conf. Multimedia Big Data, 2022
- T. Orekondy, B. Schiele, and M. Fritz, "Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images", In Int. Conf. Comput. Vis., 2017