RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection

doi:10.5281/zenodo.11406538

Published June 4, 2024 | Version v2

Dataset Open

RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection

1. University of Technology Sydney
2. University of Sydney

This repository contains all the collected and aligned data for RU-AI dataset. It is constructed based on three large publicly available datasets: Flickr8K, COCO, and Places205, by adding their corresponding machine-generated pairs from five different generative models in each modality.

Files

Files (157.5 GB)

Name	Size	Download all
audio_coco.tar.xz_part_0 md5:41dd2164608af8997005398cc5e978b0	10.7 GB	Download
audio_coco.tar.xz_part_1 md5:df5df1c2e8a262cbf849073d161e624d	10.7 GB	Download
audio_coco.tar.xz_part_2 md5:fc3be1e89e734cdcb8bca109d7968e48	10.7 GB	Download
audio_coco.tar.xz_part_3 md5:d0f09a3ba026a496196cd4953f3aa3f1	6.9 GB	Download
audio_flickr8k.tar.xz md5:636e374707470baba69d7470924fc519	2.9 GB	Download
audio_place.tar.xz_part_0 md5:16309468bc1fe7b3c38738e5f964565d	10.7 GB	Download
audio_place.tar.xz_part_1 md5:b91b2cb23015644cdd2e3d04f9164b3f	10.7 GB	Download
audio_place.tar.xz_part_2 md5:fbade3e026ab5d3e22b5b4aef6536cce	10.7 GB	Download
audio_place.tar.xz_part_3 md5:f4187a9734e334052ebf01be92515633	10.7 GB	Download
audio_place.tar.xz_part_4 md5:0826399efda3295ed2cf7011a5b7bc00	10.7 GB	Download
audio_place.tar.xz_part_5 md5:bed185fdc15532b1ff0a1a0f3aa74668	10.7 GB	Download
audio_place.tar.xz_part_6 md5:cfc2fcd13037a5ed9bcc76da6059cfb0	10.7 GB	Download
audio_place.tar.xz_part_7 md5:070ef65b1622d8394d1859c6819f3e7d	10.7 GB	Download
audio_place.tar.xz_part_8 md5:8b8350fa8d304d258ba6a4cad8fb64aa	5.5 GB	Download
image_coco.tar.xz md5:5f8aa397f45233953542712acc6eb4f3	11.1 GB	Download
image_flickr8k.tar.xz md5:c277624f7a7c035aa09f1a7a331922fd	1.3 GB	Download
image_place.tar.xz md5:f5ce3152c811d94279a742cd20d5955f	11.5 GB	Download
text_coco.tar.xz md5:46604fca9147759104bdc698d1df6449	14.3 MB	Download
text_flickr8k.tar.xz md5:67055252003ff476ceecb9f1e4329b44	1.0 MB	Download
text_place.tar.xz md5:e49557fe3031f6a8ee4444c47440fec6	27.5 MB	Download

Additional details

DOI: 10.5281/zenodo.11384492

References: Dataset: arXiv:1405.0312 (arXiv); Dataset: 10.1613/jair.3994 (DOI); Dataset: https://proceedings.neurips.cc/paper_files/paper/2014/file/3fe94a002317b5f9259f82690aeea4cd-Paper.pdf (URL); Dataset: https://openaccess.thecvf.com/content_ECCV_2018/papers/David_Harwath_Jointly_Discovering_Visual_ECCV_2018_paper.pdf (URL); Dataset: https://iclr.cc/virtual_2020/poster_B1elCp4KwH.html (URL)

Available: 2024-06-01

Repository URL: https://github.com/ZhihaoZhang97/RU-AI
Programming language: Python
Development Status: Active

Microsoft COCO: Common Objects in Context
Framing image description as a ranking task: data, models and evaluation metrics
Learning Deep Features for Scene Recognition using Places Database
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech

	All versions	This version
Views	166	160
Downloads	80	80
Data volume	466.7 GB	466.7 GB

RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection

Files

Files (157.5 GB)

Additional details

Identifiers

Related works

Dates

Software

References

RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection

Creators

Description

Files

Files (157.5 GB)

Additional details

Identifiers

Related works

Dates

Software

References