Published June 4, 2024 | Version v2
Dataset Open

RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection

  • 1. ROR icon University of Technology Sydney
  • 2. ROR icon University of Sydney

Description

This repository contains all the collected and aligned data for RU-AI dataset. It is constructed based on three large publicly available datasets: Flickr8K, COCO, and Places205, by adding their corresponding machine-generated pairs from five different generative models in each modality. 

Files

Files (157.5 GB)

Name Size Download all
md5:41dd2164608af8997005398cc5e978b0
10.7 GB Download
md5:df5df1c2e8a262cbf849073d161e624d
10.7 GB Download
md5:fc3be1e89e734cdcb8bca109d7968e48
10.7 GB Download
md5:d0f09a3ba026a496196cd4953f3aa3f1
6.9 GB Download
md5:636e374707470baba69d7470924fc519
2.9 GB Download
md5:16309468bc1fe7b3c38738e5f964565d
10.7 GB Download
md5:b91b2cb23015644cdd2e3d04f9164b3f
10.7 GB Download
md5:fbade3e026ab5d3e22b5b4aef6536cce
10.7 GB Download
md5:f4187a9734e334052ebf01be92515633
10.7 GB Download
md5:0826399efda3295ed2cf7011a5b7bc00
10.7 GB Download
md5:bed185fdc15532b1ff0a1a0f3aa74668
10.7 GB Download
md5:cfc2fcd13037a5ed9bcc76da6059cfb0
10.7 GB Download
md5:070ef65b1622d8394d1859c6819f3e7d
10.7 GB Download
md5:8b8350fa8d304d258ba6a4cad8fb64aa
5.5 GB Download
md5:5f8aa397f45233953542712acc6eb4f3
11.1 GB Download
md5:c277624f7a7c035aa09f1a7a331922fd
1.3 GB Download
md5:f5ce3152c811d94279a742cd20d5955f
11.5 GB Download
md5:46604fca9147759104bdc698d1df6449
14.3 MB Download
md5:67055252003ff476ceecb9f1e4329b44
1.0 MB Download
md5:e49557fe3031f6a8ee4444c47440fec6
27.5 MB Download

Additional details

Dates

Available
2024-06-01

Software

Repository URL
https://github.com/ZhihaoZhang97/RU-AI
Programming language
Python
Development Status
Active

References

  • Microsoft COCO: Common Objects in Context
  • Framing image description as a ranking task: data, models and evaluation metrics
  • Learning Deep Features for Scene Recognition using Places Database
  • Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input
  • Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech