AiGen-FoodReview: A Multimodal Dataset of Machine-Generated Restaurant Reviews and Images on Social Media

doi:10.5281/zenodo.10511456

Published January 14, 2024 | Version v1

Dataset Open

AiGen-FoodReview: A Multimodal Dataset of Machine-Generated Restaurant Reviews and Images on Social Media

The dataset is divided into 3 splits: training (60%), validation (20%), and test (20%), named train.csv, val.csv, and test.csv, respectively.

Images are available in images.zip in .jpg format. Each image is named "ID.jpg", with the ID mapping to each review.

Variables description:

ID: Unique identifier. It maps to each review, either generated or authentic, and to each image.
text: Review text.
label: Binary label indicating the class (0=authentic, 1=machine-generated).
automated_readability_index: Approximate US grade level needed to comprehend the text.
difficult_words: Number of difficult words from Dale-Chall world list.
flesch_reading_ease: Score on a scale from 0 to 100, with higher scores indicating easier readability.
gunning_fog: Years of formal education a person needs to understand a text easily.
words_per_sentence: Average number of words per sentence.
reading_time: Reading time.
ppl: Perplexity score from zero-shot GPTNeo 125M.
bright: Brightness. Average of V of the HSV image representation.
sat: Saturation. Color intensity and purity of an image. Average of S of the HSV image representation.
clar: Clarity. Well-defined objects in space. % of normalized V pixels that exceed 0.7 of HSV.
cont: Contrast. Spread of illumination. Standard deviation of V of the HSV image representation.
warm: Warmth. Warm colors: from red to yellow. % of H<60 or $>$ than 220 of HSV.
colorf: Colorfulness. Departure from a grey-scale image.
sd: Size difference. Difference in the number of pixels between the figure and the ground
cd: Color difference. Difference of Euclidian distance between the figure and ground (RGB vectors).
td: Texture difference. Absolute difference between the foreground and background edge density.
diag_dom: Diagonal dominance. Manhattan distance between salient region and each diagonal.
rot: Rule of thirds. Minimum distance between center of salient region and each of the four intersection points.
hpvb: Horizontal physical visual balance. Split image horizontally. Horizontal physical symmetry (mirroring).
vpvb: Vertical physical visual balance. Split image vertically. Vertical physical symmetry (mirroring)
hcvb: Horizontal color visual balance. Split image horizontally. Horizontal mirrored Euclidean cross-pixels distance.
vcvb: Vertical color visual balance. Split image vertically. Vertical mirrored Euclidean cross-pixels distance.

Files

images.zip

Files (1.3 GB)

Name	Size	Download all
images.zip md5:c4e484d8f709d3943b7aff2f48f581c7	1.3 GB	Preview Download
LICENSE md5:ca41b74290efd78ea8511816bd0f64ba	1.1 kB	Download
metadata.txt md5:cea553f83369c16374fb580ad03e7901	2.4 kB	Preview Download
test.csv md5:c41a361eafa9800e3e8be3362a0101f4	4.9 MB	Preview Download
train.csv md5:a4afa633ef14d4a8bae7fb4781e0d6ec	14.7 MB	Preview Download
val.csv md5:d6d18216e7cabf9a9b956309d4d8e06f	4.9 MB	Preview Download

	All versions	This version
Views	258	258
Downloads	165	165
Data volume	69.7 GB	69.7 GB

AiGen-FoodReview: A Multimodal Dataset of Machine-Generated Restaurant Reviews and Images on Social Media

Creators

Description

Files

images.zip

Files (1.3 GB)