Published January 14, 2024 | Version v1
Dataset Open

AiGen-FoodReview: A Multimodal Dataset of Machine-Generated Restaurant Reviews and Images on Social Media

Description

The dataset is divided into 3 splits: training (60%), validation (20%), and test (20%), named train.csv, val.csv, and test.csv, respectively. 

Images are available in images.zip in .jpg format. Each image is named "ID.jpg", with the ID mapping to each review.

Variables description: 

ID: Unique identifier. It maps to each review, either generated or authentic, and to each image. 
text: Review text.
label: Binary label indicating the class (0=authentic, 1=machine-generated).
automated_readability_index: Approximate US grade level needed to comprehend the text.
difficult_words: Number of difficult words from Dale-Chall world list.
flesch_reading_ease: Score on a scale from 0 to 100, with higher scores indicating easier readability.
gunning_fog: Years of formal education a person needs to understand a text easily.
words_per_sentence: Average number of words per sentence. 
reading_time: Reading time.
ppl: Perplexity score from zero-shot GPTNeo 125M. 
bright: Brightness. Average of V of the HSV image representation.
sat: Saturation. Color intensity and purity of an image. Average of S of the HSV image representation.
clar: Clarity. Well-defined objects in space. % of normalized V pixels that exceed 0.7 of HSV.
cont: Contrast. Spread of illumination. Standard deviation of V of the HSV image representation.
warm: Warmth. Warm colors: from red to yellow. % of H<60 or $>$ than 220 of HSV.
colorf: Colorfulness. Departure from a grey-scale image.
sd: Size difference. Difference in the number of pixels between the figure and the ground
cd: Color difference. Difference of Euclidian distance between the figure and ground (RGB vectors).
td: Texture difference. Absolute difference between the foreground and background edge density.
diag_dom: Diagonal dominance. Manhattan distance between salient region and each diagonal.
rot: Rule of thirds. Minimum distance between center of salient region and each of the four intersection points.
hpvb: Horizontal physical visual balance. Split image horizontally. Horizontal physical symmetry (mirroring).
vpvb: Vertical physical visual balance. Split image vertically. Vertical physical symmetry (mirroring)
hcvb: Horizontal color visual balance. Split image horizontally. Horizontal mirrored Euclidean cross-pixels distance.
vcvb: Vertical color visual balance. Split image vertically. Vertical mirrored Euclidean cross-pixels distance.

Files

images.zip

Files (1.3 GB)

Name Size Download all
md5:c4e484d8f709d3943b7aff2f48f581c7
1.3 GB Preview Download
md5:ca41b74290efd78ea8511816bd0f64ba
1.1 kB Download
md5:cea553f83369c16374fb580ad03e7901
2.4 kB Preview Download
md5:c41a361eafa9800e3e8be3362a0101f4
4.9 MB Preview Download
md5:a4afa633ef14d4a8bae7fb4781e0d6ec
14.7 MB Preview Download
md5:d6d18216e7cabf9a9b956309d4d8e06f
4.9 MB Preview Download