Datasets and code for ClustMe and ClustML visual quality measures of grouping patterns in monochrome scatterplots

Abbas, Mostafa M; Ullah, Ehsan; Baggag, Abdelkader; Bensmail, Halima; Sedlmair, Michael; Aupetit, Michaël

doi:10.5281/zenodo.10208144

Published November 27, 2023 | Version v1

Dataset Open

Datasets and code for ClustMe and ClustML visual quality measures of grouping patterns in monochrome scatterplots

1. Geisinger Health System
2. Qatar Computing Research Institute
3. Hamad bin Khalifa University
4. University of Stuttgart

Code and datasets S1 and S2 used in the paper ClustMe: A Visual Quality Measure for Ranking Monochrome Scatterplots based on Cluster Patterns. Computer Graphics Forum 38(3): 225-236 (2019) and to appear in ClustML: A Measure of Cluster Pattern Complexity in Scatterplots Learnt from Human-labeled Groupings, SAGE Information Visualization Journal.

Contains the main ClustML function (ClustML_Pipeline() in ClustML_VQM.R) to compute a GMM over scatterplot (x,y) data and compute the ClustML score. It uses treebag_up_PP_PCA_BoxCox_SpatialSign.RData is a CARET classification model to take merging pairwise decisions. This model is the best obtained by training on 2-component GMM evaluated for containing one or more-than-on cluster by 34 human subjects.

/DATASETS

Contains datasets from study S1 and S2, with ClustML (CARET model) results and human judgments.

Scatterplot stimuli can be plot using "plotSP" function from plotDataXY.R (see example in that code)

./DATA_S1_ORIGINAL_PARAMETER_JUDGEMENT_DATA

1000_2gaussians_param_34judgment_ClustMe_EXP1.csv contains 34 human judgments of each of 1000s 2-component GMM scatterplots and the 8 parameters used to generate a sample from these GMM models.

"XYposCSVfilename": name of the file in ../DATA_S1_ORIGINAL_Scatterplots_IMG_ClustMe

"Nsample": sample size generated from the GMM = number of points in the scatterplot.

"MuA1","MuA2": mean along axes 1 and 2 of component A of the GMM

"SigmaA1","SigmaA2": variance along axes 1 and 2 of component A of the GMM

"ThetaA": angle of the component A of the GMM

"MuB1","MuB2": mean along axes 1 and 2 of component B of the GMM

"SigmaB1","SigmaB2": variance along axes 1 and 2 of component B of the GMM

"ThetaB": angle of the component B of the GMM

"Tau": proportion of component A

"Alpha": rotation from horizontal of the full mixture

"Score_1",...,"Score_34": Human judgment (1 = see one cluster, 2 = see more-than-one cluster)

"probMore","probSingle": proportion of judgments seeing more-than-one/one clusters

./DATA_S1_ORIGINAL_Scatterplots_IMG_ClustMe

png image files stimuli shown to the human subjects, and whose filename is used in ../DATA_S1_ORIGINAL_PARAMETER_JUDGEMENT_DATA

1000_2gaussians_param_34judgment_ClustMe_EXP1.csv

./DATA_S1_ORIGINAL_Scatterplots_XY_ClustMe

zzzz.csv file containing x and y coordinates of points displayed in file zzzz.png stored in folder ../DATA_S1_ORIGINAL_Scatterplots_IMG_ClustMe

./DATA_S2

Data used in Study S2

Data_257.RData: contains list of filenames and x,y positions of points of the scatterplot stimuli

Data257_435pairwiseRanking_CARETmodels.csv /.RData rankings are given by ClustML using various CARET models as merging classifiers trained on S1 data.

Data257_435pairwiseRanking_31HumanJudgments.csv /.RData ranking given by 31 human judgments

The row name is filename1@@@@@filename2, where filename1 and 2 correspond to names in Data_257

Each cell contains the filename judged by the column header model/subject, as showing the most complex cluster patterns, BOTH if they are both judged of similar complexity.

/DEMO

Run Demo_ClustML_VQM.R to demonstrate how to use the ClustML_Pipeline function to compute the ClustML score of a scatterplot.

Files

ClustML.zip

Files (3.0 GB)

Name	Size
ClustML.zip md5:c0de5f090312a51769460fdcb34b04f3	3.0 GB	Preview Download

Additional details

Is described by: Publication: 10.1111/cgf.13684 (DOI)
Is referenced by: Publication: 10.1109/VISUAL.2019.8933620 (DOI); Preprint: 10.48550/arXiv.2106.00599 (DOI); Publication: 10.1109/TVCG.2023.3327201 (DOI); Preprint: 10.48550/arXiv.2209.10042 (DOI)

Available: 2023-11-27

	All versions	This version
Views	459	459
Downloads	108	108
Data volume	328.7 GB	328.7 GB

ClustML.zip

Files (3.0 GB)

Related works

Dates

Datasets and code for ClustMe and ClustML visual quality measures of grouping patterns in monochrome scatterplots

Authors/Creators

Description

Table of contents

Files

ClustML.zip

Files (3.0 GB)

Additional details

Related works

Dates