Published November 13, 2023 | Version 1.0.0 [with-results]
Software Open

cognitivefactory/interactive-clustering-comparative-study

Authors/Creators

Description

Interactive Clustering : Comparative Studies

Several comparative studies of cognitivefactory-interactive-clustering functionalities on NLP datasets.

  • GitHub repository : https://github.com/cognitivefactory/interactive-clustering-comparative-study/tree/1.0.0

Quick description of Interactive Clustering

Interactive clustering is a method intended to assist in the design of a training data set.

This iterative process begins with an unlabeled dataset, and it uses a sequence of two substeps :

  1. the user defines constraints on data sampled by the machine ;
  2. the machine performs data partitioning using a constrained clustering algorithm.

Thus, at each step of the process :

  • the user corrects the clustering of the previous steps using constraints, and
  • the machine offers a corrected and more relevant data partitioning for the next step.

Description of studies

Several studies are provided here:

  1. efficience: Aims to confirm the technical efficience of the method by verifying its convergence to a ground truth and by finding the best implementation to increase convergence speed.
  2. computation time: Aims to estimate the time needed for algorithms to reach their objectives.
  3. annotation time: Aims to estimate the time needed to annotated constraints.
  4. constraints number: Aims to estimate the number of constraints needed to have a relevant annotated dataset.
  5. relevance: Aims to confirm the relevance of clustering results.
  6. rentability: Aims to predict the rentability of one more iteration.
  7. inter annotator: Aims to estimate the inter-annotators score during constraints annotation.
  8. annotation errors and conflicts fix: Aims to evaluate errors impact and verify conflicts fix importance on labeling.
  9. annotation subjectivity: Aims to estimate the labeling difference impact on clustering results.

Results

All results are zipped in .tar.gz files and versioned on Zenodo: Schild, E. (2021). cognitivefactory/interactive-clustering-comparative-study. Zenodo. https://doi.org/10.5281/zenodo.5648255.

Warning ! These experiments can use a huge disk space and contain hundreds or even thousands of files (1 per execution attempt). See the table below before extracting the files.

| STUDY NAME | FOLDER SIZE | .tar.gz FILE SIZE | 
|-----------------------------------|------------:|--------------------:|
| 1_efficience_study | 1.4 Go | 0.7 Go |
| 2_computation_time_study | 1.1 Go | 0.1 Go |
| 3_annotation_time_study | 0.1 Go | 0.1 Go |
| 4_constraints_number_study | 12.0 Go | 2.7 Go |
| 5_relevance_study | 0.1 Go | 0.1 Go | 
| 6_rentability_study | 1.3 Go | 0.1 Go |
| 7_inter_annotators_score_study | 0.1 Go | 0.1 Go |
| 8_annotation_error_fix_study | 28.0 Go | 3.5 Go |
| 9_annotation_subjectivity_study | 82.0 Go | 11.3 Go |

Associated PhD report

Schild, E. (2024, in press). De l'Importance de Valoriser l'Expertise Humaine dans l'Annotation : Application à la Modélisation de Textes en Intentions à l'aide d'un Clustering Interactif. Université de Lorraine.

How to cite

Schild, E. (2021). cognitivefactory/interactive-clustering-comparative-study. Zenodo. https://doi.org/10.5281/zenodo.5648255

Files

cognitivefactory/interactive-clustering-comparative-study-1.0.0.zip

Files (19.0 GB)

Name Size Download all
md5:01b4418fb97f2389d712cfd9fc532941
764.1 MB Download
md5:49b54648d00f596f51816c5ac7475496
42.7 MB Download
md5:31e879e6f7298aa754b017686fe393cd
5.2 MB Download
md5:0c42a038b545db08fe02a192b6a3d3b5
2.7 GB Download
md5:e4edbad2e87d03e4dc5d4527dcc21bd0
94.2 MB Download
md5:6f99c0bf2cbb2501b91248f87df87bcc
172.8 MB Download
md5:9069a4bd38cc78c85a62b427b9d514f1
5.0 MB Download
md5:58f55fcd39f1b31bea51a08482fa8ade
3.5 GB Download
md5:1a531d004dd04ceb0301350305e510d1
11.6 GB Download
md5:22029c6b061e3ea959cb576394e50123
53.4 MB Preview Download
md5:250484c697ec0d328479f0a0d7ab9e8e
823 Bytes Preview Download

Additional details

References