Published February 23, 2025 | Version v1
Dataset Open

Artifacts for "Did I Vet You Before? Assessing the Chrome Web Store Vetting Process through Browser Extension Similarity"

Creators

Description

This record contains the artifacts for the aforementioned paper.

All artifacts are provided in Apache Parquet format. We recommend using Python's pandas library to open these files.

The following is a summary of the provided artifacts:

  • ground-truth.parquet: Contains pairs of extensions used as ground truth for evaluating the pipeline employed in our research.
  • infringing-extensions.parquet: Metadata of vetted extensions labeled by Google and infringing extensions found by our pipeline. Includes the cluster assigned by HDBSCAN and the UMAP 2D coordinates used for visualizations.
  • embeddings.parquet: Vector embeddings of infringing extensions generated by our pipeline.

Files

Files (1.1 GB)

Name Size Download all
md5:52697059fb22af793ab99270d59f5258
1.1 GB Download
md5:d37b9041400bec087a1183e9e33190a0
29.8 kB Download
md5:bb7ee81958630f15dcfd8465301ca35b
7.2 MB Download