Artifacts for: LLMmap: Fingerprinting for Large Language Models

Pasquini, Dario; Kornaropoulos, Evgenios M.; ATENIESE, GIUSEPPE

doi:10.5281/zenodo.14737353

Published January 25, 2025 | Version v1

Software Open

Artifacts for: LLMmap: Fingerprinting for Large Language Models

1. George Mason University

# Implementations of LLMmap

We include the weights of the models used to collect the main results in the paper, as well as the code necessary to run them.

In particular, models are stored in the directory `./data/models/`:

- `closed_set_8q`: The closed-set model used in Table 2, Figures 5, 7, 8, and G.1.
- `open_set_8q`: The open-set model used in Table 2, D.1, Figures 6, C.1, and Appendix B.

Models weights are stored in the standard ```keras``` format. We include the necessary code to run these models in `./LLMmap`. The script `./main_interactive` acts as the main entry point for all the models.

In addition, we provide:
- `unseen_model_random_forest.pickle`: the random forest model used to detect if a model is unseen based on the predictions of the open-set model (Appendix E).

This is saved as a standard pickle file and contain a pre-trained `sklearn.ensemble.RandomForestClassifier`. Code to load and use the model is given in `LLMmap.unseen_detector`.

# Datasets

We report the dataset we generated to train and test the models across the paper.

This is stored in `./data/dataset.jsonl` in JSON format. We provide the function `read_dataset`, located in `./LLMmap/data_pipeline.py`, to load the dataset and partition it into train and test sets.

Files

artifacts_LLMmap.zip

Files (129.3 MB)

Name	Size	Download all
artifacts_LLMmap.zip md5:370d121771a7a9799d5c139e2ed975bd	129.3 MB	Preview Download

	All versions	This version
Views	151	151
Downloads	39	39
Data volume	6.1 GB	6.1 GB

Artifacts for: LLMmap: Fingerprinting for Large Language Models

Creators

Description

Files

artifacts_LLMmap.zip

Files (129.3 MB)