Published January 25, 2025 | Version v1
Software Open

Artifacts for: LLMmap: Fingerprinting for Large Language Models

Description

# Implementations of LLMmap

We include the weights of the models used to collect the main results in the paper, as well as the code necessary to run them.

In particular, models are stored in the directory `./data/models/`:

- `closed_set_8q`: The closed-set model used in Table 2, Figures 5, 7, 8, and G.1.
- `open_set_8q`: The open-set model used in Table 2, D.1, Figures 6, C.1, and Appendix B.

Models weights are stored in the standard ```keras``` format. We include the necessary code to run these models in `./LLMmap`. The script `./main_interactive` acts as the main entry point for all the models.

In addition, we provide:
- `unseen_model_random_forest.pickle`:  the random forest model used to detect if a model is unseen based on the predictions of the open-set model (Appendix E).

This is saved as a standard pickle file and contain a pre-trained `sklearn.ensemble.RandomForestClassifier`. Code to load and use the model is given in `LLMmap.unseen_detector`. 

# Datasets

We report the dataset we generated to train and test the models across the paper.

This is stored in `./data/dataset.jsonl` in JSON format. We provide the function `read_dataset`, located in `./LLMmap/data_pipeline.py`, to load the dataset and partition it into train and test sets.

Files

artifacts_LLMmap.zip

Files (129.3 MB)

Name Size Download all
md5:370d121771a7a9799d5c139e2ed975bd
129.3 MB Preview Download