Published February 5, 2026
| Version v1
Model
Open
Reproducibility Artifacts for "Theseus: Navigating the Labyrinth of Evaluation Bias in Provenance-based Intrusion Detection"
Authors/Creators
Description
This repository contains the pre-computed artifacts required to reproduce the experimental results of the Theseus model presented in the paper "Theseus: Navigating the Labyrinth of Evaluation Bias in Provenance-based Intrusion Detection".
These artifacts allow researchers to bypass the computationally intensive steps of graph construction and model training, enabling the direct reproduction of the evaluation metrics of Theseus (Table 2 in the paper) using the exact checkpoints reported.
Contents
- Graph Construction Cache: Pre-processed PyTorch Geometric (PyG) data objects for the DARPA TC E3 datasets (Theia, Cadets, Trace, Fivedirections). These files contain the fully parsed provenance graphs with temporal isolation applied, ready for loading.
- Model Checkpoints: The specific trained model weights (
.ptfiles) for Theseus used to generate the final results reported in the paper. - Word2Vec Embeddings: Domain-specific semantic embeddings trained on the training splits of each dataset, required to embed the node features.
Usage
These artifacts are designed to be used in conjunction with the Theseus source code.
- Download the archive.
- Extract the archive directly into the project root directory. This will create the
cache/andcheckpoints/folders with the necessary files. - Run the evaluation script to verify the results reported in the paper:
./scripts/reproduce_results.sh
Datasets Covered
- DARPA Transparent Computing E3 (Theia, Cadets, Fivedirections, Trace)
Files
theseus_artifacts.zip
Files
(20.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:c709103b179220da3ae76e79193f1a86
|
20.1 GB | Preview Download |