CTU-HONEY-LLM-2: Two Datasets of Shell Interactions for Fine-Tuning LLM-Based SSH Honeypots
Authors/Creators
Contributors
Supervisor (2):
Description
Datasets used to fine-tune open large language models (LLMs) as interactive SSH honeypots that emulate a Linux shell. Each conversation is a multi-turn exchange between a user (attacker input) and an assistant (Linux-terminal output), teaching the model to respond to shell commands as a real terminal would.
Two datasets are released:
- D_orig (112 conversations): the original dataset used to fine-tune the GPT-3.5 model and a QLoRA Llama 3.1 8B. Covers basic commands such as `ls`,
`cd`, `cat`, `touch`, `echo`, `who`, `sudo`, `ssh`, `cp`. Mostly 1–3 turns. Limited coverage of stateful interactions, command history, and permission errors. - D_new (284 conversations): an expanded dataset built from real SSH honeypot logs. Broader command coverage and multi-turn interactions, including file creation/deletion, directory changes, and other stateful behaviors. Targets the specific failures that let attackers identify fake shells.
The D_orig and the D_new test sets are separate from their respective full files. Training should use the full files. Testing should use the test files and hold them out from training.
The files are formatted as JSON-lines (`.jsonl`) following the OpenAI chat schema with one conversation per line:
json
{"messages": [
{"role": "system", "content": "You are a Linux OS terminal. ..."},
{"role": "user", "content": "ls -la"},
{"role": "assistant", "content": "total 20\ndrwxr-xr-x ..."}
]}
Files
Files
(6.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:2d2921575b1042130222f357917b0bb3
|
5.5 MB | Download |
|
md5:d59396caec19bbe03aad3d058be5231f
|
641.7 kB | Download |
|
md5:ac94fa1906f9e2d578558baa90693530
|
158.3 kB | Download |
|
md5:1585ac5634bcf8f9691aa359720cc1f2
|
35.3 kB | Download |
Additional details
Related works
- Continues
- Conference paper: 10.1109/EuroSPW67616.2025.00082 (DOI)
Software
- Repository URL
- https://github.com/stratosphereips/shelLM
- Programming language
- Python
- Development Status
- Active