LogPMDataset
Description
Log PM is a log parser benchmark emphasizing precise in-message parameter detection rather than template-based message clustering. This dataset is a combination of smaller datasets used for this benchmark. Datasets are collected from LogHub, parsed using handcrafted regexes, and stored in CSV files. Each CSV file contains no header and three columns. The first one is the message, the second is the parameter mask, and the third one is the index of the matching regex. The necessary dataset parts are downloaded automatically in the LogPM benchmark, so no direct download is required for benchmarking.
The benchmark includes the following datasets:
- Android
- Apache
- Hadoop
- HDFS
- HPC
- Linux
- OpenStack
- Proxifier
- SSH
- ZooKeeper
Files
android.csv
Files
(2.1 GB)
Name | Size | Download all |
---|---|---|
md5:b3af2be0510a98122a42ba6abdb4a1aa
|
7.6 MB | Preview Download |
md5:1854600feb4a85afa41285a0dd0822a4
|
1.7 MB | Preview Download |
md5:6c15be7dae072eb470f6ca5c44ded0eb
|
11.3 MB | Preview Download |
md5:ce074f5544ef7febb7bc9a6d78f0d5fd
|
2.0 GB | Preview Download |
md5:e32fdfe07bd4f704b80151001a9456bf
|
1.6 MB | Preview Download |
md5:26d8cf52eacf042e588f0afccc4c176f
|
1.2 MB | Preview Download |
md5:8e9e58cf041eb991c55c4957baf9d395
|
31.1 MB | Preview Download |
md5:82eecc81544a962fd61da02cdcc230e4
|
1.8 MB | Preview Download |
md5:a996af00935df3c0950a099a53dc8b8b
|
25.0 MB | Preview Download |
md5:cad66efebcbc72a961e1e7578664f7e0
|
3.1 MB | Preview Download |
Additional details
Funding
- Detecting Technical Debt with Natural Language Processing 328058
- Research Council of Finland