Published November 17, 2024
| Version v1
Dataset
Open
Replication Package for "Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM Pre-Training Datasets"
Description
Replication Package for "Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM Pre-Training Datasets"
Includes datasets, and bash scripts.
Files
README.md
Files
(2.7 GB)
Name | Size | Download all |
---|---|---|
md5:fa8d7cace62779e56176c38b61b16c07
|
98.0 MB | Download |
md5:bc3ba41903aab04fdcb38ca607c46aad
|
14.9 MB | Download |
md5:d171ce7d98bb09fbb7ecf5ea56f2bb9e
|
18.0 MB | Download |
md5:776cc278138a07029db3d22cf5cbfe22
|
640.9 MB | Download |
md5:d4c8282bb3261cba4f77e625aead13b4
|
37.2 MB | Download |
md5:4f28f02cd41b87ed41ab6a0ae5881dc5
|
1.3 GB | Download |
md5:0096784cfc91ba20d3872ad3368c90fb
|
119.9 MB | Download |
md5:05f0fc6d7fa90d1c6f95bd1d1fb48a37
|
480.9 MB | Download |
md5:87c608c1a80dff3c1ece8de58fe771ef
|
21.8 MB | Download |
md5:fa61f7d8f5585e6d82ac8350dda1a7ae
|
4.2 kB | Download |
md5:2726568a2784e84b0547f2db24e44822
|
1.4 kB | Preview Download |
md5:1552ca8bd90d3ec28d795e37b56c3397
|
5.0 kB | Download |