Published November 17, 2024 | Version v1
Dataset Open

Replication Package for "Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM Pre-Training Datasets"

  • 1. EDMO icon University of Tennessee Knoxville

Description

Replication Package for "Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM Pre-Training Datasets"

Includes datasets, and bash scripts.

Files

README.md

Files (2.7 GB)

Name Size Download all
md5:fa8d7cace62779e56176c38b61b16c07
98.0 MB Download
md5:bc3ba41903aab04fdcb38ca607c46aad
14.9 MB Download
md5:d171ce7d98bb09fbb7ecf5ea56f2bb9e
18.0 MB Download
md5:776cc278138a07029db3d22cf5cbfe22
640.9 MB Download
md5:d4c8282bb3261cba4f77e625aead13b4
37.2 MB Download
md5:4f28f02cd41b87ed41ab6a0ae5881dc5
1.3 GB Download
md5:0096784cfc91ba20d3872ad3368c90fb
119.9 MB Download
md5:05f0fc6d7fa90d1c6f95bd1d1fb48a37
480.9 MB Download
md5:87c608c1a80dff3c1ece8de58fe771ef
21.8 MB Download
md5:fa61f7d8f5585e6d82ac8350dda1a7ae
4.2 kB Download
md5:2726568a2784e84b0547f2db24e44822
1.4 kB Preview Download
md5:1552ca8bd90d3ec28d795e37b56c3397
5.0 kB Download