Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis
Description
This repository contain datasets and results for the paper:
Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis
Github repository for the code:
Quantifying Language Confusion GitHub repo
DATA include the following datasets:
i) raw language graphs and
ii) the calculated language similarities from the language graphs,
iii) MTEI: the files from the experimental results of multilingual inversion attacks, and calculated language confusion entropy from the data;
iv) LCB: the files from the language confusion benchmark and calculated language confusion entropy from the data
Results include aggregated results for further analysis:
i) inversion_language_confusion: results from MTEI
ii) prompting_language_confusion: results from LCB
Files
results.zip
Files
(1.1 GB)
Name | Size | Download all |
---|---|---|
md5:6094ca14d103c65f3fa213fa708232f8
|
1.0 GB | Preview Download |
md5:e619cba43f1fc841a05bb052c02eacb8
|
14.5 MB | Preview Download |
Additional details
Related works
- Cites
- Dataset: arXiv:2406.20052 (arXiv)
- Dataset: arXiv:2408.11749 (arXiv)
Dates
- Submitted
-
2024-10
Software
- Repository URL
- https://github.com/siebeniris/QuantifyingLanguageConfusion
- Programming language
- Python
- Development Status
- Active
References
- language confusion