Published October 17, 2024 | Version v1
Dataset Open

Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

  • 1. ROR icon Aalborg University

Description

This repository contain datasets and results for the paper:

Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

 

Github repository for the code: 

Quantifying Language Confusion GitHub repo

 

DATA include the following datasets:

i) raw language graphs and

ii) the calculated language similarities from the language graphs,

iii) MTEI: the files from the experimental results of multilingual inversion attacks, and calculated language confusion entropy from the data;

iv) LCB: the files from the language confusion benchmark and calculated language confusion entropy from the data 

 

Results include aggregated results for further analysis:

i) inversion_language_confusion: results from MTEI

ii) prompting_language_confusion: results from LCB

 

 

Files

results.zip

Files (1.1 GB)

Name Size Download all
md5:6094ca14d103c65f3fa213fa708232f8
1.0 GB Preview Download
md5:e619cba43f1fc841a05bb052c02eacb8
14.5 MB Preview Download

Additional details

Related works

Cites
Dataset: arXiv:2406.20052 (arXiv)
Dataset: arXiv:2408.11749 (arXiv)

Dates

Submitted
2024-10

Software

Repository URL
https://github.com/siebeniris/QuantifyingLanguageConfusion
Programming language
Python
Development Status
Active

References

  • language confusion