Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

CHEN, YIYI

doi:10.5281/zenodo.13946031

Published October 17, 2024 | Version v1

Dataset Open

Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

CHEN, YIYI (Data curator)¹

1. Aalborg University

This repository contain datasets and results for the paper:

Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

Github repository for the code:

Quantifying Language Confusion GitHub repo

DATA include the following datasets:

i) raw language graphs and

ii) the calculated language similarities from the language graphs,

iii) MTEI: the files from the experimental results of multilingual inversion attacks, and calculated language confusion entropy from the data;

iv) LCB: the files from the language confusion benchmark and calculated language confusion entropy from the data

Results include aggregated results for further analysis:

i) inversion_language_confusion: results from MTEI

ii) prompting_language_confusion: results from LCB

Files

results.zip

Files (1.1 GB)

Name	Size	Download all
DATA.zip md5:6094ca14d103c65f3fa213fa708232f8	1.0 GB	Preview Download
results.zip md5:e619cba43f1fc841a05bb052c02eacb8	14.5 MB	Preview Download

Additional details

Cites: Dataset: arXiv:2406.20052 (arXiv); Dataset: arXiv:2408.11749 (arXiv)

Submitted: 2024-10

Repository URL: https://github.com/siebeniris/QuantifyingLanguageConfusion
Programming language: Python
Development Status: Active

language confusion

Views

Downloads

Show more details

	All versions	This version
Views	47	47
Downloads	23	23
Data volume	11.7 GB	11.7 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Languages

Turkish, German, Finnish, Hungarian, Hindi, Panjabi, Gujarati, Urdu, Sinhala, Amharic, Mandarin Chinese, Japanese, Korean, Indonesian, English, French, Spanish, Italian

License: Apache License 2.0

A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code. Read more

Technical metadata

Created: October 17, 2024
Modified: October 17, 2024

Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

Files

results.zip

Files (1.1 GB)

Additional details

Related works

Dates

Software

References

Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

Creators

Description

Files

results.zip

Files (1.1 GB)

Additional details

Related works

Dates

Software

References