VoiceWukong: Benchmarking Deepfake Voice Detection (part_aa)

VoiceWukong

doi:10.5281/zenodo.13731918

Published September 8, 2024 | Version v1

Dataset Restricted

VoiceWukong: Benchmarking Deepfake Voice Detection (part_aa)

VoiceWukong

VoiceWukong

VoiceWukong is a comprehensive benchmark for deepfake voice detection, designed to evaluate the performance of various detectors in real-world application scenarios.

Dataset Features

Large Scale: Contains 265,200 English and 148,200 Chinese deepfake voice samples
Diverse Sources: Covers voice samples generated by 19 commercial tools and 15 open-source tools
Real-world Scenarios: Constructed 38 data variants covering 6 types of audio manipulations common in practical applications
Bilingual Support: Supports evaluation in both Chinese and English languages

Evaluation Results

Conducted comprehensive evaluations on 12 state-of-the-art deepfake voice detectors
AASIST2 achieved the best performance with an Equal Error Rate (EER) of 13.50%
Other detectors showed EERs exceeding 20%
Results indicate significant challenges for current detectors in practical applications

Human-Machine Comparison Study

Conducted user studies with over 300 participants
Comparative analysis of detection capabilities among humans, detectors, and multimodal large language models (Qwen2-Audio)
Different detectors and humans showed varying identification capabilities for deepfake voices at different deception levels
Multimodal large language models demonstrated no effective detection ability

Dataset

This is the first part of the dataset, and it requires the complete download of both part_aa and part_ab for proper extraction and use. Please ensure that both files are in the same folder. For a detailed introduction to the data, please refer to our paper (to be made available).

The second part (part_ab) is at part_ab
extract command : cat VoiceWukong.part_* | tar -xz

Leaderboard

Our leaderboard presents comprehensive evaluation results in three main sections:

Overall Performance - General evaluation metrics for each detector across the entire dataset, providing a broad view of detection capabilities.
Manipulation-specific Performance - Detailed results showing how each detector performs under different types of audio manipulations, offering insights into specific strengths and weaknesses.
User Study-based Evaluation - Performance analysis of detectors on deepfake voices categorized by difficulty levels based on our user study results, demonstrating detector effectiveness across varying deception capabilities.

Visit our leaderboard(github.io) for detailed performance metrics and rankings. Additionally, we provide a copy of the leaderboard code here for premanent storage.

Evaluated Detectors' Weighted Models

All evaluated detectors’ weighted models can be obtained from huggingface.co. Additionally, we provide a copy of the weights files here for premanent storage.

User Study Results & Original Outputs

This code repository(github) stores our user study results and the original outputs of the evaluation detectors. Additionally, we provide a copy of the code repository here for permanent storage.

Note: VoiceWukong prohibits use for commercial purposes.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

Our dataset is currently available exclusively to the academic research community through an application and approval process. To prevent misuse of the dataset or any potentially illegal activities, applicants must strictly comply with the following conditions before accessing our dataset:

Eligibility: Access to the dataset is limited to academic researchers for the purpose of evaluating detectors.
Redistribution Prohibition: Recipients are not permitted to redistribute the dataset without explicit permission.
Commercial Use Restrictions: The dataset may not be used for any commercial purposes, including but not limited to:
- Product testing
- Development activities
- Commercial deployment
- Model fine-tuning
- Training commercial systems
- Other profit-oriented uses
Legal Compliance: The use of the dataset for any activities prohibited by law is strictly forbidden.
Get a faculty, or someone in a permanent position, to agree and commit to these conditions.

To avoid decline, please provide a brief introduction to your research institution and the purpose of your study.

You are currently not logged in. Do you have an account? Log in here

Additional details

Repository URL: https://voicewukong.github.io/

	All versions	This version
Views	1,048	1,048
Downloads	169	169
Data volume	25.0 TB	25.0 TB

VoiceWukong: Benchmarking Deepfake Voice Detection (part_aa)

Creators

Description

VoiceWukong

Dataset

Leaderboard

Evaluated Detectors' Weighted Models

User Study Results & Original Outputs

Files

Restricted

Request access

Additional details

Software