Published September 8, 2024 | Version v1
Dataset Restricted

VoiceWukong: Benchmarking Deepfake Voice Detection (part_aa)

Creators

Description

VoiceWukong

VoiceWukong is a comprehensive benchmark for deepfake voice detection, designed to evaluate the performance of various detectors in real-world application scenarios.

Dataset Features

  • Large Scale: Contains 265,200 English and 148,200 Chinese deepfake voice samples
  • Diverse Sources: Covers voice samples generated by 19 commercial tools and 15 open-source tools
  • Real-world Scenarios: Constructed 38 data variants covering 6 types of audio manipulations common in practical applications
  • Bilingual Support: Supports evaluation in both Chinese and English languages

Evaluation Results

  • Conducted comprehensive evaluations on 12 state-of-the-art deepfake voice detectors
  • AASIST2 achieved the best performance with an Equal Error Rate (EER) of 13.50%
  • Other detectors showed EERs exceeding 20%
  • Results indicate significant challenges for current detectors in practical applications

Human-Machine Comparison Study

  • Conducted user studies with over 300 participants
  • Comparative analysis of detection capabilities among humans, detectors, and multimodal large language models (Qwen2-Audio)
  • Different detectors and humans showed varying identification capabilities for deepfake voices at different deception levels
  • Multimodal large language models demonstrated no effective detection ability

Dataset

This is the first part of the dataset, and it requires the complete download of both part_aa and part_ab for proper extraction and use. Please ensure that both files are in the same folder. For a detailed introduction to the data, please refer to our paper (to be made available).

The second part (part_ab) is at part_ab
extract command : cat VoiceWukong.part_* | tar -xz

Leaderboard

Our leaderboard presents comprehensive evaluation results in three main sections:

  1. Overall Performance - General evaluation metrics for each detector across the entire dataset, providing a broad view of detection capabilities.
  2. Manipulation-specific Performance - Detailed results showing how each detector performs under different types of audio manipulations, offering insights into specific strengths and weaknesses.
  3. User Study-based Evaluation - Performance analysis of detectors on deepfake voices categorized by difficulty levels based on our user study results, demonstrating detector effectiveness across varying deception capabilities.

Visit our leaderboard(github.io) for detailed performance metrics and rankings. Additionally, we provide a copy of the leaderboard code here for premanent storage.

Evaluated Detectors' Weighted Models

  • All evaluated detectors’ weighted models can be obtained from huggingface.co. Additionally, we provide a copy of the weights files here for premanent storage.

User Study Results & Original Outputs

Note: VoiceWukong prohibits use for commercial purposes.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

Our dataset is currently available exclusively to the academic research community through an application and approval process. To prevent misuse of the dataset or any potentially illegal activities, applicants must strictly comply with the following conditions before accessing our dataset:
  1.  Eligibility: Access to the dataset is limited to academic researchers for the purpose of evaluating detectors.
  2. Redistribution Prohibition: Recipients are not permitted to redistribute the dataset without explicit permission.
  3. Commercial Use Restrictions: The dataset may not be used for any commercial purposes, including but not limited to:
    • Product testing
    • Development activities
    • Commercial deployment
    • Model fine-tuning
    • Training commercial systems
    • Other profit-oriented uses
  4. Legal Compliance: The use of the dataset for any activities prohibited by law is strictly forbidden.
  5. Get a faculty, or someone in a permanent position, to agree and commit to these conditions.

To avoid decline, please provide a brief introduction to your research institution and the purpose of your study.

You are currently not logged in. Do you have an account? Log in here

Additional details

Software