Research and Evaluation of Automatic Sound FX Classification in Freesound using the Universal Category System
Authors/Creators
Description
Automatic sound classification is becoming increasingly important across domains in audio and music, particularly in the use of sound FX libraries for video and
audio post-production. The Universal Category System (UCS) has recently been adopted as the industry standard for organizing and classifying sound FX, yet the
performance of UCS-based classifiers in real-world scenarios remains underexplored. This thesis builds upon prior work on UCS-based sound classification and investigates
how different methodologies can enhance classifier performance, with a focus on real-world user-generated data from Freesound. As part of this work, a custom
Freesound dataset was built to enable evaluation under realistic conditions, alongside a professionally curated dataset. The study compares a range of models and
demonstrates that while strong results are achieved on curated data, performance drops considerably on heterogeneous real-world audio, highlighting domain transfer
challenges such as inconsistent metadata, variable quality, and semantic ambiguity. Fine-tuning on the custom dataset led to some improvements, particularly for
multimodal models, but performance remained far from ideal, showing that adaptation alone is not sufficient to overcome domain gaps. Overall, this work contributes
both a new dataset and an in-depth evaluation of UCS-based classification, while also pointing to future directions in embedding design, multimodal integration, and
domain-aware training for more robust and transferable sound classification systems in practical applications.
Files
Madhav-Jaideep_SMC_2025_Master_Thesis.pdf
Files
(4.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:fbc20049a9c4ef1bbfdb367760952930
|
4.3 MB | Preview Download |
Additional details
Dates
- Accepted
-
2025-10-09