Automatic Detection of Audio Defects in Personal Music Collections
Creators
Contributors
Supervisor:
Description
Personal digital music collections have been growing in the last decade due to audio formats like MP3, WAVE or FLAC. They usually come from diverse sources and some may not be always reliable. They may have clicks, gaps, bad equalization, clipping, noise or many other kinds of alterations. Vast amounts of important audio material, from historic recordings to relatively recent recordings on analogue or even primitive digital media, were re-released on the new digital media formats. Digital audio restoration has had an increasing application to sound recordings from the Internet, however, there is still a very large amount of music collectors with lots of music in legacy formats such as vinyl or cassette that with the advent of sharing culture through Internet are making available many digitized audio files that may not have passed through a proper audio restoration. The quality assessment of audio signal has made a lot of improvements since the digital signal processing (DSP) techniques appeared and started to be used for sound restoration purposes. Subjective assessment of audio quality has been used for a long time, but its time-consuming and external human and technical influences (such as listener’s expertise, sensitivity or the evaluation equipment) have lead to objective approaches, such as the PEAQ for wideband audio signals and PESQ for speech signals in ITU standard regulations. Some of these issues have been already addressed and they even have commercial implementations. However, there is still room for research for some others. In this work, current taxonomy of known audio defects is reviewed according to the state of the art methods, highlighting the characteristics of each type and the solutions (if any) for their detection and correction. Afterwards, the vinyl technology is analyzed due to its error-prone nature. That is why the defects related to digitizing vinyl media are chosen for research here: the lack of RIAA filtering and the altered playback speed. Later, the mechanisms for detection are exposed. Those mechanisms are based on the psychoacoustic model developed by Zwicker (that is, the use of bark-band decomposition of the spectrum) and state-of-the-art machine learning techniques. Their implementation is defined based on preliminary data obtained from a reduce dataset of 200 instances split in 10 different genres. The resulting algorithms are evaluated under an extended defect-controlled dataset of 2000 and 800 files respectively. Two different machine-learning techniques are used, a decision-tree (C4.5) and Support Vector Machines (SMO). The accuracy is discussed for both of them against the global dataset and per genres subsets (in the case of the the lack of RIAA filtering) using the 10-Fold cross-validation method. Finally their doability for the problem under test is analyzed and further improvements are suggested.
Files
Ignasi-Adell-Master-thesis-2016.pdf
Files
(3.3 MB)
Name | Size | Download all |
---|---|---|
md5:2eb6312ecbcb26ff494fb0a198e95c55
|
3.3 MB | Preview Download |