Robust audio watermarking in the time domain

The audio watermarking method presented below offers copyright protection to an audio signal by modifying its temporal characteristics. The amount of modification embedded is limited by the necessity that the output signal must not be perceptually different from the original one. The watermarking method presented here does not require the original signal for watermark detection. The watermark key is simply a seed known only by the copyright owner. This seed creates the watermark signal to be embedded. Watermark embedding depends also on the audio signal amplitude in a way that minimizes the audibility of the watermark signal. The embedded watermark is robust to MPEG audio coding, filtering, resampling and requantization.


Introduction
The outstanding progress of digital technology has increased the ease with which digital data is reproduced and retransmitted. However, since the advantages of such a progress are broadly available, they oer equally increasing potential to both legal and unauthorized data manipulation. Consequently, the necessity arises for copyright protection of digital products against unauthorized recording attempts, known as data piracy.
Current research in image, audio and video copyright protection exploits the fact that the human visual and audio perception cannot detect slight c hanges in certain temporal or frequency domains of the image and the audio signal, respectively. This property i s c a l l e d masking, according to which a faint but perceptible signal becomes non-perceptible in the presence of another one under certain conditions.
Most research methods consider a watermark signal produced in a unique way b y a function of one or more input keys. These keys can be both owner and signal dependent and generate a signal which i s e m bedded on the original one. The embedding signal is known as a watermark or copyright label. Temporal and frequency characteristics of the original signal should be taken into account i n t h e w atermark casting process to reduce perceptible distortions in the watermarked signal. Each individual that produces or possesses digital data owns a unique key that identies its legal possession and is required for the watermark detection. Besides copyright purposes, a watermark could serve authentication purposes, as well.
Methods that are discussed in this paper investigate the watermarking potential in audio signals, taking into account the specications of the human audio perception. An audio watermark is a perceptually inaudible modication of the audio signal, based on one or more keys determined by the copyright o wner [1]. Whenever a question about ownership arises, keys are used to determine the rightful owner.
Our watermark is embedded in the spatial domain. Unlike other methods, it does not require the original signal for its detection. This way t h e o wner of the data does not have t o k eep double copies of both original and watermarked products.
A watermark has to be statistically undetectable by others to prevent the eorts of its unauthorized removal. This condition is fullled if the potential numberofkeys that produce distinct watermarks is large enough to ensure statistical safety. The detection scheme should be as statistically reliable as possible. False rejection or acceptance of the existence of the watermark should be minimal. Finally, a w atermark has to be robust to signal manipulation and impossible to be removed without signicant alteration of the signal. In other words, a pirate should have to destroy the audio signal before he accomplishes to destroy the watermark. The robustness should extend to common signal processing operations, such as ltering, compression, resampling, requantization, cropping, noise, D/A conversion. Our watermark scheme fullls most of the above conditions up to a satisfactory level and current research is being held out to improve the robustness of the algorithm to any form of unauthorized manipulation.
In the audio watermarking area there are methods that use the frequency domain [1,2]. Some of them exploit the frequency characteristics of the audio signal in order to embed the watermark, by minimizing audible distortions even for high amplitude watermarks. How-ever, most of the above methods require the use of the original signal in order to detect the watermark.
In images, there are methods that cast a watermark either in the spatial [3,4,5] or in the frequency domain [2,6 ]. The casting signal is generated in a random way and digital data is often divided in casting subsets in order to increase the robustness of the detection scheme to signal processing. In both cases, the amplitude of each sample of the watermarking signal is either constant o r calculated as a function of the amplitude of the original sample. Our audio watermarking method has certain similarities with a method used for image watermarking [3].

Watermark Embedding
The watermark embedding scheme proposed in this paper modies original digital audio signals, which are represented as 16-bit or 8-bit sample sequences, by c hanging the least signicant b i t s o f e a c h sample. The result is a s l i g h t amplitude modication of each sample in a way that does not produce any perceptual dierence. Let us assume an audio signal of N samples x(i), i = 1 ; : : : ; N .
In order to embedawatermark we modify each sample using a function f(x(i); w (i)), where w(i) i s t h e w atermark signal in the range [ ; ] and is a constant. The watermarked sample y(i) is therefore: The signal to noise ratio is calculated by S NR= 1 0 log 10 P n x 2 (n) P n [x(n) y(n)] 2 Here, it is important to denote that the random generator w should provide statistically equal numbers of discrete output values in order for the detection procedure to function more accurately. The robustness of the watermark generally increases with the amplitude of the watermarking signal, but the noise poses a limit to this increase.

Watermark Detection
Let us denote by S the following sum: By combining (1), (2) we get: The rst sum in (3) is zero if the random generator produces equal numbers of discrete output values and the signal mean value m x is equal to zero. In case some random output values are more frequently produced this dierence, denoted as w, m ust be taken into account.
Therefore, in (3) On the other hand, if the signal is watermarked, However, x(i) is of the original signal which cannot be used in the detection process.
x(i) can be replaced by y(i) in the last two terms of (4) without signicant error. This replacement leads to the replacement o f P w i=1 x(i)w(i) b y w N S. Therefore, by subtracting the amount w N jSj from S and dividing the result by P N i=1 f(y(i); w (i))w(i), the result r is approximately normalized to 0 or 1. The watermark detector used in this method produces the detection value r given by: The detection value theoretically lies between 0 and 1, yet the approximation of x(i) by y(i) introduces an inaccuracy which slightly expands the interval [0; 1] to [0 "; 1 + "]. Experimentally, a watermark threshold may b e e m bedded above 0 :5, in order to decide whether a certain watermark exists in the signal. The threshold used can be increased if we require increasing certainty in relation to the watermark detection. Figure 1 illustrates the empirical pdf of the detection value r in a watermarked and a non-watermarked signal.
The empirical pdf of a non-watermarked signal is represented by the solid curve, whereas the dashed curve shows the empirical pdf function of the watermarked signal. Both distributions have been calculated using 1000 dierent w atermarks with S NR= 2 6 . Watermark detection attempt of a signal using various keys Watermark key is 444 Figure 2: Detection values in a watermarked signal using various seeds (Key is 444).
The embedding of multiple watermarks causes incremental audible distortion. The maximum number of multiple watermarks to be embedded without audible distortion depends on the amplitude of each w atermarking signal. The worst case considers that the rst watermark reaches the threshold of noise, so the second one will be audibly perceptible even if it is of low amplitude. However, multiple watermarks are being detected in a signal with equal success. In conclusion, a multiple watermarking scheme is possible as soon as the combination of the watermarks is chosen carefully in a way that the whole acoustic result is not audibly perceptible.
The detection algorithm gives no false alarm even when it is tested on a watermarked signal using other wrong watermark keys. For example, 1000 keys, of which only one was valid, have been used to detect a watermark in a watermarked signal. The results are being shown in Figure 2.

Audio Watermark Robustness to Signal Manipulation 4.1 Robustness to MPEG2 audio compression
The robustness of the watermark technique described above, has been tested using Layers II and III of the MPEG Audio. Several 16-bits signed stereo 44.1KHz watermarked signals were encoded using 80 kbps rates in Layer III and 48 kbps rates in Layer II. The lower the rate the bigger is the compression ratio. The watermark resists the encoding-decoding process as shown in Figure 3(a). In this Figure, 1000 distinct watermarks (SNR=26) were embedded into an audio signal producing equal number of distinct watermarked signals. All were compressed by MPEG2 audio Layer III with a compression rate of 80kbps. Watermarking detection after decompression indicates a slight decrease of the detection values in the watermarked signals. The solid curve shows the pdf of the detection values after MPEG2 in comparison with the dashed curve w h i c h indicates the pdf of the same values before MPEG2. Since the detection ratio is always above the 0.5 threshold, we h a ve 100% success in watermark detection in this experiment. Layer II 48kbps causes an audible distortion to the wa-termarked signal, yet the watermark detection is being retained successfully.

Robustness to Filtering
The robustness of the watermark procedure described in this paper was studied under moving average and other types of low-pass ltering. Watermarked audio les were ltered by a moving average lter of length 20 which i n troduces a noticeable audible distortion, yet the watermark is detected. Figure 3(b) shows the alteration in detection values introduced by the use of the above mean lter. In general, the detection values are increased in the ltered signals and this is the reason why the pdf of the detection values of the ltered signals in 3(b) is translated to the right in comparison with the non-ltered respective p d f .
Our watermark is also robust to lowpass ltering. 1000 watermarked audio signals sampled at 44100Hz were ltered by a 25th order Hamming lowpass lter with cut-o frequency 2205Hz. The solid curve of gure 3(c) displays the pdf of the detection values after ltering, while the dashed curve indicates the respective pdf before ltering. It is obvious that the mean value of the solid (ltered) curve is increased in comparison with the dashed (non-ltered) one. The deviation of the solid curve is increased as well, therefore slightly reducing the robustness of the detection scheme after lowpass ltering. In both experiments, we h a ve 100% success in watermark detection.

Robustness to Resampling and Requantization
Watermarked audio signals sampled at 44100Hz have been resampled down to 22050 Hz and 11025 Hz and back again to their initial frequency. Although the above processing caused noticeable distortion in relation to the original signals, the watermarks remained easily detectable. Figure 4 shows how the watermark is retained in 1000 watermarked signals that have been resampled down to 11025 Hz and back to their initial 44100 Hz frequency. In this experiment w e h a ve 100% success in watermark detection. Requantization of the original 16-bit audio signal down to 8-bit samples and backwards conserves the embedding watermark despite the loss of information during the processing. The watermarks resist the requantization process because it is amplitude adaptable with respect to the original signal. Figure 5 shows how t h e watermark is retained in 1000 watermarked signals that have been requantized down to 8-bit and back to 16bit. The deviation of the requantized pdf is increased as in lowpass ltering, thus reducing the robustness of detection. In this experiment w e h a ve 9 9 ; 8% success in watermark detection. All above experiments were held using the same watermark parameters SNR=26.   The watermarking scheme presented above embeds a watermark in the time domain of a digital audio signal by slightly modifying the amplitude of each audio sample. The characteristics of this modication are determined both by the original signal and the copyright owner. The detection procedure does not use the original signal. Our watermarking scheme is statistically imperceptible and resists MPEG compression plus other forms of signal manipulation, such as ltering, resampling and requantization.