Windowing & Overlap 
Windowing & Overlap 
Aug 14 2004, 05:41
Post
#1


Group: Members Posts: 22 Joined: 31December 03 From: Warwick, UK Member No.: 10830 
Hello everybody... I am trying to do the following but having troubles. The following is pseudo code.
(This happens once) Zero pad a 511 sized array(this will be used for overlap) (The following lines loop) Capture 512 samples of an audio signal and store them in the first half of a 1024 size array Zero pad the remaining half of the array Window the first 512 samples with Hanning Convert the entire array to the frequency domain with FFT Multiply the contents with the frequency of an Impulse response Inverse FFT to get the filtered audio signal Add the overlap to the filtered signal Strore last 511 samples of the filtered signal to the overlap array Output the filtered signal Go back to capturing 512 samples etc... The result is a filtered signal, with no clicks or pops (thanks to the windowing), but there are rapid amplitude chages. I have read just about every note I could find on the web, gone though PDF books and guides...but have not quite understood how many windowing do I do? Do I do another windowing operation somewhere? There are mentions online in places about how its done, but some just suggest it too briefly, and some go into some deep DSP theories which I admit is not one of my strong points. Could somebody who has an understanding on this topic help me out? Pseudo code or c/C++ outlines would be great. Thanks a lot in advance! PS: This is for a suit of plugins I am writing, whcih will be freely available at the end of the month. I will let you know more about them and where you can download them. But first I need to make sure they work 


Aug 15 2004, 08:28
Post
#2


MPEG4 AAC developer Group: Developer Posts: 398 Joined: 1June 03 Member No.: 6943 
QUOTE (aristotel @ Aug 13 2004, 08:41 PM) Hello everybody... I am trying to do the following but having troubles. The following is pseudo code. (This happens once) Zero pad a 511 sized array(this will be used for overlap) (The following lines loop) Capture 512 samples of an audio signal and store them in the first half of a 1024 size array Zero pad the remaining half of the array Window the first 512 samples with Hanning Convert the entire array to the frequency domain with FFT Multiply the contents with the frequency of an Impulse response Inverse FFT to get the filtered audio signal Add the overlap to the filtered signal Strore last 511 samples of the filtered signal to the overlap array Output the filtered signal Go back to capturing 512 samples etc... The result is a filtered signal, with no clicks or pops (thanks to the windowing), but there are rapid amplitude chages. I have read just about every note I could find on the web, gone though PDF books and guides...but have not quite understood how many windowing do I do? Do I do another windowing operation somewhere? There are mentions online in places about how its done, but some just suggest it too briefly, and some go into some deep DSP theories which I admit is not one of my strong points. Could somebody who has an understanding on this topic help me out? Pseudo code or c/C++ outlines would be great. Thanks a lot in advance! PS: This is for a suit of plugins I am writing, whcih will be freely available at the end of the month. I will let you know more about them and where you can download them. But first I need to make sure they work I think you should try this approach : Construct an array of N samples with N/2 samples shifted from previous frame.. DO NOT zero pad any samples !! Windowed the array with a suitable window.. Pass the windowed array into a FFT At the reconstruction side, the IFFT would reproduce the same windowed array.. Overlapped and add with the previous half window.. eg : w(n)S_2(n) + w(N/2 + n)S_1(N/2 +n) = S_Original(n).. Since that S_2(n) = S_1(N/2 +n) = S_Original(n) ; then the windowing function MUST obey this equation for perfect reconstruction.. : w(n) + w(N/2 +n) = 1 One possible window solution is the hanning (or was it hanming ???) window given by the following equation : w(n) = 0.5*(1.0  cos((2*PI * (n+0.5) / N))) /// Sorry for the mistake !!! This post has been edited by wkwai: Aug 16 2004, 11:23 


Aug 15 2004, 13:26
Post
#3


Group: Developer Posts: 1318 Joined: 20March 04 From: Göttingen (DE) Member No.: 12875 
It seems like you want to do plain filtering by "fast convolution" (via FFT).
There's NO need to apply a window function ! Let 'k' be the length of the impulse response of the filter. Now, if you want an infinite signal to be convolved with this impulse response you can do this blockwise and do some overlap adding. This is legal because of the properties of the filter (being a timeinvariant linear system) Let 'x' be your signal you want to be convolved with the filter's IR. Let 'n' be the smallest power of 2 which is graeter than or equal to k1. You can take blocks of n samples of your signal, convolve them seperately with the impulse response and do the overlapadd of the resulting signal blocks which have at maximum n+k1 nonzerosamples (due to the convolving). because of k1<=n we know n+k1<=2n. So, the length in samples of the convolved signal blocks are 2n at maximum. This convolution can be done via the FFT of size 2n. Here's an example: CODE impulse response (k=5 samples): i i i i i your signal you want to convolve with the IR divided into blocks of length n=8: a a a a a a a ab b b b b b b bc c c c c c c c... you take each block of n samples pad them with zeros to 2n samples like this: a a a a a a a a 0 0 0 0 0 0 0 0 and convolve it with the zeropadded impulse response: i i i i i 0 0 0 0 0 0 0 0 0 0 0 to get a'a'a'a'a'a'a'a'a'a'a'a'0 0 0 0 which can be done via fast Fouriertransforming both blocks, multiplication of their spectra (you better do this with the complex valued FFT and complex multiplication) and inverse transforming the result. Now the overlapadd part: a'a'a'a'a'a'a'a'a'a'a'a'0 0 0 0 + b'b'b'b'b'b'b'b'b'b'b'b'0 0 0 0 + c'c'c'c'c'c'c'c'c'c'c'c'0 0 0 0 The zeropadding is actually needed because the convolution you can calculate via FFT is a cyclic one and our source signal is NOT periodic. HTH, Sebastian This post has been edited by SebastianG: Aug 15 2004, 13:33 


Aug 15 2004, 15:35
Post
#4


Group: Developer Posts: 1318 Joined: 20March 04 From: Göttingen (DE) Member No.: 12875 
QUOTE (wkwai @ Aug 14 2004, 11:28 PM) I think you should try this approach : [...] (1) It's called "Hann" window. See Hann and Hamming Window (2) Your approach is flawed and introduces temporal alias artefacts. It does not take into account that FFT convolution is cyclic. (You need zero padding to compensate for this) Sebastian This post has been edited by SebastianG: Aug 15 2004, 15:36 


Aug 16 2004, 00:54
Post
#5


Group: Members Posts: 22 Joined: 31December 03 From: Warwick, UK Member No.: 10830 
Thank you for your replies and suggestions. I got rid of any windowing and taken Sebastian's sugegstion. I have been advised by others to do just that and aparrently it works! I have adjusted my code to the sugegsted method (which actually is what it was before I got carried away with the Windowing theory). However I get clicks/pops at the end of each frame block processed. This results in buzzing. I have added the code of my processing with comments bellow in case anybody can spot a problem.
Regarding the spectrum multiplication. I am using Apple's vDSP library's function zvmulD(). In the documentation it is described as "Complex Vector Multiply." On of the function arguments is the "conjugate." The documentation says "Assign conjugate flag 'conjugate' a value of 1 for normal multiplication or 1 for multiplication by conjugated values of input 1." I have tried both, and both give different waveforms and both still have the clicks/pops. I am new to FFTs and DSP in general so is it normal multiplication I need for my scenario or multiplication by conjugated values of input 1? While both give clicks and pops, perhaps this may shed some light on the overall problem with the pops in the signal. Thanks a lot! CODE // Store 512 incoming audio samples into array
for(UInt32 j = 0; j < 512; j++) { speakerTempBuffer[j] = audioData[j]; } // Zeropad the second half of the array for(int a = 512; a < 1024; a++) { speakerTempBuffer[a] = 0.0; } // Convert to Split Double Complex ctozD((DOUBLE_COMPLEX *) speakerTempBuffer, 2, &speakersSplitDoubleComplex, 1, nOver2); // FFT fft_zripD(fft1024Setup, &speakersSplitDoubleComplex, stride, log2n, kFFTDirection_Forward); // Multiplication of the spectrum with the spectrum of an HRTF impulse response zvmulD(&speakersSplitDoubleComplex, stride, &HRTF_0Degree0ElevationSplitDoubleComplex, stride, &speakersSplitDoubleComplex, stride, log2n, 1); // Inverse FFT the result fft_zripD(fft1024Setup, &speakersSplitDoubleComplex, stride, log2n, kFFTDirection_Inverse); // Scale the result vsmulD(speakersSplitDoubleComplex.realp, 1, &scale, speakersSplitDoubleComplex.realp, 1, nOver2); vsmulD(speakersSplitDoubleComplex.imagp, 1, &scale, speakersSplitDoubleComplex.imagp, 1, nOver2); // Convert to real array ztocD(&speakersSplitDoubleComplex, 1, (DOUBLE_COMPLEX *) speakerTempBuffer, 2, nOver2); // Add over lap for(int a = 0; a < overlapSize; a++) { speakerTempBuffer[a] += overLap[a]; } // Update over laps for(int a = 512; a < 1023; a++) { overLap[a  512] = speakerTempBuffer[a]; } // Store result to ouput stream for(UInt32 j = 0; j < 512; j++) { audioData[j] = (float) speakerTempBuffer[j]; } 


Aug 16 2004, 05:39
Post
#6


Group: Members Posts: 22 Joined: 31December 03 From: Warwick, UK Member No.: 10830 
YES! Its solved! The problem was the size of the array that the complex vector multiplication was being done for. It was log2n which in this case equals 10, which it should have been nOver2 which equals 512. I am putting the correct code bellow. It should be useful to newbies like myself who learn DSP better by reading code than text and math formulas.
Thanks for your help!!! CODE // Store 512 incoming audio samples into array
for(UInt32 j = 0; j < 512; j++) { speakerTempBuffer[j] = audioData[j]; } // Zeropad the second half of the array for(int a = 512; a < 1024; a++) { speakerTempBuffer[a] = 0.0; } // Convert to Split Double Complex ctozD((DOUBLE_COMPLEX *) speakerTempBuffer, 2, &speakersSplitDoubleComplex, 1, nOver2); // FFT fft_zripD(fft1024Setup, &speakersSplitDoubleComplex, stride, log2n, kFFTDirection_Forward); // Multiplication of the spectrum with the spectrum of an HRTF impulse response zvmulD(&speakersSplitDoubleComplex, stride, &HRTF_0Degree0ElevationSplitDoubleComplex, stride, &speakersSplitDoubleComplex, stride, nOver2, 1); // Inverse FFT the result fft_zripD(fft1024Setup, &speakersSplitDoubleComplex, stride, log2n, kFFTDirection_Inverse); // Scale the result vsmulD(speakersSplitDoubleComplex.realp, 1, &scale, speakersSplitDoubleComplex.realp, 1, nOver2); vsmulD(speakersSplitDoubleComplex.imagp, 1, &scale, speakersSplitDoubleComplex.imagp, 1, nOver2); // Convert to real array ztocD(&speakersSplitDoubleComplex, 1, (DOUBLE_COMPLEX *) speakerTempBuffer, 2, nOver2); // Add over lap for(int a = 0; a < overlapSize; a++) { speakerTempBuffer[a] += overLap[a]; } // Update over laps for(int a = 512; a < 1023; a++) { overLap[a  512] = speakerTempBuffer[a]; } // Store result to ouput stream for(UInt32 j = 0; j < 512; j++) { audioData[j] = (float) speakerTempBuffer[j]; } 


Aug 16 2004, 07:05
Post
#7


Group: Developer Posts: 1318 Joined: 20March 04 From: Göttingen (DE) Member No.: 12875 
I guess it's correct.
you may want to change the last part to CODE [...] // Convert to real array ztocD(&speakersSplitDoubleComplex, 1, (DOUBLE_COMPLEX *) speakerTempBuffer, 2, nOver2); // Store result to ouput stream via opverlap add & update overLap buffer for(UInt32 j = 0; j < 512; j++) { audioData[j] = (float) (speakerTempBuffer[j] + overLap[j]); overLap[j] = speakerTempBuffer[512+j]; } which saves some assignments. Make sure that the overLap buffer contains 512 samples. BTW: With a complexvalued FFT you can convolve a stereo signal (treat is as quadrature signal) with a mono (realvalued) impulse response at once for both channels or convolve a mono (realvalued) signal with a stereo (quadrature) impulseresponse for both ears in case of the HRTF stuff. There's no need to do this processing you are doing twice for each channel. bye, Sebastian This post has been edited by SebastianG: Aug 16 2004, 07:06 


Aug 16 2004, 10:55
Post
#8


MPEG4 AAC developer Group: Developer Posts: 398 Joined: 1June 03 Member No.: 6943 
QUOTE (SebastianG @ Aug 15 2004, 06:35 AM) QUOTE (wkwai @ Aug 14 2004, 11:28 PM) I think you should try this approach : [...] (1) It's called "Hann" window. See Hann and Hamming Window (2) Your approach is flawed and introduces temporal alias artefacts. It does not take into account that FFT convolution is cyclic. (You need zero padding to compensate for this) Sebastian Really? What is this temporal alias artefacts ? But isn't that padding the 2nd half of the FFT input samples with zero is equivalent to the multiplication with a window of a "brickwall" shape / the rectangular window ? As mentioned.. the window must obey the equation : w(N/2+n) + w(n) = 1 which is also true here.. But this window shape could produce boundary artifacts for the reconstructed audio as it is obvious in cases of DCT used in image and video compression systems.. Windowing the input samples with an "appropriate window function" would spread the boundary blocking artifacts into 2 adjacent frames.. and kept well below the masking properties of the actual signal itself.. wkwai This post has been edited by wkwai: Aug 16 2004, 11:57 


Aug 16 2004, 11:55
Post
#9


Group: Developer Posts: 1318 Joined: 20March 04 From: Göttingen (DE) Member No.: 12875 
QUOTE (wkwai @ Aug 16 2004, 01:55 AM) Really??? I would like to understand more of the mathematical foundation to your approach... How does this temporal alias artefacts came from?? If you convolve a discrete signal which consists of n consecutive nonzero samples with another signal with k samples you get a signal with n+k1 consecutive nonzero samples (or less). If you try to compute this via an npoint FFT the last k1 samples of the convolved signal which should contribute to the next block get wrapped around and create an "alias". That's why I used the term "cyclic convolution". If you want to calculate this via an npoint FFT you need to shorten the blocks you convolve with the impulse response so the convolved blocks are at most n samples long. Your idea only works in case of the impulse response being a scaled unit impulse where k=1. recap: b = blocksize / step n = FFTSize k = length of the impulse response in samples Then this must be satisfied: b+k1<=n The signal blocks and the impulse response have to be zeropadded to n samples. FFTConvolutionExample: In case of n=4 and the two signals to be convolved via the FFT are a=(a_0,a_1,a_2,a_3) and b=(b_0,b_1,b_2,b_3) the "FFTConvolution" yields c=(c_0,c_1,c_2,c_3) with c_0 = a_0 * b_0 + a_3 * b_1 + a_2 * b_2 + a_1 * b_3 c_0 = a_1 * b_0 + a_0 * b_1 + a_3 * b_2 + a_2 * b_3 c_0 = a_2 * b_0 + a_1 * b_1 + a_0 * b_2 + a_3 * b_3 c_0 = a_3 * b_0 + a_2 * b_1 + a_1 * b_2 + a_0 * b_3 Notice when b_1>0, a_3 contributes to c_0 which should contribute to c_4 instead in case of a "normal" convolution. This is a cyclic convolution. HTH, Sebastian 


Aug 16 2004, 13:32
Post
#10


Group: Developer Posts: 1318 Joined: 20March 04 From: Göttingen (DE) Member No.: 12875 
QUOTE (wkwai @ Aug 16 2004, 01:55 AM) But isn't that padding the 2nd half of the FFT input samples with zero is equivalent to the multiplication with a window of a "brickwall" shape / the rectangular window ? As mentioned.. the window must obey the equation : w(N/2+n) + w(n) = 1 which is also true here.. But this window shape could produce boundary artifacts for the reconstructed audio as it is obvious in cases of DCT used in image and video compression systems.. Windowing the input samples with an "appropriate window function" would spread the boundary blocking artifacts into 2 adjacent frames.. and kept well below the masking properties of the actual signal itself.. Remember that a convolution is linear and timeinvariant. It's therefore legal to split the signal into blocks, convolve them seperately and add these convolved blocks to form the new signal. Try to understand what actually happens if you multiply 2 Fourier spectra and why I always used the term cyclic convolution. You may also want to check Chapter 18 of "DSPGUIDE". Sebastian 


Aug 17 2004, 09:38
Post
#11


MPEG4 AAC developer Group: Developer Posts: 398 Joined: 1June 03 Member No.: 6943 
QUOTE (SebastianG @ Aug 16 2004, 02:55 AM) QUOTE (wkwai @ Aug 16 2004, 01:55 AM) Really??? I would like to understand more of the mathematical foundation to your approach... How does this temporal alias artefacts came from?? If you convolve a discrete signal which consists of n consecutive nonzero samples with another signal with k samples you get a signal with n+k1 consecutive nonzero samples (or less). If you try to compute this via an npoint FFT the last k1 samples of the convolved signal which should contribute to the next block get wrapped around and create an "alias". That's why I used the term "cyclic convolution". If you want to calculate this via an npoint FFT you need to shorten the blocks you convolve with the impulse response so the convolved blocks are at most n samples long. Your idea only works in case of the impulse response being a scaled unit impulse where k=1. recap: b = blocksize / step n = FFTSize k = length of the impulse response in samples Then this must be satisfied: b+k1<=n The signal blocks and the impulse response have to be zeropadded to n samples. FFTConvolutionExample: In case of n=4 and the two signals to be convolved via the FFT are a=(a_0,a_1,a_2,a_3) and b=(b_0,b_1,b_2,b_3) the "FFTConvolution" yields c=(c_0,c_1,c_2,c_3) with c_0 = a_0 * b_0 + a_3 * b_1 + a_2 * b_2 + a_1 * b_3 c_0 = a_1 * b_0 + a_0 * b_1 + a_3 * b_2 + a_2 * b_3 c_0 = a_2 * b_0 + a_1 * b_1 + a_0 * b_2 + a_3 * b_3 c_0 = a_3 * b_0 + a_2 * b_1 + a_1 * b_2 + a_0 * b_3 Notice when b_1>0, a_3 contributes to c_0 which should contribute to c_4 instead in case of a "normal" convolution. This is a cyclic convolution. HTH, Sebastian Are you suggesting that if I input an N samples into a FFT and then the output of the FFT back into a IFFT, I don't get back the original N samples ??? assuming that the normalization factors are properly handled and truncation errors kept to minimum.. I will try to write a simple program to verify this.. Thanks for the "enlightenment".. wkwai This post has been edited by wkwai: Aug 17 2004, 09:39 


Aug 17 2004, 11:28
Post
#12


Group: Developer Posts: 1318 Joined: 20March 04 From: Göttingen (DE) Member No.: 12875 
QUOTE (wkwai @ Aug 17 2004, 12:38 AM) Are you suggesting that if I input an N samples into a FFT and then the output of the FFT back into a IFFT, I don't get back the original N samples ??? assuming that the normalization factors are properly handled and truncation errors kept to minimum.. No. Let a,b \in \mathbb{C}^n CODE  a_1 a_n a_{n1} .. a_2   b_1   a_2 a_1 a_n .. a_3   b_2  iFFT(FFT(a) .* FFT(b)) =  a_3 a_2 a_1 .. a_4   b_3   : : : .. :   :   a_n a_{n1} a_{n2} .. a_1   b_n  where ".*" denotes the componentwise product. To avoid those nasty wraparounds 'a' and 'b' have to be zeropadded. Be 't' the greatest index with a_t!=0 and 'k' the greatest index so that b_k!=0. If k+t1<=n there won't be any wraparounds. I don't know how to explain this any clearer. Keep in mind that the Fourier transform gives you a representation of a periodic signal. For further studies I suggest reading some DSPGUIDE chapters. Sebastian This post has been edited by SebastianG: Aug 17 2004, 11:29 


Aug 18 2004, 07:28
Post
#13


Group: Developer Posts: 1245 Joined: 16December 02 From: Australia Member No.: 4097 
QUOTE (wkwai @ Aug 17 2004, 06:38 PM) Are you suggesting that if I input an N samples into a FFT and then the output of the FFT back into a IFFT, I don't get back the original N samples ??? assuming that the normalization factors are properly handled and truncation errors kept to minimum.. I believe the simple interpretation of "linear convolution in the time domain = multiplication in the frequency domain" only applies to continuous time/frequency signals, where you don't have sampling occuring and things wrapping around because of it. For the case of the FFT, which is a transform based on discrete time, the equivalent of multiplication in frequency is circular/cyclic convolution in the time domain, rather than your typical linear convolution. ie. FFTbased convolution is the same as convolving your discrete time signal wrapped around itself, which gives you a periodic signal. Thus you need to zero pad it enough to increase the period of the signal in avoid the effects of circular convolution ie. the effect of terms that have wrapped around In subband image coding, we usually don't use zero padding as the discontinuity can cause edge distortions, but opt for something smoother like symmetric extension instead. Here's a matlab example: CODE a=[1 6 2 3 4 9]; b=[7 2 3]; >> conv(a,b) ans = 7 44 29 43 40 80 30 27 Now that's linear convolution. Doing a straight multiplication using FFT gives: CODE >> c=ifft(fft(a).*fft(b,6)) c = 37 71 29 43 40 80 which isnt the same. Now we do some zero padding. Since the final output is 6+31=8 samples, I zero pad them out to 8: CODE >> d=[a zeros(1,2)]; >> e=[b zeros(1,5)]; f=ifft(fft(e).*fft(d)) f = 7.0000 44.0000 29.0000 43.0000 40.0000 80.0000 30.0000 27.0000 Now we have our linear convolution result This post has been edited by QuantumKnot: Aug 18 2004, 07:57 


Aug 19 2004, 08:53
Post
#14


MPEG4 AAC developer Group: Developer Posts: 398 Joined: 1June 03 Member No.: 6943 
Yeah.. I think I understand the problem here.. We are dealing with short time convolution here.. Not continous time convolution.. As a result, the extra k1 samples need to be regenerated from the IMDCT..
I did some studying on the short time convolution.. and verified that the k1 extra samples belong to the next frame.. which you need to add to first few samples of the next frame.. In my ealier equations, I assumed that convolution is just a straight forward multiplication in the freq domain.. I failed to take into consideration of the "boundary conditions" needed to perfectly reconstruct the audio signal.. My appologies.. But I have another very troubling problem.. Supposed that I had an FFT spectral of N/2 spectrals components.. Then I attempt to zeroed out the upper coefficient to "bandlimit" the spectral.. According to this insight provided from you folks.. this is WRONG!!! What about the CoolEdit FFT filter option where one could just generate "ANY" filter frequency response shape as he likes?? I wondered how is it implemented? Did they actually reconvert this response drawn by any user into impulse response.. by an IFFT and then determined the input samples length to the FFT ?? I wondered too if the actual CoolEdit FFT filter response is just an approximation of the one drawn by the user?? In Twin VQ / VQF, there is also some very troubling problem.. because, Mr. Arigatou of NTT who developed the coding technique assumed that the MDCT spectral shape is identical to the FFT spectral shape.. So, the estimated LPC spectral shape is used to flattened the MDCT spectral shape which is not correct.. mathematically.. since the lpc calculation is calculated in the TIME DOMAIN and the MDCT decomposition is actually a critically downsampled subband filterbank !! Why shouldn't the convolution be done in time domain instead of trying to transform everything into freq domain.. and then having to calculate the square 2 root of the LPC estimated spectral shape ?? It is an unnecessary algorithmic complexity.. plus the fact that it is mathematically incorrect.. wkwai This post has been edited by wkwai: Aug 19 2004, 09:31 


Aug 19 2004, 09:56
Post
#15


Group: Developer Posts: 1318 Joined: 20March 04 From: Göttingen (DE) Member No.: 12875 
QUOTE (wkwai @ Aug 18 2004, 11:53 PM) But I have another very troubling problem.. Supposed that I had an FFT spectral of N/2 spectrals components.. Then I attempt to zeroed out the upper coefficient to "bandlimit" the spectral.. According to this insight provided from you folks.. this is WRONG!!! What about the CoolEdit FFT filter option where one could just generate "ANY" filter frequency response shape as he likes?? I wondered how is it implemented? Did they actually reconvert this response drawn by any user into impulse response.. by an IFFT and then determined the input samples length to the FFT ?? I don't know for sure, but I guess Cool Edit generates an impulse response of the desired filter's frequency response via iFFT + windowing and uses it for fast convolution via FFT the way I described above. QUOTE (wkwai @ Aug 18 2004, 11:53 PM) I wondered too if the actual CoolEdit FFT filter response is just an approximation of the one drawn by the user?? Yes, but a close one. It depends on your desired frequency response and FFT size settings. If you want a very close approximation of a brickwall lowpass you need the impulse response to be vera large. QUOTE (wkwai @ Aug 18 2004, 11:53 PM) In Twin VQ / VQF, there is also some very troubling problem.. because, Mr. Arigatou of NTT who developed the coding technique assumed that the MDCT spectral shape is identical to the FFT spectral shape.. So, the estimated LPC spectral shape is used to flattened the MDCT spectral shape which is not correct.. mathematically.. since the lpc calculation is calculated in the TIME DOMAIN and the MDCT decomposition is actually a critically downsampled subband filterbank !! Why shouldn't the convolution be done in time domain instead of trying to transform everything into freq domain.. and then having to calculate the square 2 root of the LPC estimated spectral shape ?? It is an unnecessary algorithmic complexity.. plus the fact that it is mathematically incorrect.. Well, the thing is: You don't convert it back to timedomain after frequency domain filtering, so it's totally reversible. The actual signal itself isn't altered after encoding and decoding. But the introduced noise get's "filtered" through the LPC synthesis filter which is done in the MDCT domain and introduces an temporal alias. But it does not matter because it's only noise. Sebastian 


Aug 19 2004, 10:25
Post
#16


MPEG4 AAC developer Group: Developer Posts: 398 Joined: 1June 03 Member No.: 6943 
[/quote] Well, the thing is: You don't convert it back to timedomain after frequency domain filtering, so it's totally reversible. The actual signal itself isn't altered after encoding and decoding. But the introduced noise get's "filtered" through the LPC synthesis filter which is done in the MDCT domain and introduces an temporal alias. But it does not matter because it's only noise. Sebastian [/quote] Sure it is not a problem at the encoderdecoder pair.. but at the encoder itself.. It is assuming that the mdct spectral shape is FFT spectral shape !!! Which is mathematically wrong.. Would have been better to convolute in time domain first before transforming it with the MDCT.. ??? That would make something like a GSM speech lpc analysis.. timedomain.. wkwai 


Aug 19 2004, 10:56
Post
#17


Group: Developer Posts: 1318 Joined: 20March 04 From: Göttingen (DE) Member No.: 12875 
QUOTE (wkwai @ Aug 19 2004, 01:25 AM) Sure it is not a problem at the encoderdecoder pair.. but at the encoder itself.. It is assuming that the mdct spectral shape is FFT spectral shape !!! Which is mathematically wrong.. Would have been better to convolute in time domain first before transforming it with the MDCT.. ??? That would make something like a GSM speech lpc analysis.. timedomain.. The MDCT is related in the same way as the FFT to the LPC analysis/synthesis. You could have done it in the time domain. But in this case it would have been very expensive because of the high orders of the filters. In fact you need higher orders if you want to do this in the timedomain since you cannot use the logarithmic / bark frequency scale like it's done in VQF / Vorbis (with floor 0). Sebastian 


Aug 21 2004, 02:49
Post
#18


Group: Developer Posts: 1245 Joined: 16December 02 From: Australia Member No.: 4097 
QUOTE (wkwai @ Aug 19 2004, 05:53 PM) But I have another very troubling problem.. Supposed that I had an FFT spectral of N/2 spectrals components.. Then I attempt to zeroed out the upper coefficient to "bandlimit" the spectral.. According to this insight provided from you folks.. this is WRONG!!! What about the CoolEdit FFT filter option where one could just generate "ANY" filter frequency response shape as he likes?? I wondered how is it implemented? Did they actually reconvert this response drawn by any user into impulse response.. by an IFFT and then determined the input samples length to the FFT ?? I wondered too if the actual CoolEdit FFT filter response is just an approximation of the one drawn by the user?? I also suspect that is what the CoolEdit FFT filter option does. It can use either the window method or frequencysampling method of FIR filter design. 


Aug 21 2004, 06:54
Post
#19


MPEG4 AAC developer Group: Developer Posts: 398 Joined: 1June 03 Member No.: 6943 
QUOTE (SebastianG @ Aug 19 2004, 01:56 AM) QUOTE (wkwai @ Aug 19 2004, 01:25 AM) Sure it is not a problem at the encoderdecoder pair.. but at the encoder itself.. It is assuming that the mdct spectral shape is FFT spectral shape !!! Which is mathematically wrong.. Would have been better to convolute in time domain first before transforming it with the MDCT.. ??? That would make something like a GSM speech lpc analysis.. timedomain.. The MDCT is related in the same way as the FFT to the LPC analysis/synthesis. You could have done it in the time domain. But in this case it would have been very expensive because of the high orders of the filters. In fact you need higher orders if you want to do this in the timedomain since you cannot use the logarithmic / bark frequency scale like it's done in VQF / Vorbis (with floor 0). Sebastian Perhaps.. it is in their (Japanese) Engineering mindset.. Engineering is both an art form as well as Science ?? I know that in some Engineering cultures.. Engineering students are taught to design like an "artist".. which is unfortunately.. I am NOT trained in.. which is also why I DON'T do things like GUI designs.. 


LoFi Version  Time is now: 1st October 2014  21:51 