IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Windowing & Overlap
aristotel
post Aug 14 2004, 05:41
Post #1





Group: Members
Posts: 22
Joined: 31-December 03
From: Warwick, UK
Member No.: 10830



Hello everybody... I am trying to do the following but having troubles. The following is pseudo code.

(This happens once)
-Zero pad a 511 sized array(this will be used for overlap)

(The following lines loop)
-Capture 512 samples of an audio signal and store them in the first half of a 1024 size array
-Zero pad the remaining half of the array
-Window the first 512 samples with Hanning
-Convert the entire array to the frequency domain with FFT
-Multiply the contents with the frequency of an Impulse response
-Inverse FFT to get the filtered audio signal
-Add the overlap to the filtered signal
-Strore last 511 samples of the filtered signal to the overlap array
-Output the filtered signal
-Go back to capturing 512 samples etc...

The result is a filtered signal, with no clicks or pops (thanks to the windowing), but there are rapid amplitude chages. I have read just about every note I could find on the web, gone though PDF books and guides...but have not quite understood how many windowing do I do? Do I do another windowing operation somewhere? There are mentions online in places about how its done, but some just suggest it too briefly, and some go into some deep DSP theories which I admit is not one of my strong points. Could somebody who has an understanding on this topic help me out? Pseudo code or c/C++ outlines would be great.

Thanks a lot in advance! rolleyes.gif

PS: This is for a suit of plug-ins I am writing, whcih will be freely available at the end of the month. I will let you know more about them and where you can download them. But first I need to make sure they work cool.gif
Go to the top of the page
+Quote Post
wkwai
post Aug 15 2004, 08:28
Post #2


MPEG4 AAC developer


Group: Developer
Posts: 398
Joined: 1-June 03
Member No.: 6943



QUOTE (aristotel @ Aug 13 2004, 08:41 PM)
Hello everybody... I am trying to do the following but having troubles. The following is pseudo code.

(This happens once)
-Zero pad a 511 sized array(this will be used for overlap)

(The following lines loop)
-Capture 512 samples of an audio signal and store them in the first half of a 1024 size array
-Zero pad the remaining half of the array
-Window the first 512 samples with Hanning
-Convert the entire array to the frequency domain with FFT
-Multiply the contents with the frequency of an Impulse response
-Inverse FFT to get the filtered audio signal
-Add the overlap to the filtered signal
-Strore last 511 samples of the filtered signal to the overlap array
-Output the filtered signal
-Go back to capturing 512 samples etc...

The result is a filtered signal, with no clicks or pops (thanks to the windowing), but there are rapid amplitude chages. I have read just about every note I could find on the web, gone though PDF books and guides...but have not quite understood how many windowing do I do? Do I do another windowing  operation somewhere? There are mentions online in places about how its done, but some just suggest it too briefly, and some go into some deep DSP theories which I admit is not one of my strong points. Could somebody who has an understanding on this topic help me out? Pseudo code or c/C++ outlines would be great.

Thanks a lot in advance!  rolleyes.gif

PS: This is for a suit of plug-ins I am writing, whcih will be freely available at the end of the month. I will let you know more about them and where you can download them. But first I need to make sure they work cool.gif
*



I think you should try this approach :

Construct an array of N samples with N/2 samples shifted from previous frame.. DO NOT zero pad any samples !!

Windowed the array with a suitable window..

Pass the windowed array into a FFT


At the reconstruction side, the IFFT would reproduce the same windowed array.. Overlapped and add with the previous half window..

eg :
w(n)S_2(n) + w(N/2 + n)S_1(N/2 +n) = S_Original(n)..

Since that S_2(n) = S_1(N/2 +n) = S_Original(n) ;

then the windowing function MUST obey this equation for perfect reconstruction.. :

w(n) + w(N/2 +n) = 1


One possible window solution is the hanning (or was it hanming ???) window given by the following equation :

w(n) = 0.5*(1.0 - cos((2*PI * (n+0.5) / N))) /// Sorry for the mistake !!!

This post has been edited by wkwai: Aug 16 2004, 11:23
Go to the top of the page
+Quote Post
SebastianG
post Aug 15 2004, 13:26
Post #3





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



It seems like you want to do plain filtering by "fast convolution" (via FFT).
There's NO need to apply a window function !

Let 'k' be the length of the impulse response of the filter.
Now, if you want an infinite signal to be convolved with this impulse response you can do this block-wise and do some overlap adding. This is legal because of the properties of the filter (being a time-invariant linear system)

Let 'x' be your signal you want to be convolved with the filter's IR.
Let 'n' be the smallest power of 2 which is graeter than or equal to k-1.

You can take blocks of n samples of your signal, convolve them seperately with the impulse response and do the overlap-add of the resulting signal blocks which have at maximum n+k-1 non-zero-samples (due to the convolving).
because of k-1<=n we know n+k-1<=2n. So, the length in samples of the convolved signal blocks are 2n at maximum. This convolution can be done via the FFT of size 2n.

Here's an example:
CODE
impulse response (k=5 samples):
 |i i i i i|

your signal you want to convolve with the IR divided into blocks of length n=8:
 |a a a a a a a a|b b b b b b b b|c c c c c c c c|...

you take each block of n samples pad them with zeros to 2n samples like this:
 |a a a a a a a a 0 0 0 0 0 0 0 0|
and convolve it with the zero-padded impulse response:
 |i i i i i 0 0 0 0 0 0 0 0 0 0 0|
to get
 |a'a'a'a'a'a'a'a'a'a'a'a'0 0 0 0|
which can be done via fast Fourier-transforming both blocks, multiplication of their spectra (you better do this with the complex valued FFT and complex multiplication) and inverse transforming the result.

Now the overlap-add part:
 |a'a'a'a'a'a'a'a'a'a'a'a'0 0 0 0|
+                 |b'b'b'b'b'b'b'b'b'b'b'b'0 0 0 0|
+                                 |c'c'c'c'c'c'c'c'c'c'c'c'0 0 0 0|


The zero-padding is actually needed because the convolution you can calculate via FFT is a cyclic one and our source signal is NOT periodic.

HTH,
Sebastian

This post has been edited by SebastianG: Aug 15 2004, 13:33
Go to the top of the page
+Quote Post
SebastianG
post Aug 15 2004, 15:35
Post #4





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



QUOTE (wkwai @ Aug 14 2004, 11:28 PM)
I think you should try this approach :
[...]


(1) It's called "Hann" window. See Hann and Hamming Window

(2) Your approach is flawed and introduces temporal alias artefacts. It does not take into account that FFT convolution is cyclic. (You need zero padding to compensate for this)

Sebastian

This post has been edited by SebastianG: Aug 15 2004, 15:36
Go to the top of the page
+Quote Post
aristotel
post Aug 16 2004, 00:54
Post #5





Group: Members
Posts: 22
Joined: 31-December 03
From: Warwick, UK
Member No.: 10830



Thank you for your replies and suggestions. I got rid of any windowing and taken Sebastian's sugegstion. I have been advised by others to do just that and aparrently it works! I have adjusted my code to the sugegsted method (which actually is what it was before I got carried away with the Windowing theory). However I get clicks/pops at the end of each frame block processed. This results in buzzing. I have added the code of my processing with comments bellow in case anybody can spot a problem.

Regarding the spectrum multiplication. I am using Apple's vDSP library's function zvmulD(). In the documentation it is described as "Complex Vector Multiply." On of the function arguments is the "conjugate." The documentation says "Assign conjugate flag 'conjugate' a value of 1 for normal multiplication or -1 for multiplication by conjugated values of input 1." I have tried both, and both give different waveforms and both still have the clicks/pops. I am new to FFTs and DSP in general so is it normal multiplication I need for my scenario or multiplication by conjugated values of input 1? While both give clicks and pops, perhaps this may shed some light on the overall problem with the pops in the signal.


Thanks a lot! biggrin.gif


CODE
// Store 512 incoming audio samples into array
for(UInt32 j = 0; j < 512; j++)
{
  speakerTempBuffer[j] = audioData[j];
}
// Zero-pad the second half of the array
for(int a = 512; a < 1024; a++)
{
  speakerTempBuffer[a] = 0.0;
}
// Convert to Split Double Complex
ctozD((DOUBLE_COMPLEX *) speakerTempBuffer, 2, &speakersSplitDoubleComplex, 1, nOver2);
// FFT
fft_zripD(fft1024Setup, &speakersSplitDoubleComplex, stride, log2n, kFFTDirection_Forward);
// Multiplication of the spectrum with the spectrum of an HRTF impulse response
zvmulD(&speakersSplitDoubleComplex, stride, &HRTF_0Degree0ElevationSplitDoubleComplex, stride, &speakersSplitDoubleComplex, stride, log2n, 1);
// Inverse FFT the result
fft_zripD(fft1024Setup, &speakersSplitDoubleComplex, stride, log2n, kFFTDirection_Inverse);
// Scale the result
vsmulD(speakersSplitDoubleComplex.realp, 1, &scale, speakersSplitDoubleComplex.realp, 1, nOver2);
vsmulD(speakersSplitDoubleComplex.imagp, 1, &scale, speakersSplitDoubleComplex.imagp, 1, nOver2);
// Convert to real array
ztocD(&speakersSplitDoubleComplex, 1, (DOUBLE_COMPLEX *) speakerTempBuffer, 2, nOver2);
// Add over lap
for(int a = 0; a < overlapSize; a++)
{
  speakerTempBuffer[a] += overLap[a];
}
// Update over laps
for(int a = 512; a < 1023; a++)
{
  overLap[a - 512] = speakerTempBuffer[a];
}
// Store result to ouput stream
for(UInt32 j = 0; j < 512; j++)
{
  audioData[j] = (float) speakerTempBuffer[j];
}
Go to the top of the page
+Quote Post
aristotel
post Aug 16 2004, 05:39
Post #6





Group: Members
Posts: 22
Joined: 31-December 03
From: Warwick, UK
Member No.: 10830



YES! Its solved! The problem was the size of the array that the complex vector multiplication was being done for. It was log2n which in this case equals 10, which it should have been nOver2 which equals 512. I am putting the correct code bellow. It should be useful to newbies like myself who learn DSP better by reading code than text and math formulas.

Thanks for your help!!!


CODE
// Store 512 incoming audio samples into array
for(UInt32 j = 0; j < 512; j++)
{
 speakerTempBuffer[j] = audioData[j];
}
// Zero-pad the second half of the array
for(int a = 512; a < 1024; a++)
{
 speakerTempBuffer[a] = 0.0;
}
// Convert to Split Double Complex
ctozD((DOUBLE_COMPLEX *) speakerTempBuffer, 2, &speakersSplitDoubleComplex, 1, nOver2);
// FFT
fft_zripD(fft1024Setup, &speakersSplitDoubleComplex, stride, log2n, kFFTDirection_Forward);
// Multiplication of the spectrum with the spectrum of an HRTF impulse response
zvmulD(&speakersSplitDoubleComplex, stride, &HRTF_0Degree0ElevationSplitDoubleComplex, stride, &speakersSplitDoubleComplex, stride, nOver2, 1);
// Inverse FFT the result
fft_zripD(fft1024Setup, &speakersSplitDoubleComplex, stride, log2n, kFFTDirection_Inverse);
// Scale the result
vsmulD(speakersSplitDoubleComplex.realp, 1, &scale, speakersSplitDoubleComplex.realp, 1, nOver2);
vsmulD(speakersSplitDoubleComplex.imagp, 1, &scale, speakersSplitDoubleComplex.imagp, 1, nOver2);
// Convert to real array
ztocD(&speakersSplitDoubleComplex, 1, (DOUBLE_COMPLEX *) speakerTempBuffer, 2, nOver2);
// Add over lap
for(int a = 0; a < overlapSize; a++)
{
 speakerTempBuffer[a] += overLap[a];
}
// Update over laps
for(int a = 512; a < 1023; a++)
{
 overLap[a - 512] = speakerTempBuffer[a];
}
// Store result to ouput stream
for(UInt32 j = 0; j < 512; j++)
{
 audioData[j] = (float) speakerTempBuffer[j];
}
Go to the top of the page
+Quote Post
SebastianG
post Aug 16 2004, 07:05
Post #7





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



I guess it's correct.

you may want to change the last part to
CODE
[...]
// Convert to real array
ztocD(&speakersSplitDoubleComplex, 1, (DOUBLE_COMPLEX *) speakerTempBuffer, 2, nOver2);

// Store result to ouput stream via opverlap add & update overLap buffer
for(UInt32 j = 0; j < 512; j++)
{
 audioData[j] = (float) (speakerTempBuffer[j] + overLap[j]);
 overLap[j] = speakerTempBuffer[512+j];
}

which saves some assignments.
Make sure that the overLap buffer contains 512 samples.

BTW: With a complex-valued FFT you can convolve a stereo signal (treat is as quadrature signal) with a mono (real-valued) impulse response at once for both channels or convolve a mono (real-valued) signal with a stereo (quadrature) impulse-response for both ears in case of the HRTF stuff. There's no need to do this processing you are doing twice for each channel. smile.gif

bye,
Sebastian

This post has been edited by SebastianG: Aug 16 2004, 07:06
Go to the top of the page
+Quote Post
wkwai
post Aug 16 2004, 10:55
Post #8


MPEG4 AAC developer


Group: Developer
Posts: 398
Joined: 1-June 03
Member No.: 6943



QUOTE (SebastianG @ Aug 15 2004, 06:35 AM)
QUOTE (wkwai @ Aug 14 2004, 11:28 PM)
I think you should try this approach :
[...]


(1) It's called "Hann" window. See Hann and Hamming Window

(2) Your approach is flawed and introduces temporal alias artefacts. It does not take into account that FFT convolution is cyclic. (You need zero padding to compensate for this)

Sebastian
*




Really? What is this temporal alias artefacts ?

But isn't that padding the 2nd half of the FFT input samples with zero is equivalent to the multiplication with a window of a "brickwall" shape / the rectangular window ?

As mentioned.. the window must obey the equation : w(N/2+n) + w(n) = 1 which is also true here.. But this window shape could produce boundary artifacts for the reconstructed audio as it is obvious in cases of DCT used in image and video compression systems..

Windowing the input samples with an "appropriate window function" would spread the boundary blocking artifacts into 2 adjacent frames.. and kept well below the masking properties of the actual signal itself..


wkwai

This post has been edited by wkwai: Aug 16 2004, 11:57
Go to the top of the page
+Quote Post
SebastianG
post Aug 16 2004, 11:55
Post #9





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



QUOTE (wkwai @ Aug 16 2004, 01:55 AM)
Really??? I would like to understand more of the mathematical foundation to your approach...
How does this temporal alias artefacts came from??
*


If you convolve a discrete signal which consists of n consecutive non-zero samples with another signal with k samples you get a signal with n+k-1 consecutive non-zero samples (or less). If you try to compute this via an n-point FFT the last k-1 samples of the convolved signal which should contribute to the next block get wrapped around and create an "alias". That's why I used the term "cyclic convolution". If you want to calculate this via an n-point FFT you need to shorten the blocks you convolve with the impulse response so the convolved blocks are at most n samples long. Your idea only works in case of the impulse response being a scaled unit impulse where k=1.

recap:
b = blocksize / step
n = FFT-Size
k = length of the impulse response in samples
Then this must be satisfied:
b+k-1<=n
The signal blocks and the impulse response have to be zero-padded to n samples.

FFT-Convolution-Example:
In case of n=4 and the two signals to be convolved via the FFT are a=(a_0,a_1,a_2,a_3) and b=(b_0,b_1,b_2,b_3) the "FFT-Convolution" yields c=(c_0,c_1,c_2,c_3) with
c_0 = a_0 * b_0 + a_3 * b_1 + a_2 * b_2 + a_1 * b_3
c_0 = a_1 * b_0 + a_0 * b_1 + a_3 * b_2 + a_2 * b_3
c_0 = a_2 * b_0 + a_1 * b_1 + a_0 * b_2 + a_3 * b_3
c_0 = a_3 * b_0 + a_2 * b_1 + a_1 * b_2 + a_0 * b_3

Notice when b_1>0, a_3 contributes to c_0 which should contribute to c_4 instead in case of a "normal" convolution. This is a cyclic convolution.


HTH,
Sebastian
Go to the top of the page
+Quote Post
SebastianG
post Aug 16 2004, 13:32
Post #10





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



QUOTE (wkwai @ Aug 16 2004, 01:55 AM)
But isn't that padding the 2nd half of the FFT input samples with zero is equivalent to the multiplication with a window of a "brickwall" shape / the rectangular window ?

As mentioned.. the window must obey the equation : w(N/2+n) + w(n) = 1 which is also true here..  But this window shape could produce boundary artifacts for the reconstructed audio as it is obvious in cases of DCT used in image and video compression systems..

Windowing the input samples with an "appropriate window function" would spread the boundary blocking artifacts into 2 adjacent frames.. and kept well below the masking properties of the actual signal itself..
*


Remember that a convolution is linear and time-invariant. It's therefore legal to split the signal into blocks, convolve them seperately and add these convolved blocks to form the new signal.

Try to understand what actually happens if you multiply 2 Fourier spectra and why I always used the term cyclic convolution. You may also want to check Chapter 18 of "DSPGUIDE".

Sebastian
Go to the top of the page
+Quote Post
wkwai
post Aug 17 2004, 09:38
Post #11


MPEG4 AAC developer


Group: Developer
Posts: 398
Joined: 1-June 03
Member No.: 6943



QUOTE (SebastianG @ Aug 16 2004, 02:55 AM)
QUOTE (wkwai @ Aug 16 2004, 01:55 AM)
Really??? I would like to understand more of the mathematical foundation to your approach...
How does this temporal alias artefacts came from??
*


If you convolve a discrete signal which consists of n consecutive non-zero samples with another signal with k samples you get a signal with n+k-1 consecutive non-zero samples (or less). If you try to compute this via an n-point FFT the last k-1 samples of the convolved signal which should contribute to the next block get wrapped around and create an "alias". That's why I used the term "cyclic convolution". If you want to calculate this via an n-point FFT you need to shorten the blocks you convolve with the impulse response so the convolved blocks are at most n samples long. Your idea only works in case of the impulse response being a scaled unit impulse where k=1.

recap:
b = blocksize / step
n = FFT-Size
k = length of the impulse response in samples
Then this must be satisfied:
b+k-1<=n
The signal blocks and the impulse response have to be zero-padded to n samples.

FFT-Convolution-Example:
In case of n=4 and the two signals to be convolved via the FFT are a=(a_0,a_1,a_2,a_3) and b=(b_0,b_1,b_2,b_3) the "FFT-Convolution" yields c=(c_0,c_1,c_2,c_3) with
c_0 = a_0 * b_0 + a_3 * b_1 + a_2 * b_2 + a_1 * b_3
c_0 = a_1 * b_0 + a_0 * b_1 + a_3 * b_2 + a_2 * b_3
c_0 = a_2 * b_0 + a_1 * b_1 + a_0 * b_2 + a_3 * b_3
c_0 = a_3 * b_0 + a_2 * b_1 + a_1 * b_2 + a_0 * b_3

Notice when b_1>0, a_3 contributes to c_0 which should contribute to c_4 instead in case of a "normal" convolution. This is a cyclic convolution.


HTH,
Sebastian
*




Are you suggesting that if I input an N samples into a FFT and then the output of the FFT back into a IFFT, I don't get back the original N samples ??? assuming that the normalization factors are properly handled and truncation errors kept to minimum.. blink.gif

I will try to write a simple program to verify this.. Thanks for the "enlightenment".. smile.gif

wkwai

This post has been edited by wkwai: Aug 17 2004, 09:39
Go to the top of the page
+Quote Post
SebastianG
post Aug 17 2004, 11:28
Post #12





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



QUOTE (wkwai @ Aug 17 2004, 12:38 AM)
Are you suggesting that if I input an N samples into a FFT and then the output of the FFT back into a IFFT, I don't get back the original N samples ???  assuming that the normalization factors are properly handled and truncation errors kept to minimum..  blink.gif
*


No.

Let a,b \in \mathbb{C}^n
CODE
                         | a_1   a_n   a_{n-1} .. a_2 |  | b_1 |
                         | a_2   a_1     a_n   .. a_3 |  | b_2 |
iFFT(FFT(a) .* FFT(b)) = | a_3   a_2     a_1   .. a_4 |  | b_3 |
                         |  :     :       :    ..  :  |  |  :  |
                         | a_n a_{n-1} a_{n-2} .. a_1 |  | b_n |

where ".*" denotes the component-wise product. To avoid those nasty wrap-arounds 'a' and 'b' have to be zero-padded. Be 't' the greatest index with a_t!=0 and 'k' the greatest index so that b_k!=0. If k+t-1<=n there won't be any wrap-arounds.

I don't know how to explain this any clearer. Keep in mind that the Fourier transform gives you a representation of a periodic signal.

For further studies I suggest reading some DSPGUIDE chapters.

Sebastian

This post has been edited by SebastianG: Aug 17 2004, 11:29
Go to the top of the page
+Quote Post
QuantumKnot
post Aug 18 2004, 07:28
Post #13





Group: Developer
Posts: 1245
Joined: 16-December 02
From: Australia
Member No.: 4097



QUOTE (wkwai @ Aug 17 2004, 06:38 PM)
Are you suggesting that if I input an N samples into a FFT and then the output of the FFT back into a IFFT, I don't get back the original N samples ???  assuming that the normalization factors are properly handled and truncation errors kept to minimum..  blink.gif
*


I believe the simple interpretation of "linear convolution in the time domain = multiplication in the frequency domain" only applies to continuous time/frequency signals, where you don't have sampling occuring and things wrapping around because of it. For the case of the FFT, which is a transform based on discrete time, the equivalent of multiplication in frequency is circular/cyclic convolution in the time domain, rather than your typical linear convolution. ie. FFT-based convolution is the same as convolving your discrete time signal wrapped around itself, which gives you a periodic signal. Thus you need to zero pad it enough to increase the period of the signal in avoid the effects of circular convolution ie. the effect of terms that have wrapped around

In subband image coding, we usually don't use zero padding as the discontinuity can cause edge distortions, but opt for something smoother like symmetric extension instead. smile.gif

Here's a matlab example:

CODE
a=[1 6 2 3 4 9];
b=[7 2 3];

>> conv(a,b)

ans =

    7    44    29    43    40    80    30    27


Now that's linear convolution. Doing a straight multiplication using FFT gives:

CODE
>> c=ifft(fft(a).*fft(b,6))

c =

   37    71    29    43    40    80


which isnt the same.

Now we do some zero padding. Since the final output is 6+3-1=8 samples, I zero pad them out to 8:

CODE
>> d=[a zeros(1,2)];
>> e=[b zeros(1,5)];

f=ifft(fft(e).*fft(d))

f =

7.0000   44.0000   29.0000   43.0000   40.0000   80.0000   30.0000   27.0000


Now we have our linear convolution result smile.gif

This post has been edited by QuantumKnot: Aug 18 2004, 07:57
Go to the top of the page
+Quote Post
wkwai
post Aug 19 2004, 08:53
Post #14


MPEG4 AAC developer


Group: Developer
Posts: 398
Joined: 1-June 03
Member No.: 6943



Yeah.. I think I understand the problem here.. We are dealing with short time convolution here.. Not continous time convolution.. As a result, the extra k-1 samples need to be regenerated from the IMDCT.. smile.gif

I did some studying on the short time convolution.. and verified that the k-1 extra samples belong to the next frame.. which you need to add to first few samples of the next frame..

In my ealier equations, I assumed that convolution is just a straight forward multiplication in the freq domain.. I failed to take into consideration of the "boundary conditions" needed to perfectly reconstruct the audio signal..

My appologies.. biggrin.gif

But I have another very troubling problem.. Supposed that I had an FFT spectral of N/2 spectrals components.. Then I attempt to zeroed out the upper coefficient to "band-limit" the spectral.. According to this insight provided from you folks.. this is WRONG!!! blink.gif

What about the CoolEdit FFT filter option where one could just generate "ANY" filter frequency response shape as he likes?? I wondered how is it implemented?
Did they actually reconvert this response drawn by any user into impulse response.. by an IFFT and then determined the input samples length to the FFT ?? blink.gif

I wondered too if the actual CoolEdit FFT filter response is just an approximation of the one drawn by the user?? blink.gif

In Twin VQ / VQF, there is also some very troubling problem.. because, Mr. Arigatou of NTT ohmy.gif who developed the coding technique assumed that the MDCT spectral shape is identical to the FFT spectral shape.. So, the estimated LPC spectral shape is used to flattened the MDCT spectral shape which is not correct.. mathematically.. since the lpc calculation is calculated in the TIME DOMAIN and the MDCT decomposition is actually a critically downsampled subband filterbank !!

Why shouldn't the convolution be done in time domain instead of trying to transform everything into freq domain.. and then having to calculate the square 2 root of the LPC estimated spectral shape ?? It is an unnecessary algorithmic complexity.. plus the fact that it is mathematically incorrect.. blink.gif



wkwai

This post has been edited by wkwai: Aug 19 2004, 09:31
Go to the top of the page
+Quote Post
SebastianG
post Aug 19 2004, 09:56
Post #15





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



QUOTE (wkwai @ Aug 18 2004, 11:53 PM)
But I have another very troubling problem.. Supposed that I had an FFT spectral of N/2 spectrals components.. Then I attempt to zeroed out the upper coefficient to "band-limit" the spectral..  According to this insight provided from you folks.. this is WRONG!!! blink.gif

What about the CoolEdit FFT filter option where one could just generate "ANY" filter frequency response shape as he likes??  I wondered how is it implemented?
Did they actually reconvert this response drawn by any user into impulse response.. by an IFFT and then determined the input samples length to the FFT ??  blink.gif
*

I don't know for sure, but I guess Cool Edit generates an impulse response of the desired filter's frequency response via iFFT + windowing and uses it for fast convolution via FFT the way I described above.

QUOTE (wkwai @ Aug 18 2004, 11:53 PM)
I wondered too if the actual CoolEdit FFT filter response is just an approximation of the one drawn by the user??  blink.gif
*

Yes, but a close one. It depends on your desired frequency response and FFT size settings. If you want a very close approximation of a brickwall lowpass you need the impulse response to be vera large.

QUOTE (wkwai @ Aug 18 2004, 11:53 PM)
In Twin VQ / VQF, there is also some very troubling problem.. because, Mr. Arigatou of NTT  ohmy.gif who developed the coding technique assumed that the MDCT spectral shape is identical to the FFT spectral shape.. So, the estimated LPC spectral shape is used to flattened the MDCT spectral shape which is not correct.. mathematically.. since the lpc calculation is calculated in the TIME DOMAIN and the MDCT decomposition is actually a critically downsampled subband filterbank !!

Why shouldn't the convolution be done in time domain instead of trying to transform everything into freq domain.. and then having to calculate the square 2 root of the LPC estimated spectral shape ?? It is an unnecessary algorithmic complexity..  plus the fact that it is mathematically incorrect.. blink.gif 
*

Well, the thing is: You don't convert it back to time-domain after frequency domain filtering, so it's totally reversible. The actual signal itself isn't altered after encoding and decoding. But the introduced noise get's "filtered" through the LPC synthesis filter which is done in the MDCT domain and introduces an temporal alias. But it does not matter because it's only noise.

Sebastian
Go to the top of the page
+Quote Post
wkwai
post Aug 19 2004, 10:25
Post #16


MPEG4 AAC developer


Group: Developer
Posts: 398
Joined: 1-June 03
Member No.: 6943



*

[/quote]
Well, the thing is: You don't convert it back to time-domain after frequency domain filtering, so it's totally reversible. The actual signal itself isn't altered after encoding and decoding. But the introduced noise get's "filtered" through the LPC synthesis filter which is done in the MDCT domain and introduces an temporal alias. But it does not matter because it's only noise.

Sebastian
*

[/quote]


Sure it is not a problem at the encoder-decoder pair.. but at the encoder itself.. It is assuming that the mdct spectral shape is FFT spectral shape !!! Which is mathematically wrong.. Would have been better to convolute in time domain first before transforming it with the MDCT.. ??? That would make something like a GSM speech lpc analysis.. time-domain..

wkwai
Go to the top of the page
+Quote Post
SebastianG
post Aug 19 2004, 10:56
Post #17





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



QUOTE (wkwai @ Aug 19 2004, 01:25 AM)
Sure it is not a problem at the encoder-decoder pair.. but at the encoder itself.. It is assuming that the mdct spectral shape is FFT spectral shape !!! Which is mathematically wrong.. Would have been better to convolute in time domain first before transforming it with the MDCT.. ???  That would make something like a GSM speech lpc analysis.. time-domain..
*


The MDCT is related in the same way as the FFT to the LPC analysis/synthesis.

You could have done it in the time domain. But in this case it would have been very expensive because of the high orders of the filters. In fact you need higher orders if you want to do this in the time-domain since you cannot use the logarithmic / bark frequency scale like it's done in VQF / Vorbis (with floor 0).


Sebastian
Go to the top of the page
+Quote Post
QuantumKnot
post Aug 21 2004, 02:49
Post #18





Group: Developer
Posts: 1245
Joined: 16-December 02
From: Australia
Member No.: 4097



QUOTE (wkwai @ Aug 19 2004, 05:53 PM)
But I have another very troubling problem.. Supposed that I had an FFT spectral of N/2 spectrals components.. Then I attempt to zeroed out the upper coefficient to "band-limit" the spectral..  According to this insight provided from you folks.. this is WRONG!!! blink.gif

What about the CoolEdit FFT filter option where one could just generate "ANY" filter frequency response shape as he likes??  I wondered how is it implemented?
Did they actually reconvert this response drawn by any user into impulse response.. by an IFFT and then determined the input samples length to the FFT ??  blink.gif

I wondered too if the actual CoolEdit FFT filter response is just an approximation of the one drawn by the user??  blink.gif
*


I also suspect that is what the CoolEdit FFT filter option does. It can use either the window method or frequency-sampling method of FIR filter design.
Go to the top of the page
+Quote Post
wkwai
post Aug 21 2004, 06:54
Post #19


MPEG4 AAC developer


Group: Developer
Posts: 398
Joined: 1-June 03
Member No.: 6943



QUOTE (SebastianG @ Aug 19 2004, 01:56 AM)
QUOTE (wkwai @ Aug 19 2004, 01:25 AM)
Sure it is not a problem at the encoder-decoder pair.. but at the encoder itself.. It is assuming that the mdct spectral shape is FFT spectral shape !!! Which is mathematically wrong.. Would have been better to convolute in time domain first before transforming it with the MDCT.. ???  That would make something like a GSM speech lpc analysis.. time-domain..
*


The MDCT is related in the same way as the FFT to the LPC analysis/synthesis.

You could have done it in the time domain. But in this case it would have been very expensive because of the high orders of the filters. In fact you need higher orders if you want to do this in the time-domain since you cannot use the logarithmic / bark frequency scale like it's done in VQF / Vorbis (with floor 0).


Sebastian
*



Perhaps.. it is in their (Japanese) Engineering mindset.. Engineering is both an art form as well as Science ?? I know that in some Engineering cultures.. Engineering students are taught to design like an "artist".. which is unfortunately.. I am NOT trained in.. which is also why I DON'T do things like GUI designs.. wink.gif
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 27th December 2014 - 20:22