IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Reducing "blocking" effect in FFT processing
2Bdecided
post Mar 20 2003, 11:02
Post #1


ReplayGain developer


Group: Developer
Posts: 5250
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



It's possible to implement a filter in the frequency domain using an FFT. Instead of convolving the impulse response in the time-domain, you multiply the frequency response in the frequency domain. It's relatively basic stuff, and is in most audio textbooks. The advantage is speed: for long impulse responses, this approach can be, say, ten times faster than direct convolution.


Things get more interesting when you change the "filter" over time. I put the word "filter" in quotes because a similar technique is used in various processes that are not simple "filtering". I'm referring to anything where you fft, change the coefficients, then inverse fft. This could be an adaptive filter, a lossy audio codec, a noise reduction algorithm etc etc.


An audible problem with this approach is that the FFT blocks become audible. You can window and overlap the blocks, and change their size, but this doesn't always make the problem go away. What problem you hear depends on what you're doing to the fft coefficients, but usually, without care, you hear artefacts related to the FFT block size. Either the temporal resolution of the signal or artefacts start to match the FFT length, or frequency domain artefacts appear which relate to the FFT frequency bins.


My question is: Are there standard (or at least, "known") techniques for reducing the audibility of the FFT block structure in such a process?


I realise in lossy audio coding, the more accurately the coefficients are stored, the less of a problem this is. When the coefficients are stored innaccurately, you get (generally) pre-echo or warbling. But the coefficients are always stored innaccurately - it's a lossy codec! So, what's done "right" to prevent this problem? Is it just careful choice of coefficient rounding to match psychoacoustics, or are there other techniques?


A more pertinent example is Noise Reduction, because the coefficients are intentionally changed. In Cool Edit Pro, the NR feature often leaves "tinkly" or "bubbley" or "bleepy" results. These results can be changed by adjusting the window size, transition width, amount of nosie to be removed etc etc etc - but generally, it's quite easy to get bad results! (You can get good results too - I'm not trying to make a point against CEP). In the Sonic Foundry NR-2 DirectX plug-in, especially using "mode 2", it's impossible to get these kinds of artefacts. (You can still remove too much noise and make it sound bad, but you only hear that you've removed some signal as well as noise, you don't hear artefacts due to the actual working of the algorithm). It's still using an adaptive FFT noise reduction algorithm, but it's doing something dramatically different which hides most of the problems. What?


The specific problem which I've never solved, and which is related (but different!) is when you don't even start with a time-domain signal; you generate a signal in the frequency domain (I was using spectrally shaped noise - specific amplitude, random phase), and IFFT it to get the time domain signal. You have to overlap the results (because you get clicks at block boundaries otherwise - inaudible for white noise, useless for coloured noise!), but during the overlap some of the noise cancels, so the noise envelope is modulated by the FFT block length. When I was trying this six (!) years ago, I didn't even get the noise spectrum I hoped for, but I think that may have been to do with time-domain aliasing due to a wildley variying spectrum and a short FFT length. However, even fixing this, the noise will average during the windowing, giving amplitude modulation at the rate of the FFT length - how could this be avoided?


I'm sure there are other examples, but I know too little about them. From what I've read and guessed, there may be different techniques for reducing the FFT block audiblity depending on exactly what you're doing. I raise the last example of a specific task where I've hit the problem myself (but it was a long time ago!) - however, I'm most interested in the previous two examples, and the problem in general.


Does anyone have any techniques or suggestions that they can share, or any papers they can point me to?


Cheers,
David.
Go to the top of the page
+Quote Post
Gabriel
post Mar 20 2003, 12:43
Post #2


LAME developer


Group: Developer
Posts: 2950
Joined: 1-October 01
From: Nanterre, France
Member No.: 138



It seems to me that all you need is to apply a shape to your window. Basically, the pure fft has a rectangular window shape.
See:
http://www.daqarta.com/ww00wndo.htm

Window shapes allow you to reduce the block artifacts on processed fft data. You could also have used an overlapping transform like mdct.

A related question is how to remove block artifacts an fft-processed audio. For a picture, I immediatly have some ideas in mind about how to reduce the blocking artifacts, but for audio it seems to me that the tools used for pictures would not be usefull.
Go to the top of the page
+Quote Post
2Bdecided
post Mar 20 2003, 13:28
Post #3


ReplayGain developer


Group: Developer
Posts: 5250
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



I have always used a hanning window with 50% overlap (someone must have given me good advice early on - thank you Martin Reed!), but that website is a mine of useful information - thanks for the link! I'll read up on MDCT too.

I think my original question is still unanswered: given correct windowing and overlap, how do some processes make the presence of fft/ifft obviously audible, while others seem to hide it?
Go to the top of the page
+Quote Post
NumLOCK
post Mar 20 2003, 14:12
Post #4


Neutrino G-RSA developer


Group: Developer
Posts: 852
Joined: 8-May 02
From: Geneva
Member No.: 2002



2Bdecided, does the NR-2 plugin in "mode 2" really use plain windowed FFT ? does it take longer to process your signal, when compared with CoolEdit's function (which is known to use FFT with windowing) ?

If it takes a whole lot longer, then it probably works in the time domain. Otherwise, it's still possible that they use some kind of MDCT. At least for audio compression, MDCT is much more useful than plain DCT - if I remember correctly, with the MDCT the artifacts are spreaded into the whole signal - not just at block boundaries. [Edit] And if the coefficients are not changed, with a 50% overlap the aliasing errors (due to block boundaries) will cancel each other out perfectly. [/Edit]

As Gabriel suggested: have you tried another window shape by any chance ?

[Edit] Have a look there: http://www.stanford.edu/~nouk/mdct/mdct/ [/Edit]

This post has been edited by NumLOCK: Mar 20 2003, 14:24


--------------------
Try Leeloo Chat at http://leeloo.webhop.net
Go to the top of the page
+Quote Post
2Bdecided
post Mar 20 2003, 15:05
Post #5


ReplayGain developer


Group: Developer
Posts: 5250
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (NumLOCK @ Mar 20 2003 - 01:12 PM)
2Bdecided, does the NR-2 plugin in "mode 2" really use plain windowed FFT ? does it take longer to process your signal, when compared with CoolEdit's function (which is known to use FFT with windowing) ?

I don't have NR-2 with me - I'll try and check - it'll have to be between stripping some wallpaper and re-wiring our bedroom though! It would be strange if it was working in the time domain somehow, because it still needs a spectral fingerprint, but maybe there are other algorithms which I know nothing of.


QUOTE
As Gabriel suggested: have you tried another window shape by any chance ?


No, I never have. I might dig up my old noise generation project and see what happens with different windows, but I'm more interested in people's experience with audio coding and noise reduction.


Thanks for the excellent link - that's two already in two replies!


btw, google sent me to Mike's tooLame pages which mention

http://mikecheng.d2.net.au/mdct/

QUOTE
The MDCT is a lapped orthogonal transform used in signal processing.

More traditional block transforms for signal processing (such as the
DCT or FFT) suffer from blocking artefacts arising because of the
independent processing of each block [2]. Lapped transforms, however,
have a 50% overlap between successive blocks which results in much
reduced artefacing.


So I'll have to play with these I think. And figure out what the coefficients "mean" - I like intuition.

Cheers,
David.


P.S. unrelated interesting links (if you can follow this thread, you know their contents already!):

a final year project to make an audio coder using MATLAB:
http://is.rice.edu/~welsh/elec431/index.html
(not quite mpc!)

Summary of audio coding:
http://www.tml.hut.fi/Opinnot/Tik-111.590/...02/chapter4.pdf
(I like the final comment)
Go to the top of the page
+Quote Post
Gabriel
post Mar 20 2003, 16:21
Post #6


LAME developer


Group: Developer
Posts: 2950
Joined: 1-October 01
From: Nanterre, France
Member No.: 138



I you really want to go deep inside overlapped transforms, there is "Signal Processing with Lapped Transforms" by H Malvar.
Go to the top of the page
+Quote Post
squid
post Jun 15 2003, 09:27
Post #7





Group: Members
Posts: 19
Joined: 15-June 03
From: Stockholm
Member No.: 7192



QUOTE (2Bdecided @ Mar 20 2003 - 02:02 AM)
It's possible to implement a filter in the frequency domain using an FFT. Instead of convolving the impulse response in the time-domain, you multiply the frequency response in the frequency domain. It's relatively basic stuff, and is in most audio textbooks. The advantage is speed: for long impulse responses, this approach can be, say, ten times faster than direct convolution.


Things get more interesting when you change the "filter" over time. I put the word "filter" in quotes because a similar technique is used in various processes that are not simple "filtering". I'm referring to anything where you fft, change the coefficients, then inverse fft. This could be an adaptive filter, a lossy audio codec, a noise reduction algorithm etc etc.


An audible problem with this approach is that the FFT blocks become audible. You can window and overlap the blocks, and change their size, but this doesn't always make the problem go away. What problem you hear depends on what you're doing to the fft coefficients, but usually, without care, you hear artefacts related to the FFT block size. Either the temporal resolution of the signal or artefacts start to match the FFT length, or frequency domain artefacts appear which relate to the FFT frequency bins.


My question is: Are there standard (or at least, "known") techniques for reducing the audibility of the FFT block structure in such a process?


I realise in lossy audio coding, the more accurately the coefficients are stored, the less of a problem this is. When the coefficients are stored innaccurately, you get (generally) pre-echo or warbling. But the coefficients are always stored innaccurately - it's a lossy codec! So, what's done "right" to prevent this problem? Is it just careful choice of coefficient rounding to match psychoacoustics, or are there other techniques?


A more pertinent example is Noise Reduction, because the coefficients are intentionally changed. In Cool Edit Pro, the NR feature often leaves "tinkly" or "bubbley" or "bleepy" results. These results can be changed by adjusting the window size, transition width, amount of nosie to be removed etc etc etc - but generally, it's quite easy to get bad results! (You can get good results too - I'm not trying to make a point against CEP). In the Sonic Foundry NR-2 DirectX plug-in, especially using "mode 2", it's impossible to get these kinds of artefacts. (You can still remove too much noise and make it sound bad, but you only hear that you've removed some signal as well as noise, you don't hear artefacts due to the actual working of the algorithm). It's still using an adaptive FFT noise reduction algorithm, but it's doing something dramatically different which hides most of the problems. What?


The specific problem which I've never solved, and which is related (but different!) is when you don't even start with a time-domain signal; you generate a signal in the frequency domain (I was using spectrally shaped noise - specific amplitude, random phase), and IFFT it to get the time domain signal. You have to overlap the results (because you get clicks at block boundaries otherwise - inaudible for white noise, useless for coloured noise!), but during the overlap some of the noise cancels, so the noise envelope is modulated by the FFT block length. When I was trying this six (!) years ago, I didn't even get the noise spectrum I hoped for, but I think that may have been to do with time-domain aliasing due to a wildley variying spectrum and a short FFT length. However, even fixing this, the noise will average during the windowing, giving amplitude modulation at the rate of the FFT length - how could this be avoided?


I'm sure there are other examples, but I know too little about them. From what I've read and guessed, there may be different techniques for reducing the FFT block audiblity depending on exactly what you're doing. I raise the last example of a specific task where I've hit the problem myself (but it was a long time ago!) - however, I'm most interested in the previous two examples, and the problem in general.


Does anyone have any techniques or suggestions that they can share, or any papers they can point me to?


Cheers,
David.

It is possible. You can deduce a formula for FFT which makes it exactly equivalient to tho convolution made in the time domain, eliminating the block borders. Doing an FFT and then manipulating the FFT-data followed by inverse transforming is NOT equvalent to convolution. It's equivalent tiosomething called "cricular convolution" if I'm not wrong. I don't remember the details, but the trick is to insert zeros in the time-signal and using some kind of overlapping in the FFT-domain. It's described in Malvar's "Signal Processing for Lapped Transforms"

/Pontus
Go to the top of the page
+Quote Post
NullC
post Jul 8 2003, 17:38
Post #8





Group: Developer
Posts: 200
Joined: 8-July 03
Member No.: 7653



Those of you interested in using lapped fft for convolution should check out the most excellent BruteFir package.

I believe that Anders' latest code in CVS for AlmusVCU has support for realtime changeable EQs, but I havn't played with that, so I' m not sure how it's works or how it's implimented. He may or may not be taking measures to reduce artifacting as the fidelity during eq change would likely not be considered critical in his applications.

If someone would like to impliment artifact free real time filter modification in BruteFIR, I'd appricate it... As I've been doing a little research in dealing with non-linear distortion in loudspeaker systems by switching between filters based on the average power level.

This post has been edited by NullC: Jul 8 2003, 17:40
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 21st November 2014 - 03:49