Reducing "blocking" effect in FFT processing 
Reducing "blocking" effect in FFT processing 
Mar 20 2003, 11:02
Post
#1


ReplayGain developer Group: Developer Posts: 5148 Joined: 5November 01 From: Yorkshire, UK Member No.: 409 
It's possible to implement a filter in the frequency domain using an FFT. Instead of convolving the impulse response in the timedomain, you multiply the frequency response in the frequency domain. It's relatively basic stuff, and is in most audio textbooks. The advantage is speed: for long impulse responses, this approach can be, say, ten times faster than direct convolution.
Things get more interesting when you change the "filter" over time. I put the word "filter" in quotes because a similar technique is used in various processes that are not simple "filtering". I'm referring to anything where you fft, change the coefficients, then inverse fft. This could be an adaptive filter, a lossy audio codec, a noise reduction algorithm etc etc. An audible problem with this approach is that the FFT blocks become audible. You can window and overlap the blocks, and change their size, but this doesn't always make the problem go away. What problem you hear depends on what you're doing to the fft coefficients, but usually, without care, you hear artefacts related to the FFT block size. Either the temporal resolution of the signal or artefacts start to match the FFT length, or frequency domain artefacts appear which relate to the FFT frequency bins. My question is: Are there standard (or at least, "known") techniques for reducing the audibility of the FFT block structure in such a process? I realise in lossy audio coding, the more accurately the coefficients are stored, the less of a problem this is. When the coefficients are stored innaccurately, you get (generally) preecho or warbling. But the coefficients are always stored innaccurately  it's a lossy codec! So, what's done "right" to prevent this problem? Is it just careful choice of coefficient rounding to match psychoacoustics, or are there other techniques? A more pertinent example is Noise Reduction, because the coefficients are intentionally changed. In Cool Edit Pro, the NR feature often leaves "tinkly" or "bubbley" or "bleepy" results. These results can be changed by adjusting the window size, transition width, amount of nosie to be removed etc etc etc  but generally, it's quite easy to get bad results! (You can get good results too  I'm not trying to make a point against CEP). In the Sonic Foundry NR2 DirectX plugin, especially using "mode 2", it's impossible to get these kinds of artefacts. (You can still remove too much noise and make it sound bad, but you only hear that you've removed some signal as well as noise, you don't hear artefacts due to the actual working of the algorithm). It's still using an adaptive FFT noise reduction algorithm, but it's doing something dramatically different which hides most of the problems. What? The specific problem which I've never solved, and which is related (but different!) is when you don't even start with a timedomain signal; you generate a signal in the frequency domain (I was using spectrally shaped noise  specific amplitude, random phase), and IFFT it to get the time domain signal. You have to overlap the results (because you get clicks at block boundaries otherwise  inaudible for white noise, useless for coloured noise!), but during the overlap some of the noise cancels, so the noise envelope is modulated by the FFT block length. When I was trying this six (!) years ago, I didn't even get the noise spectrum I hoped for, but I think that may have been to do with timedomain aliasing due to a wildley variying spectrum and a short FFT length. However, even fixing this, the noise will average during the windowing, giving amplitude modulation at the rate of the FFT length  how could this be avoided? I'm sure there are other examples, but I know too little about them. From what I've read and guessed, there may be different techniques for reducing the FFT block audiblity depending on exactly what you're doing. I raise the last example of a specific task where I've hit the problem myself (but it was a long time ago!)  however, I'm most interested in the previous two examples, and the problem in general. Does anyone have any techniques or suggestions that they can share, or any papers they can point me to? Cheers, David. 


Mar 20 2003, 12:43
Post
#2


LAME developer Group: Developer Posts: 2950 Joined: 1October 01 From: Nanterre, France Member No.: 138 
It seems to me that all you need is to apply a shape to your window. Basically, the pure fft has a rectangular window shape.
See: http://www.daqarta.com/ww00wndo.htm Window shapes allow you to reduce the block artifacts on processed fft data. You could also have used an overlapping transform like mdct. A related question is how to remove block artifacts an fftprocessed audio. For a picture, I immediatly have some ideas in mind about how to reduce the blocking artifacts, but for audio it seems to me that the tools used for pictures would not be usefull. 


Mar 20 2003, 13:28
Post
#3


ReplayGain developer Group: Developer Posts: 5148 Joined: 5November 01 From: Yorkshire, UK Member No.: 409 
I have always used a hanning window with 50% overlap (someone must have given me good advice early on  thank you Martin Reed!), but that website is a mine of useful information  thanks for the link! I'll read up on MDCT too.
I think my original question is still unanswered: given correct windowing and overlap, how do some processes make the presence of fft/ifft obviously audible, while others seem to hide it? 


Mar 20 2003, 14:12
Post
#4


Neutrino GRSA developer Group: Developer Posts: 852 Joined: 8May 02 From: Geneva Member No.: 2002 
2Bdecided, does the NR2 plugin in "mode 2" really use plain windowed FFT ? does it take longer to process your signal, when compared with CoolEdit's function (which is known to use FFT with windowing) ?
If it takes a whole lot longer, then it probably works in the time domain. Otherwise, it's still possible that they use some kind of MDCT. At least for audio compression, MDCT is much more useful than plain DCT  if I remember correctly, with the MDCT the artifacts are spreaded into the whole signal  not just at block boundaries. [Edit] And if the coefficients are not changed, with a 50% overlap the aliasing errors (due to block boundaries) will cancel each other out perfectly. [/Edit] As Gabriel suggested: have you tried another window shape by any chance ? [Edit] Have a look there: http://www.stanford.edu/~nouk/mdct/mdct/ [/Edit] This post has been edited by NumLOCK: Mar 20 2003, 14:24  Try Leeloo Chat at http://leeloo.webhop.net



Mar 20 2003, 15:05
Post
#5


ReplayGain developer Group: Developer Posts: 5148 Joined: 5November 01 From: Yorkshire, UK Member No.: 409 
QUOTE (NumLOCK @ Mar 20 2003  01:12 PM) 2Bdecided, does the NR2 plugin in "mode 2" really use plain windowed FFT ? does it take longer to process your signal, when compared with CoolEdit's function (which is known to use FFT with windowing) ? I don't have NR2 with me  I'll try and check  it'll have to be between stripping some wallpaper and rewiring our bedroom though! It would be strange if it was working in the time domain somehow, because it still needs a spectral fingerprint, but maybe there are other algorithms which I know nothing of. QUOTE As Gabriel suggested: have you tried another window shape by any chance ? No, I never have. I might dig up my old noise generation project and see what happens with different windows, but I'm more interested in people's experience with audio coding and noise reduction. Thanks for the excellent link  that's two already in two replies! btw, google sent me to Mike's tooLame pages which mention http://mikecheng.d2.net.au/mdct/ QUOTE The MDCT is a lapped orthogonal transform used in signal processing. More traditional block transforms for signal processing (such as the DCT or FFT) suffer from blocking artefacts arising because of the independent processing of each block [2]. Lapped transforms, however, have a 50% overlap between successive blocks which results in much reduced artefacing. So I'll have to play with these I think. And figure out what the coefficients "mean"  I like intuition. Cheers, David. P.S. unrelated interesting links (if you can follow this thread, you know their contents already!): a final year project to make an audio coder using MATLAB: http://is.rice.edu/~welsh/elec431/index.html (not quite mpc!) Summary of audio coding: http://www.tml.hut.fi/Opinnot/Tik111.590/...02/chapter4.pdf (I like the final comment) 


Mar 20 2003, 16:21
Post
#6


LAME developer Group: Developer Posts: 2950 Joined: 1October 01 From: Nanterre, France Member No.: 138 
I you really want to go deep inside overlapped transforms, there is "Signal Processing with Lapped Transforms" by H Malvar.



Jun 15 2003, 09:27
Post
#7


Group: Members Posts: 19 Joined: 15June 03 From: Stockholm Member No.: 7192 
QUOTE (2Bdecided @ Mar 20 2003  02:02 AM) It's possible to implement a filter in the frequency domain using an FFT. Instead of convolving the impulse response in the timedomain, you multiply the frequency response in the frequency domain. It's relatively basic stuff, and is in most audio textbooks. The advantage is speed: for long impulse responses, this approach can be, say, ten times faster than direct convolution. Things get more interesting when you change the "filter" over time. I put the word "filter" in quotes because a similar technique is used in various processes that are not simple "filtering". I'm referring to anything where you fft, change the coefficients, then inverse fft. This could be an adaptive filter, a lossy audio codec, a noise reduction algorithm etc etc. An audible problem with this approach is that the FFT blocks become audible. You can window and overlap the blocks, and change their size, but this doesn't always make the problem go away. What problem you hear depends on what you're doing to the fft coefficients, but usually, without care, you hear artefacts related to the FFT block size. Either the temporal resolution of the signal or artefacts start to match the FFT length, or frequency domain artefacts appear which relate to the FFT frequency bins. My question is: Are there standard (or at least, "known") techniques for reducing the audibility of the FFT block structure in such a process? I realise in lossy audio coding, the more accurately the coefficients are stored, the less of a problem this is. When the coefficients are stored innaccurately, you get (generally) preecho or warbling. But the coefficients are always stored innaccurately  it's a lossy codec! So, what's done "right" to prevent this problem? Is it just careful choice of coefficient rounding to match psychoacoustics, or are there other techniques? A more pertinent example is Noise Reduction, because the coefficients are intentionally changed. In Cool Edit Pro, the NR feature often leaves "tinkly" or "bubbley" or "bleepy" results. These results can be changed by adjusting the window size, transition width, amount of nosie to be removed etc etc etc  but generally, it's quite easy to get bad results! (You can get good results too  I'm not trying to make a point against CEP). In the Sonic Foundry NR2 DirectX plugin, especially using "mode 2", it's impossible to get these kinds of artefacts. (You can still remove too much noise and make it sound bad, but you only hear that you've removed some signal as well as noise, you don't hear artefacts due to the actual working of the algorithm). It's still using an adaptive FFT noise reduction algorithm, but it's doing something dramatically different which hides most of the problems. What? The specific problem which I've never solved, and which is related (but different!) is when you don't even start with a timedomain signal; you generate a signal in the frequency domain (I was using spectrally shaped noise  specific amplitude, random phase), and IFFT it to get the time domain signal. You have to overlap the results (because you get clicks at block boundaries otherwise  inaudible for white noise, useless for coloured noise!), but during the overlap some of the noise cancels, so the noise envelope is modulated by the FFT block length. When I was trying this six (!) years ago, I didn't even get the noise spectrum I hoped for, but I think that may have been to do with timedomain aliasing due to a wildley variying spectrum and a short FFT length. However, even fixing this, the noise will average during the windowing, giving amplitude modulation at the rate of the FFT length  how could this be avoided? I'm sure there are other examples, but I know too little about them. From what I've read and guessed, there may be different techniques for reducing the FFT block audiblity depending on exactly what you're doing. I raise the last example of a specific task where I've hit the problem myself (but it was a long time ago!)  however, I'm most interested in the previous two examples, and the problem in general. Does anyone have any techniques or suggestions that they can share, or any papers they can point me to? Cheers, David. It is possible. You can deduce a formula for FFT which makes it exactly equivalient to tho convolution made in the time domain, eliminating the block borders. Doing an FFT and then manipulating the FFTdata followed by inverse transforming is NOT equvalent to convolution. It's equivalent tiosomething called "cricular convolution" if I'm not wrong. I don't remember the details, but the trick is to insert zeros in the timesignal and using some kind of overlapping in the FFTdomain. It's described in Malvar's "Signal Processing for Lapped Transforms" /Pontus 


Jul 8 2003, 17:38
Post
#8


Group: Developer Posts: 200 Joined: 8July 03 Member No.: 7653 
Those of you interested in using lapped fft for convolution should check out the most excellent BruteFir package.
I believe that Anders' latest code in CVS for AlmusVCU has support for realtime changeable EQs, but I havn't played with that, so I' m not sure how it's works or how it's implimented. He may or may not be taking measures to reduce artifacting as the fidelity during eq change would likely not be considered critical in his applications. If someone would like to impliment artifact free real time filter modification in BruteFIR, I'd appricate it... As I've been doing a little research in dealing with nonlinear distortion in loudspeaker systems by switching between filters based on the average power level. This post has been edited by NullC: Jul 8 2003, 17:40 


LoFi Version  Time is now: 1st October 2014  17:38 