FFT Analysis for Dummies 
FFT Analysis for Dummies 
Mar 27 2010, 21:41
Post
#1


Group: Members Posts: 248 Joined: 12May 09 From: New Milford, CT Member No.: 69730 
Folks,
I'd like to learn more about FFTs. I'm not a math guy, so I imagine I'll never fully understand all the nuances. But I'd like to try anyway. I understand the general concept, that an FFT shows how much energy is present at different frequencies. What I'd like to know is how to set the various parameters such as FFT Size and Overlap, when to use the different types of window smoothing and why, and so forth. Below is a list of settings in the Rightmark FFT analyzer with my associated questions, and hopefully this is a good place to start. FFT Size: I understand that the higher the number, the better the frequency resolution. So why is this I realize this is a lot to ask! If anyone knows of a good newbielevel tutorial that explains this in plain English with minimal math, I'd love to see it. Everything I've found through Google starts right in with math that's way over my head. Ethan  I believe in Truth, Justice, and the Scientific Method



Mar 28 2010, 16:53
Post
#2


Winamp Developer Group: Developer Posts: 670 Joined: 17July 05 From: Brooklyn, NY Member No.: 23375 
Answering a bit more than your question asks, for anyone else who might be interested in this thread.
A transform is just a different way of representing the same set of data. If we have three samples, 4, 7 and 12, we could use a polynomial transform and use transformed coefficients of 4, 2, 1 and recreate the original samples through the equation a + b * x + c * x^2 by plugging in x for each sample (f(0) = 4, f(1) = 7, f(2) = 12). We havn't stored any less or any more data by using the transform rather than the samples, and we can change between them without loss. A fourier transform is the same thing, but the equation to recreate the samples is a + b*sin(x) + c*cos(x) + d*sin(2x) + e*cos(2x) + ... Taking the fourier transform is done by solving a giant system of equations to determine the correct coefficients to recreate the original samples out of this equation. The fourier transform has the nice property that it it transforms the sample data into a representation that models the human ear (and other physical systems) very well. When we take the FT (via FFT or DFT) of a large audio file, we are going to get coefficients representing different frequency bands. Unfortunately, this information gives us only the average energy of each frequency band, not the precise occurrence of the frequency within the song. If you had a weird highpitch noise in some small section of the song, you could see it in the spectrograph, but would have no idea where it occurs. Just as looking at the streaming of audio samples gives you "time" data but no frequency information. The frequency coefficients of a fourier transform give you "frequency" data but no time information. To work around this issue, we take the FT of subsections of the song called windows. A smaller window gives us less frequency bands, but if we see the weird highpitched noise, at least we've narrowed it down to the current window. In the FT lingo, you can tradeoff "time resolution" and "frequency resolution" by using smaller or larger windows of audio. In Sound Forge, "FFT Size" corresponds to window size. At the extreme smaller end of window sizes, you have something called "Short Time Fourier Transform" which is often doing an FFT with windows of 4 samples. Wavelets are used when time and frequency resolution need to be adjusted more precisely, but that's a conversation for another day When you see a spectrogram (2D spectrograph over time), it is a series of windowed FTs. Each vertical "strip" is one window. Windowing the audio, however, causes an annoying artifact. If you were to play the subsection of audio out of your speakers, you'd likely get a 'click' at the beginning and the end since they don't occur at zero crossings. It shows up as noise in the spectrograph just like it shows up as noise on the speakers. And because you can't tell "where" the noise occurs within the window, there's no way to isolate it out. You can see this phenomenon in low bitrate video as "blocking" artifacts  same reason. A windowing function is using to avoid these blocking artifacts. Conceptually, it fades in the start of the window and fades out the end of the window. But as you can imagine, a windowing function is also destroying out data. This is where overlap comes in. If we let the windows overlap each other  e.g. first window is samples 0 through 1023 and the second window is samples 512 through 1535, we can avoid destroying the data. This is because the audio we "faded out" during the current window becomes part of the "fade in" of the next window. The downside of the windowing function is that the data has "smeared" itself across two windows. Different windowing functions change the amount of smear, allowing a tradeoff between blocking artifacts and smear. Hope that helps. This post has been edited by benski: Mar 28 2010, 17:04 


LoFi Version  Time is now: 22nd September 2014  11:02 