Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Human hearing beats FFT (Read 48046 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Human hearing beats FFT

Reply #25
Good God, I have been reading/getting PMed on other sites with a variety of ridiculous stuff about this article. So far, it apparently disproves sampling theorem (!) and, by an utterly incredible chain of logic, renders all existing measurement techniques worthless, either because there are ENORMOUS DISTORTIONS HIDING INSIDE THE FOURIERS or because...well...human hearing is nonlinear...erm...therefore all linear measurements are stupid and wrong...therefore tube amps. Or something.


Human hearing beats FFT

Reply #27
Good God, I have been reading/getting PMed on other sites with a variety of ridiculous stuff about this article. So far, it apparently disproves sampling theorem (!) and, by an utterly incredible chain of logic, renders all existing measurement techniques worthless, either because there are ENORMOUS DISTORTIONS HIDING INSIDE THE FOURIERS or because...well...human hearing is nonlinear...erm...therefore all linear measurements are stupid and wrong...therefore tube amps. Or something.



Yeah, me too, intercourse-it.
-----
J. D. (jj) Johnston

Human hearing beats FFT

Reply #28
It's amazing what people conclude, given that the experiment will have been carried out with digital audio signals (not analogue signal generators, and certainly not vinyl!), and the extreme "10x better than FFT" test clips will happily survive mp3 encoding.

Cheers,
David.

Human hearing beats FFT

Reply #29
I didn't find any mention of FFT in this article. Only "Fourier uncertainty principle" and "uncertainty limit"

Human hearing beats FFT

Reply #30
The first thing that comes to my mind is: Now how do we design some frequency transform that provides better results than human hearing? I honestly don't know. FFT has been the de-facto frequency transform in my head for far too long. My own attempt to hack around with wavelets never gave me better time/frequency resolution than your typical STFT. What other options do we have?

Human hearing beats FFT

Reply #31
I didn't find any mention of FFT in this article. Only "Fourier uncertainty principle" and "uncertainty limit"

I tried this already.  I guess I'm not the only one wondering how 10x better than "FFT" can survive going through an FFT process. 

EDIT: added scary quotes. I don't wonder how a reversible process can satisfy the requirements of a non-linear system.

Human hearing beats FFT

Reply #32
Is not some wavelet/filterbank transform more relevant than the FFT for comparing with human hearing?


Yes, there's no reason to limit yourself to FFT. The most advanced psymodels don't use them exactly because of that reason, they use QMF filterbanks or similar. (This already implies that what's in that article isn't so shocking as you'd think)

Human hearing beats FFT

Reply #33
The first thing that comes to my mind is: Now how do we design some frequency transform that provides better results than human hearing? I honestly don't know. FFT has been the de-facto frequency transform in my head for far too long. My own attempt to hack around with wavelets never gave me better time/frequency resolution than your typical STFT. What other options do we have?


Parallel bandpass filters (PEAQ Advanced). More accurate, very slow.
Wavelets on the MDCT coefficients (Opus). Fast, can switch the T/F tradeoff depending on the signal.

The ear works more like the parallel filters setup.

Human hearing beats FFT

Reply #34
Yes, there's no reason to limit yourself to FFT. The most advanced psymodels don't use them exactly because of that reason, they use QMF filterbanks or similar. (This already implies that what's in that article isn't so shocking as you'd think)


FFTs, MDCTs, QMFs and other filter banks are all fundamentally bound by the uncertainty principle: the product of the frequency resolution and time resolution cannot be smaller than 1. This is the case for any non-parametric model/transform, i.e. when you don't make any particular assumptions about your signal. There are however parametric models one can use. The best example is a model where you directly fit sinusoids of arbitrary frequencies (as opposed to Fourier, which uses sinusoids of predetermined frequencies). With such a model, the resolution is only limited by practical concerns like noise, other sinusoids, and modulation effects. As a trivial example, if you give me three samples and promise that they represent only a single sinusoid (no noise or modulation), then I can calculate the exact frequency of that sinusoid. So in theory, sinudoidal modeling solves all the time-freq issues of the FFT. The only problem is that it's damn hard to use, especially when it comes to having a good enough analysis. And that's why we don't don't have any high-quality sinusoidal-based audio codecs.

Human hearing beats FFT

Reply #35
For those in the know..........



The Princess and the Pea.


That sums it all up.



Paul



 
"Reality is merely an illusion, albeit a very persistent one." Albert Einstein

Human hearing beats FFT

Reply #36
It's still a confused headline. Recognzing one of a set of different sine waves is not limited by the Gabor limit.

Observing that that is not limited by the Gabor limit is like observing that white is not limited by aircraft.
-----
J. D. (jj) Johnston

Human hearing beats FFT

Reply #37
It's still a confused headline.... is like observing that white is not limited by aircraft.


I'm willing to rename this thread to "white is not limited by aircraft" but I don't think it'll make things better

Human hearing beats FFT

Reply #38
Yeah, we don't want to turn HA into an extension of horse_ebooks.


Human hearing beats FFT

Reply #40
Never mind the title, I still don't find a satisfactory answer in this thread.

I understand that the human ear uses a wobbling membrane as something like a filter bank, with a number of non-linear processes, and an amazing analysis of the signals coming from it, to deliver the hearing capacities that we can probe in listening tests and experience every day. I understand that this is nothing like an FFT. I understand that the frequency resolution of masked noise is not that critical, so we use FFTs in codecs in a place where their frequency resolution is far over-specified, rather than being an issue.

However, we often describe other things in audio and hearing with an FFT-like model. It crops up in sampling theory. We push all the audio through a comparable filterbank in most lossy codecs. It is true that these transforms are mathematically lossless/reversible - but if we're messing with things in the other domain, this is little comfort.

So, simply, what is the reason that this is OK?

Cheers,
David.

 

Human hearing beats FFT

Reply #41
Never mind the title, I still don't find a satisfactory answer in this thread.

I understand that the human ear uses a wobbling membrane as something like a filter bank, with a number of non-linear processes, and an amazing analysis of the signals coming from it, to deliver the hearing capacities that we can probe in listening tests and experience every day. I understand that this is nothing like an FFT. I understand that the frequency resolution of masked noise is not that critical, so we use FFTs in codecs in a place where their frequency resolution is far over-specified, rather than being an issue.

However, we often describe other things in audio and hearing with an FFT-like model. It crops up in sampling theory. We push all the audio through a comparable filterbank in most lossy codecs. It is true that these transforms are mathematically lossless/reversible - but if we're messing with things in the other domain, this is little comfort.

So, simply, what is the reason that this is OK?

Cheers,
David.

I guess the switching between two different time/frequency resolution transforms in many lossy codecs is a sort of "ad hoc" fix for not doing a proper modelling of our hearing aparatus?

Not all audio processing/transmission may need to include an accurate model of our hearing. Perhaps a crude STFT is simply sufficient for some applications.

So what if we deviced an insanely complex, irregular, nonlinear filterbank (Volterra filterbank?). What could it be used for? Better lossy coding? (I think that there are other tradeoffs in lossy coding as well, such as signal compaction). Could we make better "frequency analyzers"? (what engineers would be able to interpret the plots from such a device?).

-k

Human hearing beats FFT

Reply #42
I guess the switching between two different time/frequency resolution transforms in many lossy codecs is a sort of "ad hoc" fix for not doing a proper modelling of our hearing aparatus?
Not in the sense discussed in this paper. If that lossy codec filterbank and/or transform defined/trashed the performance that's measured in this paper (it doesn't), then even with optimal choice of transform length options and optimal switching between them, the result would be 10x too bad.

I think your other two paragraphs are right though. I'd just love to see a robust scholarly explanation, because I think we're going to need it after this paper.

Cheers,
David.

Human hearing beats FFT

Reply #43
Never mind the title, I still don't find a satisfactory answer in this thread.

I understand that the human ear uses a wobbling membrane as something like a filter bank, with a number of non-linear processes, and an amazing analysis of the signals coming from it, to deliver the hearing capacities that we can probe in listening tests and experience every day. I understand that this is nothing like an FFT. I understand that the frequency resolution of masked noise is not that critical, so we use FFTs in codecs in a place where their frequency resolution is far over-specified, rather than being an issue.

However, we often describe other things in audio and hearing with an FFT-like model. It crops up in sampling theory. We push all the audio through a comparable filterbank in most lossy codecs. It is true that these transforms are mathematically lossless/reversible - but if we're messing with things in the other domain, this is little comfort.


If you want to think about this in terms of FFTs... consider the case of a 10 ms FFT window. The resolution of that FFT is 100 Hz. Does this mean we can't tell the frequency of a sinusoid with better than 100 Hz accuracy using that FFT? Absolutely not. First, we can use interpolation with the neighbouring bins to get a more precise value. If we have FFTs at other time offsets, we can do even better. We can look at phase changes for a certain bin and compute the exact (within noise limits) frequency of the sinusoid that's around that bin. So we've again "beaten Heisenberg", but only because we've assumed that we have a single sinusoid around that bin. AFAIK, the human ear is capable of similar phase processing to figure out the frequency. It has to do something like htat because it's "critical bands" are far wider than the bins of a 10 ms FFT. There's only ~25 critical bands for the entire 20 Hz - 20 kHz spectrum.

Human hearing beats FFT

Reply #44
There are a number of issues confused in this thread.

The first is that the Gabor limit applies. The Gabor limit only applies when what you need to detect is completely unknown.

Hearing the difference between notes is not at all the same problem.

The second that this 'beats FFT'. It beats the single-bin resolution of an FFT, but once you know you're dealing with a single cycle of a single sine wave, that problem becomes moot, because an FFT is 1:1 and onto, i.e. orthonormal, tight frame, etc, and the information is all retained. So, yes, it is there in the FFT that has wider bands, just not in the usual way one would extract it. The GABOR LIMIT DOES NOT APPLY TO THIS DETECTION ISSUE, and YES, Batman, the FFT can be used in such detection, it's just a dumb way to do it.

Third, the ear has about 60Hz bands until you get to the point where 1/4 octave is wider, and then they are 1/4 octave wide, give or take. This has little reading on the actual frequency detection mechanism, because the phase of firing of neurons is radically different below and above the center frequency of a given hair cell. This, alone, to 500Hz, can suffice to demonstrate pitch detection ability. And since the filters are wide, they settle fast, and hence again we beat the gabor limit, because we know we're looking for ONE set of frequencies, not any arbitrary frequency.

So, the headline is just confused, it's comparing an apple, an orange, and a crate full of bowling balls, and concluding that apples are orange-colored and weigh 12 lbs.

-----
J. D. (jj) Johnston

Human hearing beats FFT

Reply #45
Thank you JJ.


Human hearing beats FFT

Reply #47
Well the quality of the comments look pretty encouraging, though I imagine the section's entropy will increase, especially after the more informed people get tired of participating.

Human hearing beats FFT

Reply #48
Well the quality of the comments look pretty encouraging, though I imagine the section's entropy will increase, especially after the more informed people get tired of participating.


I don't belong to that particular site. If somebody would like to convey my feeling, please feel free.

I'm tired of dealing with what I can only describe as 'poo flinging' in most of the audio press.
-----
J. D. (jj) Johnston

Human hearing beats FFT

Reply #49
The first is that the Gabor limit applies. The Gabor limit only applies when what you need to detect is completely unknown.

Speaking of Gabor, there's a nice prior art from 1946 suggesting that “human hearing beats FFT”:
Quote
Actually, as noted by Dennis Gabor (best known for his invention of holography, but who also worked in audio) back in 1946, the ears actually analyse the frequency content of sounds in time faster than suggested by the uncertainty principle by a factor of about 7. The seeming logical contradiction with the fundamental theoretical limit of time/frequency resolution is avoided by the ear’s use of a-priori or previously assumed knowledge of the nature of typical sounds but at the expense of getting the analysis ‘wrong’ when sounds not of the assumed form occur.

(quote taken from M. Gerzon's paper )