IPB

Welcome Guest ( Log In | Register )

3 Pages V  < 1 2 3 >  
Reply to this topicStart new topic
Human hearing beats FFT
Willakan
post Feb 18 2013, 20:03
Post #26





Group: Members
Posts: 34
Joined: 19-May 12
Member No.: 99992



Good God, I have been reading/getting PMed on other sites with a variety of ridiculous stuff about this article. So far, it apparently disproves sampling theorem (!) and, by an utterly incredible chain of logic, renders all existing measurement techniques worthless, either because there are ENORMOUS DISTORTIONS HIDING INSIDE THE FOURIERS or because...well...human hearing is nonlinear...erm...therefore all linear measurements are stupid and wrong...therefore tube amps. Or something.

This post has been edited by Willakan: Feb 18 2013, 20:03
Go to the top of the page
+Quote Post
Porcus
post Feb 18 2013, 20:22
Post #27





Group: Members
Posts: 1842
Joined: 30-November 06
Member No.: 38207



QUOTE (Willakan @ Feb 18 2013, 20:03) *
or because...well...human hearing is nonlinear...erm...therefore all linear measurements are stupid and wrong...therefore tube amps. Or something.


That one was cute.


--------------------
One day in the Year of the Fox came a time remembered well
Go to the top of the page
+Quote Post
Woodinville
post Feb 19 2013, 00:44
Post #28





Group: Members
Posts: 1402
Joined: 9-January 05
From: JJ's office.
Member No.: 18957



QUOTE (Willakan @ Feb 18 2013, 11:03) *
Good God, I have been reading/getting PMed on other sites with a variety of ridiculous stuff about this article. So far, it apparently disproves sampling theorem (!) and, by an utterly incredible chain of logic, renders all existing measurement techniques worthless, either because there are ENORMOUS DISTORTIONS HIDING INSIDE THE FOURIERS or because...well...human hearing is nonlinear...erm...therefore all linear measurements are stupid and wrong...therefore tube amps. Or something.



Yeah, me too, intercourse-it.


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
2Bdecided
post Feb 19 2013, 10:53
Post #29


ReplayGain developer


Group: Developer
Posts: 5142
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



It's amazing what people conclude, given that the experiment will have been carried out with digital audio signals (not analogue signal generators, and certainly not vinyl!), and the extreme "10x better than FFT" test clips will happily survive mp3 encoding.

Cheers,
David.

This post has been edited by 2Bdecided: Feb 19 2013, 10:54
Go to the top of the page
+Quote Post
lvqcl
post Feb 19 2013, 15:51
Post #30





Group: Developer
Posts: 3387
Joined: 2-December 07
Member No.: 49183



I didn't find any mention of FFT in this article. Only "Fourier uncertainty principle" and "uncertainty limit"

This post has been edited by lvqcl: Feb 19 2013, 15:51
Go to the top of the page
+Quote Post
Canar
post Feb 19 2013, 16:46
Post #31





Group: Super Moderator
Posts: 3361
Joined: 26-July 02
From: princegeorge.ca
Member No.: 2796



The first thing that comes to my mind is: Now how do we design some frequency transform that provides better results than human hearing? I honestly don't know. FFT has been the de-facto frequency transform in my head for far too long. My own attempt to hack around with wavelets never gave me better time/frequency resolution than your typical STFT. What other options do we have?

This post has been edited by Canar: Feb 19 2013, 16:46


--------------------
You cannot ABX the rustling of jimmies.
No mouse? No problem.
Go to the top of the page
+Quote Post
greynol
post Feb 19 2013, 16:56
Post #32





Group: Super Moderator
Posts: 10000
Joined: 1-April 04
From: San Francisco
Member No.: 13167



QUOTE (lvqcl @ Feb 19 2013, 06:51) *
I didn't find any mention of FFT in this article. Only "Fourier uncertainty principle" and "uncertainty limit"

I tried this already. I guess I'm not the only one wondering how 10x better than "FFT" can survive going through an FFT process.

EDIT: added scary quotes. I don't wonder how a reversible process can satisfy the requirements of a non-linear system.

This post has been edited by greynol: Feb 19 2013, 17:04


--------------------
I should publish a list of forum idiots.
Go to the top of the page
+Quote Post
Garf
post Feb 19 2013, 17:27
Post #33


Server Admin


Group: Admin
Posts: 4885
Joined: 24-September 01
Member No.: 13



QUOTE (knutinh @ Feb 12 2013, 09:59) *
Is not some wavelet/filterbank transform more relevant than the FFT for comparing with human hearing?


Yes, there's no reason to limit yourself to FFT. The most advanced psymodels don't use them exactly because of that reason, they use QMF filterbanks or similar. (This already implies that what's in that article isn't so shocking as you'd think)
Go to the top of the page
+Quote Post
Garf
post Feb 19 2013, 17:32
Post #34


Server Admin


Group: Admin
Posts: 4885
Joined: 24-September 01
Member No.: 13



QUOTE (Canar @ Feb 19 2013, 16:46) *
The first thing that comes to my mind is: Now how do we design some frequency transform that provides better results than human hearing? I honestly don't know. FFT has been the de-facto frequency transform in my head for far too long. My own attempt to hack around with wavelets never gave me better time/frequency resolution than your typical STFT. What other options do we have?


Parallel bandpass filters (PEAQ Advanced). More accurate, very slow.
Wavelets on the MDCT coefficients (Opus). Fast, can switch the T/F tradeoff depending on the signal.

The ear works more like the parallel filters setup.
Go to the top of the page
+Quote Post
jmvalin
post Feb 19 2013, 21:07
Post #35


Xiph.org Speex developer


Group: Developer
Posts: 481
Joined: 21-August 02
Member No.: 3134



QUOTE (Garf @ Feb 19 2013, 11:27) *
Yes, there's no reason to limit yourself to FFT. The most advanced psymodels don't use them exactly because of that reason, they use QMF filterbanks or similar. (This already implies that what's in that article isn't so shocking as you'd think)


FFTs, MDCTs, QMFs and other filter banks are all fundamentally bound by the uncertainty principle: the product of the frequency resolution and time resolution cannot be smaller than 1. This is the case for any non-parametric model/transform, i.e. when you don't make any particular assumptions about your signal. There are however parametric models one can use. The best example is a model where you directly fit sinusoids of arbitrary frequencies (as opposed to Fourier, which uses sinusoids of predetermined frequencies). With such a model, the resolution is only limited by practical concerns like noise, other sinusoids, and modulation effects. As a trivial example, if you give me three samples and promise that they represent only a single sinusoid (no noise or modulation), then I can calculate the exact frequency of that sinusoid. So in theory, sinudoidal modeling solves all the time-freq issues of the FFT. The only problem is that it's damn hard to use, especially when it comes to having a good enough analysis. And that's why we don't don't have any high-quality sinusoidal-based audio codecs.
Go to the top of the page
+Quote Post
Paulhoff
post Feb 20 2013, 19:51
Post #36





Group: Members
Posts: 106
Joined: 3-June 05
From: Coconut Creek Fl
Member No.: 22486



For those in the know..........



The Princess and the Pea.


That sums it all up.



Paul



smile.gif smile.gif smile.gif

This post has been edited by Paulhoff: Feb 20 2013, 20:12


--------------------
"Reality is merely an illusion, albeit a very persistent one." Albert Einstein
Go to the top of the page
+Quote Post
Woodinville
post Feb 21 2013, 04:55
Post #37





Group: Members
Posts: 1402
Joined: 9-January 05
From: JJ's office.
Member No.: 18957



It's still a confused headline. Recognzing one of a set of different sine waves is not limited by the Gabor limit.

Observing that that is not limited by the Gabor limit is like observing that white is not limited by aircraft.


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
Garf
post Feb 21 2013, 07:54
Post #38


Server Admin


Group: Admin
Posts: 4885
Joined: 24-September 01
Member No.: 13



QUOTE (Woodinville @ Feb 21 2013, 04:55) *
It's still a confused headline.... is like observing that white is not limited by aircraft.


I'm willing to rename this thread to "white is not limited by aircraft" but I don't think it'll make things better smile.gif
Go to the top of the page
+Quote Post
dhromed
post Feb 21 2013, 10:02
Post #39





Group: Members
Posts: 1314
Joined: 16-February 08
From: NL
Member No.: 51347



Yeah, we don't want to turn HA into an extension of horse_ebooks.
Go to the top of the page
+Quote Post
Woodinville
post Feb 22 2013, 04:22
Post #40





Group: Members
Posts: 1402
Joined: 9-January 05
From: JJ's office.
Member No.: 18957



QUOTE (Garf @ Feb 20 2013, 22:54) *
QUOTE (Woodinville @ Feb 21 2013, 04:55) *
It's still a confused headline.... is like observing that white is not limited by aircraft.


I'm willing to rename this thread to "white is not limited by aircraft" but I don't think it'll make things better smile.gif


No better. Just as meaningful. smile.gif


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
2Bdecided
post Feb 25 2013, 13:14
Post #41


ReplayGain developer


Group: Developer
Posts: 5142
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



Never mind the title, I still don't find a satisfactory answer in this thread.

I understand that the human ear uses a wobbling membrane as something like a filter bank, with a number of non-linear processes, and an amazing analysis of the signals coming from it, to deliver the hearing capacities that we can probe in listening tests and experience every day. I understand that this is nothing like an FFT. I understand that the frequency resolution of masked noise is not that critical, so we use FFTs in codecs in a place where their frequency resolution is far over-specified, rather than being an issue.

However, we often describe other things in audio and hearing with an FFT-like model. It crops up in sampling theory. We push all the audio through a comparable filterbank in most lossy codecs. It is true that these transforms are mathematically lossless/reversible - but if we're messing with things in the other domain, this is little comfort.

So, simply, what is the reason that this is OK?

Cheers,
David.
Go to the top of the page
+Quote Post
knutinh
post Feb 25 2013, 15:32
Post #42





Group: Members
Posts: 569
Joined: 1-November 06
Member No.: 37047



QUOTE (2Bdecided @ Feb 25 2013, 13:14) *
Never mind the title, I still don't find a satisfactory answer in this thread.

I understand that the human ear uses a wobbling membrane as something like a filter bank, with a number of non-linear processes, and an amazing analysis of the signals coming from it, to deliver the hearing capacities that we can probe in listening tests and experience every day. I understand that this is nothing like an FFT. I understand that the frequency resolution of masked noise is not that critical, so we use FFTs in codecs in a place where their frequency resolution is far over-specified, rather than being an issue.

However, we often describe other things in audio and hearing with an FFT-like model. It crops up in sampling theory. We push all the audio through a comparable filterbank in most lossy codecs. It is true that these transforms are mathematically lossless/reversible - but if we're messing with things in the other domain, this is little comfort.

So, simply, what is the reason that this is OK?

Cheers,
David.

I guess the switching between two different time/frequency resolution transforms in many lossy codecs is a sort of "ad hoc" fix for not doing a proper modelling of our hearing aparatus?

Not all audio processing/transmission may need to include an accurate model of our hearing. Perhaps a crude STFT is simply sufficient for some applications.

So what if we deviced an insanely complex, irregular, nonlinear filterbank (Volterra filterbank?). What could it be used for? Better lossy coding? (I think that there are other tradeoffs in lossy coding as well, such as signal compaction). Could we make better "frequency analyzers"? (what engineers would be able to interpret the plots from such a device?).

-k
Go to the top of the page
+Quote Post
2Bdecided
post Feb 25 2013, 17:24
Post #43


ReplayGain developer


Group: Developer
Posts: 5142
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (knutinh @ Feb 25 2013, 14:32) *
I guess the switching between two different time/frequency resolution transforms in many lossy codecs is a sort of "ad hoc" fix for not doing a proper modelling of our hearing aparatus?
Not in the sense discussed in this paper. If that lossy codec filterbank and/or transform defined/trashed the performance that's measured in this paper (it doesn't), then even with optimal choice of transform length options and optimal switching between them, the result would be 10x too bad.

I think your other two paragraphs are right though. I'd just love to see a robust scholarly explanation, because I think we're going to need it after this paper.

Cheers,
David.
Go to the top of the page
+Quote Post
jmvalin
post Feb 26 2013, 00:10
Post #44


Xiph.org Speex developer


Group: Developer
Posts: 481
Joined: 21-August 02
Member No.: 3134



QUOTE (2Bdecided @ Feb 25 2013, 07:14) *
Never mind the title, I still don't find a satisfactory answer in this thread.

I understand that the human ear uses a wobbling membrane as something like a filter bank, with a number of non-linear processes, and an amazing analysis of the signals coming from it, to deliver the hearing capacities that we can probe in listening tests and experience every day. I understand that this is nothing like an FFT. I understand that the frequency resolution of masked noise is not that critical, so we use FFTs in codecs in a place where their frequency resolution is far over-specified, rather than being an issue.

However, we often describe other things in audio and hearing with an FFT-like model. It crops up in sampling theory. We push all the audio through a comparable filterbank in most lossy codecs. It is true that these transforms are mathematically lossless/reversible - but if we're messing with things in the other domain, this is little comfort.


If you want to think about this in terms of FFTs... consider the case of a 10 ms FFT window. The resolution of that FFT is 100 Hz. Does this mean we can't tell the frequency of a sinusoid with better than 100 Hz accuracy using that FFT? Absolutely not. First, we can use interpolation with the neighbouring bins to get a more precise value. If we have FFTs at other time offsets, we can do even better. We can look at phase changes for a certain bin and compute the exact (within noise limits) frequency of the sinusoid that's around that bin. So we've again "beaten Heisenberg", but only because we've assumed that we have a single sinusoid around that bin. AFAIK, the human ear is capable of similar phase processing to figure out the frequency. It has to do something like htat because it's "critical bands" are far wider than the bins of a 10 ms FFT. There's only ~25 critical bands for the entire 20 Hz - 20 kHz spectrum.
Go to the top of the page
+Quote Post
Woodinville
post Feb 26 2013, 04:30
Post #45





Group: Members
Posts: 1402
Joined: 9-January 05
From: JJ's office.
Member No.: 18957



There are a number of issues confused in this thread.

The first is that the Gabor limit applies. The Gabor limit only applies when what you need to detect is completely unknown.

Hearing the difference between notes is not at all the same problem.

The second that this 'beats FFT'. It beats the single-bin resolution of an FFT, but once you know you're dealing with a single cycle of a single sine wave, that problem becomes moot, because an FFT is 1:1 and onto, i.e. orthonormal, tight frame, etc, and the information is all retained. So, yes, it is there in the FFT that has wider bands, just not in the usual way one would extract it. The GABOR LIMIT DOES NOT APPLY TO THIS DETECTION ISSUE, and YES, Batman, the FFT can be used in such detection, it's just a dumb way to do it.

Third, the ear has about 60Hz bands until you get to the point where 1/4 octave is wider, and then they are 1/4 octave wide, give or take. This has little reading on the actual frequency detection mechanism, because the phase of firing of neurons is radically different below and above the center frequency of a given hair cell. This, alone, to 500Hz, can suffice to demonstrate pitch detection ability. And since the filters are wide, they settle fast, and hence again we beat the gabor limit, because we know we're looking for ONE set of frequencies, not any arbitrary frequency.

So, the headline is just confused, it's comparing an apple, an orange, and a crate full of bowling balls, and concluding that apples are orange-colored and weigh 12 lbs.



--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
2Bdecided
post Feb 26 2013, 10:57
Post #46


ReplayGain developer


Group: Developer
Posts: 5142
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



Thank you JJ.
Go to the top of the page
+Quote Post
krabapple
post Feb 27 2013, 01:51
Post #47





Group: Members
Posts: 2274
Joined: 18-December 03
Member No.: 10538



http://arstechnica.com/science/2013/02/hum...3s-sound-worse/

This post has been edited by krabapple: Feb 27 2013, 01:52
Go to the top of the page
+Quote Post
greynol
post Feb 27 2013, 02:44
Post #48





Group: Super Moderator
Posts: 10000
Joined: 1-April 04
From: San Francisco
Member No.: 13167



Well the quality of the comments look pretty encouraging, though I imagine the section's entropy will increase, especially after the more informed people get tired of participating.


--------------------
I should publish a list of forum idiots.
Go to the top of the page
+Quote Post
Woodinville
post Feb 27 2013, 05:55
Post #49





Group: Members
Posts: 1402
Joined: 9-January 05
From: JJ's office.
Member No.: 18957



QUOTE (greynol @ Feb 26 2013, 17:44) *
Well the quality of the comments look pretty encouraging, though I imagine the section's entropy will increase, especially after the more informed people get tired of participating.


I don't belong to that particular site. If somebody would like to convey my feeling, please feel free.

I'm tired of dealing with what I can only describe as 'poo flinging' in most of the audio press.


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
Alexey Lukin
post Mar 14 2013, 17:39
Post #50





Group: Members
Posts: 191
Joined: 31-July 08
Member No.: 56508



QUOTE (Woodinville @ Feb 25 2013, 23:30) *
The first is that the Gabor limit applies. The Gabor limit only applies when what you need to detect is completely unknown.

Speaking of Gabor, there's a nice prior art from 1946 suggesting that “human hearing beats FFT”:
QUOTE
Actually, as noted by Dennis Gabor (best known for his invention of holography, but who also worked in audio) back in 1946, the ears actually analyse the frequency content of sounds in time faster than suggested by the uncertainty principle by a factor of about 7. The seeming logical contradiction with the fundamental theoretical limit of time/frequency resolution is avoided by the ear’s use of a-priori or previously assumed knowledge of the nature of typical sounds but at the expense of getting the analysis ‘wrong’ when sounds not of the assumed form occur.

(quote taken from M. Gerzon's paper )
Go to the top of the page
+Quote Post

3 Pages V  < 1 2 3 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 23rd September 2014 - 19:35