IPB

Welcome Guest ( Log In | Register )

3 Pages V  < 1 2 3  
Reply to this topicStart new topic
Human hearing beats FFT
2Bdecided
post Mar 15 2013, 12:15
Post #51


ReplayGain developer


Group: Developer
Posts: 5058
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



Interesting paper, thank you.
Go to the top of the page
+Quote Post
Yaakov Gringeler
post Apr 2 2013, 01:03
Post #52





Group: Members
Posts: 7
Joined: 1-April 13
Member No.: 107483




EST is a new transform that can explain the results of the article.

Fourier-related transforms, like FFT, are just one way to find frequencies, and clearly not the best possible.

EST derives frequencies from samples and is unrelated to Fourier/FFT.
The process of EST is deterministic, does not use non-linear equations, and can handle noise.

In the ideal case of a noiseless signal composed of n sinusoids, the frequencies, amplitudes and phases are precisely recovered from 3n
equally spaced real samples.

A noisy signal will require more samples, depending on noise level.

Other than the minimum for the ideal case, accuracy does not depend on the number of samples (time). The additional samples for a noisy signal
are needed to handle noise.

EST can also transform samples into increasing/decreasing sinusoids, which is a better way to model audio. In such a case, for a noiseless
signal, 4 samples are required per increasing/decreasing sinusoid, and more for a noisy signal.

EST can be evaluated using a demo program that implements it. There is also a paper that details the transform and its mathematical basis.

Those interested to see the paper and/or the demo program, can email me at gringya atsign gmail dot com.
Go to the top of the page
+Quote Post
Woodinville
post Apr 2 2013, 23:47
Post #53





Group: Members
Posts: 1402
Joined: 9-January 05
From: JJ's office.
Member No.: 18957



QUOTE (Yaakov Gringeler @ Apr 1 2013, 17:03) *
Fourier-related transforms, like FFT, are just one way to find frequencies, and clearly not the best possible.

Which, of course, depends entirely on your definition of "Frequency", something that itself is trickier than some seem to realize.
QUOTE
EST derives frequencies from samples and is unrelated to Fourier/FFT.

What does "EST" stand for, in the first place. Does it use a complex exponential or a representation of a complex exponential?

QUOTE
The process of EST is deterministic, does not use non-linear equations, and can handle noise.

Which is true of the Fourier Transform, as well.
QUOTE
In the ideal case of a noiseless signal composed of n sinusoids, the frequencies, amplitudes and phases are precisely recovered from 3n
equally spaced real samples.

Sounds pretty good. What's the basis set you're using? Sounds a lot like a * sin (b *t +c) where a,b,c are the 3 samples. Not sure what "equally spaced" means here, unless you're referring to the fact you can characterize a sine wave with 3 non-degenerate points.
QUOTE
A noisy signal will require more samples, depending on noise level.

No surprise.
QUOTE
Other than the minimum for the ideal case, accuracy does not depend on the number of samples (time). The additional samples for a noisy signal
are needed to handle noise.

EST can also transform samples into increasing/decreasing sinusoids, which is a better way to model audio. In such a case, for a noiseless
signal, 4 samples are required per increasing/decreasing sinusoid, and more for a noisy signal.

So it's Laplace-based instead of Fourier based, then?

Instead of bombarding us with a bunch of not-very-specific qualities, why not just tell us what the basis set is, and how the analysis works?

I am aware of approximately infinite (well, literally infinite but obviously I haven't generated them all!) numbers of basis sets, many of which this could describe.


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
Alexey Lukin
post Apr 2 2013, 23:59
Post #54





Group: Members
Posts: 191
Joined: 31-July 08
Member No.: 56508



Yaakov, also check out the Reassigned spectrogram mode in iZotope RX. It “beats FFT” in terms of time and frequency resolution: it can precisely localize impulsive events in time and precisely display frequencies of harmonics, assuming that they do not overlap in FFT spectrum.
Go to the top of the page
+Quote Post
Yaakov Gringeler
post Apr 3 2013, 01:42
Post #55





Group: Members
Posts: 7
Joined: 1-April 13
Member No.: 107483



EST stands for Exponential Sum Transform and it uses complex exponentials.

The basis is sigma(c*b^t) where b and c are non-zero complex numbers and the set of b is distinct. If all b are on the unit circle, then it is simply a spectrum.

When all b are on the unit circle and the samples are real, this becomes sigma(a*cos(b*t+c))

The samples must be equally space, not just non-degenerate.

It clearly looks more like Laplace than Fourier, but a specific relation, if exists, is not known to me.

As for describing the analysis, I offered to send the detailed paper. Do you prefer an informal description?

Go to the top of the page
+Quote Post
Canar
post Apr 3 2013, 05:27
Post #56





Group: Super Moderator
Posts: 3348
Joined: 26-July 02
From: princegeorge.ca
Member No.: 2796



I think a lot of us here would be interested in a formal description, myself included. I think from what you've just said that we'll get it puzzled out though. smile.gif


--------------------
You cannot ABX the rustling of jimmies.
No mouse? No problem.
Go to the top of the page
+Quote Post
Yaakov Gringeler
post Apr 3 2013, 18:14
Post #57





Group: Members
Posts: 7
Joined: 1-April 13
Member No.: 107483



QUOTE (Canar @ Apr 3 2013, 05:27) *
I think a lot of us here would be interested in a formal description, myself included. I think from what you've just said that we'll get it puzzled out though. smile.gif

If I understand you correctly, you prefer a formal description of the process, and only that.
Go to the top of the page
+Quote Post
db1989
post Apr 3 2013, 18:31
Post #58





Group: Super Moderator
Posts: 5275
Joined: 23-June 06
Member No.: 32180



If I may guess, I think he means that this site has a significant number of users who would appreciate detailed descriptions. However, that is not to stop you from providing less technical information (i.e. ‘layman’s terms’) if you want to; there are probably other users who would like that, too.
Go to the top of the page
+Quote Post
Porcus
post Apr 3 2013, 20:34
Post #59





Group: Members
Posts: 1842
Joined: 30-November 06
Member No.: 38207



I think I could very well use a formula or two ... point seven eighteen twentyeight ...

QUOTE (Yaakov Gringeler @ Apr 3 2013, 02:42) *
As for describing the analysis, I offered to send the detailed paper. Do you prefer an informal description?


I think I just got one that was a bit too rough wink.gif although I do suspect I have guessed the point.

This post has been edited by Porcus: Apr 3 2013, 20:37


--------------------
One day in the Year of the Fox came a time remembered well
Go to the top of the page
+Quote Post
Yaakov Gringeler
post Apr 3 2013, 22:10
Post #60





Group: Members
Posts: 7
Joined: 1-April 13
Member No.: 107483



The following link:

http://www.mediafire.com/view/?ce47jurz43wzjce

is to a short document that describes the EST process for real noiseless samples.

Go to the top of the page
+Quote Post
Woodinville
post Apr 11 2013, 11:09
Post #61





Group: Members
Posts: 1402
Joined: 9-January 05
From: JJ's office.
Member No.: 18957



Hm. Define "noiseless". Most instruments have a chaotic part of their performance that in fact is noiselike in that it does not repeat, is not entirely stationary, depends on technique, and so on.

So, I'm not quite sure I know what you mean by noiseless.


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
Yaakov Gringeler
post Apr 11 2013, 19:33
Post #62





Group: Members
Posts: 7
Joined: 1-April 13
Member No.: 107483



The paper described the mathematical basis of EST, which uses the ideal case of perfect increasing/decreasing sinusoids.

For realistic data, EST uses different processes, that expect noise.

For audio, the EST process is as follows.
1. Find linear prediction coefficients, preferably using the covariance method and not the auto-correlation method.
2. Create the linear prediction polynomial.
3. Find the roots of the linear prediction polynomial to establish the basis set of an exponential sum function, as described in the paper.
4. Use the samples and the basis set to find the coefficients of the function.

The key point is that linear prediction coefficients and an exponential sum function, are equivalent, with the exponential sum function having the distinct advantage of being an analytic function with a useful structure. The mathematical basis proves this equivalence.

Due to the equivalence, an exponential sum function models an audio signal with the same quality as linear prediction.

You may note that the best lossless audio compressors, like OptimFROG, use linear prediction. This is a strong indication of the power of linear prediction to model audio.

Since EST generates an analytic function, it is suitable for lossy audio compression, as well as other audio applications.

Once EST generated an exponential sum function, you can do the following:
Identify noise elements, using frequency and/or amplitude, and remove them.
Identify inaudible elements, and remove them.
Quantize the coefficients.
Resample the audio signal, both sample rate and sample depth.
And various other things.

Unlike Fourier related methods, which use a predefined basis, EST uses a basis derived from the data.

In short, EST for audio combines the flexibility and usefulness of an analytic function with the modeling power of linear prediction.
Go to the top of the page
+Quote Post
Woodinville
post Apr 11 2013, 20:36
Post #63





Group: Members
Posts: 1402
Joined: 9-January 05
From: JJ's office.
Member No.: 18957



QUOTE (Yaakov Gringeler @ Apr 11 2013, 11:33) *
Unlike Fourier related methods, which use a predefined basis, EST uses a basis derived from the data.

In short, EST for audio combines the flexibility and usefulness of an analytic function with the modeling power of linear prediction.


Try applying EST to the first 30 seconds of the track "We Shall Be Happy" by Ry Cooder off the album titled "Jazz". Let me know how big your covariance matrix is, too, ok?


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
Yaakov Gringeler
post Apr 11 2013, 21:32
Post #64





Group: Members
Posts: 7
Joined: 1-April 13
Member No.: 107483



QUOTE (Woodinville @ Apr 11 2013, 20:36) *
QUOTE (Yaakov Gringeler @ Apr 11 2013, 11:33) *
Unlike Fourier related methods, which use a predefined basis, EST uses a basis derived from the data.

In short, EST for audio combines the flexibility and usefulness of an analytic function with the modeling power of linear prediction.


Try applying EST to the first 30 seconds of the track "We Shall Be Happy" by Ry Cooder off the album titled "Jazz". Let me know how big your covariance matrix is, too, ok?


In a practical implementation the samples will be broken into blocks and there will be a chosen matrix size for that block size.

The size of the matrix and the block size will determine accuracy and an accuracy-speed trade-off.

This is also the way it is done when using linear prediction for lossless audio compression or for speech compression. The difference is that EST returns an analytic function.

30 senconds of audio will therefore be broken into many smaller blocks, and not treated as a single block.
Go to the top of the page
+Quote Post
Woodinville
post Jun 4 2013, 01:51
Post #65





Group: Members
Posts: 1402
Joined: 9-January 05
From: JJ's office.
Member No.: 18957



QUOTE (Yaakov Gringeler @ Apr 11 2013, 13:32) *
QUOTE (Woodinville @ Apr 11 2013, 20:36) *
QUOTE (Yaakov Gringeler @ Apr 11 2013, 11:33) *
Unlike Fourier related methods, which use a predefined basis, EST uses a basis derived from the data.

In short, EST for audio combines the flexibility and usefulness of an analytic function with the modeling power of linear prediction.


Try applying EST to the first 30 seconds of the track "We Shall Be Happy" by Ry Cooder off the album titled "Jazz". Let me know how big your covariance matrix is, too, ok?


In a practical implementation the samples will be broken into blocks and there will be a chosen matrix size for that block size.

The size of the matrix and the block size will determine accuracy and an accuracy-speed trade-off.

This is also the way it is done when using linear prediction for lossless audio compression or for speech compression. The difference is that EST returns an analytic function.

30 senconds of audio will therefore be broken into many smaller blocks, and not treated as a single block.


I do know how coders work, so try your EST basis on We Shall Be Happy and get back to me, ok? And tell me how many basis functions you need for that one, too. And how many are orthogonal. And then how many of those you have to code.


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
Specy
post Aug 17 2013, 11:52
Post #66





Group: Members
Posts: 22
Joined: 27-November 08
Member No.: 63320



Over 10 years ago, for my master thesis, I wrote an algorithm that determines nearly exact frequency values from an FFT transform - it can find any frequency as long as they are far enough away from each other and constant in tone and level.

The method is pretty simple:
1. Create an FFT using a window that's a lot bigger than the block of audio that you use
2. Find the highest peak in the FFT domain. This is an estimation of the loudest frequency present.
3. Write down the found frequency, phase and amplitude
4. Generate an FFT based on the found freq, phase, amp (this can be optimized for speed, since it's only a single tone).
5. Subtract a small percentage of this (I found that 5-10% works well) from the original FFT from step 1.
6. Go back to step 2.

This gives you a whole lot of values, next you need to combine all the values that have approximately the same frequency. This can be done as follows:
- If a frequency is new (no data within 0.5 FFT bin size), this is a new frequency that we haven't seen before.
- Otherwise combine this new measurement with the measurement closest to it.

Tones that are 1 bin apart will not be found perfectly (frequency and amplitude might be very slightly wrong), but they still clearly show up as separate signals. Tones that are 2 or more bins apart show up nearly perfectly.

Test tones:


Real signal (voice):



Signal and it's peak data:




This post has been edited by Specy: Aug 17 2013, 11:59
Go to the top of the page
+Quote Post
Yaakov Gringeler
post Nov 4 2013, 20:15
Post #67





Group: Members
Posts: 7
Joined: 1-April 13
Member No.: 107483



Several months ago, in posts in this topic, I provided some information about my transform, EST.

I now have a document with better explanations, actual results, and charts.

The link to the document is:
http://www.mediafire.com/?0bprdaoop81d0cx
Please note that viewing the document online will only display the text, and not the charts. It has to be downloaded to be fully viewed.

As a reminder, this topic followed an article that showed that human hearing performance in finding frequencies exceeds the Fourier uncertainty limit.

EST finds frequencies using a deterministic algorithm unrelated to Fourier transforms and not bound by the Fourier uncertainty principle.

This shows that the results of the article are not surprising.
Go to the top of the page
+Quote Post

3 Pages V  < 1 2 3
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 24th July 2014 - 07:29