Human hearing beats FFT 
Human hearing beats FFT 
Mar 15 2013, 12:15
Post
#51


ReplayGain developer Group: Developer Posts: 5453 Joined: 5November 01 From: Yorkshire, UK Member No.: 409 
Interesting paper, thank you.



Apr 2 2013, 01:03
Post
#52


Group: Members Posts: 7 Joined: 1April 13 Member No.: 107483 
EST is a new transform that can explain the results of the article. Fourierrelated transforms, like FFT, are just one way to find frequencies, and clearly not the best possible. EST derives frequencies from samples and is unrelated to Fourier/FFT. The process of EST is deterministic, does not use nonlinear equations, and can handle noise. In the ideal case of a noiseless signal composed of n sinusoids, the frequencies, amplitudes and phases are precisely recovered from 3n equally spaced real samples. A noisy signal will require more samples, depending on noise level. Other than the minimum for the ideal case, accuracy does not depend on the number of samples (time). The additional samples for a noisy signal are needed to handle noise. EST can also transform samples into increasing/decreasing sinusoids, which is a better way to model audio. In such a case, for a noiseless signal, 4 samples are required per increasing/decreasing sinusoid, and more for a noisy signal. EST can be evaluated using a demo program that implements it. There is also a paper that details the transform and its mathematical basis. Those interested to see the paper and/or the demo program, can email me at gringya atsign gmail dot com. 


Apr 2 2013, 23:47
Post
#53


Group: Members Posts: 1418 Joined: 9January 05 From: In the kitchen Member No.: 18957 
Fourierrelated transforms, like FFT, are just one way to find frequencies, and clearly not the best possible. Which, of course, depends entirely on your definition of "Frequency", something that itself is trickier than some seem to realize. QUOTE EST derives frequencies from samples and is unrelated to Fourier/FFT. What does "EST" stand for, in the first place. Does it use a complex exponential or a representation of a complex exponential? QUOTE The process of EST is deterministic, does not use nonlinear equations, and can handle noise. Which is true of the Fourier Transform, as well. QUOTE In the ideal case of a noiseless signal composed of n sinusoids, the frequencies, amplitudes and phases are precisely recovered from 3n equally spaced real samples. Sounds pretty good. What's the basis set you're using? Sounds a lot like a * sin (b *t +c) where a,b,c are the 3 samples. Not sure what "equally spaced" means here, unless you're referring to the fact you can characterize a sine wave with 3 nondegenerate points. QUOTE A noisy signal will require more samples, depending on noise level. No surprise. QUOTE Other than the minimum for the ideal case, accuracy does not depend on the number of samples (time). The additional samples for a noisy signal are needed to handle noise. EST can also transform samples into increasing/decreasing sinusoids, which is a better way to model audio. In such a case, for a noiseless signal, 4 samples are required per increasing/decreasing sinusoid, and more for a noisy signal. So it's Laplacebased instead of Fourier based, then? Instead of bombarding us with a bunch of notveryspecific qualities, why not just tell us what the basis set is, and how the analysis works? I am aware of approximately infinite (well, literally infinite but obviously I haven't generated them all!) numbers of basis sets, many of which this could describe.  
J. D. (jj) Johnston 


Apr 2 2013, 23:59
Post
#54


Group: Members Posts: 208 Joined: 31July 08 Member No.: 56508 
Yaakov, also check out the Reassigned spectrogram mode in iZotope RX. It “beats FFT” in terms of time and frequency resolution: it can precisely localize impulsive events in time and precisely display frequencies of harmonics, assuming that they do not overlap in FFT spectrum.



Apr 3 2013, 01:42
Post
#55


Group: Members Posts: 7 Joined: 1April 13 Member No.: 107483 
EST stands for Exponential Sum Transform and it uses complex exponentials.
The basis is sigma(c*b^t) where b and c are nonzero complex numbers and the set of b is distinct. If all b are on the unit circle, then it is simply a spectrum. When all b are on the unit circle and the samples are real, this becomes sigma(a*cos(b*t+c)) The samples must be equally space, not just nondegenerate. It clearly looks more like Laplace than Fourier, but a specific relation, if exists, is not known to me. As for describing the analysis, I offered to send the detailed paper. Do you prefer an informal description? 


Apr 3 2013, 05:27
Post
#56


Group: Super Moderator Posts: 3396 Joined: 26July 02 From: To: Member No.: 2796 
I think a lot of us here would be interested in a formal description, myself included. I think from what you've just said that we'll get it puzzled out though.
 1. Attack the argument, not the arguer.
2. Assume good faith. 


Apr 3 2013, 18:14
Post
#57


Group: Members Posts: 7 Joined: 1April 13 Member No.: 107483 



Apr 3 2013, 18:31
Post
#58


Group: Super Moderator Posts: 5275 Joined: 23June 06 Member No.: 32180 
If I may guess, I think he means that this site has a significant number of users who would appreciate detailed descriptions. However, that is not to stop you from providing less technical information (i.e. ‘layman’s terms’) if you want to; there are probably other users who would like that, too.



Apr 3 2013, 20:34
Post
#59


Group: Members Posts: 2100 Joined: 30November 06 Member No.: 38207 
I think I could very well use a formula or two ... point seven eighteen twentyeight ...
As for describing the analysis, I offered to send the detailed paper. Do you prefer an informal description? I think I just got one that was a bit too rough although I do suspect I have guessed the point. This post has been edited by Porcus: Apr 3 2013, 20:37 


Apr 3 2013, 22:10
Post
#60


Group: Members Posts: 7 Joined: 1April 13 Member No.: 107483 
The following link:
http://www.mediafire.com/view/?ce47jurz43wzjce is to a short document that describes the EST process for real noiseless samples. 


Apr 11 2013, 11:09
Post
#61


Group: Members Posts: 1418 Joined: 9January 05 From: In the kitchen Member No.: 18957 
Hm. Define "noiseless". Most instruments have a chaotic part of their performance that in fact is noiselike in that it does not repeat, is not entirely stationary, depends on technique, and so on.
So, I'm not quite sure I know what you mean by noiseless.  
J. D. (jj) Johnston 


Apr 11 2013, 19:33
Post
#62


Group: Members Posts: 7 Joined: 1April 13 Member No.: 107483 
The paper described the mathematical basis of EST, which uses the ideal case of perfect increasing/decreasing sinusoids.
For realistic data, EST uses different processes, that expect noise. For audio, the EST process is as follows. 1. Find linear prediction coefficients, preferably using the covariance method and not the autocorrelation method. 2. Create the linear prediction polynomial. 3. Find the roots of the linear prediction polynomial to establish the basis set of an exponential sum function, as described in the paper. 4. Use the samples and the basis set to find the coefficients of the function. The key point is that linear prediction coefficients and an exponential sum function, are equivalent, with the exponential sum function having the distinct advantage of being an analytic function with a useful structure. The mathematical basis proves this equivalence. Due to the equivalence, an exponential sum function models an audio signal with the same quality as linear prediction. You may note that the best lossless audio compressors, like OptimFROG, use linear prediction. This is a strong indication of the power of linear prediction to model audio. Since EST generates an analytic function, it is suitable for lossy audio compression, as well as other audio applications. Once EST generated an exponential sum function, you can do the following: Identify noise elements, using frequency and/or amplitude, and remove them. Identify inaudible elements, and remove them. Quantize the coefficients. Resample the audio signal, both sample rate and sample depth. And various other things. Unlike Fourier related methods, which use a predefined basis, EST uses a basis derived from the data. In short, EST for audio combines the flexibility and usefulness of an analytic function with the modeling power of linear prediction. 


Apr 11 2013, 20:36
Post
#63


Group: Members Posts: 1418 Joined: 9January 05 From: In the kitchen Member No.: 18957 
Unlike Fourier related methods, which use a predefined basis, EST uses a basis derived from the data. In short, EST for audio combines the flexibility and usefulness of an analytic function with the modeling power of linear prediction. Try applying EST to the first 30 seconds of the track "We Shall Be Happy" by Ry Cooder off the album titled "Jazz". Let me know how big your covariance matrix is, too, ok?  
J. D. (jj) Johnston 


Apr 11 2013, 21:32
Post
#64


Group: Members Posts: 7 Joined: 1April 13 Member No.: 107483 
Unlike Fourier related methods, which use a predefined basis, EST uses a basis derived from the data. In short, EST for audio combines the flexibility and usefulness of an analytic function with the modeling power of linear prediction. Try applying EST to the first 30 seconds of the track "We Shall Be Happy" by Ry Cooder off the album titled "Jazz". Let me know how big your covariance matrix is, too, ok? In a practical implementation the samples will be broken into blocks and there will be a chosen matrix size for that block size. The size of the matrix and the block size will determine accuracy and an accuracyspeed tradeoff. This is also the way it is done when using linear prediction for lossless audio compression or for speech compression. The difference is that EST returns an analytic function. 30 senconds of audio will therefore be broken into many smaller blocks, and not treated as a single block. 


Jun 4 2013, 01:51
Post
#65


Group: Members Posts: 1418 Joined: 9January 05 From: In the kitchen Member No.: 18957 
Unlike Fourier related methods, which use a predefined basis, EST uses a basis derived from the data. In short, EST for audio combines the flexibility and usefulness of an analytic function with the modeling power of linear prediction. Try applying EST to the first 30 seconds of the track "We Shall Be Happy" by Ry Cooder off the album titled "Jazz". Let me know how big your covariance matrix is, too, ok? In a practical implementation the samples will be broken into blocks and there will be a chosen matrix size for that block size. The size of the matrix and the block size will determine accuracy and an accuracyspeed tradeoff. This is also the way it is done when using linear prediction for lossless audio compression or for speech compression. The difference is that EST returns an analytic function. 30 senconds of audio will therefore be broken into many smaller blocks, and not treated as a single block. I do know how coders work, so try your EST basis on We Shall Be Happy and get back to me, ok? And tell me how many basis functions you need for that one, too. And how many are orthogonal. And then how many of those you have to code.  
J. D. (jj) Johnston 


Aug 17 2013, 11:52
Post
#66


Group: Members Posts: 22 Joined: 27November 08 Member No.: 63320 
Over 10 years ago, for my master thesis, I wrote an algorithm that determines nearly exact frequency values from an FFT transform  it can find any frequency as long as they are far enough away from each other and constant in tone and level.
The method is pretty simple: 1. Create an FFT using a window that's a lot bigger than the block of audio that you use 2. Find the highest peak in the FFT domain. This is an estimation of the loudest frequency present. 3. Write down the found frequency, phase and amplitude 4. Generate an FFT based on the found freq, phase, amp (this can be optimized for speed, since it's only a single tone). 5. Subtract a small percentage of this (I found that 510% works well) from the original FFT from step 1. 6. Go back to step 2. This gives you a whole lot of values, next you need to combine all the values that have approximately the same frequency. This can be done as follows:  If a frequency is new (no data within 0.5 FFT bin size), this is a new frequency that we haven't seen before.  Otherwise combine this new measurement with the measurement closest to it. Tones that are 1 bin apart will not be found perfectly (frequency and amplitude might be very slightly wrong), but they still clearly show up as separate signals. Tones that are 2 or more bins apart show up nearly perfectly. Test tones: Real signal (voice): Signal and it's peak data: This post has been edited by Specy: Aug 17 2013, 11:59 


Nov 4 2013, 20:15
Post
#67


Group: Members Posts: 7 Joined: 1April 13 Member No.: 107483 
Several months ago, in posts in this topic, I provided some information about my transform, EST.
I now have a document with better explanations, actual results, and charts. The link to the document is: http://www.mediafire.com/?0bprdaoop81d0cx Please note that viewing the document online will only display the text, and not the charts. It has to be downloaded to be fully viewed. As a reminder, this topic followed an article that showed that human hearing performance in finding frequencies exceeds the Fourier uncertainty limit. EST finds frequencies using a deterministic algorithm unrelated to Fourier transforms and not bound by the Fourier uncertainty principle. This shows that the results of the article are not surprising. 


LoFi Version  Time is now: 1st March 2015  02:30 