IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
pspsy source code release, Release of the pspsy, the phase space codec.
mattc
post Sep 25 2007, 09:43
Post #1





Group: Members
Posts: 28
Joined: 4-March 05
Member No.: 20358



Hi all,

I have decided to abandon my dreams of great wealth from the phase space lossy perceptual codec I described in the thread

http://www.hydrogenaudio.org/forums/index....showtopic=56069 ,

and have submitted my C++ source code as a project at Sourceforge. The project page is

http://sourceforge.net/projects/pspsy/ .

I intend this release to allow interested parties to explore the phase space framework for psychoacoustical modeling and coding. (If anyone knows of a university class studying perceptual coding, I'd certainly appreciate your letting them know about this project; this would be a good hands-on intro to the subject.) It does contain a working codec, but because I don't think it yet has its best psymodel, the code allows for a broad range of parameters and relatively straightforward incorporation of new psymodels and lossless compression schemes. Because of this, the code can be easily crashed. Future versions will contain fewer dangerous parameters and perhaps a competitive codec.

I developed this project on a Linux system, using fftw for FFTs. It may or may not compile on a Windows system, but, because it uses named pipes for its play capability, some system calls will fail.

Let me know if you need any help getting it up and running, or if you have successfully built it on your own. The source code tarball contains my contact information, and I'll be checking here frequently.

Matthew Cargo
Go to the top of the page
+Quote Post
muaddib
post Sep 25 2007, 12:33
Post #2





Group: Developer
Posts: 398
Joined: 14-October 01
Member No.: 289



You can still have some patents on this and become rich in the future wink.gif
Go to the top of the page
+Quote Post
kwanbis
post Sep 25 2007, 14:49
Post #3





Group: Developer (Donating)
Posts: 2362
Joined: 28-June 02
From: Argentina
Member No.: 2425



QUOTE (muaddib @ Sep 25 2007, 11:33) *
You can still have some patents on this and become rich in the future wink.gif

Hate patents.

"If people had understood how patents would be granted when most of today’s ideas were invented and had taken out patents, the industry would be at a complete stand-still today." Billy Gates.


--------------------
MAREO: http://www.webearce.com.ar
Go to the top of the page
+Quote Post
SebastianG
post Sep 25 2007, 15:55
Post #4





Group: Developer
Posts: 1317
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



The paper didn't catch my attention at the time you mentioned it the first time here. And to be honest, there's a lot in there that looks Chinese to me, too.

Some remarks:

In the introduction you talk about transform coding in general: v' = K Q L v with L=K^-1 a pair of linear mappings, v=original signal, v'=altered signal and a nonlinear operator Q, the quantizer which you modeled by Qs = s + a_q X where the vector X has a constant per-component variance.
However, this does not apply to most of the transform-based lossy codecs. At least not at a first glimpse. The mapping K/L is usually orthogonal (MDCT) which would render the transform step almost completely unecessary because you introduce uncorrelated noise in the transform domain with a constant per-component variance via Q which will still be uncorrelated white noise after the inverse transform. This, of course, is not what we usually want. In the case of orthogonal mappings (including identity mapping) noise shaping can only be done via introducing correlated noise in the transform domain (like TNS) and/or noise with specific per-component variance (variable quantizer step size via scalefactors).
Of course, we could move the weighting step into the transform K/L itself and make the whole transform time-varying and non-orthogonal. Maybe you had exactly this in mind. The thing is however that most of the (orthogonal) transforms that are in use (DCT, MDCT) can be calculated very efficiently and that's clearly a big advantage (fixed transform that's fast to calculate).

You mentioned 'time varying filters' here which perfectly fits into the "v'=KQLv"-world where L and K represent the time-varying "analysis" and "synthesis" filters, respectivly. Such kind of codec is nothing new. Developers of speech codecs should be familiar with this idea. Minimumphase filter estimation via LPC analysis, filter interpolation via LAR or LSF representation and the likes are well-known. If that, what you describe in your paper, shares the same idea (=time varying filters) at least your implementation might be new/superiour.

I'm not really sure what role the DFT plays in your algorithm. Maybe you just use it to derive the 'current mapping' (filters) -- maybe you use it to code the residual as well like it was an "inner" part of K and L. In the latter case it would remind me of an approach by Bernd Edler, Christof Faller, Gerald Schuller: Perceptual Audio Coding Using a Time-Varying Linear Preand Post-Filter (AES109 contribution). You might find this paper an interesting read. I certainly did. It combines really neat ideas. They alter frequency response of pre/post filters (including overall gain) to minimize pre-echos and to exploit other psychoacoustic effects like masking. The filtered signal is then transform-coded to be able to compactly represent the usually still correlated "residual" (catching highly tonal parts is more efficient that way). Filter design is eased via 'frequency warping' in a way that reflects the varying widths of critical bands....


Cheers!
Sebi
Go to the top of the page
+Quote Post
mattc
post Sep 25 2007, 20:28
Post #5





Group: Members
Posts: 28
Joined: 4-March 05
Member No.: 20358



muaddib,
Really? I was under the impression that I was waiving my patent rights by releasing the code as open source. If not, I would have done so much earlier.

Sebi,
I had been hoping you'd weigh in on this. I just looked at the Edler, et al, paper, and I can confirm that ours fall within the same class of codecs and that, yes, the DFT is an "inner" transformation in my implementation. Both schemes decouple the noise shaping and lossless stages of the codec, or as they put it in the paper, the irrelevancy and the redundancy stages.

Perhaps I'm just struggling with a language difference, but I don't see how they can practically or rigorously invert time varying filters without using/knowing about the Moyal symbol calculus. (It would help if there were formulas in their paper; I don't speak flow diagram.) Maybe the critical difference is that my time varying filters are smoothly varying in time, but theirs vary discontinuously? More generally, I don't think there are any other coding schemes where the time-varying filter varies smoothly in time.

I think all lossy codecs fall within the v'=QL; v''=Kv' formulation. For example, in the simplest version, you form L=SF, where F is an matrix containing a series of NxN DFT submatrices and S is a matrix of scalefactors, i.e., a series of N-element diagonal matrices. Such L is difficult to interpret in phase space, because it varies discontinuously from block to block. As far as I can tell, the more sophisticated codecs change S to a more complicated object, but keep the F in place, or change it to a series of MDCTs. Such transformations become increasingly difficult to interpret mathematically (i.e., in phase space), because they leave in place the jagged F operator.

Off topic, but I've been working on a set of notes on the MDCT, because I was trying to understand what is special about it. (Chanting "TDAC" didn't convince me.) Specifically, I wanted to understand the MDCT as an infinity by infinity mapping, rather than a series of isolated mappings from R^{2N} to R^N. The notes are included in the pspsy release, but if there is interest, I could post them here for comment.

This post has been edited by mattc: Sep 25 2007, 20:39
Go to the top of the page
+Quote Post
Leto Atreides II
post Sep 25 2007, 21:16
Post #6





Group: Members
Posts: 163
Joined: 13-January 02
From: Eugene, OR
Member No.: 1009



Just because you release the source of something does not mean it is not patented. As an example, think of reference MPEG code. Anyone can get it, but the algorithms are still patented. Or see LAME -- it is open source, but MP3 is patented. You are perhaps confusing copyright and patent law. They are not the same at all.
Go to the top of the page
+Quote Post
mattc
post Sep 25 2007, 22:01
Post #7





Group: Members
Posts: 28
Joined: 4-March 05
Member No.: 20358



I should have been more precise. As I understand it, a patent is a "hunting license", so that, if I followed through on the provisional patent, I could use it to sue anyone profiting from it. So, for example, if a telco started using the idea, I could troll them. What I meant was that, by releasing the code this way, I am waiving any direct profit from the codec software.

This is all moot anyway, because I don't think there's profit anymore in lossy codecs.
Go to the top of the page
+Quote Post
SebastianG
post Sep 25 2007, 22:32
Post #8





Group: Developer
Posts: 1317
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



Hi, matt,

QUOTE (mattc @ Sep 25 2007, 21:28) *
Perhaps I'm just struggling with a language difference, but I don't see how they can practically or rigorously invert time varying filters without using/knowing about the Moyal symbol calculus. (It would help if there were formulas in their paper; I don't speak flow diagram.) Maybe the critical difference is that my time varying filters are smoothly varying in time, but theirs vary discontinuously? More generally, I don't think there are any other coding schemes where the time-varying filter varies smoothly in time.

Sure there are. Most common: Speech codecs (like Speex). Although, the filter in use serves a slightly different purpose compared to the Edler et al approach. In both cases you deal with minimum-phase filters that are usually easily inverted. Also, this can be combined with smooth filter interpolation. As long as you apply the same filter interpolation in both tools, encoder and decoder, you'll be fine and there'll be no loss except for the usual round-off errors during limited precision arithmetic. You might wanna read up on the whole linear prediction stuff.

QUOTE (mattc @ Sep 25 2007, 21:28) *
I think all lossy codecs fall within the v'=QL; v''=Kv' formulation.

You're right. I just didn't think about including the scalefactor thing into the transform at first. It depends on how you look at it. Is the scaling part of the transform or part of the quantizer? I always associated the quantizer with it. ;-)

QUOTE (mattc @ Sep 25 2007, 21:28) *
Off topic, but I've been working on a set of notes on the MDCT, because I was trying to understand what is special about it. (Chanting "TDAC" didn't convince me.) Specifically, I wanted to understand the MDCT as an infinity by infinity mapping, rather than a series of isolated mappings from R^{2N} to R^N. The notes are included in the pspsy release, but if there is interest, I could post them here for comment.

What helped me to understand the MDCT (applied on the whole signal in smaller blocks) and what's actually happening there is to separate the whole transform into two isolated steps -- namely some pre/post rotation butterflies and non-overlapping DCTs of type 4. The "butterflies" are responsible for the overlapping part. The post rotation butterflies change the DCT4's basis functions (discontinuities at block edges) to smooth block-overlapping functions in the decoder. On the other hand, the pre/post rotations are responsible for the so called time-domain aliasing which limits the benefits of TNS as you already mentioned. Unfortunately, I can't recommend any books on that subject. Mr. Woodinville would suggest Malvar's book on lapped transforms, I guess. (I havn't read it)

Cheers!
SG

This post has been edited by SebastianG: Sep 25 2007, 23:17
Go to the top of the page
+Quote Post
mattc
post Sep 26 2007, 02:27
Post #9





Group: Members
Posts: 28
Joined: 4-March 05
Member No.: 20358



Sebi,

I think my attempt to speak DSP is getting in the way of my precision. I should have said that I don't know of any codecs use pseudodifferential (slowly-varying) operators. These operators have matrices of limited extent off their diagonal, as well as some other properties. LPC filters, even when interpolated to be smooth in the time domain, are not slowly varying, because they are also causal. Either the LPC or its inverse has a long (backward) tail, and the long one is performed by updating a series of values. (This is how instability arises.)

The upshot is the LPC filters have a different phase space interpretation than the slowly varying operators, and design considerations differ.

----

Regarding the second, whether S is part of the quantizer or the transformation, this is the old active/passive distinction that always leads to confusion. I was long ago recruited into the active camp.

-----
I was trying to understand how it is valid to use the MDCT transform for analysis. I was taught to interpret bases in phase space, and the overlapping here tangles up the interpretation. Also I was worried about the kernel of the individual MDCT's.
Go to the top of the page
+Quote Post
mattc
post Sep 26 2007, 07:58
Post #10





Group: Members
Posts: 28
Joined: 4-March 05
Member No.: 20358



Sebi,
One more thing, I've read the documentation on how Speex interpolates LPCs, but because there are no formulas, I don't think I can write down exactly what they are doing. For example, you said, "As long as you apply the same filter interpolation in both tools, encoder and decoder" you are fine, but this can't be literally true, since using the same filter again wouldn't invert the first. So it probably means interpolating the inverse filters, but there are various ways one could do this. If you point me to a formula, or write one down, I could talk more concretely about the differences between operator inversion in these two methods.
Go to the top of the page
+Quote Post
SebastianG
post Sep 26 2007, 09:06
Post #11





Group: Developer
Posts: 1317
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



QUOTE (mattc @ Sep 26 2007, 08:58) *
If you point me to a formula, or write one down, I could talk more concretely about the differences between operator inversion in these two methods.

In Speex the analysis filter is a minimum-phase FIR filter. The finite impulse response is given by the vector [1 a b c d e ...] where the variables a,b,c,... are derived and interpolated via the usual LPC related algorithms. It's possible to change'em on a per-sample basis. That's why I'm going to use the index 'n' for those coefficients as well:

Analysis (encoder):
y_n = x_n + a_n x_{n-1} + b_n x_{n-1} + c_n x_{n-2} + d_n x_{n-3} + ...

Synthesis (decoder):
Solve the above for x_n

where x_n refers to the n-th input sample and y_n to the n-th output sample. The index n could be a variable of a forward for-loop.

Some care has to be taken to keep the inverse filter stable: The roots of the polynomial 1+a_n*z^-1+b_n*z^-2+c_n*z^-3 (ie the analysis filter's 'zeros' = the synthesis filter's 'poles') have to be completely inside the unit circle. There are some tricks to ensure that including using the line spectral frequency representation of the analysis filter for interpolation. In the Edler et al paper they use 'frequency-warped' filters which results in slightly more complicated IIR filters for both, the analysis and synthesis stage.

Cheers!
SG

This post has been edited by SebastianG: Sep 26 2007, 09:15
Go to the top of the page
+Quote Post
mattc
post Sep 26 2007, 11:28
Post #12





Group: Members
Posts: 28
Joined: 4-March 05
Member No.: 20358



Sebi,
Alright, good. You are saying

Y = MX,

where M is a lower triangular matrix with ones along the diagonal and with at most N non-zero elements to the left of each one. That it is lower triangular means that the filter is causal, and as you say, this property is crucial for doing a running inversion as

X = Y - (M - I)X.

If you were to recurse this equation, you'd end up with a matrix M^{-1} that is causal but has an infinite extent (I think), and stability becomes an issue.

I remember proving (modulo a few subtleties) using contour integrations those theorems about where the poles need to be for stability, but they don't apply to when the rows of M aren't identical, i.e., a time varying LPC. I'm not sure how people generalize the transfer function formalism when working with time varying filters, but the Weyl correspondence might prove useful. Intuitively, I would guess that you would have a problem with stable inversion when the Weyl symbol has zeros, because the first term in the expansion for the symbol of the inverse operator is just one over the symbol, and 1/0=trouble.

In contrast, the matrix M in my method is not lower trangular, but banded and symmetric about the diagonal. The above trick for inversion doesn't work. You can't invert by brute force because the matrix is too big. The only way (that I know) to do it is to use the formula for the symbol of a function of an operator.

The advantage of using such a matrix is that it has a very clear interpretation in phase space. In fact, in the codec, the K operator is defined in phase space by the desired shaped noise profile. Intuitively, applying this operator to a function will multiply the function's own time-frequency representation by the shaped noise profile. In the encoding stage, the input sample's time frequency representation is divided by a smoothed version of itself. Thus, the codec generalizes the (localized) frequency-space division by scale factors to a division in phase space.

Cheers indeed, in this wine-influenced reply,
Matt
Go to the top of the page
+Quote Post
SebastianG
post Sep 27 2007, 13:07
Post #13





Group: Developer
Posts: 1317
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



QUOTE (mattc @ Sep 26 2007, 12:28) *
If you were to recurse this equation, you'd end up with a matrix M^{-1} that is causal but has an infinite extent (I think), and stability becomes an issue.

Infinite extent, yes.

QUOTE (mattc @ Sep 26 2007, 12:28) *
I remember proving (modulo a few subtleties) using contour integrations those theorems about where the poles need to be for stability, but they don't apply to when the rows of M aren't identical, i.e., a time varying LPC. I'm not sure how people generalize the transfer function formalism when working with time varying filters, ...

I guess this is usually ignored and/or handled in the same way one avoids instability due to limited precision arithmetics (i.e. proper windowing of the autocorrelation coefficients so that poles are not too close to the unit circle).

QUOTE (mattc @ Sep 26 2007, 12:28) *
but the Weyl correspondence might prove useful. Intuitively, I would guess that you would have a problem with stable inversion when the Weyl symbol has zeros, because the first term in the expansion for the symbol of the inverse operator is just one over the symbol, and 1/0=trouble.

I'd have to invest more time on this subject to be able to follow you here.

QUOTE (mattc @ Sep 26 2007, 12:28) *
In contrast, the matrix M in my method is not lower trangular, but banded and symmetric about the diagonal. The above trick for inversion doesn't work.

...which looks like a big disadvantage to me. What can you tell us about the complexity of encoding/decoding? Has your inverse matrix a finite extent (=only a finite number of non-zero coefficients per row near the diagonal)? What's your typical setting for the number of non-zero subdiagonals (how many bands) and how does this number affect time complexity?

QUOTE (mattc @ Sep 26 2007, 12:28) *
The advantage of using such a matrix is that it has a very clear interpretation in phase space. .... Thus, the codec generalizes the (localized) frequency-space division by scale factors to a division in phase space.

I'm not sure whether this is notably superior to what Edler et al described. Isn't the only difference that you use "time-varying linear-phase filters" and have to invert the process somehow differently because the matrix is not lower triangular?

Cheers!
SG

This post has been edited by SebastianG: Sep 27 2007, 15:16
Go to the top of the page
+Quote Post
mattc
post Sep 27 2007, 20:47
Post #14





Group: Members
Posts: 28
Joined: 4-March 05
Member No.: 20358



QUOTE (SebastianG @ Sep 27 2007, 05:07) *
...which looks like a big disadvantage to me. What can you tell us about the complexity of encoding/decoding? Has your inverse matrix a finite extent (=only a finite number of non-zero coefficients per row near the diagonal)? What's your typical setting for the number of non-zero subdiagonals (how many bands) and how does this number affect time complexity?

Both the key and the lock are, practically speaking, banded. Strictly, they aren't, but the coefficient's fall to negligible soon enough; for the key, I keep about 128 and, for the lock, about 200. So yes, this is more costly than the usual lossy coding, because the O(log N) per sample scaling there happens to be smaller than the O(1) here, due to the large coefficient of order. Importantly, however, the decoding does run faster than real time, even in my clunky code.

QUOTE (SebastianG @ Sep 27 2007, 05:07) *
I'm not sure whether this is notably superior to what Edler et al described. Isn't the only difference that you use "time-varying linear-phase filters" and have to invert the process somehow differently because the matrix is not lower triangular?


I'm not sure I would attach "linear-phase" to my method without knowing what that means in the time-varying case.

Actually, at this point, we are comparing apples and oranges, because the LPC coding effects irrelevancy and redundancy reduction in the same step, whereas my codec is more similar in concept to, and a generalization of, the usual processing by dividing scale factors in blocks. The difference from the latter is that my scheme gives a noise profile that has a continuous shape in phase space, and is therefore, I think, more like the actual masking mechanism in the ear.

I would like to stress that. This coding scheme begins with a setting for masking theory which postulates that the signal-dependent threshold of noise is a slowly varying function on phase space. I am not using pseudodifferential operators just because; they follow directly from that postulate. This setting for masking theory is very broad in the variety of functions it can accept, and I have a strong intuition, backed by several experimental findings, that the superior theory (yet to be fully realized) falls within its scope. If that is true, this scheme can optimize irrelevancy reduction.

That said, I think I could compare LPC with mine conceptually if I understood the phase space interpretation time-varying LPC coding and its shaped noise. However, I believe these LPC operators falls into a class of operators less well treated by phase space methods. From what I remember, the operators are more similar to unitary operators, which often vary rapidly in phase space. ( Such operators are understood in terms of Lagrangian manifolds in a doubled phase space.) Let me take a look at it again.

Matt

This post has been edited by mattc: Sep 27 2007, 20:50
Go to the top of the page
+Quote Post
benski
post Sep 27 2007, 21:04
Post #15


Winamp Developer


Group: Developer
Posts: 670
Joined: 17-July 05
From: Brooklyn, NY
Member No.: 23375



As long as you don't release under the newly-made GPL v3 (GPL v2 and older are OK), then you are not waiving any patent rights by open-sourcing your implementation.
Go to the top of the page
+Quote Post
mattc
post Sep 27 2007, 21:59
Post #16





Group: Members
Posts: 28
Joined: 4-March 05
Member No.: 20358



Sebi,
Correction, since you'll jump on me for this. I said that LPC joins the redundancy and irrelevancy steps, and this is usually true. But in the Edler paper, they decouple the two and use LPC coding to shape the quantization noise. Reading their paper, it is clear they are on the same track. They understand that the noise profile lives in phase space. However, since they don't understand or know of the Weyl calculus, they choose the wrong set of operators to implement their intention, leading to artifacts and a series of increasingly complex work-arounds. Sorry to sound cocky here, but this was the primary focus of my PhD research; my research group strongly believes that if you don't know about symbol correspondences, you're fumbling in the dark when it comes to time-frequency analysis.

benski,
Whew. Just checked: version 2.
Go to the top of the page
+Quote Post
SebastianG
post Sep 28 2007, 09:02
Post #17





Group: Developer
Posts: 1317
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



matt,

maybe you should try to get attention using other channels as well. I'm certainly not qualified enough to say anything about "phase space", "weyl calculus", "symbol correspondence" etc and the motivation for doing the work needed so I can follow you here is close to zero, to be honest.

The problem is, that to someone like me the advantages of your approach you mentioned and expressed using only terms I associate no meaning with don't really look like advantages.

Cheers!
SG
Go to the top of the page
+Quote Post
[JAZ]
post Sep 28 2007, 13:29
Post #18





Group: Members
Posts: 1764
Joined: 24-June 02
From: Catalunya(Spain)
Member No.: 2383



QUOTE (benski @ Sep 27 2007, 22:04) *
As long as you don't release under the newly-made GPL v3 (GPL v2 and older are OK), then you are not waiving any patent rights by open-sourcing your implementation.



This always makes me laugh:

Excerpt from the GPL V2 license:

QUOTE
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.

Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version
published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version
ever published by the Free Software
Foundation.


(Emphasis mine).
Go to the top of the page
+Quote Post
Garf
post Sep 28 2007, 14:46
Post #19


Server Admin


Group: Admin
Posts: 4883
Joined: 24-September 01
Member No.: 13



QUOTE
' date='Sep 28 2007, 14:29' post='519677']
QUOTE (benski @ Sep 27 2007, 22:04) *

As long as you don't release under the newly-made GPL v3 (GPL v2 and older are OK), then you are not waiving any patent rights by open-sourcing your implementation.



This always makes me laugh:

Excerpt from the GPL V2 license:

QUOTE
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.

Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version
published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version
ever published by the Free Software
Foundation.


(Emphasis mine).


You may choose, as the redistributor. But since the redistributor holds no patent rights, it can never be a waiver. You can't waive what you hold no rights to. The question is if the author/patent owner waives his rights. The GPLv2, which he used, doesn't. The fact that you can redistribute under GPLv3 doesn't change that.

One could say the license does not apply to the original copyright holder. For example, I can distribute my program with a license that says you can't distribute it. Makes perfect sense and is even typical.
Go to the top of the page
+Quote Post
mattc
post Sep 28 2007, 21:35
Post #20





Group: Members
Posts: 28
Joined: 4-March 05
Member No.: 20358



Sebi,
We seem to be at an impasse. I will try to think of a way to convince you that the Weyl calculus is a valuable thing to know, but I'm not confident it will work. I know Cohen has spent considerable effort promoting time-frequency distributions (TFD) for DSP, but with evidently little wide-scale success.

I have my own theory about why this is. For years, computers were so slow that the only way to do DSP was to use the fast Fourier transform, so processing algorithms naturally involved various combinations of windows, FFTs, and inverse FFTs. A certain mindset comes with that. You know, when all you have is a hammer . . .

There are other reasons for not wanting to think in phase space (time-frequency here).
1. There's another variable to worry about, and functions of two variables are more complicated and take more space than those of one.
2. To the inexperienced, some things you'd like to do with TFD's just don't work. For example, the Wigner function is in many ways the best phase space representation of a signal, but it is also among the most complicated. Generic transformations of it do not produce another Wigner function, i.e., it won't correspond to a signal anymore. There's a literature on pattern matching, I think it is called, that attempts to form a signal with a given TFD, but the problem is usually ill posed, for the above reason. The upshot is that spectrograms and Wigner functions---which people encounter first in DSP---may be useful for understanding a signal but aren't much help in processing it.

In physics, the problems are such that phase space (position and momentum, here) methods are older and more accepted. They are used to understand the asymptotic form of eigenfunctions of quantum mechanical Hamiltonians, and to compare their features with the dynamics of the underlying classical system. The phase space representation is equivariant under an important group of operators, the inhomogeneous metaplectic group, which gives us confidence that we have a representation-independent way of thinking about waves. (The metaplectic group has as a subgroup the fractional Fourier transforms, which I see mentioned sometimes in signal processing.)

In audio signal, the waves themselves are a complex mixture of resonant harmonics and they have a much more complicated structure in phase space. They aren't solutions to a small eigenvalue problem. Again, this reinforces the idea semiclassical concepts won't help much for DSP.

The way out is to realize that operators also have a phase space interpretation, and that there is a large and useful class of operators, the pseudodifferential operators, which can be practically understood and manipulated using phase space techniques. Just from looking at the matrix of a pseudodifferential operator, you wouldn't be able to guess its effect, just as (say) you couldn't understand the stability of an LPC filter just from looking at its coefficients. In phase space however, semiclassics gives us a clear, intuitive picture of what it will do to any wavelike function. Thus, the phase space formalism is an attractive setting for the design of time-varying filters.

That's about it for now. I'll stress is that the phase space setting is the best way to understand time-frequency uncertainty. I see a lot of naive discussion about "phase" that would benefit from this knowledge. Finally, as an intellectual matter, the mathematics is elegant and connected to a broad range of topics in symplectic geometry, wave problems and quantization.

Garf, [Jaz],
I take it you don't think that part of the GPL would hold up in court. Has the GPL ever been put to a serious legal challenge?

Matt
Go to the top of the page
+Quote Post
jmvalin
post Sep 29 2007, 11:15
Post #21


Xiph.org Speex developer


Group: Developer
Posts: 475
Joined: 21-August 02
Member No.: 3134



QUOTE (mattc @ Sep 26 2007, 15:58) *
Sebi,
One more thing, I've read the documentation on how Speex interpolates LPCs, but because there are no formulas, I don't think I can write down exactly what they are doing. For example, you said, "As long as you apply the same filter interpolation in both tools, encoder and decoder" you are fine, but this can't be literally true, since using the same filter again wouldn't invert the first. So it probably means interpolating the inverse filters, but there are various ways one could do this. If you point me to a formula, or write one down, I could talk more concretely about the differences between operator inversion in these two methods.


It's the same filter, it's just applied as either an IIR or an FIR. On analysis, I apply A(z) and on synthesis, I apply 1/A(z). As for interpolation, it's done in the LSP domain. Doing it in the LPC domain is bad, and potentially (I think) unstable.
Go to the top of the page
+Quote Post
mattc
post Sep 29 2007, 11:38
Post #22





Group: Members
Posts: 28
Joined: 4-March 05
Member No.: 20358



QUOTE (jmvalin @ Sep 29 2007, 03:15) *
It's the same filter, it's just applied as either an IIR or an FIR. On analysis, I apply A(z) and on synthesis, I apply 1/A(z). As for interpolation, it's done in the LSP domain. Doing it in the LPC domain is bad, and potentially (I think) unstable.

I've been thinking of a filter as a type of operator or matrix, but, from your first sentence, it seems that there is a subtle difference in terminology here, like a filter is an object that can be applied various ways to a signal and describes a set of operators rather than one. I'll have to be more careful with that word.

Thanks for clarifying the interpolation point. I'll have to think about why interpolating in the LSP domain fixes the instability: I would have guessed that stability would have required the interpolation to have been rounded off so that it operates in the sample domain by a set of integer coefficients.
Go to the top of the page
+Quote Post
SebastianG
post Sep 29 2007, 12:49
Post #23





Group: Developer
Posts: 1317
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



QUOTE (mattc @ Sep 29 2007, 12:38) *
Thanks for clarifying the interpolation point. I'll have to think about why interpolating in the LSP domain fixes the instability

Huh? If done right it guarantees poles to be inside the unit circle which means that in case of a non-varying filter it's stable. I thought you were questioning whether this is still enough in case of varying filters since the stability proof would require the filter to be time-invariant as you said (in your terms: the matrix K being a banded Toeplitz matrix). ... which is a good question, really. I don't know the answer to that but I think we are very successfull in avoiding possible instability (also due to limited precision arithmetic) by enforcing some additional constraints in the LSP/LSF domain (the representation's "spectral frequencies" must not be too close to each other).

Cheers!
SG

This post has been edited by SebastianG: Sep 29 2007, 20:53
Go to the top of the page
+Quote Post
mattc
post Sep 29 2007, 20:52
Post #24





Group: Members
Posts: 28
Joined: 4-March 05
Member No.: 20358



QUOTE (SebastianG @ Sep 29 2007, 04:49) *
Huh? If done right it guarantees poles to be inside the unit circle which means that in case of a non-varying filter it's stable. I've been telling you. I thought you were questioning whether this is still enough in case of varying filters since the stability proof would require the filter to be time-invariant as you said (in your terms: the matrix K being a banded Toeplitz matrix). ... which is a good question, really. I don't know the answer to that but I think we are very successful in avoiding possible instability (also due to limited precision arithmetic) by enforcing some additional constraints in the LSP/LSF domain (the representation's "spectral frequencies" must not be too close to each other).

Cheers!
SG

Right, we have two issues going on here. In the last, I was just responding to jmvalin's statement that the interpolation for Speex is done between LSP sets rather than LPC sets, because the latter method has potential instabilities.

Am I correct to think of Speex as belonging more to the standard use of LPC (like flac), as compared to using the filters just for irrelevancy reduction, as in Edler?
Go to the top of the page
+Quote Post
SebastianG
post Sep 29 2007, 21:05
Post #25





Group: Developer
Posts: 1317
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



QUOTE (mattc @ Sep 29 2007, 21:52) *
Am I correct to think of Speex as belonging more to the standard use of LPC (like flac), as compared to using the filters just for irrelevancy reduction, as in Edler?

Yes. The main purpose is to decorrelate consecutive samples (->near white residual) with the effect that the inverse filter models the signal's 'color'. Advantage: Decorrelated samples are easily coded. Disadvantage: It's tricky to do noise shaping exploiting psychoacoustic effects -- if you qnautize the white residual you should usually introduce non-white quantization noise.

The Edler approach is different in that the filter models the current spectral/temporal threshold of hearing. You'll end up with a still correlated signal you should exploit somehow further but quantization is of course easier this way since you only need the q-noise to be white so it gets shaped by the synthesis filter to follow the thresholds of hearing.

Cheers!
SG

This post has been edited by SebastianG: Sep 29 2007, 21:06
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 29th July 2014 - 07:54