IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Low latency codecs
knutinh
post Apr 27 2011, 07:35
Post #1





Group: Members
Posts: 569
Joined: 1-November 06
Member No.: 37047



What are the choices for low latency codecs, and what is the fundamental information/perceptual trade-off?

I see that large block sizes are needed to do operations on narrow frequency bands, suitable for some masking stuff, but exploiting temporal correlation could be done against a historical reference (not introducing significant delay)?

-k
Go to the top of the page
+Quote Post
saratoga
post Apr 27 2011, 07:51
Post #2





Group: Members
Posts: 4859
Joined: 2-September 02
Member No.: 3264



QUOTE (knutinh @ Apr 27 2011, 02:35) *
but exploiting temporal correlation could be done against a historical reference (not introducing significant delay)?


Most codecs don't really do this though, since its quite difficult in practice.
Go to the top of the page
+Quote Post
knutinh
post Apr 27 2011, 08:56
Post #3





Group: Members
Posts: 569
Joined: 1-November 06
Member No.: 37047



QUOTE (saratoga @ Apr 27 2011, 08:51) *
Most codecs don't really do this though, since its quite difficult in practice.

http://en.wikipedia.org/wiki/DPCM
patented in 1950?

-k
Go to the top of the page
+Quote Post
C.R.Helmrich
post Apr 27 2011, 12:06
Post #4





Group: Developer
Posts: 686
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



QUOTE (knutinh @ Apr 27 2011, 08:35) *
...exploiting temporal correlation could be done against a historical reference (not introducing significant delay)?

Correct. But low-delay codecs are mostly used in communication scenarios where you might lose a frame during transmission. So if you lose a frame, your history is corrupted, and you're in trouble until the next history reset. Just like in video coding, by the way, where you get nasty blocking artifacts until the next I frame (which might take seconds).

Chris


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
Garf
post Apr 27 2011, 13:10
Post #5


Server Admin


Group: Admin
Posts: 4883
Joined: 24-September 01
Member No.: 13



In the case of AAC, both Main and LTP profile went nowhere, so I guess the gain wasn't very high. HE-AAC uses some differential coding in the HE part. It certainly doesn't seem that easy if you look at how little used it is.

Opus uses a differential coding of band energy, but constructed so packet loss can be recovered from after a few frames.

The main problem of small block sizes is not so much the inability to operate on narrow bands (the critical bands usually group many FFT lines together anyway), but the problem that tonal signals start to leak into several adjacent frequency bands. Because they require a high SMR, you suddenly have more big coefficients that you must code accurately. And that hurts.
Go to the top of the page
+Quote Post
NullC
post Apr 28 2011, 03:08
Post #6





Group: Developer
Posts: 200
Joined: 8-July 03
Member No.: 7653



QUOTE (knutinh @ Apr 26 2011, 22:35) *
What are the choices for low latency codecs, and what is the fundamental information/perceptual trade-off?
I see that large block sizes are needed to do operations on narrow frequency bands, suitable for some masking stuff, but exploiting temporal correlation could be done against a historical reference (not introducing significant delay)?


Masking is a very fuzzy thing. As Garf said, it's not masking that gets you, it's the fact that you lose coding gain.

You can do very narrow frequency domain operations from the temporal domain. It's all the same thing, after all. E.g. in Opus CELT mode we have a single backwards looking predictor, but it's only really useful for highly harmonic signals. For those signals it's helps a lot, otherwise it's not really useful.

Backwards-looking prediction of multiple components has the problem that they're not easily separable. For a signal with multiple strong inharmonic tones there would need to be multiple offsets. Consider the problem of a single video block which has separate components moving in different directions at once. You could try to separate the block into over-complete components and then predict those but the separation process is computationally hard and might not be possible to do well with low latency. (Or at least we found that short-time low latency sinusoidal coding didn't appear to work so well, though we perhaps didn't try hard enough this time around as it seemed out of the question computational cost wise)

Making more complicated prediction robust against loss is also quite tricky/impossible, and virtually all low latency applications need to deal with loss (if you can retransmit you probably don't need a low latency codec!)

There are some neat things that could be done if you don't care much about robustness... E.g. http://ieeexplore.ieee.org/xpl/freeabs_all...rnumber=5413930 But techniques that end up resulting in NxN matrix multiplies or approximations therein work a heck of a lot better (complexity wise) for separable 8x8 pixel blocks than they do for 240 sample audio frame.

This post has been edited by NullC: Apr 28 2011, 03:49
Go to the top of the page
+Quote Post
saratoga
post Apr 28 2011, 03:21
Post #7





Group: Members
Posts: 4859
Joined: 2-September 02
Member No.: 3264



QUOTE (knutinh @ Apr 27 2011, 03:56) *
QUOTE (saratoga @ Apr 27 2011, 08:51) *
Most codecs don't really do this though, since its quite difficult in practice.

http://en.wikipedia.org/wiki/DPCM
patented in 1950?


Try encoding real music losslessly with DPCM and see how much compression you get. You'll see why I said its "difficult".
Go to the top of the page
+Quote Post
knutinh
post Apr 28 2011, 09:55
Post #8





Group: Members
Posts: 569
Joined: 1-November 06
Member No.: 37047



QUOTE (saratoga @ Apr 28 2011, 04:21) *
Try encoding real music losslessly with DPCM and see how much compression you get. You'll see why I said its "difficult".

This confuses me. Is not "all" music lowpass in nature (at least as a long-term statistic)? Is not DPCM practically a high-pass pre-whitening filter/low-pass predictor?

If music in general is somewhat predictable (just like the weather tends to be like the weather the day before), I would have guessed that a simple, primitive predictor would be better than nothing.

-k

This post has been edited by knutinh: Apr 28 2011, 10:05
Go to the top of the page
+Quote Post
knutinh
post Apr 28 2011, 10:04
Post #9





Group: Members
Posts: 569
Joined: 1-November 06
Member No.: 37047



QUOTE (NullC @ Apr 28 2011, 04:08) *
Masking is a very fuzzy thing. As Garf said, it's not masking that gets you, it's the fact that you lose coding gain.

Is it possible to say something about how much of the compression is related to pure source-coding, and how much is relying on psycho-acoustically guided lossy coding?

FLAC can do 2:1 lossless encoding, while AAC can do 10:1 or whatever perceptually "as good as lossless" encoding. Can one assume that the first 50% of the AAC compression stems from source redundancy, while the remaining factor stems from (hopefully) irrelevancy?

-k
Go to the top of the page
+Quote Post
Garf
post Apr 28 2011, 11:30
Post #10


Server Admin


Group: Admin
Posts: 4883
Joined: 24-September 01
Member No.: 13



QUOTE (knutinh @ Apr 28 2011, 11:04) *
Can one assume that the first 50% of the AAC compression stems from source redundancy, while the remaining factor stems from (hopefully) irrelevancy?


Yes, that looks correct. Another example: MPEG-4 SLS can act as a lossless AAC coder, and when it does, it achieves ratios comparable to classic lossless codecs.
Go to the top of the page
+Quote Post
Garf
post Apr 28 2011, 11:35
Post #11


Server Admin


Group: Admin
Posts: 4883
Joined: 24-September 01
Member No.: 13



QUOTE (knutinh @ Apr 28 2011, 10:55) *
This confuses me. Is not "all" music lowpass in nature (at least as a long-term statistic)? Is not DPCM practically a high-pass pre-whitening filter/low-pass predictor?

If music in general is somewhat predictable (just like the weather tends to be like the weather the day before), I would have guessed that a simple, primitive predictor would be better than nothing.


"Better than nothing" is still a far cry from what the codecs achieve now. The T/F transformations they use also exploits the property you mentioned, and hence temporal correlation (but seen from a frequency perspective). Getting more out of that by exploiting correlation between transformed blocks is difficult.
Go to the top of the page
+Quote Post
2Bdecided
post Apr 28 2011, 12:03
Post #12


ReplayGain developer


Group: Developer
Posts: 5060
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



NICAM.

But it's hardly efficient! wink.gif

Cheers,
David.
Go to the top of the page
+Quote Post
_mē_
post Apr 28 2011, 15:00
Post #13





Group: Members
Posts: 231
Joined: 6-April 09
Member No.: 68706



QUOTE (Garf @ Apr 28 2011, 12:30) *
QUOTE (knutinh @ Apr 28 2011, 11:04) *
Can one assume that the first 50% of the AAC compression stems from source redundancy, while the remaining factor stems from (hopefully) irrelevancy?


Yes, that looks correct. Another example: MPEG-4 SLS can act as a lossless AAC coder, and when it does, it achieves ratios comparable to classic lossless codecs.


A bit OT, but is there any public SLS encoder available?
Go to the top of the page
+Quote Post
knutinh
post Apr 28 2011, 15:07
Post #14





Group: Members
Posts: 569
Joined: 1-November 06
Member No.: 37047



QUOTE (Garf @ Apr 28 2011, 12:35) *
QUOTE (knutinh @ Apr 28 2011, 10:55) *
This confuses me. Is not "all" music lowpass in nature (at least as a long-term statistic)? Is not DPCM practically a high-pass pre-whitening filter/low-pass predictor?

If music in general is somewhat predictable (just like the weather tends to be like the weather the day before), I would have guessed that a simple, primitive predictor would be better than nothing.


"Better than nothing" is still a far cry from what the codecs achieve now. The T/F transformations they use also exploits the property you mentioned, and hence temporal correlation (but seen from a frequency perspective). Getting more out of that by exploiting correlation between transformed blocks is difficult.

So a more precise answer to my initial questions would perhaps be:
1. Lossless audio compression is usually possible, and will give you 2:1 or so for a substantial delay
2. Lossy audio compression is usually possible and may give you 10:1 or so for a substantial delay
3. Very low latency audio compression is usually possible but will either give very poor compression (lossless) or very poor quality:bitrate (lossy)
Go to the top of the page
+Quote Post
Garf
post Apr 28 2011, 16:07
Post #15


Server Admin


Group: Admin
Posts: 4883
Joined: 24-September 01
Member No.: 13



QUOTE (_mē_ @ Apr 28 2011, 16:00) *
A bit OT, but is there any public SLS encoder available?


There is one in the MPEG reference sources. Those aren't free, of course.
Go to the top of the page
+Quote Post
googlebot
post Apr 28 2011, 16:10
Post #16





Group: Members
Posts: 698
Joined: 6-March 10
Member No.: 78779



QUOTE (knutinh @ Apr 28 2011, 15:07) *
3. Very low latency audio compression is usually possible but will either give very poor compression (lossless) or very poor quality:bitrate (lossy)


The last HA listening test tells a different story (see Opus).
Go to the top of the page
+Quote Post
Garf
post Apr 28 2011, 16:13
Post #17


Server Admin


Group: Admin
Posts: 4883
Joined: 24-September 01
Member No.: 13



QUOTE (knutinh @ Apr 28 2011, 16:07) *
3. Very low latency audio compression is usually possible but will either give very poor compression (lossless) or very poor quality:bitrate (lossy)


I think you can have efficient low-latency lossless compression. I don't see why a coder with a backwards predictor (Monkey Audio, MPEG ALS with -z mode) wouldn't do well. As explained in the posts there, the problem is that it cannot recover well from packet loss, which tends to go hand in hand with low-latency operation.

For pure lossy codecs, I don't think there is a good way around the coding gain issues.
Go to the top of the page
+Quote Post
Garf
post Apr 28 2011, 16:15
Post #18


Server Admin


Group: Admin
Posts: 4883
Joined: 24-September 01
Member No.: 13



QUOTE (googlebot @ Apr 28 2011, 17:10) *
QUOTE (knutinh @ Apr 28 2011, 15:07) *
3. Very low latency audio compression is usually possible but will either give very poor compression (lossless) or very poor quality:bitrate (lossy)


The last HA listening test tells a different story (see Opus).


That depends on what you define as very low latency. Opus can work at much lower latency (5ms) than what used in that test (22ms), but at a quality cost. The posters question was what causes this tradeoff.
Go to the top of the page
+Quote Post
googlebot
post Apr 28 2011, 17:28
Post #19





Group: Members
Posts: 698
Joined: 6-March 10
Member No.: 78779



The OP asked for low delay codecs. Opus at 22 ms is by all means a low delay codec compared to all other contenders (AAC, Vorbis, MP3) in the test. Further, for example, AAC-LD is officially called "low delay" for its 20 ms. That Opus can be used down to 5 ms doesn't change the fact, that a low delay codec showed considerably better performance than all other large delay codecs in the latest installment.

The theoretical information usually provided about low vs. high delay coding isn't necessarily false. I just pointed out, that, in practice, a state of the art low delay codec can beat even fine tuned large delay implementations.

This post has been edited by googlebot: Apr 28 2011, 17:30
Go to the top of the page
+Quote Post
knutinh
post Apr 30 2011, 18:58
Post #20





Group: Members
Posts: 569
Joined: 1-November 06
Member No.: 37047



QUOTE (NullC @ Apr 28 2011, 04:08) *
Masking is a very fuzzy thing. As Garf said, it's not masking that gets you, it's the fact that you lose coding gain.

"coding gain" is the variance of one block of input divided by the variance of one block of transform output, averaged over some set of input blocks?

So the difference between coding transformed blocks of e.g. 128 samples, vs coding transformed blocks of 8*128 samples is that the latter will (for typical content) be more sparse and easily coded into few bits?

-k

This post has been edited by knutinh: Apr 30 2011, 18:59
Go to the top of the page
+Quote Post
Garf
post May 1 2011, 17:09
Post #21


Server Admin


Group: Admin
Posts: 4883
Joined: 24-September 01
Member No.: 13



QUOTE (knutinh @ Apr 30 2011, 19:58) *
"coding gain" is the variance of one block of input divided by the variance of one block of transform output, averaged over some set of input blocks?

So the difference between coding transformed blocks of e.g. 128 samples, vs coding transformed blocks of 8*128 samples is that the latter will (for typical content) be more sparse and easily coded into few bits?


I wouldn't say typical content but more: tonal signals. Encoding a tonal signal properly requires a higher SMR due to psychoacoustic reasons. So it's especially relevant to get a good coding gain there. This is visible in practice too, in the sense that low delay codecs are relatively worse on highly tonal signals and suffer less for more noisy input.

The example you give compares encoding 128 samples with encoding 1024 samples so it doesn't really make sense. If you're instead compare 8*128 vs 1*1024 samples, the latter will indeed be more sparse for a tonal signals. If you take a single sine wave, for the 1024 sample case, you will have 1 peak with some small sidelobes/leakage and everything else 0, whereas for the 8*128 you will have 8 peaks, with more leakage, and less 0s.
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 28th July 2014 - 18:45