CELT 0.3.2 is out, along with listening test results

Topic: CELT 0.3.2 is out, along with listening test results (Read 19660 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

CELT 0.3.2 is out, along with listening test results

2008-05-16 12:47:58

I just released version 0.3.2 of the CELT ultra low delay codec. Improvements include mainly quality tuning. As this version is also the basis for a paper I just submitted, I've also conducted some listening tests on it.

I compared CELT to G.722.1C (aka Siren14), AAC-LD and to MP3 (lame) and here are the results of the MUSHRA tests. Results turn out to be better than what I expected. CELT comes out on top of G.722.1C and MP3 at both 48 kbps and 64 kbps. It also beats AAC-LD at 48 kbps and ties at 64 kbps. All results are statistically significant (95%), except the tie of course. Considering that CELT has 8.7 ms delay, which is less than a fourth of the other codecs (AAC-LD can have 20 ms delay, but not the Apple implementation we used), this is fairly good (and unexpected). I've also posted some samples using the same encoder, but different samples.

Keep in mind that CELT isn't competing with Vorbis and standard AAC on quality. That would be next to impossible with the tiny frame size it uses. That being said, I think there's still room for improvement and I'm pretty sure CELT will get better. For those who wonder why such a low delay is useful, let's just say that one of the applications it allows is playing music remotely over a DSL connection.

CELT 0.3.2 is out, along with listening test results

Reply #1 – 2008-05-16 16:54:58

Wow an opensource low-delay implementation!!!!

How do I make this work as a dll for windows? It would be nice to use it with oddcast/edcast!

If it could be competing with AAC-LD, then you're in.

regards,

Arkadas

CELT 0.3.2 is out, along with listening test results

Reply #2 – 2008-05-16 17:35:41

Congrats, Jean-Marc!

Did the AAD-LD encoder make use of the LTP feature? Otherwise it wouldn't be completely fair because CELT is pretty much doing the same thing in this regard.

LTP = long term prediction (aka "pitch prediction")

Cheers,
SG

CELT 0.3.2 is out, along with listening test results

Reply #3 – 2008-05-16 18:22:45

You could use this codec for radio programs from locations

Instead off the pricy ISDN codecs!!!

Cheap setup:

Edcast to broadcast to icecast to winamp (need an icecast plug-in).

Regards,

Arkadas

CELT 0.3.2 is out, along with listening test results

Reply #4 – 2008-05-16 19:01:20

Glad to hear about this. Keep up the good work, jmvalin!

CELT 0.3.2 is out, along with listening test results

Reply #5 – 2008-05-16 23:38:35

Quote from: SebastianG on 2008-05-16 17:35:41

Did the AAD-LD encoder make use of the LTP feature? Otherwise it wouldn't be completely fair because CELT is pretty much doing the same thing in this regard.

The only AAC-LD implementation we found was from Apple in Quicktime Pro. All it allowed us to chance was the bit-rate and a "quality" setting, which we set to "best". We figured out from the bit-stream that it was using 512-sample frames with 512-sample lookahead (MDCT overlap) and another 512 samples delay for the bit reservoir. Also, we've checked and the reference implementation included LTP by default. The Apple one probably does too, which is why AAC-LD didn't collapse at 48 kbps on speech. Also, keep in mind that the comparison isn't fair to CELT in the first place because AAC-LD and G.722.1C have 4x the delay (and of course larger frame sizes).

Updated twice: Added better LTP info

CELT 0.3.2 is out, along with listening test results

Reply #6 – 2008-05-17 00:47:00

Nothing on rarewares yet?!

CELT 0.3.2 is out, along with listening test results

Reply #7 – 2008-05-17 00:52:00

Quote

Also, keep in mind that the comparison isn't fair to CELT in the first place because AAC-LD and G.722.1C have 4x the delay (and of course larger frame sizes).

And part of that is due to AAC-LD's the use of bit-res, which should really be expected to help a lot in terms of performance. I expect good things from a future managed VBR mode of CELT for high speed packet networks.

As JM pointed out, the test was against Apple's AAC-LD. I found an old reference implementation of the mpeg 4 natural audio codecs including AAC LD, but I can only describe it as "throughly broken" ... If I wanted to be unfair against AAC-LD I could have used that! Furthermore, the reference implementation had a zillion knobs which might positively or negatively impact the quality, with little clear guidance about what people might actually use in the real world. By using a widely available commercial AAC-LD codec we can at least be confidence that we were comparing against something real and in actual use. Besides, no one here had anything better to suggest.

MP3 in the test was Lame MP3, but it had the default lowpass disabled. Otherwise the difference in bandpass would have made it generally incomparable with the wider band-passes provided by CELT, AAC-LD, and G.722.1C. (notice how much lower they ranked 7khz lowpass)

The test was performed used untrained listeners, who I think were more tolerant of some types of artifacts than codec buffs would be, which is supported by the fact that they didn't totally pan MP3 which sounds really terrible at those bitrates. I expect that more experienced listeners would actually rate CELT even better than AAC-LD, except perhaps for some pre-echo punishment tests, because CELT avoids birdies very well.

The Xiph Wiki has a short page which includes some info on CELT tuning. I'll hopefully add some more to it soon. One thing that might be attractive to people interested in tinkering with codec improvements is that the CELT bitstream is not frozen so improvements need not be limited by details like compatibility (Performance, OTOH.. CELT already includes a complete fixed point port that runs well on a TI DSP, keeping computational burden and memory requirements realistic is important).

CELT 0.3.2 is out, along with listening test results

Reply #8 – 2008-05-17 23:38:56

Quote from: NullC on 2008-05-17 00:52:00

MP3 in the test was Lame MP3, but it had the default lowpass disabled. Otherwise the difference in bandpass would have made it generally incomparable with the wider band-passes provided by CELT, AAC-LD, and G.722.1C. (notice how much lower they ranked 7khz lowpass)

Actually, I did use a low-pass at 20 kHz to match CELT. I also disabled the bit reservoir, but it didn't seem to have that much effect.

CELT 0.3.2 is out, along with listening test results

Reply #9 – 2008-05-20 16:31:45

Quote

You could use this codec for radio programs from locations

what would be the practical difference between 20 ms lag and 8 ms lag in this case?

CELT 0.3.2 is out, along with listening test results

Reply #10 – 2008-05-20 18:40:56

Quote from: Arkadas on 2008-05-17 00:47:00

Nothing on rarewares yet?!

CELT is still a proof-of-concept and its made available for professionals who can mess with code. Don't expect Windows binaries until a few versions later.

CELT 0.3.2 is out, along with listening test results

Reply #11 – 2008-05-20 23:44:25

Quote from: Saoshyant on 2008-05-20 18:40:56

CELT is still a proof-of-concept and its made available for professionals who can mess with code. Don't expect Windows binaries until a few versions later.

CELT is now beyond the proof-of-concept and it's actually usable, even though the bit-stream isn't frozen yet. But it is indeed mostly useful as a library and not as a stand-alone encoder. So there's indeed little point in having CELT files because the only advantage CELT has over Vorbis is the latency.

CELT 0.3.2 is out, along with listening test results

Reply #12 – 2008-05-21 07:49:32

Quote from: smok3 on 2008-05-20 16:31:45

Quote
You could use this codec for radio programs from locations

what would be the practical difference between 20 ms lag and 8 ms lag in this case?

One point worth raising is that AAC-LD as tested against CELT isn't 20ms, it's somewhat more. Constrained to 20ms AAC-LD would likely do worse in the listening tests (Just as CELT does worse when cranked down to 3ms delay).

As far as lower delays go:

For one way transmission? One advantage is if there is high delay that the encoded audio will arrive later than any un-encoded paths and create an echo. (i.e. in the case of simulcast radio, or monitoring at live events)

For two way applications, my experience has been that acoustic echo cancellation works much better if latency is reduced, and that this continues all the way down to zero delay: It's always a win for the echo canceler to have less delay to deal with.

Even if a one way delay >20ms is acceptable for your application, CELT increases the distance at which you can achieve a particular total system delay. Signals travel in fiber at roughly 2/3C, So CELT will extend the distance that you can achieve a particular delay by 500-750 miles vs a 20ms codec (assuming CELT is 9 - 3ms, the delay range CELT is currently useful for), or 1500-1650mi vs 40ms codecs (which is closer to what CELT was tested against). These are distance differences which are significant in the real-world. Minimizing codec delay in long distance applications is important since its one of the only ways to constrain total system delay as the speed of light is fairly non-negotiable.

In cases where lowish latency is somewhat important but signals are not being sent long distances CELT still helps by reducing the latency pressure on other parts of the system: By using a lower latency codec more of the delay budget can be spent on long error correction codes, interleaving delay on simplex channels, transmission equipment delays, jitter buffering delays, and other overheads. Small frames also directly reduce interleaving delay. The difference between CELT and a 20ms codec could be expected to 'pay' for all other delays in a typical system.

When compared to 40ms (or 34ms) CELT's delay advantages are even greater.

For packetized transmission it most likely the case that smaller frame sizes will tend to improve packet loss robustness though it's not clear exactly how the pay-off curve for this is shaped. Speech transmitted in CELT with 30% packetloss is fairly intelligible (and 5% sounds not too bad). This is too codec-dependent a factor to directly attribute to small frame sizes (especially considering that 1 lost frame reportedly can result in ~1 second of total loss for the FhG ULD codec) but I do believe that it is something we should expect as a general trend since smaller packets mean that one packet loss loses less data.

Finally, ... Not relevant to a user shopping for a codec but perhaps relevant to Hydrogen audio: Pushing the bounds of low latency compressed audio is simply good for advancing the art. Near-transparency has already been achieved with decently low bitrates. How can codecs get better in the future? There is still a lot of room for advancement in the area of near-transparent bitrate but we can also improve codecs by decreasing their delay, improving their error robustness, and decreasing their computational complexity. As each of these dimensions become closer to perfection incremental improvements become harder but the insights learned can possibly be applied to improvements in the other performance dimensions as well.

(CELT also has other advantages unrelated to latency. I've only discussed the advantages which are arguably delay related here)

CELT 0.3.2 is out, along with listening test results

Reply #13 – 2008-05-21 17:43:33

Quote from: smok3 on 2008-05-20 16:31:45

Quote
You could use this codec for radio programs from locations

what would be the practical difference between 20 ms lag and 8 ms lag in this case?

Interacting between locations

telos uses aac-ld in their ISDN Codecs, Celt might be a nice alternative, especially used like the software codecs from audioTX, hopefully in opensource form.

CELT 0.3.2 is out, along with listening test results

Reply #14 – 2008-05-22 11:34:43

jmvalin

I'm curious about this release. I Dont know much about this CELT codec so can you please give a simple definition what it does and what do you mean by "Low delay" sound? and what are the benefits compare to Vorbis codec?

Thanks

CELT 0.3.2 is out, along with listening test results

Reply #15 – 2008-05-22 14:27:24

Quote from: Nicos on 2008-05-22 11:34:43

I'm curious about this release. I Dont know much about this CELT codec so can you please give a simple definition what it does and what do you mean by "Low delay" sound? and what are the benefits compare to Vorbis codec?

Let's say Alice's microphone is capturing sound in real-time, encoding it with a codec, transmitting it to Bob, who decodes it and plays it in the speakers. Assuming that the capture, playback and transmission have zero delay (not true), there is still a delay introduced by the codec. If all you have is audio up to time T and you send that to an encoder and then to a decoder, the decoder will only be able to give you the audio up to time T-D, where D is the "algorithmic delay". For Vorbis, that delay is usually more than 100ms, which means you can't really do VoIP with Vorbis. With CELT, the delay is 8.7ms, which makes it possible not only to do VoIP, but to do cool things like playing music with a remote musician.

CELT 0.3.2 is out, along with listening test results

Reply #16 – 2008-05-26 10:03:43

Quote from: jmvalin on 2008-05-16 23:38:35

The only AAC-LD implementation we found was from Apple in Quicktime Pro. All it allowed us to chance was the bit-rate and a "quality" setting, which we set to "best". We figured out from the bit-stream that it was using 512-sample frames with 512-sample lookahead (MDCT overlap) and another 512 samples delay for the bit reservoir.

Interesting. That really doesn't match what I remember having read about AAC-LD. I checked again and it turns out that the low-overlap window thing and a framesize of 480 samples are not mandatory.

Quote from: jmvalin on 2008-05-16 23:38:35

Also, keep in mind that the comparison isn't fair to CELT in the first place because AAC-LD and G.722.1C have 4x the delay (and of course larger frame sizes).

Impressive. Though, algorithmic delays of 15-17.4 ms should be possible with AAC-LD depending on the samplingrate (44 or 48 kHz) and frame size (480 or 512 samples). With respect to algorithmic delay AAC-LD is not that different to CELT except that the frame size is twice as big. That said, CELT beating AAC-LD qualitywise is a nice property as well.

Cheers!
SG

CELT 0.3.2 is out, along with listening test results

Reply #17 – 2008-05-26 16:19:23

Quote from: SebastianG on 2008-05-26 10:03:43

Impressive. Though, algorithmic delays of 15-17.4 ms should be possible with AAC-LD depending on the samplingrate (44 or 48 kHz) and frame size (480 or 512 samples). With respect to algorithmic delay AAC-LD is not that different to CELT except that the frame size is twice as big. That said, CELT beating AAC-LD qualitywise is a nice property as well.

Well, keep in mind that CELT's 8.7 ms delay is currently the largest delay it supports. I've done experiments down to 2 ms delay, though of course that requires higher bit-rate. As for AAC-LD, I'd be curious to see how it does with lower delay because in the implementation I was testing against (the only one I could find), it was not only using a longer window, but also a bit reservoir that would certainly have helped it on samples like the castanets (probably others too).

That being said, I'm aware that CELT still needs more work. The two main things I want to focus on now are stereo coupling and adding a psychoacoustic model along with dynamic bit allocation (currently, the bit allocation is fixed for each band).

Updated: formatting

CELT 0.3.2 is out, along with listening test results

Reply #18 – 2008-07-26 20:33:36

is there a windows binary / compile somewhere?

Or how do i compile this for use in windows ACM?

CELT 0.3.2 is out, along with listening test results

Reply #19 – 2008-07-27 06:17:11

Quote from: Arkadas on 2008-07-26 20:33:36

is there a windows binary / compile somewhere?

Or how do i compile this for use in windows ACM?

If you have a cygwin environment setup you should be able to build the library and example encoder/decoder app pretty easily using the configure script included with the archive.

Actually including support in an app would involve linking against the library. The API is pretty much just like speex. The celtenc/celtdec give a fair example.

I'm not sure how useful an ACM wrapper would be since taking advantage the low latency nature of CELT usually requires some other considerations in the application.

Notice