Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Multiformat listening test @ ~64kbps: Results (Read 123130 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Multiformat listening test @ ~64kbps: Results

Reply #75
From the previous tests, the Nero and Apple encoders are proven to be the best AAC encoders, so with the data at hand I strongly believe you are wrong.

Has anyone ever seriously blind-tested e.g. Dolby's and Fraunhofer's HE-AAC encoders in the last few years, especially at these bit rates? I'd be more than happy to test them here, provided I can choose their versions.

Chris
If I don't reply to your reply, it means I agree with you.

Multiformat listening test @ ~64kbps: Results

Reply #76
Quote
Has anyone ever seriously blind-tested e.g. Dolby's and Fraunhofer's HE-AAC encoders in the last few years, especially at these bit rates? I'd be more than happy to test them here, provided I can choose their versions.


The CT/Dolby encoder was last tested here (AFAIK): http://www.mp3-tech.org/tests/aac_48/results.html
The latest version I have is 8.2.0 (found in the latest Winamp version). Igor took a look at this, and concluded it didn't seem improved much if at all since the 7.x series. I don't know if that is an accurate assessment, but it should be verifiable easily. I don't know of any newer versions, and 8.2 is from 2009.

Nobody ever tested Fraunhofer's HE-AAC encoder. Their MP3 encoder is available freely, and was tested in the past, but I couldn't find a free AAC encoder, which probably explains the complete lack of interest.

I'm not sure what you mean by "I'd be more than happy to test them here", but if you are able to provide copies of those codecs so they can be tested further including in future listening tests, that is probably a good idea. And if they do win those, your criticism is founded. Until then, I see little point.

Multiformat listening test @ ~64kbps: Results

Reply #77
Sorry, but being an AAC developer I have to stress this. The nero and Apple encoders have never been proven to be the best encoders around.


I don't know much about all the HE-AAC implementations around, but keep in mind that we're talking about two implementations that have matured over several years compared to the Opus *reference* implementation that is still very immature in terms of tuning. So if anything, I think the comparison would be biased against Opus. Not to mention that Opus makes quite a few quality-affecting compromises to achieve its low delay. To be fair, I'm probably just as surprised as you are with these results.

Multiformat listening test @ ~64kbps: Results

Reply #78
I'm stunned by the CELT/Opus results! I would have assumed that your toolbox is smaller than usually when you are targeting low-delay. And now Celt even beats the others by lengths.
Thanks for the great work, guys!


We were surprised when we started getting competitive with high latency codecs too, but we've had a little while to get used to that.  HE-AAC was a bit more surprising, especially since so many things were going against us (the immaturity of our encoder, it's inadequacy for this application (VBR), etc.)

Low-latency work implies some serious compromises, but it's not all bad.  Small transform sizes automatically give you better time domain noise shaping, for example.  There have been a lot of people that liked codecs like MPC at high rates for the reduced time-domain-liability they imply.  Simply getting many little details right helps reduce the harm of low-latency.  E.g. we use a trivial time domain pre-whitening before the transform so that the quantization noise from spectral leakage is less exposed.

We also expanded the low latency toolbox some. For example,  we have short and long frames without window switching (and the look-ahead latency that requires).  We're also using other things like range coding and high-dimensional vector quantization which might not have been great options a number of years ago.  Our decoder is currently quite a bit slower than AAC decoders (though it's not optimized, so it's hard to say how much the ultimate difference will be),  but since we're mostly targeting interactive use we were able to "pay" for decoder complexity increases with encoder decreases:  We're an order of magnitude faster encode side than our high latency competition (no explicit psy-model!). With some fairly modest compromises you can make an Opus audio mode (the CELT part) encoder which is basically the same speed as the decoder. Though cpu cycle hungry, Opus uses a very small amount of memory, which eliminates one of the embedded hurdles Vorbis suffered.

I also like to think that the ultra-low latency support has fostered some beneficial discipline: Every single fractional-bit of signaling overhead and frame redundancy counts a lot with 2.5ms frames. While it's not so important with larger frames, waste is waste, and Opus has very little of it. A very high percentage of the bits in opus frames go directly to improving the audio band SNR, and very few go to shuffling around coding modes or switching between inconsequentially different alternatives.  Signaling bits are sometimes helpful, sometimes _very_ helpful... but bits spent coding signal data are pretty much always helpful. We came up with a number of clever ways of eliminating signaling, and Opus is able to provide a true hard CBR (no bitres!) in audio modes which is super efficient (uses every bit in the frame) and actually sounds really good.

For music at lower rates, I expect that HE-AAC would win— we simply start to fall apart once the number of bytes in the frame gets too small.  Speech is another matter and Opus should do quite well there down to very low rates, owing to the merger of Skype's SILK.

Opus should also scale well to higher rates— it is not using any highly parametric techniques that don't respond to additional bits— though the lack of a mature encoder will probably still give other codecs the edge in many cases.  This is especially true for exposed multi-tonal samples like the sample 02 in this set, though multi-tonal killers are fairly uncommon and I expect that we can fix them with VBR in the same way large block codecs fix transients...

I also think that aggressive tuning of these HE-AAC encoders could put them back in the lead, or at least strongly tie opus. E.g. From my own listening I think a lot of the difference between nero and apple in the test was due to the lowpass difference for example.  That said, the HE-AAC format is mature (and by some accounts now stagnant), and we have a lot of low hanging fruit.

I feel that Vorbis loses for a sad reason at this rate and lower:  The Vorbis toolbox doesn't have any good tools to avoid having to low-pass at rates well below its initial design goals at higher rates it obviously does much better than it did here.  The lack of efficient tools for low rate HF is real shortcoming at this rate, but not one which is all that interesting from an engineering/competitive basis.

Cheers,

Multiformat listening test @ ~64kbps: Results

Reply #79
One thing which catches my attention here is that the Nero VBR seems to flex much more than that of Apple VBR, but on this sample set it neglected to increase the bitrate, i.e. it somehow failed to recognize these samples were difficult.

What exactly does the "Per-sample distribution" graph show?

Hm,
Nero had moderate bitrate variation in previous public tests as well.

I feel that Vorbis loses for a sad reason at this rate and lower:  The Vorbis toolbox doesn't have any good tools to avoid having to low-pass at rates well below its initial design goals at higher rates it obviously does much better than it did here.  The lack of efficient tools for low rate HF is real shortcoming at this rate, but not one which is all that interesting from an engineering/competitive basis.

Yes, but how AoTuv 6b juggles with bitrate. It deserves "Bravos"
Some listeners really like Vorbis (including me).
My results are:
Vorbis - 2.94
Nero -2.75
Apple - 3.00
Opus - 3.58.

It's aslo the matter of taste as it says here
I figured ratings would vary between testers depending on which of pre-echo, lowpass, ringing, warble and grittiness is more objectionable. Further more on the Bohemian Rhapsody sample source warbling had me very confused for a while


Multiformat listening test @ ~64kbps: Results

Reply #80
We also expanded the low latency toolbox some. For example,  we have short and long frames without window switching (and the look-ahead latency that requires).  We're also using other things like range coding and high-dimensional vector quantization which might not have been great options a number of years ago.  Our decoder is currently quite a bit slower than AAC decoders (though it's not optimized, so it's hard to say how much the ultimate difference will be),  but since we're mostly targeting interactive use we were able to "pay" for decoder complexity increases with encoder decreases:  We're an order of magnitude faster encode side than our high latency competition (no explicit psy-model!). With some fairly modest compromises you can make an Opus audio mode (the CELT part) encoder which is basically the same speed as the decoder. Though cpu cycle hungry, Opus uses a very small amount of memory, which eliminates one of the embedded hurdles Vorbis suffered.


Is the decoder spec finalized yet?  Would be interesting to see about getting a fixed point version running in Rockbox.

Multiformat listening test @ ~64kbps: Results

Reply #81
Is the decoder spec finalized yet?  Would be interesting to see about getting a fixed point version running in Rockbox.


We're in a soft-freeze right now. We're really not planning on changing it, but we're also not yet making any promises not to: if something awful comes up we will. The whole process is dependent on progress in the IETF now, which appears to have gone super-political and thus slow, for the moment.  The Ogg encap for it is certainly not final at the moment, it's been due for a redo for over a year now, but that was pushed back because we were working on finishing the codec itself.

So for the CELT part the reference implementation (libcelt) is both a fixed point implementation and a floating point implementation, through the magic of unholy C macros. The SILK part has split fixed/float code, and the combination is float only at the moment, but I think this is mostly just a build system issue.  I'd be glad to work with whomever to get it working on whatever.  Feel free to hop into the #celt  channel on irc.freenode.net.


Multiformat listening test @ ~64kbps: Results

Reply #82
I don't know much about all the HE-AAC implementations around, but keep in mind that we're talking about two implementations that have matured over several years compared to the Opus *reference* implementation that is still very immature in terms of tuning. So if anything, I think the comparison would be biased against Opus.


We only test implementations because I don't think we can sensibly test a specification. The best available implementation of Opus was compared to the best available implementations (that we know of) of HE-AAC. Where exactly is there a bias against Opus here?

You are arguing from the belief that Opus implementations can improve faster and more than HE-AAC implementations can. There may be arguments to support that belief, but this only means that a future test might be expected to show an increasing advantage for Opus. It certainly doesn't mean the current test is biased.

Quote
Not to mention that Opus makes quite a few quality-affecting compromises to achieve its low delay. To be fair, I'm probably just as surprised as you are with these results.


I agree (on both).

Multiformat listening test @ ~64kbps: Results

Reply #83
I'm not sure what you mean by "I'd be more than happy to test them here", but if you are able to provide copies of those codecs so they can be tested further including in future listening tests, that is probably a good idea. And if they do win those, your criticism is founded. Until then, I see little point.

Yes, pointing to encoders and settings is what I mean. Sorry for not making it clear. I'll let you know once the encoders of my choice become available somewhere for all to try out.

Maybe some background information: my (and a colleague's) full-time job over the last 3 years has been to improve Fraunhofer's HE-AAC encoder. I'm quite confident that I actually made some progress  at least over Fraunhofer's older encoder versions. I'm not expecting my "latest greatest" work to win over Opus in a test like this one (because the latter sounds really good), but I hope that it would be tied on average.

I'll have to disagree (of course ), Jean-Marc. The fact that Apple's HE-AAC implementation is quite new doesn't convince me it's fully mature. From Wikipedia:

Quote
As of September 2009, Apple has added support for HE-AAC (which is fully part of the MP4 standard) but iTunes still lacks support for true VBR encoding.

Chris
If I don't reply to your reply, it means I agree with you.

Multiformat listening test @ ~64kbps: Results

Reply #84
You are arguing from the belief that Opus implementations can improve faster and more than HE-AAC implementations can. There may be arguments to support that belief, but this only means that a future test might be expected to show an increasing advantage for Opus. It certainly doesn't mean the current test is biased.


Sorry, wrong choice of words. I meant to say that Opus (as a spec) was at a sort of at a disadvantage on this test compared to AAC which had more mature encoders. I do not in any way suggest that the test itself had a problem or that the IgorC or you should have done anything different. Mainly I was responding to Chris' comment about Apple AAC not being the best encoder out there.

Multiformat listening test @ ~64kbps: Results

Reply #85
Low-latency work implies some serious compromises, but it's not all bad.  Small transform sizes automatically give you better time domain noise shaping, for example.


But not enough, given the presence of block-switching in Opus

I could imagine the more frequent transmission of band energies is very useful. It should also help the folding. IIRC, SBR has issues with not being able to adapt fast enough time-domain-wise in some circumstances.

Quote
I also like to think that the ultra-low latency support has fostered some beneficial discipline: Every single fractional-bit of signaling overhead and frame redundancy counts a lot with 2.5ms frames. While it's not so important with larger frames, waste is waste, and Opus has very little of it. A very high percentage of the bits in opus frames go directly to improving the audio band SNR, and very few go to shuffling around coding modes or switching between inconsequentially different alternatives.  Signaling bits are sometimes helpful, sometimes _very_ helpful... but bits spent coding signal data are pretty much always helpful. We came up with a number of clever ways of eliminating signaling, and Opus is able to provide a true hard CBR (no bitres!) in audio modes which is super efficient (uses every bit in the frame) and actually sounds really good.


I think you have a good advantage over (LC)AAC here: AAC allows very fine control of the quantization, but at a severe signaling cost. At least in AAC codec design, you can spend a very long time on the complicated question of doing the joint R/D optimization quickly, and end up not using the fine control at all because it just eats too many bits. This gets worse at low bitrates. It doesn't help AAC only uses huffman coding, and not arithmetic/range coding. H.264 also has many possible modes but I presume CABAC helps mitigate the signaling cost (maybe someone who is more familiar with that particular codec can confirm/deny).

Is the almost complete lack of signaling in Opus related to the decision not to make range coding contexts? If I read the spec correctly, almost all of the range coding assumes uniform distribution.

The 3GPP reference code (which is actually Fraunhofer fastaac, as far as I know) shows that you can make a decent AAC codec even ignoring most of the psycho-acoustics or greatly simplifying them, so I'm not surprised you eliminated the explicit psymodel entirely. It's a surprisingly small part of the codecs efficiency (I'm not saying its not important - it is - but less as you would think at first). VBR is another matter, though it's also surprisingly hard to make consistently good decisions there.

Quote
For music at lower rates, I expect that HE-AAC would win— we simply start to fall apart once the number of bytes in the frame gets too small.


This is a bit surprising to me because of the above. Are there technical limitations that cause this?

Quote
This is especially true for exposed multi-tonal samples like the sample 02 in this set, though multi-tonal killers are fairly uncommon and I expect that we can fix them with VBR in the same way large block codecs fix transients...


Did the codec used in this test use the tonal pre/postfilter from Broadcom?

Quote
I feel that Vorbis loses for a sad reason at this rate and lower:  The Vorbis toolbox doesn't have any good tools to avoid having to low-pass at rates well below its initial design goals at higher rates it obviously does much better than it did here.  The lack of efficient tools for low rate HF is real shortcoming at this rate, but not one which is all that interesting from an engineering/competitive basis.


I understand this as saying that Vorbis would be easily competitive if it had something like SBR or folding. The experience with LC-AAC vs HE-AAC seems to support that.

Multiformat listening test @ ~64kbps: Results

Reply #86
Is the almost complete lack of signaling in Opus related to the decision not to make range coding contexts? If I read the spec correctly, almost all of the range coding assumes uniform distribution.


The CELT part of Opus uses range coding more for convenience than absolute necessity. I once did some simulations for using simpler coding (e.g. Golomb) instead of range coding and the loss was about 2-3 bits/frame. Of course, with the features we later added, some features would have been a pain to implement without range coding, but nothing impossible. The most important symbols we code either have flat probabilities or use a Laplace distribution, which Golomb codes model well.

Quote
Quote
For music at lower rates, I expect that HE-AAC would win— we simply start to fall apart once the number of bytes in the frame gets too small.


This is a bit surprising to me because of the above. Are there technical limitations that cause this?


The reason I'd expect us to eventually lose at some lower bit-rate is simply the fact that we have no SBR and (at even lower rates), no parametric stereo. But I'm fine with that. Opus was never intended to even go as low as 64 kb/s for stereo music so I'm already pretty happy with our performance.

Quote
Quote
This is especially true for exposed multi-tonal samples like the sample 02 in this set, though multi-tonal killers are fairly uncommon and I expect that we can fix them with VBR in the same way large block codecs fix transients...


Did the codec used in this test use the tonal pre/postfilter from Broadcom?


Yes, it probably helped a bit but it doesn't do miracles. This is one of the sacrifices we make for having low delay (the lower MDCT overlap causes more leakage).


Multiformat listening test @ ~64kbps: Results

Reply #87
Low-latency work implies some serious compromises, but it's not all bad.  Small transform sizes automatically give you better time domain noise shaping, for example.

But not enough, given the presence of block-switching in Opus
I could imagine the more frequent transmission of band energies is very useful. It should also help the folding. IIRC, SBR has issues with not being able to adapt fast enough time-domain-wise in some circumstances.


If we'd only been comparing ourselves to G.719/G.722.1c we probably wouldn't have done as much as we've done for transients, only through thoroughly unfair comparisons of our CBR behavior to things like vorbis were we motivated enough to really do something about it here. 

Amusingly, we don't do the kind of block switching that increases the coarse energy temporal resolution currently (the format allows it, we just don't do it).

The format supports frame sizes of 2.5, 5, 10, and 20ms  all use the same 2.5 ms window-overlap.  There format can switch on the fly between any of these sizes, but the current encoder doesn't do this automatically (you can ask it to).  We needed the sizes to cover all the latency use-cases, but they're potentially useful for coding even if you don't care about latency.  There are clearly cases where switching to a higher signaling rate than the 20ms frames give you is beneficial.

The switching we do  have is for any of the {5,10,20} ms sizes there is a 'transient frame' bit switch to flip to the 2.5ms transform, e.g. so a 20ms frame would have 8 of them.  They are grouped by band and normalized by band, so the coarse energy resolution doesn't necessarily go up, and the side information rate doesn't go up (much).  During quantization we can apply special T/F processing to boost or lower (for transient frames) the effective time domain resolution on a band by band basis.

At higher rates when our 32-bit algebraic codebook limitation arises (we artificially limit the VQ symbols to ~32 bits to avoid the need to do 64-bit arithmetic, plus some other limits to tame memory requirements), bands get subdivided in dimensions and for transient blocks (or blocks which have been time-boosted) the subdivision is set up so that it subdivides in time.  When this subdivision occurs, additional energy data is coded (basically the balance of the energy on each half, so that the resulting vectors retain the unit norm required by our spherical VQ), and in that case the energy resolution increases.

Regardless of the energy resolution, the sparseness preservation code makes a fair bit of effort to produce output which has the same time domain distribution as the original signal.

Quote
I think you have a good advantage over (LC)AAC here: AAC allows very fine control of the quantization, but at a severe signaling cost. At least in AAC codec design, you can spend a very long time on the complicated question of doing the joint R/D optimization quickly, and end up not using the fine control at all because it just eats too many bits. This gets worse at low bitrates. It doesn't help AAC only uses huffman coding, and not arithmetic/range coding. H.264 also has many possible modes but I presume CABAC helps mitigate the signaling cost (maybe someone who is more familiar with that particular codec can confirm/deny).

Is the almost complete lack of signaling in Opus related to the decision not to make range coding contexts? If I read the spec correctly, almost all of the range coding assumes uniform distribution.


Few of our signaling parameters have uniform distribution.  We don't use _adaptive_ contexts for most of the signaling because most of the signaling is a single symbol per frame— and we can't adapt across frames due to needing to tolerate loss—, but we have static probabilities, allowing for R/D decisions on the signaling and for making uncommon options very cheap (tiny fractions of a bit when not in use).  The cases where there really are multiple correlated signaling symbols (the per band T/F changes and the band bitrate boost symbols come to mind) then we do adapt the probability.

The coarse energy is all entropy coded, with a static PDF that agrees pretty well with most data. Again, loss robustness prevents us from having much useful adaptation, and the autoregressive inter/intra frame prediction at least makes sure the mean of the assumed distribution is right.

The VQ is uniform coded and it counts for most of the bits in the frame, as you mention— but after dumping a lot of data I found that the actual symbols themselves were quite uniform.  There might have been something we could have done with the signs if we used an alternative algebraic representation, but having very predictable bitrates from our VQ at lower resolutions turned out to be helpful for the bit allocation behavior in any case.

Quote
Quote
For music at lower rates, I expect that HE-AAC would win— we simply start to fall apart once the number of bytes in the frame gets too small.

This is a bit surprising to me because of the above. Are there technical limitations that cause this?


Well, a couple.  For one, I understand that HE-AAC is using 40ms frames(1024 samples at half-rate, no?).  That is a lot of effective signaling reduction that we miss, even if we're more efficient to start with.  Our shorter transforms and tiny window make the transform very leaky.  The reduced compaction means that signals are naturally less sparse, so at the very limits of resolution they fall apart more suddenly.

Because we always preserve the energy to at least 6dB resolution, the energy rate does not change as the rate decreases. At low enough rates we're spending a lot of bits there.  In particular, sometimes the energy bit rate bursts rather high, and if it uses up almost all of the bits which is bad for quality even if it only happens fairly rarely.  A smarter encoder than our current one could use dynamic programming to do R/D optimization of the coarse energy, but since this is only applicable to distorting the 6dB resolution data, it would only be applicable at very low rates.  A smarter encoder could also adjust the end band position (low-passing) to skip the coding of HF when it will be inaudible, but I think both of these would require a reliably psy-model and a lot of tuning in order to not be a liability.

You'd think that reduced side information would be a benefit at low rates, but it isn't always—  we can't e.g. precisely place a single tone in a band without coding enough resolution for the whole band.  When you just don't have enough bits, using them exactly where you want them is more important then when you have more bits, and we have fairly little control.  (And what control we do have, the encoder doesn't make great use of currently).  We also don't have parametric stereo other than a kind of band-energy-intensity only stereo. (We have quite clever stereo coding overall, but it isn't the sort of clever that makes very low rates work well, plus I think we've never heard a parametric stereo we actually liked)

Quote
Quote
This is especially true for exposed multi-tonal samples like the sample 02 in this set, though multi-tonal killers are fairly uncommon and I expect that we can fix them with VBR in the same way large block codecs fix transients...

Did the codec used in this test use the tonal pre/postfilter from Broadcom?


Yes, but it's not really that helpful for that kind of sample. The filter does a fairly narrow comb-shaped noise shaping. It can make a dramatic improvement on simple harmonic signals (like speech, exposed tonal instruments (like a trumpet or clarinet, even with background sound)), but on samples where there are many exposed tones which aren't simply harmonic related it doesn't do much.  Those signals also probably throw off the encoder's search, so even if some weak use of the filter could improve things there it probably isn't using it usefully right now.

Quote
Quote
I feel that Vorbis loses for a sad reason at this rate and lower:  The Vorbis toolbox doesn't have any good tools to avoid having to low-pass at rates well below its initial design goals at higher rates it obviously does much better than it did here.  The lack of efficient tools for low rate HF is real shortcoming at this rate, but not one which is all that interesting from an engineering/competitive basis.


I understand this as saying that Vorbis would be easily competitive if it had something like SBR or folding. The experience with LC-AAC vs HE-AAC seems to support that.


Yes, primarily. Or even if could get away with higher dimensional VQ with acceptable memory/complexity it would be somewhat better off.  Like MP3 vorbis' only way to cope with low rate signals is eventually by throwing out hunks of the spectrum.  It's better than MP3 in this regard as it has more control about where it throws things away, but ultimately leaving holes in the spectrum is not a great thing to do.



Multiformat listening test @ ~64kbps: Results

Reply #88
The switching we do have is for any of the {5,10,20} ms sizes there is a 'transient frame' bit switch to flip to the 2.5ms transform, e.g. so a 20ms frame would have 8 of them.  They are grouped by band and normalized by band, ...

Sounds similar to AAC, actually. May I ask, what do you mean by grouping and normalizing?

Quote
I understand that HE-AAC is using 40ms frames (1024 samples at half-rate, no?)

Yes, at 48 kHz output sampling rate and when using dual-rate SBR (46 ms in the test configuration). But since there's 50% block overlap, a frame spans up to 80 ms (up to 93 ms in the test). Which is a bit on the high side if you ask me, but that's how it is. Maybe that explains why CELT does so well in this test: with its 20-ms framing it might actually be closer to the optimum than HE-AAC.

Chris
If I don't reply to your reply, it means I agree with you.

Multiformat listening test @ ~64kbps: Results

Reply #89
The switching we do have is for any of the {5,10,20} ms sizes there is a 'transient frame' bit switch to flip to the 2.5ms transform, e.g. so a 20ms frame would have 8 of them.  They are grouped by band and normalized by band, ...

Sounds similar to AAC, actually. May I ask, what do you mean by grouping and normalizing?


It means that if (e.g.) a 20 ms frame is split in 8 short blocks, then there's only *one* energy value encoded per band. That value is the sum of the energies for all the short blocks. Normalizing is what CELT does in each band before applying the PVQ encoding (normalizing happens for all frames, not just transients).

Quote
Quote
I understand that HE-AAC is using 40ms frames (1024 samples at half-rate, no?)

Yes, at 48 kHz output sampling rate and when using dual-rate SBR (46 ms in the test configuration). But since there's 50% block overlap, a frame spans up to 80 ms (up to 93 ms in the test). Which is a bit on the high side if you ask me, but that's how it is. Maybe that explains why CELT does so well in this test: with its 20-ms framing it might actually be closer to the optimum than HE-AAC.


For 20 ms frame size, the CELT window is only 22.5 ms, so about 4x shorter than HE-AACs. That makes a big difference. That's probably the single biggest limitation imposed by the low-delay constraint and that's why I was really surprised by the quality we were able to get in this test. Has we not have this constraint, just increasing the MDCT overlap could have provided a big improvement in quality.

Multiformat listening test @ ~64kbps: Results

Reply #90


The second graph seems to be consistent with "Results from Dr. Christian Hoene for ITU-T Workshop last September". PEAQ is not supposed to work well for HE-AAC, but Vorbis has bad scores as well.


Multiformat listening test @ ~64kbps: Results

Reply #91
The second graph seems to be consistent with "Results from Dr. Christian Hoene for ITU-T Workshop last September". PEAQ is not supposed to work well for HE-AAC, but Vorbis has bad scores as well.


PEAQ is known to be horrible at comparing codecs. At best it can help tuning a codec when the tuning being done is not related to psycho-acoustics. We've known for a long time that it tends to give CELT higher scores than it deserves, so we've never really relied on it for comparing to other codecs.

Multiformat listening test @ ~64kbps: Results

Reply #92
Bitrate verification on my set of albums:


http://www.hydrogenaudio.org/forums/index....st&p=752009

IgorC, in your set of albums, which version of aoTuV did you use to encode to vorbis? I assume that as it is your own collection, you may not always re-encode your whole collection when new versions of the encoder come out (I certainly don't). I find that aoTuV b6.02 produce files consistently larger than those produced by aotuv b5.7, which may explain why the average bitrate in the listening test is so high. I wouldn't be surprised if b5.7 produce files on average ~64-68kbps, but I would be less certain about b6.02... Then again, it could just be coincidence, that for the times I check, b6.02 tends to produce larger files...

EDIT: I just realised I'm being a bit silly, who listens to 64kbps files for pleasure anyway? You must have used the current version.

Multiformat listening test @ ~64kbps: Results

Reply #93
IgorC, in your set of albums, which version of aoTuV did you use to encode to vorbis? I assume that as it is your own collection, you may not always re-encode your whole collection when new versions of the encoder come out (I certainly don't). I find that aoTuV b6.02 produce files consistently larger than those produced by aotuv b5.7, which may explain why the average bitrate in the listening test is so high. I wouldn't be surprised if b5.7 produce files on average ~64-68kbps, but I would be less certain about b6.02... Then again, it could just be coincidence, that for the times I check, b6.02 tends to produce larger files...

EDIT: I just realised I'm being a bit silly, who listens to 64kbps files for pleasure anyway? You must have used the current version.


The bitrates were independently verified here:

http://www.hydrogenaudio.org/forums/index....st&p=751888

Multiformat listening test @ ~64kbps: Results

Reply #94
IgorC, in your set of albums, which version of aoTuV did you use to encode to vorbis?

It was the last version of AoTuv 6.02 Beta.

Multiformat listening test @ ~64kbps: Results

Reply #95
EDIT: I just realised I'm being a bit silly, who listens to 64kbps files for pleasure anyway? You must have used the current version.

"Listen[ing] for pleasure" was not the intended goal of this test, based on my understanding. I believe it was to measure the usability of the tested codecs for implementations that require a low encoding rate (whatever they may be, but streamed content over low bandwidth -- such as telephony -- immediately springs to mind).

Multiformat listening test @ ~64kbps: Results

Reply #96
EDIT: I just realised I'm being a bit silly, who listens to 64kbps files for pleasure anyway? You must have used the current version.

"Listen[ing] for pleasure" was not the intended goal of this test, based on my understanding. I believe it was to measure the usability of the tested codecs for implementations that require a low encoding rate (whatever they may be, but streamed content over low bandwidth -- such as telephony -- immediately springs to mind).

Oh I know that . It's just that I know he wouldn't keep 64kbps encodes of his albums lying around on his computer, so he must have newly encoded the files to 64kbps with the latest encoder, hence why I felt a bit silly about asking which version encoder he used.

Anyway I'm glad it's the most recent version he used, now I don't have to worry about songs taking up more space on my DAP, now that I'm using b6.02.



Multiformat listening test @ ~64kbps: Results

Reply #99
NullC,

h*tp://www.mediafire.com/?s7i9usu2qr27pcg

The bitrate is slightly lower.