MPC vs VORBIS vs MP3 vs AAC at 180 kbps, 2nd checkup with classical music
Aug 21 2005, 19:33
Preliminary notes

Last summer I performed a blind listening comparison between three different audio formats, all set for ~175 kbps encodings. The purpose of the test was to investigate about encoding quality with classical music (and only classical) and to see which format would be the most efficient (i.e. the closest to transparency at lowest bitrate possible) for this kind of music. As jumping-off place for bitrate I took MPC –standard preset which was indisputably recognized as the best encoding solution outputing at 175...190 kbps on average. And indeed, the test ended on musepack superiority. MPC was even superior to Vorbis and MP3 at presets presenting higher bitrate (~195 kbps for LAME, ~185 for Vorbis against ~175 for musepack). Consequently, MPC encodings appeared to sound better and to be smaller at the same time. Amazed by the existent gap between all contenders I conclude my specific test with these words: “I didn’t think that MPC –standard was so in advance”.

My vacation are now quite over. I performed during my free time a big checkup of lossy quality at 80 kbps and 96 kbps (this one has to be translated in english dry.gif ), and it’s too late now to complete the 128 kbps I planned to do in the same silly conditions (150+35 tested samples). But I used my small remaining time to do again the listening test at 175 kbps I did last year, with the same 18 samples and the same hardware.

Why doing again the same test?

As a result of constant evolution of most audio encoders I consider my previous results as really outdated. I recall that Vorbis encodings were done with MEGAMIX I (hybrid encoder melting aoTuV beta 2, Garf Tuned 2 and Quantum Knot tunings). This encoder didn't subsist for a long time... and doesn’t exist anymore; it was replaced by MEGAMIX II, then official 1.1 with Impulse Trigger Profile + Impulse Noisetune switches, which was finally followed by aoTuV beta 3 and beta 4. The same goes for LAME: 3.97 alpha 3 was tested, and during this time LAME developers have submitted eight new versions of this alpha and a few other ones (lame_may, lame_june...)! MPC has also changed: from 1.14 beta to 1.15 alpha which is now considered as safe to use.
As a consequence of this evolution, problems audible last years (kind of ringing for LAME, or noise and coarseness for Vorbis) may be corrected or at least be lowered. The first purpose of my test is therefore to check the outcomes of recent tunings done for high bitrate settings.

There’s also a second point which stimulated me to do again the test and this point is called AAC. I haven’t tested AAC last year for technical and moral reasons. Technically, iTunes encoder couldn’t be set to ~175 kbps; Apple's AAC encoder wasn't also gapless and is for my purpose unsuitable for my conception of artefact-free encodings. I also felt as dishonest the inclusion of Nero AAC: it had recognized issues with classical first and a new encoder supposed to solve these problems was announced as imminent. Some readers suggest me to include faac as competitor, but I felt as unfair to test an encoder which was probably not the state of the art of AAC format and to compare it to the most advanced implementation of other formats (MEGAMIX and LAME 3.97).
I never regret my choice. But this absence of AAC frustrated my curiosity for a long time, because I had strictly no idea about comparative performance of this format with other contenders. That’s why I decided to absolutely include AAC this time. WMAPro will also be tested this time if possible.

The purpose of my test is therefore to obtain a fresh photography of the current performance of all modern lossy formats with classical music using the most advanced implementations for each of them.

I. Choosing the encoders

My purpose being to test most advanced encoders the choice of format hasn't to be controversial for most of them:

MP3: LAME 3.97 alpha 11. Release date: July 2005. Note: --vbr-new encoding mode.
MPC: mppenc 1.15v. Release date: march 2005.
Vorbis: aoTuV beta 4. Release date: June 2005, updated in July 2005 (merged with SVN 1.1.1).
WMAPro: no choice here: it's 9.1 or nothing. Release date: during 2004.

Choosing the good AAC encoder is much harder:
Apple AAC: There's still no VBR mode with iTunes. Consequently it's currently impossible to use Apple's AAC encoder unless other contenders will output an average bitrate close to either 160 kbps or 192 kbps. It's unlikely...
Nero Digital AAC: the most advanced VBR AAC encoder and therefore better placed to represent the AAC format. Big problem: should I use the 'high' and defaulted encoder or rather the 'fast' one which is really better at lower bitrate with classical music? The first one is still recommended by all Nero's developers and it's a valid reason to choose it instead of something they don't consider as stable enough (Garf, JohnV and Ivan Dimokovic). But the situation has maybe changed since their recommendation; I wouldn't also discard too quickly the possibility of using an encoder working better for the difficulties proper to the musical genre I'll test. The debate could be endless if a trivial but objective argument hadn't close the debate: the average bitrate of VBR mode of both encoders (see below).
faac AAC: testing faac might also be interesting. And even for fun, it would give me the possibility to oppose four different open-source implementations of four different formats smile.gif But such friendly comparison has a price: increasing the onerousness of the test which is anything but easy at this bitrate...

II. Targeting a bitrate

The purpose of my test is not to see what encoders could do with xxx kbps for each sample; I don't plan to force each encoding reaching a precise bitrate. My purpose is to stay close to the real usage of a vast majority of listeners (if not all...): using for every encoding one fixed setting which should statistically corresponds on average to the desired bitrate. That's why it's really fundamental to precisely know the average bitrate corresponding to a defined preset. And there's only one way to get it: encoding several tracks or albums.
Last year, I used as reference ~20 classical (+3 non-classical) albums. This year, I decided to be more methodical. I’m now using 150 different tracks (I mean full tracks) coming from 150 different CD in order to increase the variety of encoded tracks. It’s important to note that I didn’t choose randomly those tracks. I meticulously worked to get a representative microcosm of my full classical library, balanced between different grand ensemble (vocal, orchestral, chamber, soloist recording). This collection is nothing more than the 150 full tracks from which I’ve extracted 150 short samples in order to build a “catalogue raisonné” of musical situations occurring with classical music (see this test).

I genuinely expect from this methodically constructed library to be a highly representative panel of my classical collection. My assumption could be verified by checking the average bitrate of the entire collection encoded with WavPack -fx5 (all my >1000 CD digital library is encoded with this preset): 642 kbps for the selection of 150 tracks against 635 kbps for a complete set of more than 15000 tracks. The deviation is inferior to 1%! blink.gif Statistics are really magical.

III. Observing bitrates

I started with MPC which must give the reference bitrate. All other competitors have to be set in order to get a similar value.

MPC: --quality 5 corresponds precisely to 184,54 kbps. This is higher from what I expected first (~175 kbps). The 150 reference tracks are maybe not as representative as supposed. I also tried 1.14 (used last year) with the same preset and --xlevel: 176,28 kbps, much closer to the native average bitrate of --standard profile and reassuring me about the representativity of my collection of tracks. The bitrate has therefore inflated by 4.7% from 1.14 to 1.15v with classical.
=> I'll therefore try to get from all other encoders a setting which outputs to 184,5 kbps ±2% (180,5...188,1 kbps).

MP3: I first tried -V2 --vbr-new, which corresponds to the former --preset fast standard. Average bitrate is 181,79 kbps. Now, this value is lower from what I estimated last year (and that's why I tested -V3 in addition to -V2)... Indeed, 3.97alpha3 -V2 would output to 192,99 kbps. Nice gain (-5.80%). Obviously LAME developpers also worked on efficiency. Gain is great enough that LAME --preset standard could now be fairly compared to MPC --standard. But I recall another time that it only applies for classical (I suppose that bitrate is higher with other musical suffering from sb21 issue).

Vorbis: aoTuV beta 4 -q6,00 leads to 181,48 kbps. This is lower than what I expected, and it's also lower than MPC --standard bitrate. I get 186,99 kbps for the old MEGAMIX I. Bitrate has therefore be lowered with latest aoTuV.
-q6,00 could therefore be directly compared to MPC --standard and LAME --preset fast standard (for classical music).

WMAPro: VBR75 leads to 150,24 kbps. The next available preset is VBR90 and it leads to 203,96 kbps. Both are very far for the range I fixed and consequently WMAPro can't compete in this test.

Nero Digital AAC: Like LAME and WMAPro Nero Digital doesn't offer any precise VBR scale but seven presets. -internet leads to ~142 kbps for both 'high' and 'fast' encoders. -streaming high corresponds to 176,14 kbps and -streaming fast to 193,33 kbps. Consequently none of them is inside the fixed range; the closest one is -streaming high and is therefore the less unacceptable solution (I recall that the 'high' encoder is still the recommended one).

faac AAC: this is the only encoder able to fit into the fixed bitrate range (thanks to the precise VBR scale alla vorbis & mpc). AAC faac –q 175 leads to 180,92 kbps. This –q setting won’t probably correspond to 180 kbps with other musical genre and that’s the occasion to recall another time that the whole test is specific to classical music and nothing else.

Recapitulative table

         bitrate_2004   bitrate_2005     evolution in kbps   ...in %

MPC          176,28         184,54            +8,26 kbps      +4,69 %
MP3          192,99         181,79           -11,20 kbps      -5,80 %  
Vorbis       186,99         181,48            -5,51 kbps      -2,95 %
AAC faac   not tested       180,92              --              --
AAC Nero   not tested       176,14              --              --

=> faac, LAME, aoTuV are very close each others (difference is inferior to 0,9 kbps!). MPC presents a higher bitrate (+3 kbps) and Nero Digital a lower one (-5 kbps). The gap between the extreme is worrying: approximately 5% corresponding to 8 kbps. That's not a huge difference but these eight missing kbps may lead to a significant difference in quality. I could discard Nero Digital for this test but I would consider this choice as a mistake. For my own curiosity I'm also very impatient to see how would perform an advanced implementation of AAC in comparison to other formats, even if bitrate are not fully comparable.

=> As a consequence I decided to test both Nero Digital AAC and faac AAC, and I will consider Nero Digital presence as a "bonus" interesting to watch rather than an entire competitor. That's why my final diagramme (plots) will graphically separate Nero AAC results from other contenders. I hope this will avoid unecessary debate about any kind of unfairness based on bitrate disparity.


Are going to be test:

AAC: faac 1.24.1. Release date: end 2004 (?). Setting: -q175
AAC: Nero Digital aacenc32 v. Release date: June 2005. Setting: -streaming (high/default encoder).
MP3: LAME 3.97 alpha 11. Release date: July 2005. Setting: -V2 --vbr-new
MPC: mppenc 1.15v. Release date: march 2005. Setting: --quality 5
Vorbis: aoTuV beta 4 based on 1.1.1. Release date: July 2005. Setting: -q6,00

IV. Additional information

I performed all my last listening tests on a Creative Audigy 2 soundcard, which resamples everything to 48000 KHz. Some people consider that internal resampling (transparent in my opinion) is treating unfairly musepack and would biased any listening test. To cut the controversial short, I installed my (better) Terratec DMX6Fire 24/96 which doesn't resample 44100 KHz files (I'm not using it anymore for daily listening because of interference with my VIA chipset).


soundcard: Terratec DMX6Fire 24/96
headphone: BeyerDynamic DT-531
amp: Onkyo MT-5
software player: Java ABC/HR 0.5 beta 5.
software decoder: foobar2000 0.83 (in order to automatically get files free of offset and to solve my incompatibility issues occuring with Vorbis).


ABX phase : To limit the listening fatigue and to end the test before I left my appartment, I restricted the ABX tests to the most transparent encodings (note > 4.00).
Number of trials : eight trials as a minimum. I recall that schnofler's ABC/HR software doesn't reveal to score until the test is closed by the user (and it also can't be resume). Therefore the number of trails hasn't to be fixed: as long as score is hidden the pval isn't ruined. That's why I add more trials when I suspect bad results. I never exceed 16 trials: if something is really transparent I didn't persecute the encoding smile.gif
Notation : My notation was very severe last year, with a full dynamic range of notation (a lot of notes were inferior to 2.0). That's why I decided to add 10 points to each score (in order to disconnect the notation from the usual corresponding scale). This year, I tried to respect the ITU scale. When a difference is audible but not really annoying, the notation is at least equal to 4.0 and my hairs must stand on end to allow a notation inferior to 2.0 (from "annoying" to "very annoying"). Notation is still severe (I keep in mind that all encodings were set at 180 kbps) and that's why results I get here can't absolutely not be compared to other listening tests I done, especially those performed for low bitrate settings. By the way, there are no anchors in this test (high anchor is of course unecessary here).
Samples: Same as last year. See this thread.
Gain: I hadn't modify the gain of any file. All were played at their original volume.

Aug 22 2005, 21:26
Time to play devils advocate. What is the point of this test? All it shows are results that only apply to guruboolez's hearing, and only with classical music. Everyone is different and needs to test for themselves which is better to them, since everyone hears a little differently. I could do the same test, present the results in the same fancy manner, and come up with completely different results. The other option is an average based on a group of blind testers which makes much more sense, doesn't it?

That said, I'm not surprised by the results since they fall almost along the same lines as my own ABX testing. I use Vorbis (aotuvb4) for my portable an think it's fantastic, but if that didn't exsist I would use AAC over MP3 simply because I find the artifacts less annoying, which emphasises why people should do their own testing rather than choosing based on someone elses tests.

Hopefully I haven't offended anyone. I found this post very interesting and want to thank guruboolez as well for putting in such a huge effort. Great work! wink.gif
Aug 22 2005, 22:14
QUOTE (Digisurfer @ Aug 22 2005, 05:26 PM)
I could do the same test, present the results in the same fancy manner, and come up with completely different results.

Could you really? Have you done so? Or is that just speculation?

Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
Aug 22 2005, 23:20
QUOTE (rjamorim @ Aug 22 2005, 03:14 PM)
QUOTE (Digisurfer @ Aug 22 2005, 05:26 PM)
I could do the same test, present the results in the same fancy manner, and come up with completely different results.

Could you really? Have you done so? Or is that just speculation?

Speculation of course, primarily because differeing results are certinaly within the realm of possibility. Every one is different after all. I wasn't just talking about 180k tests by the way, my query applies to any bitrate. I'm lazy so I wouldn't really bother of course. I only bought it up because one of the things I see people posting here at HA.org all the time is that you have to ABX test for yourself in order to get any truly meaningful results, and that makes perfect sense of course. Thus, any testing I might do is only relevent to me, and is why I feel it would be a waste of time to post such results even if some folks may find said results interesting, though I honestly doubt anyone would actually care all that much. After all, I'm just a nobody and guruboolez is the one with the golden ears which, oddly enough, seem to have attained a strange sort of celebrity status around here, hehe. More power to him too. Like I said, despite what seems like (admitedly very minor) hypocrisy, I have to admit I find the tests fun and interesting to read just like everyone else. Thanks again for all your hard work guru! Again, I hope I have not offended anyone, since that was never my intent. Just find the whole thing rather amusing is all, given what is normally posted whenever someone new comes along and asks "what is the best codec/bitrate?". wink.gif

