MPC vs OGG VORBIS vs MP3 at 175 kbps, listening test on non-killer samples
MPC vs OGG VORBIS vs MP3 at 175 kbps, listening test on non-killer samples
Jul 12 2004, 00:50
Group: Members (Donating)
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420
• My access to internet is now very limited. Therefore, the encoder I’m using for my tests are not necessary the most recent available on the web. Here, tests were done when vorbis 1.1 RC1 was released, but I didn’t have access to this information…
• This test is something like a work-in-progress. I plan to add more results with time.
I. PURPOSE OF THE TEST
Like many people of this board, my principal motivation for audio encoding lie in the possibility to listen and enjoy music in high quality directly from computer, which allows a very fast browsing and the access to an entire record collection. High quality encoding is a requirement, security a need. I used successively lame mp3, musepack audio and now lossless, which offer the security of identical digital-data with CD.
Nevertheless, lossy encoding is still interesting: modern hard disks are not necessary big enough for all collections, and I think that there’s some benefits to feed expensive digital jukebox with “better than just good” quality audio encodings, like AAC/Vorbis 128 – fine but perfectible.
The choice of the best lossy encoder isn’t really problematic. Musepack (mpc) is still winning most approvals, and is considered as fully transparent with --standard preset. Some elements encouraged me to seriously question this leading position of mpc.
• 1/ by testing occasionally the standard preset of mpc, I discovered that small differences are sometimes audible with usual music. Now if mpc isn’t fully transparent at 175 kbps, this format is definitively comparable (it doesn’t mean “equal”) to other lossy solution, which are suffering from the same report.
• 2/ the leading position of mpc was admitted long time ago. It was defined as “best lossy format” when challengers where not very strong: beta of vorbis, lame < 3.90, suboptimal aac encoders. But now, there are powerful vorbis encoders (the recent “megamix” merging looks like a serious challenger), optimized AAC encoders (QuickTime CBR and Nero VBR), and mature MP3 solutions (VBR presets of lame). The leading position must therefore be questioned again, at least by people able to detect differences.
• 3/ This challenge becomes necessary with the growing numbers of device supporting new audio formats like AAC and Vorbis. MPC is still confined to computer, or in best case on PDA – and is maybe doomed to this limited usage.
In consequence, I’ve tried to oppose to mpc --standard other serious encoding solutions, in order to have a better, modern and personal idea of the relative quality of this encoder compared to modern and convenient challengers.
Against musepack --standard, I decided to oppose two formats: MP3 with lame 3.97a3 and OGG VORBIS with the recent combined encoder named “megamix”. Explanations.
• first, no AAC encoder in the arena. I was tempted to use Nero AAC, but the last version I have (22.214.171.124) have some recognized quality problems and is promise to an imminent conceptual death, with the Third version of Ivan Dimkovic encoder. No need to test something outdated… I was also tempted to take QuickTime AAC, though it’s not VBR and not very flexible (nothing between 160 and 192 kbps: annoying for fair comparison with MPC --standard). But this encoder is not really suitable in my opinion for HQ listening, at least when user is found of opera and when most of his CD absolutely need a real gapless playback. AAC will be add later, but for now, it’s absent from this test.
• the choice of lame MP3 version is highly problematic too. Three choices are possible: the last “tested” release (3.90.3), the last gold release (3.96) or the last alpha release (3.97 alpha 3). I’ve decided to not use 3.90.3. I know that for some people this encoder is the best mp3 codec ever released; I also know that for historical reason 3.90.3 is probably the safest choice. But the difference between 3.90.x dead branch and the active 3.9x one is not only related to quality: 3.9x are much faster (not a luxury considering slowness of 3.90.x presets), more complete (full and redesigned VBR preset scale: the nice –V 5 used in Roberto listening test is for example a new feature inaccessible for 3.90.x), and last but not least in perpetual evolution. There’s nobody to correct flaws on 3.90.x, whereas bug audible with 3.9x could be corrected or lowered by Gabriel, Robert, Takehiro and other developers.
I definitively forget the choice of 3.90.x for another important reason: there’s no VBR preset corresponding to the MPC –standard bitrate. –alt-preset standard is clearly too high, --medium too low, -Y switch a hack, and ABR is probably not efficient enough. With 3.9x branch, there’s an existing preset between –standard and –medium: -V 3. And –V 3 average bitrate should be close to the MPC –standard one.
Then: 3.96 “gold” or 3.97 alpha? I’ve decided for the alpha release. I know the risks (for regression but also for progress). But I also know that 3.96 is buggy on –fast mode: it decides me to use a corrected release, even if the test doesn’t concern the fast mode of lame.
• the choice of vorbis version is less problematic. Recent tests were done. CVS/GT3b2 couldn’t resists against aoTuV/GT3/QK32 dream team (aka megamix), at least up to 5,99. And even higher, GT3b2 (previous reference encoder for high bitrate) doesn’t really sound superior (except maybe for one family of problems: micro-attacks). I also recall that I’ve began this test by being unaware of the release of 1.1 RC1. This last encoder nevertheless seems to be inferior to “megamix” (the essential but maybe ‘excessive’ tuning of Garf, used at bitrate > -q 5,00, are apparently missing from this RC1 version). The use of “megamix” is therefore pertinent, and my test is probably not outdated by this enjoying pre-release of oggenc 1.1
• I don’t forget the promising WMApro: I was really pleased and even enthusiastic by the quality reached by this format with classical music at mid-bitrate. Nevertheless, I didn’t include this format in the test. First, I had to limit the number of competitors. Then, I’m not familiar with this encoder and don’t know what setting is the best (which VBR mode? And is WMApro VBR implementation reliable, or isn’t ABR 2-PASS preferable, etc…). Last: still no hardware device for WMApro (though it’s not a reason to exclude an audio format from a test including MPC, it’s a disappointing situation).
Mid/High bitrate tests are, for me at least, especially painful. It doesn't mean that I hate them, quite the opposite. ...
Samples only concern « classical music », with one exception. I deliberately limited my choice on the music I like. It's not by snobbism; and it's not an egocentric attitude: other music is much harder for me to ABX, and my motivation would quickly disappear with music I don't really like. In other word, the impact of these results is VERY LIMITED: they concern my subjectivity (and only mine), and a particular genre of samples (natural instruments, recording according to high-fidelity principles - and not to the marketing “loud” one).
There are solo instruments (organ with Dom Bedos; harpsichord with Fuga; trombones with Orion II), instruments with small accompaniment (cymbals with Krall and Marche Royale, drums with Marche Royale, 2nd part), orchestra (Weihnachts-Oratorium and Platée), chorus (Ich bin der Welt abhanden gekommen) and voice ( “Dover, giustizia, amor” ). Additional information (artist, performer…) are available on file tags.
Comparing VBR encoders/settings is problematic. The ideal thing is to fix a target bitrate, and then to find the corresponding preset for each encoder. I followed the usual (and IMHO the best) methodology: the setting must be related to a wide selection of music, and not to the selected samples.
The targeted bitrate is the average bitrate of MUSEPACK --standard preset. The average bitrate can’t be evaluated precisely: it’s something comprise between 170 and 180 kbps. 175 kbps approximately. I have verified this value with classical music library, and people have reported similar value with completely different music.
The remaining task is now to find the corresponding VBR settings for LAME MP3 and Vorbis “megamix”.
The problems are beginning…
4.1. VORBIS SETTINGS
• The biggest problem lies in the average bitrate’s difference of vorbis, occurring at the same setting, depending on the kind of encoded music. Classical is bitrate friendly compared to most other stereo and modern material. With CVS encoder, I estimated this difference at 10…15 kbps on average for –q 5…6. With “megamix” (or other GT3b2 based encoders), this difference might reach 25…30 kbps for the same setting. I don’t know what to do…
- by testing vorbis with a –q value corresponding to 175 kbps for classical but 200…210 kbps on pop/rock… people may blame me for opposing to musepack an advantaged vorbis challenger.
- by testing vorbis with a –q value corresponding to a 175 kbps for pop/rock but 140 kbps on classical, the test will be pointless for me (the winner between mpc@175 and vorbis@145 isn’t very hard to guess…).
- by testing vorbis with a half-baked –q value, I fear that the test won’t corresponding to neither of both situation.
• The second big problem is related to vorbis rupture in the linearity of the quality scale. Between -5,99 and -6,00, there’s a consequent bitrate difference (~10 kbps), also corresponding to a serious quality difference, at least with vorbis 1.00 – 1.01 (including GT3b2). aoTuV (and therefore “megamix”) is based on the same code, but the tuning tried to correct or to minimize the quality gap between the two settings. I discovered that for classical music, the fair vorbis setting is very close to this 5,99 value. 6,00 is slightly to high, and I could disadvantage mpc by comparing it to vorbis –q 6,00. On the other side, I have the feeling that -q6,00 would show the full potential of vorbis, and that the extra 8…10 kbps could be worth for daily use. Would someone renounce to the correction of a quality bug at low prince (+5% increase in filesize), especially with archiving in mind? Seriously, I don’t think so.
For all these reasons, I’ve decided to put in the arena vorbis megamix at three different settings:
-q 6,00: clearly to “heavy” compared to mpc --standard with non-classical music, but interesting to test against -q 5,99 (to see if the frontier between these two settings still exists with aoTuV/Megamix/1.1)
-q 5,99: the corresponding setting for a matching bitrate with mpc –standard for classical music (still too heavy with other music), but maybe suboptimal quality for vorbis
-q 5,50: more universal setting for acceptable test against mpc --standard. It would be interesting to compare the quality difference between 5,50/5,99 and 5,99/6,00. I suspect (and fear) a much greater jump between the last pair than with the first one.
4.2. LAME SETTINGS
I discovered that bitrate of –V 3 preset (lame 3.97a3) is really close to the average bitrate of mpc --standard. This applies at least for classical music (I don’t have enough material to measure average bitrate on other musical genre). –V 3 will therefore be tested.
I’ve also decided to add –V 2 (--preset standard). The bitrate is higher, but I really want to see if this historical leading preset of lame MP3 is competitive against musepack. It would also be interesting to see how will perform lame –V 2 compared to vorbis megamix, also playable on portable player, but with bad consequences on battery life.
4.3. BITRATE TABLE
Instead of posting of a bitrate table of the short samples used for the test, I prefer posting data about more audio material. Average bitrate for ~20 albums (classical for most), and additionnal datas for track coming from 50 different CDs (+15 other in mono) are available on the following tables:
V. RESULTS AND CONCLUSIONS
First comment: I've add 10 points to each note. I had to find a solution to prevent misinterpretation of notes which could first appear as excessively severe. I didn't use low anchor for this test, and slight flaws sometimes appear as terribly annoying on such tests, lowering very much the notes. By artificially changing all notes, I also had in mind to disconnect the notation I used from the EBU scale (4= "perceptible but not annoying"; 3 = "slightly annoying", etc...).
With 10 results only, I couldn’t make strong conclusions. But some elements of conclusions are now appearing:
• MPC –standard has serious chance to be the best of the three competitors. Eight time on the first place, one time second, and never on the last. Very good performances. We could also note that –standard setting wasn’t sufficient for reaching the “transparency” level (except for the organ sample, with negative ABX tests). Nevertheless, I could seriously expect full transparency with higher setting: none of this sample (except maybe the chorus one) showed severe artifacts, but just slight differences. It’s typically the kind of “problems” that disappear with a higher bitrate. Anyway, I’m impressed, because I didn’t thought that MPC –standard was so in advance...
• LAME MP3 has few chances in my opinion to compete with vorbis and musepack at ~175 kbps. The new –V 3 setting sit on the last place eight times: too much… even with a limited set of samples. It doesn't mean that -V 3 sounds bad, but it's just inferior to modern lossy format at similar bitrate. But with improvements, who knows...
But the –V 2 setting (aka –alt-preset standard) is apparently competitive, and could fight (and sometime win) with vorbis “megamix” –q5.50 and –q5.99. Only problem: bitrate is not the same anymore (195 kbps vs something comprise between 162 and 180 kbps, but with classical music only). But it’s imperative to precise that LAME –V2 and –V3 suffers from huge artifacts (the harpsichord and the organ samples are severely wounded to my ears), whereas vorbis artifacts were never so bad (except, maybe, with Orion II sample – micro-attack problems).
To be short, LAME –V2 (--preset standard) is apparently competitive with VORBIS “megamix” –q 5,99, at least with classical music. It would be interesting to see how will perform both contenders with other kind of music at the same setting, which implies a completely different bitrate range (+10..15% with vorbis, and maybe – x% with lame).
• I expected a lot from the vorbis mixture. The progress between “megamix” and CVS are really impressive compared to CVS encoder, and I really wondered how it’ll perform against other challengers. I’m finally disappointed. For some reasons:
- First, the coarse sounding problem of the format is still audible with “megamix” up to 5,99. No need to suspect any of GT3b2 or QK32 tuning to ruin the benefits of original aoTuV in this area: the noise problem is particularly audible on “tonal” moments, encoded with pure aoTuV code (bit to bit identical samples between aoTuV encoder and megamix one). This additional noise is probably not too disturbing on daily listening, but on direct comparison with other challengers, the contrast is still annoying. The problem not really lies on noise, but on coarse rendering of voice or instruments: lack of subtlety, fat texture… I think that this problem is a legacy of internal change occurring during RC3 development of Vorbis, in spring 2002. I think I’ve established this fact at ~128 kbps some months ago (correct me if I’m wrong), and I suppose that’s still true at ~160…170 kbps, even with aoTuV (based on the same buggy “final” CVS code).
- Second reason to be disappointed: due to this remaining coarse problem occurring up to –q5,99, there’s still a consequent quality gap between this setting and the rounded -q 6,00. It’s my fault: I’ve expected from aoTuV tuning to erase the existing frontier between –q 5,99 and –q 6,00: this encoder only reduced the gap. There are ~10 kbps difference between 5,50 and 5,99 but few quality improvements. There are also 10 kbps difference between 5,99 and 6,00, but huge quality progress are audible. For a daily use of vorbis encoder, there’s no real problem with this difference: the 10 additional kbps of –q6,00 are obviously worth if someone is looking for high quality or archiving, and there’s no need to hesitate. But for my test or any similar one, this difference is much more problematic. On one side, I couldn’t oppose mpc –standard to megamix –q 6,00 on fair bases (average bitrate doesn’t match anymore). And on the other side, it’s pointless to compare mpc –standard to an handicapped vorbis setting (5,99). It’s like using musepack at –quality 4.99, which also suffers from problems (and bitrate gap) that don’t exist anymore at –quality 5.00. Cruel dilemma…
- Third reason to be disappointed: even at –q 6,00 (and 10 exceeding kbps), megamix couldn’t apparently reach the quality of musepack –standard. More samples are of course needed to enforce this beginning of conclusions, but I really fear the solution doesn’t lie on a selection of samples, but rather on further development.
As I said it at the very beginning, I consider this test as a first step. Additionnal results should and will normally complete this first phase. I expect a quick release of Nero AAC encoder in 126.96.36.199 version to add some spice to the test. External test, opposing vorbis megamix to the new 1.1 must also be done, in order to be sure that megamix is the best vorbis encoder at this bitrate.
I'd also like to see this test followed by other people. It would help to compare different HQ encoders on empirical bases. Feel free to post some results, even for one sample, on this topic.
APPENDIX. SAMPLE LOCATION
I've upload all samples on a temporary link. I couldn't keep them on-line too long. So don't wait if you're planning to do personal tests. ABX logs are available in each archive. Samples are in OptimFROG lossless audio format.
This post has been edited by guruboolez: Dec 29 2005, 21:46
Jul 14 2004, 04:07
Joined: 1-July 03
Member No.: 7495
Thanks for the time and effort you put into this test, guru. It provides invaluable info for those of us interested in these codecs, but without enough time to perform the tests ourselves.
I'd like to perform a similar test using a sample set of rock music. Though since my hearing sensitivity isn't NEAR what yours is, I may end up not being able to distinguish any differences at this bitrate. I can at least try, though.
|Lo-Fi Version||Time is now: 10th October 2015 - 17:57|