Vorbis quality – wrong direction?, RC3 against post-final encoder
Vorbis quality – wrong direction?, RC3 against post-final encoder
Feb 5 2004, 09:10
Group: Members (Donating)
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420
I promised to Quantum Knot to test his new tuning. I finally found the time, and tested QK2, 1.01 - compared to RC3 library. Yes, the two-years old RC3.
The challengers are of course questionable. I had to explain my choices.
First, I didn't include Nyaochi and Aoyumi tunings. I really apologize to both persons. It might appear as disrespectful, but in my opinion, comparing different tunings in alpha stage each others is not something fair. I guess that alpha tunings are released for report, not for competition. Comparison will probably follow later, when all tunings will be advanced enough.
Second thing: why old vorbis encoders?
I've tested vorbis 1.00 and 1.01 many times with natural instruments, and it always sounded approximative and disapointing, generally worse than lame mp3. To my ears, there are serious problems: not only hiss or high frequencies boost (reported many times by other users), but imprecise or coarse sound, aggressive rendering. With classical music, or natural instruments well recorded, all these problems are clearly audible – probably more than with loud and (over-)compressed music.
These problems are like a vorbis “signature” (in blind test, it’s easy to find vorbis among different challengers). I never heard something comparable with other formats. But I’m sure that these flaws are not consubstantial to vorbis format, but that they are a consequence not only of a lack of tuning, but from contestable choices made in the past. I'm still convinced that something wrong happened between RC3 and final "1.00" version from July 2002, though I never really tested this difference. My suspicions are based on the following things.
Important changes were made between RC3 and Final library. This was particularly noticeable on encoding speed (final is much faster), and on sound quality at low bitrate, even for untrained people. Before “final” encoder, Vorbis wasn’t very competitive at low bitrate (especially the popular and symbolic 64 kbps). The reference two years ago was mp3pro, and vorbis was clearly behind. At low bitrate, Vorbis sounded like other non-SBR format (wma, real, aac): metallic, heavily distorted, etc… But final encoder sounded totally different: not metallic anymore, less distorted, but simply noisier (and some stereo issues). For people used to find traditional flaws at this bitrate, vorbis was simply amazing. With some habits, noise and stereo problems would be more noticeable, but even here, vorbis is a very good solution at 64 kbps (see Roberto’s recent test) compared to all present non-SBR format. Noise and stereo reduction are generally more acceptable and less ugly than metallic coloration a la wma or DivX audio...
But this incontestable victory (for an open-source and non-patented project) had some bad reverse. In my opinion, pre-final encoders were more transparent at mid-bitrate (especially 128 kbps or –q4 setting), at least on non-killer samples (90% or 99.9% of [my?] music, I can’t say) and at least with natural instruments recorded on hi-fi principles. At this bitrate, encodings are now sharing the same flaw as low bitrate: hiss, imprecise rendering… On cool samples, where even mp3 sounds flawlessly, vorbis has audible problems. I suspect that the secret of vorbis good quality at low bitrate is the cause of wrong things that are happening above.
I never seriously tried to confirm or infirm my suspicions with blind test. I was more interested by high bitrate & transparent encodings, and because of pre-echo serious problems, vorbis wasn’t really interesting (except with Garf tuning). Now than Vorbis begins to infiltrate industrial manufacturers, the audio format looks for me more interesting at lower bitrate (I expect from modern format a good quality at 130-150 kbps]. Testing Quantum Knot tuning encoder appears like a good opportunity for me to oppose post-1.00 encoders to old RC3 library, in order to see if my suspicions are justified or simple rubbish.
I looked on old CD-R, and founded three RC3 builds, dated on: February 2002, March 2002 and April 2002. I was tempted to test two different builds, and not only one RC3 encoder. Why? In my souvenirs, a vorbis developer talked about changes happened within RC3 (can't remember or understand what - maybe something related to stereo model). I think it happened in springs (April or May). February and March encoders produce same output result (I did a bit to bit comparison, and files were the same, except tiny difference on the first samples). April encoder is a different beast. Encoding speed had improved (twice faster, if not more!), and output is different too. If problems occur with final library, the cause is maybe in this crucial moment of vorbis history. I'm maybe completely wrong in my suspicions, and maybe RC3 sounds worse than 1.00 encoder. The best thing to be sure about it is to test the different encoders.
The following test confronts four vorbis encoders:
• Official 1.01 oggenc – JohnV compile (2003.09.09)
• GordianKnot tuning QK2 (January 2004, based on 1.01)
Setting for all encoder is the same: -q4 (VBR, 128 kbps nominal)
In order to make the 1.01 & QK2 comparison (primary goal of the test) useful, I’ve selected sample with transients. I didn’t include well-known killer samples: positive reports for QK2 were done with this kind of samples, and for 128 kbps encodings, I guess that common musical samples are difficult enough for all encoders and more representative of real usage too. All samples are coming from my own library. I’ve deliberately chosen very short ones, so upload will be easier.
• harpsichord.wav: solo harpsichord, maybe too reverberated but very sharp and nice recording. Encoders usually suffer from pre-echo and from heavy distortions
• erhu10.wav: erhu (Chinese string instrument) with percussive instruments in accompaniment.
• Arche I.wav: orchestral extract from Penderecki first Symphony. Very sharp and loud attack, followed by something like a rattle [not Simon]
• Mandolins.wav: extract from a famous Vivaldi concerto. This sample (reduced to 5 seconds) is one of my favorite, because many encoders failed to encode this properly at ~130 kbps.
• Transfiguration.wav: part of an orchestral work of Olivier Messiaen. Two different problems should occur: distortions with cymbals (and with Vorbis 1.00 family: flattened noisy sound) and pre-echo/blurred brass instruments.
• La Spagna.wav: percussions & wind instruments from Renaissance playing concertedly. Pre-echo reduction is expected from GK2, and maybe noise problems may occurs with wind instruments.
• Mars.wav: beginning of the first Planet[/U], from Gustav Holst. Naturally quiet (violin pizzicatos and threatening winds, this sample is very quiet due to weird mastering of the CD layer [rip from a SACD]. This sample is comparable to orchestral lace, and I listened it without harm at very high volume in order to magnify all possible problems (pre-echo, ATH issues, background changes...)
• Brahms6.wav: piano is a percussive instruments, potentially affected by pre-echo. This recording from the 6th [I]Hungarian Dance of Johannes Brahms is really sharp and well-recorded.
In other words, the samples tested are probably not favorable to pre-Final libraries. For measuring vorbis noise problems at ~128 kbps, tonal sample are more interesting. But even here, interesting things happened during the test.
• QK2 modifications have a positive effect on 5 samples. The benefits of this encoder concern pre-echo and sharpness only. On low-volume transients, GK2 is inoperative (Mars.wav). On sharp and detailed micro-attacks (Arche I.wav second part, and Transfiguration.wav), QK2 sounds identical to official 1.01. But when attacks are clearly defined (Arche I.wav first part and Harpsichord.wav), the difference is appreciable. Benefits are audible too when transients are not excessively strong (piano, percussions on La Spagna.wav), but difference is then mildly audible. I’m disappointed by the few differences with Mandolins.wav, and surprised by the identical scores I obtained with Erhu10.wav.
Vorbis library needs tweaking for mid bitrate encodings, and QK2 is a good answer for the pre-echo problem.
• The official post final “1.01” encoder sounded worse than RC3 libraries to my ears on 7 samples. Harpsichord is the only exception, with less distortion (vibrating effect). But most often, the noise issues (louder, unstable, affecting micro-dynamic of some instruments and definition/contours of others) was more discriminating than pre-echo variations. With Erhu10.wav, Transfiguration.wav and Arche I.wav (second part), difference between RC3 family and 1.01 is really big (whereas progress made with QK2 from 1.01 had a much more limited impact).
Interesting thing to note: official 1.01 never reached the 3.0 (= “slightly annoying”) notation. Notation is of course a very imprecise thing, but this ranking isn’t totally meaningless. 1.01 produces non transparent and barely acceptable sound with natural instruments, whereas the RC3 branch, as all different audio format, is able to reach near-transparency with non-difficult samples. To the 8 samples of this test I could add the results of the ~30 other samples I’ve tested in the past with vorbis: final vorbis results were each time around 3/5. In other words, the changes introduced with 1.00 are negative at –q 4 with a consequent part of the editorial production (at least for my ears).
• Other thing to note: there wasn’t any rupture between March 2002 [m2k2] and April 2002 [a2k2]. Library from m2k2 is not the crystal clear one, and a2k2 the Hoover© sounding library. M2k2 RC3 was better than a2k2 four time; a2k2 was better (but by a slightly margin) the four other time. Nevertheless, if April 2002 wasn’t an historic moment for Vorbis (except for encoding speed), it was surely the beginning of tendency: noise increased with a2k2 and reached an alarming level. The accident occurred few months later…
Other point: pre-echo issues were reduced with a2k2 library (compared to the previous one). But comparing RC3s to Finals libraries on pre-echo is not easy. Pre-echo is sometimes lower with RC3 (a2k2 or m2k2) : erhu10, Arche I, Mandolins and Brahms - and sometimes higher than with 1.01 output : Spagna, Holst.
Hard to conclude anything with only eight samples. I nevertheless reached the following one: my suspicions weren’t unfounded, and there’s obviously something rotten in 1.00 – and following libraries. I haven’t test Vorbis with non-classical/over-compressed/electronic music, and therefore I can’t evaluate the possible benefits of 1.00 modifications. But with classical (or more generally, natural instruments recorded with fidelity principles), these changes have clearly a negative effect – with some local exceptions (like harpsichord).
I guess that without these changes, Vorbis will sound poorly at low bitrate (a quick try with RC3 at 64 kbps convinced me that 1.00 is much better). But isn’t RC3 a better basis for mid-bitrate tuning than 1.00? Or is it possible to disconnect Vorbis mid settings from low settings, in order to avoid all characteristic introduced by “final” version to contaminate all encoding area (at least, up to 5.99]?
Note: the eight samples (FLAC – 4.5 MB] are available on the uploading forum – or will follow very soon.
This post has been edited by guruboolez: Dec 29 2005, 22:13
Feb 12 2004, 01:51
Joined: 16-December 02
Member No.: 4097
QUOTE (nyaochi @ Feb 12 2004, 03:38 AM)
Let me add two samples available here. You can hear boosted close hi-hat cymbal in the 8823 sample with 1.0.1 -q4.
My ABXing of the uncoupled stereo hi-hat cymbal at q 4 though I thought the differences were so subtle, I often doubted myself in most cases.
WinABX v0.23 test report
A file: E:\vsamples\8823.wav
B file: E:\vsamples\uncoupled.wav (155.27 kbps)
10:43:57 1/1 p=50.0%
10:44:11 2/2 p=25.0%
10:44:25 3/3 p=12.5%
10:44:34 3/4 p=31.2%
10:44:47 4/5 p=18.8%
10:45:33 5/6 p=10.9%
10:45:45 5/7 p=22.7%
10:46:06 6/8 p=14.5%
10:46:34 7/9 p= 9.0%
10:46:40 8/10 p= 5.5%
10:46:57 9/11 p= 3.3%
10:47:05 10/12 p= 1.9%
10:47:11 11/13 p= 1.1%
10:47:17 12/14 p= 0.6%
10:47:24 13/15 p= 0.4%
10:47:40 14/16 p= 0.2%
10:47:47 test finished
That is quite a nice sample for HF boost.
|Lo-Fi Version||Time is now: 18th September 2014 - 05:46|