Welcome Guest ( Log In | Register )

Personal evaluation at ~130..135 kbps, 200 samples, AAC (iTunes, Nero) - MP3 - Vorbis aoTuV
post Nov 15 2005, 09:14
Post #1

Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420

Preliminary notes

Two years ago I performed and published my two first listening tests. Both included different formats and encoders at ~130 kbps and involved a dozen of samples: classical music only. My purpose was to see which encoder was able to produce the best encoding at a friendly bitrate (friendly for portable players), and for a specific kind of music. iTunes AAC & WMAPro appeared to be the best encoders (for myself), and the absolute quality of both encoders at such bitrate surprised me. Last year (December 2004) I performed two similar tests: the first was dedicated to AAC (Nero, Apple, old and new encoders) and the second was a match between the best AAC encoder (Nero Digital “fast” VBR) and the most advanced Vorbis one (aoTuV beta 3). Quality and enjoyment were even higher!

This year I performed a fresh multiformat listening test at 130 kbps. This new test is very different from their predecessors from a methodological point of view. I progressively improved my approach of listening tests and tried to answered to all criticism addressed in the past to previous tests (and not necessary mine). Consequently, my “personal evaluations” which were first a friendly exercise feasible in one rainy, autumnal afternoon now looks as a gigantic task which took me approximately 10 days (shared with family, friends, job, and discouragement) to complete. I improved several point of the methodology; to sum them up:

diversity : the following test is not only based on “classical” music, and will also include several (fifty!) samples of “modern” music.
grading : described once as “temperamentic” I decided to stick all marks between two anchors, a low and a high one. It will decrease the contrast between different encoders and increase at the same time the difficulty of the full exercise but it should also ensure a more accurate grading. The low anchor is vital to prevent an excessively harsh grading; the high anchor is essential to temper enthusiasm: a very good encoding at 130 kbps should be marked in regard to an excellent and high bitrate one. The presence of both anchors should guarantee a right grading: not too low, not too high.
complexity : people reproached to some listening tests to focus only “critical” or “complex” samples. It may be a problem with some VBR implementations, which sometimes decrease too much the bitrate on “non-complex” samples. In my opinion, a listening test should include both types of samples, at least to verify that non-complex/low bitrate parts are as well encoded as complex/high bitrate ones. Usually, VBR encoders handle very well non-complex part. Usually… The complexity range of my gallery of samples is wide enough to represent all situations (from ultra-low bitrate to ultra high ones) and to check the strength of VBR implementations.
abundance : a bunch of 12…15 samples is maybe not enough to give an accurate idea of the strength and weakness of different encoders. I experienced it myself in the past: my previous tests didn’t reveal some problems I only noticed after on real usage, and more important, they were unable to expose the recurrence of the detected problems. Detecting one problem (like rumbling or ringing) is one thing, measuring the periodicity of this problem is another thing. My test is based on 200 samples; this number should be enough to expose all common problems plus several uncommon ones and is also sufficient to get an idea of their redundancy. This is in my opinion the biggest advantage of my personal listening test over collective ones (which must stay friendly to avoid discouragement and attract a lot of testers).
statistical analysis : it might appear as trivial to mention this, but statistical analysis of results and confidence bars are presents (they were not used last year and the year before).
“Apples and Oranges” : no need to recall the problem. This test only mobilizes VBR encoders. No debate this time.


The market of audio encoders is ruled by a Darwinian process: the stronger only survive. Between my first test (October 2003) and this one (november 2005), only few encoders really progressed. Most other (some of them are still in use) are unchanged or only changed once: MPC, WMA (Standard and Pro), faac, all MP3 encoders (excepted LAME). Another one appeared and disappeared in the meantime (Compaact!).
On the hardware side, the situation is now very different from the one I lived two years ago. With the exception of one or two devices, AAC and Vorbis support in hardware players were more a dream than reality. Testing different audio formats was useful for a virtual and opened future, rich in dreams and promises. Now, the concrete situation is more interesting than dreams. MP3 and WMA (Std) are still the two well-established formats, but Vorbis now benefits from a growing interest of several manufacturers and if AAC still looks like an Apple monopoly the iPod market has at least mutated into several form (flash memory players, Microdrive™ based jukebox). One victim of reality is WMAPro, still not supported; and the growing popularity of WMA labeled as PlaysForSure (based on WMA Std) seems to sentence WMAPro to a long exile.
For all these considerations, I restricted the test to the most usable and interesting encoders: AAC-LC (highly developed by Apple and Nero Digital), MP3 (vigorous as ever, thanks to LAME devs), Vorbis (saved from inertia by Aoyumi). Besides these four encoders, I add two anchors. More precisely:

Apple AAC: I used iTunes (based on QuickTime 7.03), at 128 kbps and with the recently added VBR mode . I test Apple AAC in VBR for the first time. I sadly discovered that this encoder use the same trick as the MP3 encoder included in iTunes: the minimal size of the frames are not inferior to the targeted bitrate (apart maybe digital silence). In other words, for 128 VBR encodings the bitrate starts at 128 kbps and is increased with complexity. No need to precise that if average bitrate stays close to the target, the variations are necessary limited. One advantage: this restricted mode prevents the VBR engine to use inadequately low bitrate frames, and should guarantee quality from bad surprises compared to a CBR encoding.

Nero Digital AAC: I used the very new encoder released two weeks ago (aac.dll v.3 and aacenc32.dll v. ), in VBR mode too. –internet profile is the closed to 128 kbps (slightly inferior with classical music, but higher with non-classical. I didn’t use the “fast” mode, which is now pretty similar but probably inferior to the “high” one.

LAME MP3: I used latest alpha of 3.98 (alpha 2) in order to add the –athaa-sensitivity 1 command to the –V5 --vbr-new mode. For the second group of samples and to slightly lower the bitrate I simply used –V5 –vbr-new.

Vorbis: I used aoTuV beta 4 (4.5 was released during the testing phase) instead of official 1.1.1 which corresponds to the 18 months old aoTuV beta 2 version. I used –q4,25 for the first group and –q4,00 for the second.

As low anchor, I looked for something really low and also usable in batch mode. I found a very old AAC encoder on ReallyRareWares called mbaacencoder version 0.3: it’s awfully slow, quality is terrific and is as anecdote ideal to get an idea of all progress made around AAC between 1999 (release date of mbaaencoder) and 2005 (Apple and Nero Digital). I tried to get joint stereo and LC profile in batch mode, but the encoder apparently stayed in default mode (Main Profile, 128 kbps and dual stereo).

As high anchor, I didn’t hesitate and used LAME 3.97 beta 1 –V2 --vbr new (or --preset standard) which is a reference for efficient, high quality and universal encodings. Furthermore, it would be interesting to evaluate the remaining gap between modern implementation of AAC and Vorbis at ~128 kbps to HQ MP3 at ~192 kbps.


The test hinges on two big groups of samples: 150 for “classical” music group and 50 for “non-classical” (or “various”, or “modern”, or “popular”… choose your own) group. I already used the first group in three different tests in the past (80 kbps, 96 kbps, and LAME –V5). The complete collection is available for download. The 2nd group consist on all (35) non-classical samples used in previous collective listening tests; they’re all available on rarewares. To decrease the gap between the first and the second group I’ve add 15 other samples, all recently submitted for the postponed 64 kbps listening test of Sebastian Mares. Most of these last files may still be available.


The bitrate comparison is more accurate for the first group: it’s based on full tracks (6min 30 sec. per file on average) instead of short samples (10 sec. on average), and the complete collection is last but not least very representative of my entire library. For the second group of samples, I proceeded differently and I based the bitrate calculation on the 50 samples (which are longer: 24 sec. on average) and on external data (bitrate table for LAME posted by someone else). This way to evaluate the bitrate is not very precise, but I don’t have enough material to build a more accurate bitrate table. That’s why I tried to lower at maximum the difference in bitrate for all settings, and changed the command line for Vorbis (from –q4,25 to –q4,00) and LAME (--athaa-sensitivity 1 was removed).
To sum up the datas (a complete bitrate table will follow in the next days):

CLASSICAL (full tracks)

low anchor 128,00 kbps (estimated)
AAC iTunes 133,33 kbps [+4,16 %]
AAC Nero 125,71 kbps [-1,79 %]
MP3 LAME 130,81 kbps [+2,20 %]
Vorbis aoTuV 131,69 kbps [+2,88 %]
high anchor 181,46 kbps [+41,77 %]

NON-CLASSICAL (short samples)

low anchor 128,00 kbps (estimated)
AAC iTunes 137,31 kbps [+7,27 %]
AAC Nero 134,10 kbps [+4,76 %]
MP3 LAME¹ 137,82 kbps [+7,67 %]
Vorbis aoTuV² 133,42 kbps [+4,23 %]
high anchor 196,28 kbps [+53,34 %]

¹ with --athaa-sensitivity 1 bitrate reaches 139,38 kbps
² with –q4,25 bitrate reaches 140,21 kbps


The full test consists on pure ABC notation. The double blind test conditions are ensured by schnofler ABC/HR 0.5 beta (2005.08.31) software. All samples were decoded by CLI decoded within ABC/HR; offset were removed each times and minor differences in gain were systematically corrected (the highest difference reached 1.2 dB). Small mention for Vorbis: all files were decoded with foobar2000 (I still can’t make ABC/HR decode Vorbis files). There are no ABX comparisons: it’s a luxury I can’t afford with 1200 files awaiting for evaluation (200 x 6). If a difference is really unsure, I don’t rank the file. I finally ranked 16 times the reference instead of the encoded one (and 6 mistakes concern the high anchor). The error is inferior to 1.5%. I didn’t discard the errors from the final results (they don’t have a significant impact).
My hardware setting: Beyerdynamic DT-531 headphone; Audigy2 soundcard; Onkyo A-5 amp.


Last words before posting the results: I planed to write a complete review, including a complete synthesis on most common problems encountered in this test. Different encoders have different problems, and some of them are recurrent. As example, LAME produce often weird kind of rumbling (noise in low frequencies) and smearing; Vorbis has still issues with what I called “microdetails” (blurred and replaced by noise) and sometimes coarseness; iTunes suffers sometimes from a form of ringing I can’t define; Nero Digital has serious troubles on tonal passage and poor pre-echo performance.
I didn’t compile this memento yet, which should interest developers more than users. But I publish the results yet, because I feel that it’ time for me to close this test (honestly, seeing ABC/HR running somewhere drives me mad or sick).
Results are published as big png files; file size is not an issue (only 111 kb) but the image size may cause issue on small display resolution (800x600). I apologize for inconvenience. Small comments are ending the graphs. Here again, I planned to write more detailed comments, but until I achieve what I planed to do I fear that the week-end and maybe the month will be over. I postponed several activities during the two last weeks to perform and present this test, but I can’t continue anymore. If I remember correctly there’s a life outside ABC/HR smile.gif I also suspect that most people are not reading comments or details and are more interested by the final ranking. That’s why my results I’ll post today are a bit in “raw” form. I sincerely apologize, and will try to (slowly) give more textual substance in the next days. Now, results

This post has been edited by guruboolez: Nov 15 2005, 09:38
Go to the top of the page
+Quote Post
Start new topic
post Nov 15 2005, 09:15
Post #2

Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420

Few words to conclude the test…
It’s pretty clear that all encoders tested here correspond to a good or even a very good output quality. There are currently no winner between AAC (iTunes) and Vorbis. It’s funny to see that results are pretty close on the finish line when problems are so different. Encodings are not fully transparent, but quality is in my opinion excellent most often (but not always).
LAME offers to MP3 the chance to stay competitive against AAC and Vorbis. Not fully competitive, but the efficiency of this format forces the respect.
Nero Digital implementation of AAC is slightly disappointing, especially with classical music, which is still a weak point of this encoder. But the quality is far from disaster (it wasn’t the case two years ago), is on average really good, gets even better with “non-classical” music and should satisfy several users.
Last but not least, difference among all these encoders is really small (don't look too much on "zoomed" plots smile.gif )

But the average mark is somewhat misleading. LAME quality is ~0.5 point lower to iTunes or Vorbis, but it doesn’t mean for example that quality of encoded albums are 0,5 lower. This lower ranking is rather the expression of higher fragility than lower quality. LAME, and Nero Digital, are more inclined to serious distortions than Vorbis or iTunes AAC at the same bitrate. The concept of quality may be replaced with such encoders by the concept of strength or robustness. To illustrate this I made the following histogram (sorry for poor quality, I’ll change it later):

Here, Vorbis and iTunes both get a mark comprise between 4.5 and 5.0 for 50% of the tested samples, whereas Nero only achieve this state (near-transparency or full transparency) for 20% of the same samples. With the classical group of samples, 30% of the them were ranked below 3.0 with Nero when iTunes or Vorbis got the same notation of less than 10% of the sample. The two winners are stronger, and could handle more situations than LAME and Nero Digital AAC.

This post has been edited by guruboolez: Dec 29 2005, 22:57
Go to the top of the page
+Quote Post

Posts in this topic
- guruboolez   Personal evaluation at ~130..135 kbps, 200 samples   Nov 15 2005, 09:14
- - guruboolez   RESULTS I. CLASSICAL: 5 electronic/artificial ...   Nov 15 2005, 09:14
- - guruboolez   Few words to conclude the test… It’s p...   Nov 15 2005, 09:15
- - PoisonDan   QUOTE (guruboolez @ Nov 15 2005, 10:14 AM)(ho...   Nov 15 2005, 09:30
- - vinnie97   bravo! You are much braver and patient than m...   Nov 15 2005, 09:34
- - guruboolez   Changes in aoTuV beta 4.5 are for inferior setting...   Nov 15 2005, 09:42
- - krazy   Once again guruboolez, thankyou for your amazingly...   Nov 15 2005, 09:45
- - Synthetic Soul   Thank you guruboolez. These tests are so importa...   Nov 15 2005, 09:53
- - arman68   QUOTE (guruboolez)Consequently, my “personal...   Nov 15 2005, 10:13
- - Daijoubu   That must have been a heap load of data to compile...   Nov 15 2005, 10:32
- - tycho   Invaluable tests again. Thank you so much. Vorbis ...   Nov 15 2005, 10:40
- - LadFromDownUnder   As we say "down under", "Good on ya...   Nov 15 2005, 10:41
- - Atlantis   Thanks Guru!   Nov 15 2005, 11:18
- - ffooky   Cheers Guru, fascinating stuff.   Nov 15 2005, 11:41
- - ilikedirtthe2nd   Thanks a lot, very interesting test again!   Nov 15 2005, 11:45
- - robert   Thanks Guruboolez, very informative. About LAME e...   Nov 15 2005, 11:46
|- - guruboolez   QUOTE (robert @ Nov 15 2005, 11:46 AM)QUOTE ...   Nov 15 2005, 13:23
- - Enig123   Oh. I don't think this test can be ignored onl...   Nov 15 2005, 12:07
|- - pest   you must be crazy impressive work! thanks a l...   Nov 15 2005, 12:15
- - dimzon   Thanx! guruboolez, how about low-bitrate compa...   Nov 15 2005, 12:09
|- - guruboolez   QUOTE (dimzon @ Nov 15 2005, 12:09 PM)Thanx...   Nov 15 2005, 13:28
|- - arman68   QUOTE (guruboolez @ Nov 15 2005, 12:28 PM)QUO...   Nov 15 2005, 15:28
|- - dimzon   QUOTE (guruboolez @ Nov 15 2005, 04:28 PM)QUO...   Nov 17 2005, 13:11
|- - guruboolez   QUOTE (dimzon @ Nov 17 2005, 01:11 PM)Whe nee...   Nov 17 2005, 13:26
|- - dimzon   QUOTE (guruboolez @ Nov 17 2005, 04:26 PM)Now...   Nov 17 2005, 14:25
||- - guruboolez   QUOTE (dimzon @ Nov 17 2005, 02:25 PM)Thanx a...   Nov 17 2005, 14:47
|- - dimzon   QUOTE (guruboolez @ Nov 17 2005, 04:26 PM)Ner...   Nov 17 2005, 14:56
|- - Halcyon   Thank you Guru, you never seize to amaze. Can I p...   Nov 17 2005, 18:42
|- - guruboolez   QUOTE (Halcyon @ Nov 17 2005, 06:42 PM)I...   Nov 19 2005, 11:21
|- - Halcyon   Thank you very much for going through the FSB 3rd ...   Nov 19 2005, 19:16
|- - guruboolez   QUOTE (Halcyon @ Nov 19 2005, 07:16 PM)I...   Nov 19 2005, 19:43
- - Zurman   Nice listening test, as always   Nov 15 2005, 12:38
- - rjamorim   Awesome, awesome, awesome. Very big thanks, Fran...   Nov 15 2005, 13:11
- - Busemann   Surprising to see how close Vorbis and iTunes are ...   Nov 15 2005, 13:36
|- - Zurman   QUOTE (Busemann @ Nov 15 2005, 04:36 AM)Surpr...   Nov 15 2005, 14:46
|- - rjamorim   QUOTE (Zurman @ Nov 15 2005, 11:46 AM)QUOTE (...   Nov 15 2005, 14:52
|- - Zurman   QUOTE (rjamorim @ Nov 15 2005, 05:52 AM)QUOTE...   Nov 15 2005, 15:11
- - QuantumKnot   To guruboolez, thank you for yet another incredibl...   Nov 15 2005, 14:27
- - henkersmahlzeit   Thanks Thanks Thanks ... what a job!   Nov 15 2005, 14:43
- - DARcode   Wow ! How far can a guruboolez appreciation ...   Nov 15 2005, 15:24
- - Sunhillow   DARcode seems to be VERY impressed guruboolez ...   Nov 15 2005, 16:28
- - lordraiden   Somebody can do a short conclusion about test I do...   Nov 15 2005, 16:31
|- - guruboolez   QUOTE (lordraiden @ Nov 15 2005, 04:31 PM)Som...   Nov 15 2005, 16:51
- - kornchild2002   Wow, thanks guruboolez as you have done so much wo...   Nov 15 2005, 16:46
- - lordraiden   is better iTunes AAC than Nero Digital for music a...   Nov 15 2005, 16:55
|- - guruboolez   QUOTE (lordraiden @ Nov 15 2005, 04:55 PM)is ...   Nov 15 2005, 16:59
|- - Gambit   QUOTE (guruboolez @ Nov 15 2005, 05:59 PM)QUO...   Nov 15 2005, 17:02
|- - arman68   QUOTE (Gambit)I think that we will soon see that e...   Nov 15 2005, 17:06
- - SirGrey   Uh. Thanks, guruboolez ! QUOTE The perform...   Nov 15 2005, 17:18
|- - Sebastian Mares   QUOTE (SirGrey @ Nov 15 2005, 06:18 PM)I...   Nov 15 2005, 17:23
- - kuniklo   One more voice of thanks for your thorough and pro...   Nov 15 2005, 18:07
- - singaiya   Like everybody else... wow! merci beaucoup. I ...   Nov 15 2005, 18:37
|- - guruboolez   QUOTE (singaiya @ Nov 15 2005, 06:37 PM)One i...   Nov 17 2005, 13:37
- - richard123   Allow me to add my thanks. Great job. Makes me f...   Nov 15 2005, 20:18
- - yulyo!   Great test Guru. it seems Nero new encoder isn...   Nov 15 2005, 20:43
|- - Sebastian Mares   QUOTE (yulyo! @ Nov 15 2005, 09:43 PM)Gre...   Nov 15 2005, 20:46
|- - Garf   QUOTE (yulyo! @ Nov 15 2005, 09:43 PM)Gre...   Nov 15 2005, 21:04
|- - guruboolez   QUOTE (Garf @ Nov 15 2005, 09:04 PM)2) Gurubo...   Nov 17 2005, 10:50
|- - Garf   QUOTE (guruboolez @ Nov 17 2005, 11:50 AM)Nev...   Nov 18 2005, 20:31
|- - guruboolez   QUOTE (Garf @ Nov 18 2005, 08:31 PM)I believe...   Nov 19 2005, 13:39
|- - Garf   QUOTE (guruboolez @ Nov 19 2005, 02:39 PM)It ...   Nov 19 2005, 14:11
|- - guruboolez   QUOTE (Garf @ Nov 19 2005, 02:11 PM)You'r...   Nov 19 2005, 14:30
|- - Garf   QUOTE (guruboolez @ Nov 19 2005, 03:30 PM)Sec...   Nov 19 2005, 15:19
|- - guruboolez   QUOTE (Garf @ Nov 19 2005, 03:19 PM)It might ...   Nov 19 2005, 15:35
|- - Garf   QUOTE (guruboolez @ Nov 19 2005, 04:35 PM)You...   Nov 19 2005, 15:54
|- - guruboolez   QUOTE (Garf @ Nov 19 2005, 03:54 PM)QUOTE (gu...   Nov 19 2005, 16:00
- - kwanbis   cool guru.   Nov 15 2005, 20:55
- - yulyo!   right sorry, my mistake   Nov 15 2005, 20:58
- - yulyo!   i know Garf, it really is. but i remember someone ...   Nov 15 2005, 21:13
|- - [JAZ]   QUOTE (yulyo! @ Nov 15 2005, 09:13 PM)i k...   Nov 15 2005, 21:41
|- - Garf   QUOTE ([JAZ] @ Nov 15 2005, 10:41 PM...   Nov 15 2005, 23:03
|- - [JAZ]   QUOTE (Garf @ Nov 15 2005, 11:03 PM)I don...   Nov 15 2005, 23:36
- - beto   Guru, outstanding as usual. Many thanks for shari...   Nov 15 2005, 22:36
- - jaybeee   @guruboolez: many thanks, and massive respect   Nov 15 2005, 22:57
- - vinnie97   I have a feeling Nero AAC would be the new contend...   Nov 15 2005, 23:35
- - skelly831   All hail the great Guruboolez! Nicely done...   Nov 16 2005, 00:26
- - de Mon   Guru, how could you make such a big testing ALONE ...   Nov 16 2005, 02:09
|- - smz   Really powerfull work, Guru! All my compliment...   Nov 16 2005, 02:22
- - kl33per   Again Guru, I am astounded. How you find the time...   Nov 16 2005, 03:14
- - Aoyumi   I was surprised by the too much huge test. The con...   Nov 16 2005, 15:13
- - Cpt. Spandrel   QUOTE (guruboolez @ Nov 15 2005, 07:14 PM)Pre...   Nov 17 2005, 01:44
|- - krazy   QUOTE (Cpt. Spandrel @ Nov 17 2005, 08:44 AM)...   Nov 17 2005, 04:47
- - Wedge   very impressive test. thanks alot!   Nov 17 2005, 01:57
- - QuantumKnot   Based on these comprehensive listening tests (as w...   Nov 17 2005, 11:25
- - =trott=   Don't get me wrong, I fully believe guruboulez...   Nov 17 2005, 12:29
- - yandexx   Guru, thanks a lot for your test! We really ap...   Nov 18 2005, 19:15
- - Ivan Dimkovic   @guruboolez, There will be an update for the next...   Nov 19 2005, 13:50
- - QuantumKnot   Just posting my initial impressions, in response t...   Nov 19 2005, 14:33
- - guruboolez   QuantumKnot> thanks for testing it (ABX scores ...   Nov 19 2005, 14:39
- - Garf   You can choose for yourself what you believe or do...   Nov 19 2005, 14:50
|- - guruboolez   QUOTE (Garf @ Nov 19 2005, 02:50 PM)You can c...   Nov 19 2005, 15:01
|- - Garf   QUOTE (guruboolez @ Nov 19 2005, 04:01 PM)If ...   Nov 19 2005, 15:12
|- - guruboolez   QUOTE (Garf @ Nov 19 2005, 03:12 PM)It would ...   Nov 19 2005, 15:24
- - ToS_Maverick   i just tested s22 and was only able to abx it with...   Nov 19 2005, 15:24
- - Lyx   Garf, Nero AAC may to some extend be your baby.......   Nov 19 2005, 16:04
- - lexor   guruboolez great test, you trully have the patienc...   Nov 19 2005, 17:03
|- - guruboolez   QUOTE (lexor @ Nov 19 2005, 05:03 PM)gurubool...   Nov 19 2005, 17:15
- - Ivan Dimkovic   Guys... I really think this discussion is not gett...   Nov 19 2005, 18:22
- - guruboolez   If you have some time, check the current highest V...   Nov 19 2005, 18:34
- - Ivan Dimkovic   Well, next week is gonna be busy   Nov 19 2005, 18:42
2 Pages V   1 2 >

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:


RSS Lo-Fi Version Time is now: 2nd December 2015 - 05:41