IPB

Welcome Guest ( Log In | Register )

QT AAC, aoTuV (Vorbis), libopus, LAME (MP3) at high quality settings, Searching for transparency by ABX
Jplus
post Feb 8 2013, 20:15
Post #1





Group: Members
Posts: 41
Joined: 7-February 13
Member No.: 106471



My primary motivation for performing this listening test was to find the lowest QT AAC TVBR setting that is fully transparent for me, because I want to use that for my music collection. Secondary motivation was to find out how the other encoders compare to QT AAC at high quality settings.

This is my first rigorous listening test, and a rather extensive one, so I wanted to share the results with the audio community. I hope others may learn as much from this experiment as I did!

Results in a nutshell (for the impatient)
QT AAC was judged fully transparent at q91 and close to transparent at q82. The sample in which I heard a faint difference between these presets had a bitrate of only 128kbps at q82 and 159kbps at q91, so taking that in consideration together with expected bitrates at q82, in CBR mode I would assume files at 190kbps and up to be reasonably safe for my ears.
AoTuV (Vorbis) was judged very close to transparent at q5 and q6 and fully transparent at q7. If I were to use Vorbis for my music collection I would pick q6 because I think the tradeoff between file size and perceived sound quality is better at that preset than at q7. I would trust CBR files of 200kbps or greater.
Opus was judged fully transparent at VBR with target bitrate 224kbps, which is consirably higher than I expected based on previous reports. At preset 192 I judged it untransparent so there's no grey area like in AAC or Vorbis. Opus VBR seems to be a lot less variable than the other codecs so in CBR mode I would trust Opus files of 230kbps and up.
LAME (MP3) was judged very close to transparent at V1 and V0 and fully transparent at c320. I would pick V0 if I were to use LAME for my music collection. In CBR mode I would trust files of 260kbps or greater.

Hardware
  • iMac7,1 with default Intel HD sound processor
  • Sennheiser HD 201 ear-enclosing headphones
  • fairly sensitive ears which were recently rinsed


Software
  • Mac OS X 10.6.8
  • X Lossless Decoder 20130127 for transcoding the samples (see encoder details below)
  • ABXTester 0.9, a simple GUI tool that presents the Xs in batches of 5 and uses QuickTime to play the samples
  • opus-tools 0.1.6 in order to decode Opus files to WAV so I could play them with QuickTime in ABXTester
  • Perian 1.2.3 QuickTime component that allows for playback of Vorbis ogg files


Encoder details
QT AAC: my installation of Mac OS X included CoreAudio 3.2.6, QuickTime 7.6.6 and QuickTimeX 10.0. I used TVBR mode and overall encoder quality "max".
AoTuV: XLD included release 1. Apart from the target quality setting no options were shown.
Opus: XLD included libopus 1.0.2. I used VBR mode and framesize 20ms. opus-tools 0.1.6 also uses libopus 1.0.2.
LAME: XLD included version 3.99.5. I used VBR mode with -q2 and the new VBR method.

Ambient conditions
Test setup was in an appartment with reasonably good sound isolation, in a moderately quiet environment with singing birds and low traffic. During ABX trials I kept the room door and the ventilation window closed. Computer fans were turned down. Under those conditions while wearing the headphones, most of the time the only sound I heard was the low humming of the external hard drive that carried the samples. Usually I became unaware of that sound when actively listening to a sample.

Samples
I selected 15 samples from the LAME Quality and Listening Test Information page. In 8 out of those samples I didn't hear a difference at any of the encodings I've tested. The remaining 7 samples are numbered below. In addition I included a 10-second fragment from Central Industrial by The Future Sound of London which I had previously found to contain obvious artifacts when encoded with QT AAC q63:
  1. applaud.wv
  2. fatboy.wv
  3. goldc.wv
  4. pipes.wv
  5. testsignal2.wv
  6. vbrtest.wv
  7. velvet.wv
  8. central industrial.m4a (ALAC)

Henceforth I'll refer to these samples by their numbers. See the appendix for detailed discussion of each sample.

General test procedure
As a general preparation I transcoded the WavPack samples to ALAC in order to make them playable in ABXTester. I always used the lossless original as sample A and the lossy compressed file as sample B. I took regular breaks in order to prevent fatigue. The measurements were spread over multiple sessions with almost a week between the first and the last session.

For each codec, I would first encode all samples at the middle preset, i.e. q63 for QT AAC, V5 for LAME, q4 for aoTuV and 96kbps for Opus. Then for each sample I would conduct ABX testing and conclude one of the following levels of quality:
  • clear difference if I was very sure I heard obvious artifacts and I scored 5 out of 5 after the first batch, or if I scored near 100% after multiple batches with overall p <= 0.002;
  • marginal difference if I wasn't absolutely sure in each trial but testing showed that I was able to hear the difference, i.e. at least three batches with overall p <= 0.05;
  • no difference if testing didn't disprove that I might be just guessing (p > 0.05) or if I gave up in advance.

By default I set the audio volume to 5 notches out of 16. I tended to turn it up to 6 notches if I didn't immediately hear a difference in all samples except for #1, which I experienced as very loud already. Occasionally I would try the sample with the channels reversed (by reversing my headphones) in order to test if something new might come to my attention.
After testing all samples at the middle preset I would proceed to higher presets with the samples in which I heard any difference, until I found the minimal preset at which I heard no difference or until I couldn't go higher. A preset was judged "fully transparent" if I heard no difference in any sample, "very close to transparent" if I heard a marginal difference in at most one sample, and "untransparent" otherwise. I decided to assign QT AAC q82 an intermediate category "close to transparent" because I heard a clear but very faint difference in one sample. More on that below. The overall search path from preset to preset generally went like a binary search or similarly "jumpy".

I executed the above procedure first for QT AAC, then for LAME, then aoTuV and finally Opus. During the course of the experiment I noticed I had become better at detecting artifacts, so in the end I returned to QT AAC to verify my end results for that encoder.

QT AAC
Observed bitrate range: varies wildly around the official expected value. For example, at q63 (135kbps expected) some samples had an average bitrate of 80kbps while others went over 190kbps.
Observed artifacts: even at medium bitrates (q27) most artifacts were slight changes in timbre or texture rather than very obtrusive stand-alone sounds. The exception is sample 8 which obtained some obvious, very sharp "ticks" after encoding which were audible up to q82 at 128kbps average file bitrate.

Stage 1: all samples at q63.
I heard no differences except for a clear difference in sample 8. I decided to ignore that for the moment and to proceed my search downwards first.

Stage 2: samples 1-7 at q27.
I heard clear differences in samples 1, 2, 6, 7.

Stage 3: samples 1, 2, 6, 7 at q45.
I heard clear differences in samples 2, 6 but no difference in samples 1, 7.

Stage 4: samples 2, 6 at q54.
I heard no differences anymore and decided q54 to be fully transparent if disregarding sample 8.

Stage 5: sample 8 at q100.
No difference.

Stage 6: sample 8 at q82.
No difference.

Stage 7: sample 8 at q73.
Clear difference, I chose q83 as my search result for the time being.

Stage 8: samples 1, 2, 6, 7, 8 at q82 (verification after finishing the other encoders).
I did hear a clear difference in sample 8 afterall, though I had to listen to A and B a few times before I noticed it. I heard no difference in the other samples.
Note: I have not reviewed stages 1-4. With my trained ears I might actually hear some additional differences at q54 or even q63 but I haven't tested.

Stage 9: sample 8 at q91.
No difference. I decided q91 to be my final search result for QT AAC.

LAME
Observed bitrate range: the spread is somewhat less than in QT AAC, generally the highest and lowest average bitrates where within 30kbps of the expected bitrate for the given quality preset.
Observed artifacts: no standalone "objects", but changes in timbre or texture could be very un-subtle.

Stage 1: all samples at V5.
I heard clear differences in samples 1, 4, 6, marginal difference in sample 7 and no difference in samples 2, 3, 5, 8.

Stage 2: samples 1, 4, 6, 7 at V3.
Clear differences in samples 1, 6.

Stage 3: samples 1, 6 at V1.
Marginal difference in sample 1. I decided V1 to be my search result for the time being.

Stage 4: sample 1 at V2 (checking for consistency with aoTuV after finishing Opus).
Clear difference. I chose V0 as my final search result instead.

Stage 5: sample 1 at V0 (for completeness, shortly before starting this report).
Marginal difference (yes really, I believe I heard a difference and I identified 18 out of 25 Xs correctly: 72%, p=0.014).

Stage 6: sample 1 at c320.
No difference (at first I thought I heard a difference but ABX testing showed I didn't).

AoTuV
Observed bitrate range: average file bitrate is usually greater than the official target bitrate for the given quality preset. For example, the average bitrates at q4 were all greater than 128kbps. Upwards spread from the target bitrate seemed to be similar to QT AAC.
Observed artifacts: few and subtle. The marginal difference in sample 3 that I consistently heard up to q6 was an attenuation effect, the high frequency components were slightly softened.

Stage 1: all samples at q4.
Clear difference in sample 1, marginal difference in sample 3 and no difference in the other samples.

Stage 2: samples 1, 3 at q6.
Marginal difference in sample 3, no difference in sample 1.

Stage 3: sample 1 at q5.
Marginal difference. I decided q6 to be my search result.

Stage 4: sample 1 at q7 (for completeness, shortly before starting this report).
No difference.

Opus
Observed bitrate range: average bitrates were always very close to the target bitrate, with a spread of less than 10kbps in each direction. I would compare Opus VBR to QT AAC ABR.
Observed artifacts: texture changes, some of them very severe, including "rattling" and "grinding" sounds. Usually the timbre became more "metallic".

Stage 1: all samples at target 96kbps.
Clear differences in samples 1, 2, 4, 5, 6, 7, no difference in samples 3, 8.

Stage 2: samples 1, 2, 4, 5, 6, 7 at target 192kbps.
Clear differences in samples 4, 6, no difference in samples 1, 2, 5, 7.

Stage 3: samples 4, 6 at target 256kbps.
No differences.

Stage 4: samples 4, 6 at target 224kbps.
No differences. I chose 224kbps to be my search result.

Conclusions and recommendations
QT AAC and aoTuV are the clear winners in this comparison, with QT AAC achieving full transparency at the best compression ratio. I was a bit surprised to find that the highest quality preset is no overkill (for my ears) in LAME. Opus doesn't seem to perform exceptionally well (though better than LAME) at high bitrates although it's known to beat QT HE-AAC (more or less) at 64kbps. This is probably in part explained by the fact that Opus is still very young. Another explanation is that Opus might be more intended for low bitrates, which is somewhat suggested by the way it's described on the Opus home page.

According to the Hydrogenaudio wiki, most people find AAC to be transparent at about 150kbps, Vorbis at about 150-170kbps and LAME at about 160-224kbps. Given the results of this experiment, my ears might be slightly better than average.

If you wish to repeat this experiment, you might be able to save a lot of time by using my results as a hint where to find the most significant differences. The sample details in the appendix may help you to "look" in the right direction. In addition, you can probably start your searches for Opus and LAME at higher presets than I did.

If you just want to use this report as a hint for choosing your ideal encoder setting, I suggest that you perform a miniature version of my experiment using just a single sample in the encoder that you're interested in. If you hear a difference go up one preset until you don't, otherwise do the opposite by going down. Specifically:
For QT AAC, I would recommend listening to sample 8 and starting at q73. If you descend below q54 I recommend listening to samples 2, 6 instead.
For aoTuV, I would recommend listening to sample 3 and starting at q5. If you don't hear any difference switch to sample 1 at q4.
For Opus, you could take sample 4 at target 160kbps.
For LAME, I recommend listening to sample 1 starting at V3.

Appendix: sample details
Sample 1
Loud applause, with a "thank you" yelled through a microphone shortly after the start. The "thank you" is loud but sounds a bit muffled because of the microphone and there's a faint echo to it.
In the lossless original the applause sounds "wet"; you could compare it to rain or perhaps to oil spattering in a hot pan. In audibly different encodings it may sound dryer, noisier and coarser, perhaps like sandblasting, or very coarse and metallic (in Opus at 96kbps target bitrate).
The "thank you" should be a separate sound layered on top of the applause, and should sound fairly smooth. In audibly different encoding you may expect it to interact with the applause in several ways:
  • The applause may seem less clear, noisier or softer during the "thank you".
  • Directly after the "thank you" the applause may seem to be slightly louder and much coarser.
  • The echo to the "thank you" may seem to be amplified compared to the original and include some noise.
  • The "thank" syllable may sound slightly less smooth, a bit raspy, as if affected by the sandblasting (this is the primary way in which I made out the difference at maximum quality settings in LAME).


Sample 2
Some sawtooth-like signal with an additional trill effect that seems to contain vowels. I'm not sure whether this is a heavily filtered human voice or just something creative from a synthesizer, but either way it sounds quite interesting.
At medium bitrates in QT AAC and Opus it sounded distorted and metallic.

Sample 3
Symphonic fragment with drums, trumpets, violins, vocals and some high-pitched snare instrument which I think might be a steel guitar. There's also some high tingling in the right channel which I suspect is an artifact in the original file coming from the snare instrument. Sounds like a soundtrack to an epic 1960s movie.
In aoTuV you may find that the snare instrument (the proper sound slightly to the left, not the tingling in the right channel) is arpeggiated less sharply and sounds softer overall; I would call it a bit "timid" compared to the original.

Sample 4
Bagpipe playing a slow high-pitched melody over a constant bass. The sound is smooth overall although you'll find some irregularity especially in the second long-lasting high note. In the background there's the occasional hollow, raspy, low-pitched sound which might be either the bag being inflated by the artist or (a suggestion of) wind.
Focus on the long-lasting high-pitched notes, especially the very last one. In case of audible difference you'll find that they sound metallic and/or less smooth or even straightout distorted (Opus at 96kbps target bitrate).

Sample 5
Drums (something that sounds similar to a conga or a djembe) playing a samba-like rhythm. At the start an alto voice sings "aaaa", which is a bit of a shame because the voice will not help you to distinguish the encoded sample from the original and it partially masks the drums.
In Opus at 96kbps target bitrate the high-pitched slap beats sound more metallic than in the original.

Sample 6
Western guitar playing a country tune.
At lesser bitrates you might recognise the encoded sample directly because it sounds metallic and perhaps even a bit distorted. A high bitrates you might be able to make out the difference if you focus on the initial arpeggio and the final note. The last note of the initial arpeggio (which lasts longer than the previous notes) might sound a bit more rough than in the original. The final note might sound metallic. The latter difference is probably easier to hear than the former. You probably won't find a difference in the chords.

Sample 7
Monotone (synthetic) drum rythm with bass, big tom beating every second base beat, open-closing hi-hat in the right channel alternating with the bass beat and another closed hi-hat in the left channel beating four times for every bass beat.
You'll only hear a difference at the lesser quality settings, and you are most likely to find it in the closed hi-hat in the left channel.

Sample 8
Synthesizer music of fairly low complexity.
Frankly, the sounds aren't really important, because the main reason to listen to this fragment is the sharp ticks that are introduced by QT AAC. I don't think I need to tell you where they are because you're pretty much guaranteed to hear them at q63 and below.
Since this sample isn't available from the LAME Quality and Listening Test Information page, I made it available for download over here: https://dl.dropbox.com/u/3512486/central%20industrial.m4a
Go to the top of the page
+Quote Post
 
Start new topic
Replies
eahm
post Feb 9 2013, 19:55
Post #2





Group: Members
Posts: 1085
Joined: 11-February 12
Member No.: 97076



LordWarlock, I was able to hear difference between CBR 320kbps FhG and CBR 320kbps LAME on the first minute of Guns n' Roses - Don't Cry. I tested this for four straight hours but sorry I don't have any logs, you have to trust me. Go tell LAME and Fraunhofer there is something wrong with their encoders.

This post has been edited by eahm: Feb 9 2013, 19:56


--------------------
/lwAsIimz
Go to the top of the page
+Quote Post

Posts in this topic
- Jplus   QT AAC, aoTuV (Vorbis), libopus, LAME (MP3) at high quality settings   Feb 8 2013, 20:15
- - eahm   I believe you but I am skeptical, these bitrates a...   Feb 8 2013, 20:26
- - DonP   What versions of these encoders did you use? For ...   Feb 8 2013, 21:43
- - Jplus   @eahm: If by "logs" you mean the kind of...   Feb 8 2013, 21:53
|- - DonP   QUOTE (Jplus @ Feb 8 2013, 15:53) @DonP: ...   Feb 8 2013, 22:35
- - eahm   Jplus, yes I meant these logs. Until I see a prope...   Feb 8 2013, 22:05
- - Jplus   @eahm: I'm sorry for quote-sniping you, but I ...   Feb 8 2013, 23:42
|- - DonP   QUOTE (Jplus @ Feb 8 2013, 17:42) @DonP: ...   Feb 9 2013, 00:58
- - Jplus   Ahh, I guess I've been too defensive. Thanks f...   Feb 9 2013, 02:04
- - eahm   Jplus, don't worry about lower bitrates just f...   Feb 9 2013, 05:26
- - Mach-X   The OP presented clear and concise criteria for te...   Feb 9 2013, 07:03
- - lvqcl   About aotuv version numbers: homepage QUOTE aoTuV...   Feb 9 2013, 09:11
- - Jplus   @lvqcl: Thank you, that clarifies some things. I h...   Feb 9 2013, 11:20
- - Alexxander   Without hard numbers, only you can draw your concl...   Feb 9 2013, 12:20
- - greynol   I have my doubts that the p-values are being calcu...   Feb 9 2013, 13:30
- - Jplus   TLDR version: greynol is right that fb2k does some...   Feb 9 2013, 14:33
- - LordWarlock   Why are you all so hung up on the ABX logs? It...   Feb 9 2013, 14:53
- - eahm   LordWarlock, I was able to hear difference between...   Feb 9 2013, 19:55
- - greynol   Thanks for the clarification, Jplus. Even more tha...   Feb 9 2013, 20:16
- - db1989   Not so thanks to eahm for continuing to troll agai...   Feb 9 2013, 20:27
- - vinnie97   Realizing that Opus was built for mobility, the ap...   Feb 9 2013, 22:58
|- - IgorC   Jplus, welcome on HA forum. . It's great to se...   Feb 10 2013, 02:04
- - halb27   A big thank you from me too, Jplus. I appreciate y...   Feb 9 2013, 22:58
- - LordWarlock   QUOTE (eahm @ Feb 9 2013, 19:55) LordWarl...   Feb 9 2013, 23:36
|- - eahm   QUOTE (LordWarlock @ Feb 9 2013, 15:36) A...   Feb 10 2013, 00:16
- - Jplus   Thanks for the grateful and welcoming reactions, I...   Feb 10 2013, 13:36
|- - DonP   QUOTE (Jplus @ Feb 10 2013, 07:36) To be ...   Feb 10 2013, 13:54
- - Jplus   Writing down the (average) bitrate of each encoded...   Feb 10 2013, 15:43
- - IgorC   Jplus, It might be worth to mention that ABX is a...   Feb 11 2013, 17:37
- - Jplus   As promised: additional measurements for the lates...   Feb 11 2013, 19:04


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 21st September 2014 - 09:41