IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
*Improved* Multi Codec Listening Test Plans
ezlez
post Mar 10 2010, 09:28
Post #1





Group: Members
Posts: 42
Joined: 24-October 09
From: California, US
Member No.: 74271



Table of Contents
• Summary
• Reasoning
• Issues
• Samples
• Proposed Codec Settings


Summary:

Within the following weeks, I will conduct a listening test with several individuals using ABC/HR for Java 0.53a amongst different lossy codecs that use VBR. The goal of these listening tests is to evaluate different codecs within an equal range of quality settings to try to figure what codec performs better than others at compression within these different quality settings. The range of these quality settings span from a low setting (around 64 kbps for most codecs) to a point where each codec reaches transparency, based upon what people have recommended in the forums. At the moment, the plans are somewhat complete since I’m having issues with certain codecs and samples. Here are the following codecs:

• MP3: LAME 3.98.2 (Encoder) through Audacity 1.3.11 Beta (Frontend Software)
• Ogg Vorbis: aoTuV beta 5.7 (Encoder) though OggdropXPd 1.9.0 (Frontend Software)
• AAC: Nero 1.5.1 (Encoder) through foobar2000 1.0 (Frontend Software), QuickTime/Apple (Encoder) through iTunes 9.0.3.15
• Windows Media Audio: Standard 9 & Pro 10 Versions through Windows Media Player 11 (Encoder),
• Musepack: Musepack SV8 (Encoder) through foobar2000 1.0 (Frontend Software)

So far there will be about seven individuals taking the listening tests, and each will be assigned their own codec to test.

Reasoning

The aim of these tests is to evaluate different quality settings instead of arbitrarily chosen bit rates. But then again I don’t exactly know where to adjust each quality setting for the tests. I chose 5 quality settings per codec (excluding MPC) as a rough draft; a couple of settings above or below 5 is alright, but I want to try to evaluate the codecs within an equal range of settings. I also want to keep the number of tests/samples for each setting at/or at least 12 to yield meaningful results. Multiple sources claim that good results tend to come within 12 – 20 samples. I'm using Roberto Amorim's/rjamorim's listening test document on ABC/HR Testing as a guide to the tests that I will conduct (http://www.rarewares.org/rja/ListeningTest.pdf). The reason why I chose music samples is because I don’t really know much about using plain audio samples, and that I’ve seen many ABC/HR tests done with music samples. The subjects taking the tests have had experience regarding music and acoustics; most of them are teachers and complete music enthusiasts. Before any of them begins the actual testing, I’m going to demonstrate to them what sort of artifacts there are and how to detect them using portions ff123’s Artifact Training Page, which can be found at http://ff123.net/training/training.html

Issues

My main issue regarding the tests are what quality levels should each codec be set upon to be within the same range altogether. I want to test each codec within a range of low quality to transparent quality settings, but they don’t necessarily have to be within 5 settings. For instance, Musepack is used to compress music within transparent and high quality settings, which means it’s not optimized for lower compression. For Musepack, encoding usually starts at a quality setting of “Q3”, and usually reaches transparency at “Q5”. So for the Musepack portion of the listening tests, I’m planning to conduct testing on three quality settings (Q3, Q4, Q5), while still producing significant data for that codec.

Here is a list of issues that I’m currently facing with the codecs
MP3 (LAME):
1. On which quality settings should be tested: V9(45-85 kbps) to V4(145–185 kbps) [6 settings) or V8(65–105 kbps) to V4(145–185 kbps) [5 settings]
2. Variable Speed: Fast or Standard?
3. Channel Mode: Joint Stereo or Stereo?
4. Is the new release of LAME 3.98.3 available for most frontends (like Audacity or foobar2000 1.0)?

Ogg Vorbis (aoTuV beta 5.7)
1. On which quality settings should be tested: q 0.0 (~64 kbps) to q 4.0 (~128 kbps) [5 settings] or q 0.0 (~64 kbps) to q 5.0 (~160 kbps) [6 Settings]?
2. Any advanced encoder options I should use (On OggdropXPd)?

Nero AAC 1.5.1
1. What should the highest quality setting be? Different people argue that Nero AAC reaches transparency at q 0.45 while others argue that it reaches transparency at q 0.50
2. What should the lowest quality setting be? At the moment, I’m opting for q 0 .25 ( ~63 kbps)
3. How many quality settings should I test?

Apple/iTunes AAC
With the iTunes AAC encoder, there aren’t really quality settings. You’re given a choice of: Adjusting the Stereo Bit Rate amongst 17 settings (16 -320 kbps). A choice of Sampling Rates between 10 settings (8.0 kHz – 48 kHz, and “Auto”). A choice of 3 Channels (Auto, Mono, & Stereo). And options to use VBR, High Efficiency, and voice optimization.
1. How many bit rate settings should I choose? Some argue that transparency is usually at 128 kbps
2. What settings should I use for the “Sample Rate” and “Channels” Options?
3. Should I consider using high efficiency?
4. Should I consider using QuickTime Pro instead of iTunes 9? I haven’t bought it yet.

WMA Standard
1. Big issue: Should I test either WMA Standard with CBR or with VBR? At the moment, I’m trying to convince someone to see if they could test WMA Std with CBR (Someone is scheduled to test WMA Std with VBR at the moment).
2. Both WMA Std with CBR and VBR don’t offer a lot of quality settings. How many settings should I set? For that matter, when does WMA standard (either CBR/VBR) reach transparency

WMA Pro 10
1. At which settings does WMA Pro reach transparency?
2. Number of settings?
3. What should be considered a low setting?

Musepack
1. Are three quality settings with 18 samples per setting good enough? (Q3 – Q5)

Samples

There will be samples from 6 different genres of music, and 2 samples for each genre, each equaling 20 seconds. Each song will be ripped from CDs Here are the following genres
• Classical
• Jazz
• Rap/Hip Hop
• Country
• Rock
• Pop
I will post a complete list of the music samples within a couple of days


Proposed Codec Settings
MP3
  • Encoder: LAME through Audacity 1.3.11
  • VBR; Variable Speed: Standard
  • 12 Samples per Quality Setting (60 Tests)
  • Channel Mode: Stereo

  • V8 (65 – 105 kbps) [Low]
  • V7 (80 – 120 kbps)
  • V6 (95 – 135 kbps)
  • V5 (110 – 150 kbps)
  • V4(145 – 185 kbps) [High/Transparent]


Ogg Vorbis
  • Encoder: aoTuV b5.7 through OggdropXPd
  • Standard Quality Mode (VBR)
  • 12 Samples per Quality Setting (60 Tests)

  • q 0.0 ( ~64 kbps) [Low]
  • q 1.0 (~80 kbps)
  • q 2.0 (~96 kbps)
  • q 3.0 (~112 kbps)
  • q 4.0 (~128 kbps) [High/Transparent]


AAC
  • Encoder #1: Nero 1.51 through foobar2000
  • VBR
  • 12 Samples per Quality Setting (60 Tests)

  • Q .25 ( ~63 kbps) [Low]
  • Q .30 (~82 kbps)
  • Q .35 (~100 kbps)
  • Q .40 (~125 kbps)
  • Q .45 (~150 kbps) [High/Transparent]

  • Encoder #2:Quicktime/ iTunes (Apple) AAC Encoder through iTunes
  • Sample Rate: Auto
  • Channels: Auto
  • VBR
  • “Use High Efficiency” Omitted
  • “Optimize Voice” Omitted
  • 12 Samples per Quality Setting (60 Tests)

  • ~64 kbps
  • ~80 kbps
  • ~96 kbps
  • ~128 kbps
  • ~160 kbps



Windows Media Audio (Encoder: Windows Media Player 11)

  • WMA Standard w/VBR (Option #1)
  • 12 Samples per “Audio Quality” Setting (60 Tests)

  • 40 – 75 kbps
  • 50 – 95 kbps
  • 85 – 148 kbps
  • 136 – 215 kbps
  • 240 – 355 kbps

  • WMA Standard w/CBR (Option #2)
  • 12 Samples per Quality Setting (60 Tests)

  • 64 kbps
  • 96 kbps
  • 128 kbps
  • 160 kbps
  • 192 kbps

  • Format #2: WMA Professional
  • CBR
  • 12 Samples per “Audio Quality” Setting (60 Tests)

  • 64 kbps
  • 96 kbps
  • 128 kbps
  • 160 kbps
  • 192 kbps


Musepack (SV8)
  • Encoder: Musepack Encoder (MPC) through foobar2000
  • VBR
  • 18 Samples per Quality Setting (54 Tests)

  • Q3 (~90 kbps)
  • Q4 (~128 kbps)
  • Q5 (~170 kbps)


As I stated before, I value any criticism and advice regarding the plans

Go to the top of the page
+Quote Post
Larson
post Mar 10 2010, 09:32
Post #2





Group: Members
Posts: 131
Joined: 27-March 09
Member No.: 68422



you could use lame mp3 3.98.3 which was released recently,Nero AAC 1.5.4 (bugfix updates) and Apple AAC true vbr through qtaacenc by the great nao!
Go to the top of the page
+Quote Post
ojdo
post Mar 10 2010, 10:33
Post #3





Group: Members
Posts: 894
Joined: 18-June 06
From: Germany
Member No.: 31980



QUOTE (ezlez @ Mar 10 2010, 09:28) *
MP3 (LAME):
3. Channel Mode: Joint Stereo or Stereo?

AFAIK Joint Stereo is more efficient at a given bitrate, as the coding scheme can exploit similarities between L and R channel. So I would recommend to use (the default setting) joint stereo instead of separate channels.


One more general question: You write in the summary that you want to find out
QUOTE
what codec performs better than others at compression within these different quality settings
but then you write that there are
QUOTE
seven individuals taking the listening tests, and each will be assigned their own codec to test.


I'm neither an expert for listening tests nor have I conducted one myself yet, but I fear you won't be able to compare the quality/transparency of different codecs if each of them was judged by different individuals only. To draw such a conclusion it would be better let ABC/HR each individual a set of samples encoded with different codecs at (bitrate-wise) similar settings.


--------------------
http://freemusi.cc/
Go to the top of the page
+Quote Post
db1989
post Mar 10 2010, 11:02
Post #4





Group: Super Moderator
Posts: 5275
Joined: 23-June 06
Member No.: 32180



QUOTE
MP3 (LAME):
2. Variable Speed: Fast or Standard?
3. Channel Mode: Joint Stereo or Stereo?
4. Is the new release of LAME 3.98.3 available for most frontends (like Audacity or foobar2000 1.0)?

That you must ask such basic questions makes me wonder whether or not you're ready to undertake such an apparently ambitious test. Fast is default and recommended, as is joint stereo. The newest version works exactly like previous ones.
Go to the top of the page
+Quote Post
googlebot
post Mar 10 2010, 14:03
Post #5





Group: Members
Posts: 698
Joined: 6-March 10
Member No.: 78779



You should test at equal (average) bitrates. Else the results are meaningless and just a function of the initially chosen bitrate proportions. From a user perspective it doesn't make sense, anyway, that codec X at 140 kbit/s outperforms Y at 128 kbit/s. The opposite it doesn't, either. What matters is which is best for a chosen average rate for a specific content.

Your plan is ill with complexity. You must reduce it. Just do one proper ABX of Nero q .41 vs. iTunes AAC CVBR 128, and measure the time you need to come to a conclusive result. Multiply that by the 100's if not 1000's of singular comparisons your new plan necessarily involves.

This post has been edited by googlebot: Mar 10 2010, 14:04
Go to the top of the page
+Quote Post
stephanV
post Mar 10 2010, 14:09
Post #6





Group: Members
Posts: 394
Joined: 6-May 04
Member No.: 13932



I'm sorry to see that you haven't done anything with criticisms you received on your previous posts.


--------------------
"We cannot win against obsession. They care, we don't. They win."
Go to the top of the page
+Quote Post
Alexxander
post Mar 10 2010, 15:04
Post #7





Group: Members
Posts: 457
Joined: 15-November 04
Member No.: 18143



QUOTE
So far there will be about seven individuals taking the listening tests, and each will be assigned their own codec to test.

Do you really mean each codec is assigned to only one individual? If so, think deep about what you're doing.
Go to the top of the page
+Quote Post
timcupery
post Mar 10 2010, 16:07
Post #8





Group: Members
Posts: 780
Joined: 19-December 01
From: Tar Heel country
Member No.: 683



why don't you use foobar2000 as frontend software for all the codecs? (except WMA of course)
not that it's a big deal, but you can use foobar2000 for mp3 and ogg vorbis as well as for AAC and musepack. streamlines the process.

QUOTE (dv1989 @ Mar 10 2010, 05:02) *
QUOTE
MP3 (LAME):
2. Variable Speed: Fast or Standard?
3. Channel Mode: Joint Stereo or Stereo?
4. Is the new release of LAME 3.98.3 available for most frontends (like Audacity or foobar2000 1.0)?

That you must ask such basic questions makes me wonder whether or not you're ready to undertake such an apparently ambitious test. Fast is default and recommended, as is joint stereo. The newest version works exactly like previous ones.

this is very true. I'll chime in on the general "back to school before the drawing board" sentiment.

but take heart. once you learn what you're doing, such a listening test would be a good thing.


--------------------
God kills a kitten every time you encode with CBR 320
Go to the top of the page
+Quote Post
db1989
post Mar 10 2010, 16:15
Post #9





Group: Super Moderator
Posts: 5275
Joined: 23-June 06
Member No.: 32180



I made my point after skim-reading the initial post, so it's rather basic and only superficially related to methodology (which I'm not qualified to comment on), but several criticisms raised in this and previous topics by other users do seem to be fairly glaring.

This post has been edited by dv1989: Mar 10 2010, 16:16
Go to the top of the page
+Quote Post
ezlez
post Mar 11 2010, 02:13
Post #10





Group: Members
Posts: 42
Joined: 24-October 09
From: California, US
Member No.: 74271



I've now decided that I should test each subject with the same amount and types of codecs. So now I need to overhaul the planning of the testing to produce significant results. At the moment, I've made two different plans for the listening tests
  • I can keep the methodology of my original plans formed in this topic if I limit the amount of quality settings down to three or even two quality settings: a low, medium, and/or high/transparent setting for each codec. The problem with this plan is that the subject will need to assess all codecs, and in order to produce significant results, I need at least 12 samples per adjustbile quality setting. In order to not fatigue any of the subjects with so many tests, I will most likely need to omit something, perhaps a codec, genre(s) of music, or length of the sample (between 10 - 20 seconds). If anything, I will probably test a low and a medium quality setting and remain with 12 samples per setting and 6 genres of music. Most of the subjects are willing to put in up to 2 hours into the testing, so I there won't neccsecaily be a short limit of time for testing
  • I can simply test just one quality setting amongst all the codecs, even though it's going to be different then the original goal I made. With one quality setting, I can use 18 samples, and still produce significant results, in a way.
Go to the top of the page
+Quote Post
а.п.т.
post Mar 11 2010, 13:46
Post #11





Group: Members
Posts: 36
Joined: 25-January 09
Member No.: 65946



QUOTE (ezlez @ Mar 10 2010, 10:28) *
Musepack (SV8)
  • Encoder: Musepack Encoder (MPC) through foobar2000
  • VBR
  • 18 Samples per Quality Setting (54 Tests)

  • Q3 (~90 kbps)
  • Q4 (~128 kbps)
  • Q5 (~170 kbps)


As I stated before, I value any criticism and advice regarding the plans



    I believe, that testing of Musepack at bitrates below Q4 (it should be --quality 4, btw) to find the transparency point is pointless, it is not tuned at all for such bitrates. For me it becomes transparent at about Q5.5 (although once I needed Q6), so if you want 5 steps for musepack as well, I would suggest you
    [*]Q4
    [*]Q4.5
    [*]Q5
    [*]Q5.5
    [*]Q6

    ~~
    Go to the top of the page
    +Quote Post

    Reply to this topicStart new topic
    1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
    0 Members:

     



    RSS Lo-Fi Version Time is now: 30th July 2014 - 07:34