IPB

Welcome Guest ( Log In | Register )

2 Pages V  < 1 2  
Reply to this topicStart new topic
Codecs and settings for 64kbit/s SE listening test, criticism required
Serge Smirnoff
post Oct 16 2012, 01:17
Post #26





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



QUOTE (yourlord @ Oct 15 2012, 20:38) *
QUOTE (Serge Smirnoff @ Oct 15 2012, 08:34) *
At 64kbit/s there is no need for artifacts amplification for sure. Above 128kbit/s meaningful results of ABX testing become more and more expensive (but still meaningful). SoundExpert proposes methodology that makes those tests less expensive. SE quality ratings of devices with small impairments could be considered as results of specially simplified listening tests. Results are experimental which is clearly stated on the site.


If meaningful results of ABX tests above 128kbps become more and more expensive it's because the codecs are doing their jobs and producing audibly transparent output. At a point where normal ABX results become statistically insignificant then transparency has been reached and we're done.

The problem is that there is no such "point" in practice. Another more seriously organized listening test moves the point of transparency to higher bitrates. Codecs at 256 are not the same even if your "normal ABX results" show that; another super-normal ABX results will reveal the differences for sure. In other words, differences between "equally transparent" codecs can be revealed by more thoroughly prepared listening tests. Besides codecs there is a lot of audio equipment with small impairments that requires evaluation and expensive listening tests. So, the purpose of SE testing methodology is to make such listening tests cheaper but still relevant.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
Serge Smirnoff
post Oct 16 2012, 01:42
Post #27





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



QUOTE (yourlord @ Oct 16 2012, 00:55) *
How about, can we have a test worth performing?

If AAC, Vorbis, Opus, and MP3 are all statistically transparent at a given nominal bit rate then they are all audibly the SAME QUALITY at that bit rate. No one codec offers any audible benefit over the others at that point. There is nothing to gain in claiming to judge codec quality by adding distortion to their output and pretending it somehow matters in the real world.

I'm honestly not sure why this thread hasn't been locked/removed. It smells of snake oil and pixie dust, or at the least is ill-conceived. These "tests" of adulterated codec outputs offer us no relevant results on which to base any kind of rational discussion or decisions, other than how NOT to conduct a codec quality test. It can only serve to spread disinformation and ignorance. IMO it has no place on HA.

Probably I should repeat once again - 32kbps and up to 128kbps SE listening tests are performed without artifacts amplification. The topic is devoted to codecs and settings for 64kbps listening test.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
Serge Smirnoff
post Oct 16 2012, 02:27
Post #28





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



QUOTE (LithosZA @ Oct 15 2012, 23:57) *
QUOTE
All AAC contenders for this 64kbit/s testing are in CVBR mode.

Can we also use Vorbis and Opus at CVBR rates?

Both Opus and Vorbis have managed bitrate mode. But right you are - already chosen codecs and settings are storage oriented. I suppose if initial goal is testing codecs for streaming, then contenders and their settings should be different.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
yourlord
post Oct 16 2012, 03:21
Post #29





Group: Members
Posts: 219
Joined: 1-March 11
Member No.: 88621



QUOTE (Serge Smirnoff @ Oct 15 2012, 20:42) *
Probably I should repeat once again - 32kbps and up to 128kbps SE listening tests are performed without artifacts amplification. The topic is devoted to codecs and settings for 64kbps listening test.


Which is even worse since you're mixing potentially valid and useful results in with utter trash results that deliver no practical information. The fact that some results in the test may be valid doesn't correct the fact that the overall tests are tainted with intentionally corrupted and useless results. The fact that you've chosen to intentionally distort the resulting waveforms at the higher bitrates brings into question the testing methodology of all of the results.

I'll repeat, the testing you intend to perform has no place being used as a reference for anything other than how NOT to perform a codec listening test. I would personally question the morality of even labelling your results with the names of the codecs since your tested output has NOTHING to do with nominal operation of those codecs.

Go to the top of the page
+Quote Post
Serge Smirnoff
post Oct 28 2012, 10:32
Post #30





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



The following codecs were added to 64kbit/s section:

AAC+ VBR@59.2 (Winamp 5.63) - CVBR, HE-AAC
AAC Encoder v1.04 (Fraunhofer IIS) from Winamp 5.63: variable Bitrate, preset: 2

AAC+ VBR@59.6 (QTime 7.7.2) - CVBR, HE-AAC
QuickTime (7.7.2) AAC Encoder via qaac 1.45 (CoreAudioToolbox 7.9.8.1): qaac --he -v56 ref.wav

AAC+ VBR@60.1 (NeroRef 1540) - CVBR, HE-AAC
Nero AAC Encoder 1.5.4.0 (build 2010-02-18): neroAacEnc.exe -q 0.25 -if ref.wav -of out.mp4

Vorbis VBR@60.3 (Xiph 1.3.3)
OggEnc v2.87 (libVorbis 1.3.3): oggenc2 -q0.3 ref.wav

Opus VBR@59.9 (libopus 1.0.1)
opusenc --bitrate 59 ref48.wav (44.1/16 -> 48/24 by Audition CS6)


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
C.R.Helmrich
post Oct 28 2012, 17:43
Post #31





Group: Developer
Posts: 688
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



Thanks! How does the reliability rating work? How many listeners are needed for, say, 5 percent? And why is Opus graded worse than mp3?

Chris

This post has been edited by C.R.Helmrich: Oct 28 2012, 17:44


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
LithosZA
post Oct 28 2012, 19:21
Post #32





Group: Members
Posts: 197
Joined: 26-February 11
Member No.: 88525



That must be one good MP3 encoder smile.gif
Who knew MP3 64Kbit/s CBR at 22050Hz Stereo would sound better than Opus 59.9Kbit/s VBR at 48000Hz Stereo...

This post has been edited by LithosZA: Oct 28 2012, 19:36
Go to the top of the page
+Quote Post
Serge Smirnoff
post Oct 28 2012, 19:49
Post #33





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



QUOTE (C.R.Helmrich @ Oct 28 2012, 20:43) *
How does the reliability rating work?

Each time a device under test receives a grade the rating is recalculated. A sequence of such ratings tends to some final value. ((Max-Min)/Last value)*100% over last N values is reliability. Now N = number of test files for a device. For low bitrates when artifacts amplification is not applied the number of test files equals to 18 (9samples*2).

QUOTE (C.R.Helmrich @ Oct 28 2012, 20:43) *
How many listeners are needed for, say, 5 percent?

Usually 5-7 grades are necessary for each test file in order to achieve 5% reliability of rating. Due to above mentioned nature of the parameter there is no strict relationship between accuracy of ratings and number of returned grades.


QUOTE (C.R.Helmrich @ Oct 28 2012, 20:43) *
And why is Opus graded worse than mp3?

At the moment Opus received 9 grades only and its rating is completely unreliable. I will check now why it shows 4% …


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
skamp
post Oct 28 2012, 20:03
Post #34





Group: Developer
Posts: 1444
Joined: 4-May 04
From: France
Member No.: 13875



QUOTE (Serge Smirnoff @ Oct 28 2012, 19:49) *
Usually 5-7 grades are necessary for each test file in order to achieve 5% reliability of rating. […] At the moment Opus received 9 grades only and its rating is completely unreliable. I will check now why it shows 4% …


I don't understand unsure.gif


--------------------
See my profile for measurements, tools and recommendations.
Go to the top of the page
+Quote Post
Serge Smirnoff
post Oct 28 2012, 20:11
Post #35





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



QUOTE (skamp @ Oct 28 2012, 23:03) *
QUOTE (Serge Smirnoff @ Oct 28 2012, 19:49) *
Usually 5-7 grades are necessary for each test file in order to achieve 5% reliability of rating. […] At the moment Opus received 9 grades only and its rating is completely unreliable. I will check now why it shows 4% …


I don't understand unsure.gif

5-7 grades for each test file, number of test files is 18. Opus received 9 grades in total, i.e. not all test files received even 1 grade.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
Serge Smirnoff
post Oct 29 2012, 02:14
Post #36





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



It turned out that reliability of ratings in 64kbit/s section was calculated according to old parameter N=9 which was in use before 2006. The last codec was added to the section in 2008 and since that time the section was a bit neglected. Probably because of that old N remained unnoticed. Fixed now. As all other codecs in the section have outdated reliability parameter we decided to put all of them on rotation for several days at least. Codecs with confirmed reliability (<5%) will be returned on hold. Testing of devices in other sections have been suspended as well. So now only codecs from 64kbit/s section are tested. Test files of newly added codecs will be given out more frequently because the old test files have more grades.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
Xanikseo
post Oct 31 2012, 00:21
Post #37





Group: Members
Posts: 27
Joined: 14-April 09
Member No.: 68951



One thing I can't comprehend, is why each of the files is encoded at a different bit rate. Why on earth are they not all encoded to 64kbps, it is the nature of VBR to produce files which may have bit rates higher or lower than the target.
Go to the top of the page
+Quote Post
eahm
post Oct 31 2012, 02:20
Post #38





Group: Members
Posts: 1085
Joined: 11-February 12
Member No.: 97076



QUOTE (Xanikseo @ Oct 30 2012, 16:21) *
One thing I can't comprehend, is why each of the files is encoded at a different bit rate. Why on earth are they not all encoded to 64kbps, it is the nature of VBR to produce files which may have bit rates higher or lower than the target.

Agree, I wanted to say this yesterday, he should do 64 on all of them then see what the codec does to optimize the bitrate.


--------------------
/1CcSkg3
Go to the top of the page
+Quote Post
jensend
post Oct 31 2012, 05:49
Post #39





Group: Members
Posts: 145
Joined: 21-May 05
Member No.: 22191



QUOTE (Serge Smirnoff @ Oct 14 2012, 14:55) *
Conversion chain for Opus: 44.1/16 -->> 48/24(Audition CS6) -->> opusenc -->> foobar2000(48/24) -->> 44.1/16(Audition CS6)

Why on earth would you do that? You should just use opusenc and opusdec. opusenc will happily accept your 44.1/16 input and running the result through opusdec will by default give you 44.1/16 output.

opusenc and opusdec will handle any resampling internally and transparently, even making sure that opusdec's output has the exact same number of samples as the input to opusenc had. Using a convoluted Audition/foobar/Audition toolchain is more likely to introduce extraneous problems.

Also, I agree with other posters that your bitrate setting selections for these codecs seems rather odd. In concluding that these are the rates that "really" give an average 64kbps output, have you really tried this with a large and diverse collection, or are you jumping to conclusions based on a few choice files? If the latter, you may well be unjustly penalizing codecs that have better VBR rate control.
Go to the top of the page
+Quote Post
Serge Smirnoff
post Oct 31 2012, 09:31
Post #40





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



Bitrate issue: they are calculated on the basis of nine SE samples concatenated with each other; so, all 5 added codecs produce almost the same bitrate for this bunch of test samples. The approach is used from the beginning of SE and has its pros and cons. Calculation of target bitrates on a large sound collection also has its drawback – for classical, rock, minimal …. music collections the bitrates will be different due to different complexity of music styles. If you mix them you'll get some arbitrary averages.

Resampling issue: I couldn't find info about internal Opus resampler and its quality except that “The opus-tools package source code contains a small, high quality, high performance, BSD licensed resampler which can be used where resampling is required”. On the other hand Audition CS6 has one of the best (http://src.infinitewave.ca/ ). Another reason for using external resampler is that OpusDec can't produce 24|32bit output which is necessary for SE utility that generates test files. To be accurate the resulting conversion chain was 44.1/16 -->> 48/24(Audition CS6) -->> opusenc -->> foobar2000(48/24) -->> 44.1/32(Audition CS6) -->> test files production.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
Xanikseo
post Oct 31 2012, 15:25
Post #41





Group: Members
Posts: 27
Joined: 14-April 09
Member No.: 68951



I am also generally suspicious of the other results on the website. How is it that files at 320kbps have a higher rating than at 256kbps, when, for certain encoders, 256kbps is already well above the transparency threshold?

This post has been edited by Xanikseo: Oct 31 2012, 15:25
Go to the top of the page
+Quote Post
Remedial Sound
post Oct 31 2012, 16:03
Post #42





Group: Members
Posts: 508
Joined: 5-January 06
From: Dublin
Member No.: 26898



It would seem there are 10 people in the world, those who understand perceptual transparency and those who don't. A lossy encoding either achieves it or doesn't. To say that 320 is superior to 256 when both are transparent is utter rubbish.
Go to the top of the page
+Quote Post
Serge Smirnoff
post Apr 11 2013, 23:01
Post #43





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



Detailed results of this listening test are available.



This post has been edited by Serge Smirnoff: Apr 11 2013, 23:28


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
greynol
post Apr 13 2013, 00:25
Post #44





Group: Super Moderator
Posts: 10000
Joined: 1-April 04
From: San Francisco
Member No.: 13167



QUOTE (greynol @ Oct 15 2012, 14:01) *
I too am skeptical about the relevance of SE tests

Looking over this discussion I noticed:
QUOTE (Serge Smirnoff @ Oct 14 2012, 17:12) *
Below 128kbit/s artifact amplification is not applied. Outputs of codecs are used as is.

With this in mind, I think this is a worthwhile test for our members.

Thank you for your hard work, Serge.


--------------------
I should publish a list of forum idiots.
Go to the top of the page
+Quote Post
LithosZA
post Apr 13 2013, 07:10
Post #45





Group: Members
Posts: 197
Joined: 26-February 11
Member No.: 88525



From this I can gather that Vorbis should give about the same quality as Opus at 64Kbps?
Go to the top of the page
+Quote Post
Serge Smirnoff
post Apr 13 2013, 13:16
Post #46





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



Thanks, greynol.

I still have some questions concerning stat. analysis of the results. I started new thread to clear them up.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
Serge Smirnoff
post Nov 28 2013, 12:23
Post #47





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



Below are results using slightly different statistical analysis of the same collected grades. Changes:
  1. Resulting mean of each codec (black ones) is an average of its nine sample means (previously it was an average of all grades submitted for the codec). Its bootstrapped confidence intervals are also computed using these nine sample means and show therefore consistency of codec performance with different types of audio material.
  2. All bootstrapped confidence intervals of means are computed using basic percentile method which is more simple and clear (previously bias corrected and accelerated percentile method was used)



Some reasoning, which backs the changes in analysis is here - http://www.hydrogenaudio.org/forums/index....st&p=850741

This post has been edited by Serge Smirnoff: Nov 28 2013, 13:14


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
Serge Smirnoff
post Mar 29 2014, 21:53
Post #48





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



Following this very painful but insightful discussion it became clear that the above calculation of overall confidence intervals (wide ones) using sample means is not correct. Any arbitrary set of samples (especially small one) chosen for a listening test is representative to some unknown/undefined general population of music. Consequently, the confidence intervals calculated for this unknown population have little or no meaning. Results of such test can't be generalized beyond this set of samples. Loosing that generalization allows to discard separate samples from analysis of overall means and consider all grades of all samples as a single/indivisible entity. Confidence intervals of overall means calculated using grades turn out to be small. Such increase of test power is a reward for loosing generalization of test.

Taking all this into account SE returns to initial/standard calculation of overall confidence intervals. The correct version is below.



--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post

2 Pages V  < 1 2
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 24th September 2014 - 00:22