IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
SE listening test @96kbit/s, codecs/settings - testing - results
Serge Smirnoff
post Apr 23 2013, 21:37
Post #1





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



The following codecs are going to be added to 96kbit/s section for listening tests:

AAC VBR@89.8 (Winamp 5.63) - CVBR, LC-AAC
AAC Encoder v1.04 (Fraunhofer IIS) from Winamp 5.63: variable Bitrate, preset: 3

AAC VBR@92.1 (QTime 7.7.3) - TVBR, LC-AAC
QuickTime (7.7.3) AAC Encoder via qaac 2.18 (CoreAudioToolbox 7.9.8.2): qaac -V45 ref.wav

AAC VBR@90.0 (NeroRef 1540) - CVBR, LC-AAC
Nero AAC Encoder 1.5.4.0 (build 2010-02-18): neroAacEnc.exe -q 0.34 -if ref.wav -of out.mp4

Vorbis VBR@90.4 (Xiph 1.3.3)
OggEnc v2.87 (libVorbis 1.3.3): oggenc2 -q2.2 ref.wav

Opus VBR@90.7 (libopus 1.0.2)
opusenc --bitrate 90 ref48.wav (44.1/16 -> 48/24 by Audition CS6)

    mp3 VBR@89.1 (Lame 3.99.5)
    encode: lame -V7 ref.wav
    decode: MAD 32kHz/32bit -> 44.1kHz/24bit by Audition CS6

    mp3 VBR@90.2 (Lame 3.99.5)
    encode: lame -V6.9 ref.wav
    decode: MAD 32kHz/32bit -> 44.1kHz/24bit by Audition CS6

Not sure what variant of lame settings to choose - first one uses more usual "-V7" setting and the second - has more appropriate target bitrate.
Other suggestions/remarks are welcome as well.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
ManekiNeko
post Apr 23 2013, 23:11
Post #2





Group: Members
Posts: 79
Joined: 7-April 09
Member No.: 68742



QUOTE (Serge Smirnoff @ Apr 23 2013, 21:37) *
Vorbis VBR@90.4 (Xiph 1.3.3)
OggEnc v2.87 (libVorbis 1.3.3): oggenc2 -q2.2 ref.wav


At this bitrate you should be using (or at least including) Oggenc2.87 using aoTuVb6.03 which is tuned for lower bitrates/presets.
Go to the top of the page
+Quote Post
eahm
post Apr 24 2013, 02:05
Post #3





Group: Members
Posts: 1056
Joined: 11-February 12
Member No.: 97076



Why don't you use ABR 96 on all of them?


--------------------
/lwAsIimz
Go to the top of the page
+Quote Post
Serge Smirnoff
post Apr 24 2013, 10:06
Post #4





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



QUOTE (ManekiNeko @ Apr 24 2013, 02:11) *
At this bitrate you should be using (or at least including) Oggenc2.87 using aoTuVb6.03 which is tuned for lower bitrates/presets.

QUOTE (eahm @ Apr 24 2013, 05:05) *
Why don't you use ABR 96 on all of them?

The idea is to use the encoders with default settings if possible, assuming they are recommended by developers. Also it is good to make this test in succession to the previous one @64.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
eahm
post Apr 24 2013, 16:49
Post #5





Group: Members
Posts: 1056
Joined: 11-February 12
Member No.: 97076



What do you mean by "default settings"? You are not using any "default setting" ("qaac file.wav" for example), almost any AAC CLI has usually ~128 kbps as "default settings". Like you are preparing it is not a 96 kbps listening test.

Not trying to be rude but I've never liked your lack of expertise and you call yourself Sound Expert.

This post has been edited by eahm: Apr 24 2013, 16:50


--------------------
/lwAsIimz
Go to the top of the page
+Quote Post
benski
post Apr 24 2013, 18:11
Post #6


Winamp Developer


Group: Developer
Posts: 670
Joined: 17-July 05
From: Brooklyn, NY
Member No.: 23375



QUOTE (eahm @ Apr 24 2013, 10:49) *
What do you mean by "default settings"? You are not using any "default setting" ("qaac file.wav" for example), almost any AAC CLI has usually ~128 kbps as "default settings". Like you are preparing it is not a 96 kbps listening test.

Not trying to be rude but I've never liked your lack of expertise and you call yourself Sound Expert.


TOS #2?

Anyway, there is plenty of precedence for using VBR settings in listening tests even when those settings do not produce the desired bitrate on the test material. Historically, this is done by testing the settings on a large corpus of music.

There has been plenty of discussion about the usefulness of of the results of these kinds of listening tests. His approach to conducting the test is scientifically valid, even though there are doubts over whether the methodology produces effects that correlate to human hearing. And the use of VBR over ABR is recommended for a variety of reasons.
Go to the top of the page
+Quote Post
Diow
post Apr 24 2013, 18:28
Post #7





Group: Members
Posts: 94
Joined: 4-June 06
From: Ponta Grossa,PR
Member No.: 31450



QUOTE (eahm @ Apr 24 2013, 12:49) *
What do you mean by "default settings"? You are not using any "default setting" ("qaac file.wav" for example), almost any AAC CLI has usually ~128 kbps as "default settings". Like you are preparing it is not a 96 kbps listening test.

Not trying to be rude but I've never liked your lack of expertise and you call yourself Sound Expert.


Default Settings = Recommended by Developers. The settings that they tune every release. smile.gif


--------------------
Sorry for my bad english.
Go to the top of the page
+Quote Post
eahm
post Apr 24 2013, 18:57
Post #8





Group: Members
Posts: 1056
Joined: 11-February 12
Member No.: 97076



QUOTE (Diow @ Apr 24 2013, 10:28) *
Default Settings = Recommended by Developers. The settings that they tune every release. smile.gif

LOL WHAT? Please explain further, post references, links, discussions where developers "tune" only one settings instead of the full encoder capabilities.

This post has been edited by eahm: Apr 24 2013, 18:58


--------------------
/lwAsIimz
Go to the top of the page
+Quote Post
benski
post Apr 24 2013, 19:18
Post #9


Winamp Developer


Group: Developer
Posts: 670
Joined: 17-July 05
From: Brooklyn, NY
Member No.: 23375



QUOTE (eahm @ Apr 24 2013, 12:57) *
LOL WHAT? Please explain further, post references, links, discussions where developers "tune" only one settings instead of the full encoder capabilities.


This is typical practice.

http://lame.cvs.sourceforge.net/viewvc/lam...amp;view=markup

Edit: I should clarify. There are typically a handful of presets and settings that are highly tuned. Anything in between these values is often done by interpolation of parameters between neighboring presets. This means that, in practice, the "preset" values are going to produce the best quality:bitrate ratio. Certainly there have been plenty of improvements that effect an encoder's overall quality. But the point remains that anything other than built-in presets will result in lesser quality (per bit) than the default preset settings.

This post has been edited by benski: Apr 24 2013, 19:31
Go to the top of the page
+Quote Post
Gainless
post Apr 24 2013, 21:31
Post #10





Group: Members
Posts: 169
Joined: 28-October 11
Member No.: 94764



QUOTE (benski @ Apr 24 2013, 20:18) *
Edit: I should clarify. There are typically a handful of presets and settings that are highly tuned. Anything in between these values is often done by interpolation of parameters between neighboring presets. This means that, in practice, the "preset" values are going to produce the best quality:bitrate ratio. Certainly there have been plenty of improvements that effect an encoder's overall quality. But the point remains that anything other than built-in presets will result in lesser quality (per bit) than the default preset settings.

I wonder how well the interpolation of e.g. 2 presets over the whole bitrate spectrum would work, one for acceptable quality at a low bitrate, and the other one for transparency with maximum compression. Shouldn't this theoretically give a more or less ideal efficiency, if these presets were perfectly tuned?

This post has been edited by Gainless: Apr 24 2013, 21:34
Go to the top of the page
+Quote Post
Serge Smirnoff
post Apr 27 2013, 10:07
Post #11





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



  • The following codecs were added to 96 kbit/s section for crowd-testing:
    AAC VBR@89.8 (Winamp 5.63) - CVBR, AAC LC
    AAC VBR@92.1 (QTime 7.7.3) - TVBR, AAC LC
    AAC VBR@90.0 (NeroRef 1540) - CVBR, AAC LC
    Vorbis VBR@90.4 (Xiph 1.3.3)
    Opus VBR@90.7 (libopus 1.0.2)
    mp3 VBR@90.2 (Lame 3.99.5)
  • At the moment all codecs from 96 kbit/s section are under test, though probability of getting test files of the newly added ones are higher as they have less grades.
  • Listening tests in 96kbit/s section, including this one, are performed without artifact amplification.
  • Besides usual live ratings on 96kbit/s page there will be full report similar to previous 64kbit/s test. This practice will also be usual from now.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
Gainless
post Apr 27 2013, 11:52
Post #12





Group: Members
Posts: 169
Joined: 28-October 11
Member No.: 94764



QUOTE (Serge Smirnoff @ Apr 27 2013, 11:07) *
  • The following codecs were added to 96 kbit/s section for crowd-testing:
    [indent]AAC VBR@89.8 (Winamp 5.63) - CVBR, AAC LC

Since when has the Winamp AAC encoder CVBR? I guess it's CBR which would be a bit unfair against Apple AAC.
Go to the top of the page
+Quote Post
Serge Smirnoff
post Apr 27 2013, 12:24
Post #13





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



QUOTE (Gainless @ Apr 27 2013, 14:52) *
Since when has the Winamp AAC encoder CVBR?

Winamp uses Fraunhofer AAC Codec with VBR encoding support since v5.62 and AFAIK the only aac encoder that utilizes TrueVBR is by Apple.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
C.R.Helmrich
post Apr 27 2013, 13:08
Post #14





Group: Developer
Posts: 686
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



Winamp's AAC VBR is somewhat between a CVBR and a TVBR. At high bit rates it's more like Apple's TVBR, at low bit rates it's more CVBR. Just call it VBR.

Chris


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
Serge Smirnoff
post Apr 27 2013, 13:17
Post #15





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



QUOTE (C.R.Helmrich @ Apr 27 2013, 16:08) *
Winamp's AAC VBR is somewhat between a CVBR and a TVBR. At high bit rates it's more like Apple's TVBR, at low bit rates it's more CVBR. Just call it VBR.

Thanks. Corrected.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
IgorC
post Apr 27 2013, 17:51
Post #16





Group: Members
Posts: 1553
Joined: 3-January 05
From: ARG/RUS
Member No.: 18803



Sergei,

I visit the russian audio-related sites. Your tests have null reception there as well as any other place. Even people of your mother tongue are strongly disagree with your tests.

Let's see. http://soundexpert.org/encoders-64-kbps
CODE
MP3 22.050 kHz - 2.79
Opus - 2.77
Vorbis - 2.49


So, are You suggesting us that MP3 22.050 kHz, 64 kbps is better than Opus and Vorbis, 48/44.1 kHz, 64 kbps?
It's not even possible neither realistic.


A good design is very necessary condition to avoid 90-100% of future flaws. It's impossible to correct anything with math or statistics after an end of test.

You can continue to ignore people disliking your tests or You can start from scratch and elaborate a good design for future tests.

This post has been edited by IgorC: Apr 27 2013, 17:59
Go to the top of the page
+Quote Post
Serge Smirnoff
post Apr 27 2013, 20:15
Post #17





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



QUOTE (IgorC @ Apr 27 2013, 20:51) *
Let's see. http://soundexpert.org/encoders-64-kbps
CODE
MP3 22.050 kHz - 2.79
Opus - 2.77
Vorbis - 2.49


So, are You suggesting us that MP3 22.050 kHz, 64 kbps is better than Opus and Vorbis, 48/44.1 kHz, 64 kbps?
It's not even possible neither realistic.


I uploaded three streams that were used to produce test files for the codecs you picked up - http://www.hydrogenaudio.org/forums/index....st&p=832552
If you encode SE test samples with the recent Lame you'll be surprised even more.

Also I would be more careful saying "better" when comparing such close values of codec averages.

In short - yes, those codecs have comparable sound quality.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
IgorC
post Apr 27 2013, 20:38
Post #18





Group: Members
Posts: 1553
Joined: 3-January 05
From: ARG/RUS
Member No.: 18803



Again. This is how your test is designed not what can happen in real scenario.



Your set of samples are >50% strongly tonal samples and have only one transient sample?

5 of 9 samples are strongly tonal.
1 ackward synthetic sound and only 1 transient sample? And that's all?

It's very unrepresentaive and waaaay out of real scenario.

And it's only a little start. There are way much gross flaws.


Please, listen what people are trying to tell you. They are diasgree with your tests. ALL of them. Not only hear them, but actually listen and try to understand them.

-хзнч

This post has been edited by IgorC: Apr 27 2013, 20:54
Go to the top of the page
+Quote Post
Serge Smirnoff
post Apr 27 2013, 20:58
Post #19





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



QUOTE (IgorC @ Apr 27 2013, 22:38) *
It's very unrepresentaive and waaaay out of real scenario.

There are multiple "real scenarios". I doubt you can find any finite representative set of test samples. SE tests codecs with these 9 sound samples since 2001, you can consider this test as Big Mac Index in audio. The latter is also not representative but still meaningful.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
IgorC
post Apr 27 2013, 21:05
Post #20





Group: Members
Posts: 1553
Joined: 3-January 05
From: ARG/RUS
Member No.: 18803



Let's talk about what we got on a table not jumping to philosofic endless talks.

5 tonals, 1 tranients -> not representative. Nada.
Why in the wotld You would do that? It's 5 vs 1. It's not balanced. Why?

Of course it's impossible to find 100% ideal representative set of samples. But why don't just include different types of samples?

A set of samples with equal amount of different types of samples (20% tonal samples, 20% transient, 20% speech, 20% mixed, 20% stereo etc) will be already much representative.

This post has been edited by IgorC: Apr 27 2013, 21:16
Go to the top of the page
+Quote Post
Serge Smirnoff
post Apr 27 2013, 21:27
Post #21





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



QUOTE (IgorC @ Apr 28 2013, 00:05) *
Let's talk about what we got on a table not jumping to philosofic endless talks.

5 tonals, 1 tranients -> not representative. Nada.
Why in the wotld You would do that? It's 5 vs 1. It's not balanced. Why?

Of course it's impossible to find 100% ideal representative set of samples. But why don't just include different types of samples?

A set of samples with equal amount of different types of samples (20% tonal samples, 20% transient, 20% speech, 20% mixed, 20% stereo etc) will be already much representative.


These sound samples were chosen more than 10 years ago. They represent different types of audio material. They will not be changed in the near future at least. Period.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
IgorC
post Apr 27 2013, 21:33
Post #22





Group: Members
Posts: 1553
Joined: 3-January 05
From: ARG/RUS
Member No.: 18803



Wunderbar,

Good luck with your tests.
Go to the top of the page
+Quote Post
Serge Smirnoff
post Nov 21 2013, 10:27
Post #23





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



Detailed results of this listening test are available.



Attempting to make graphical representation of results more informative and easy to interpret for an average user I tried a slightly different approach to calculation of resulting mean scores of codecs and to codecs comparison in general. Main ideas of the approach are below.

Analysis of collected grades ends at the point when means and confidence intervals for sound excerpts have been computed. A set of such means for each codec (colored ones) most completely characterizes its performance on chosen sound material. Further averaging of grades into the single parameter - overall mean discards too much information and its use for comparison of codecs is very questionable while comparing sets of means allows more comparison techniques to be elaborated and applied.

The simplest way to compare different sets of means with each other is to compare their averages. Here comes the first crude (preliminary) integral estimator of codec performance – average of means in the set. Confidence interval for this average (bootstrapping only as distribution of means in a set is not normal and varies from set to set) has clear and useful meaning from user's perspective – if more sound samples would be used in a listening test their average will get into that interval with high probability. Comparison of such confidence intervals therefore is another meaningful method for comparing sets of means. Different estimators of variance of means in a set also could be helpful (range of means in set are shown on SE rating bar graphs). As the most important question for end user of codecs looks something like “which codec is better?” even direct simple comparison of sets of means could give a clear answer. For example it could be conventionally defined that codec A better than codec B only if all means in a set of codec A are higher than corresponding means in a set of codec B (in other words all sound samples of codec A must be graded higher in listening test). Probability that it could happen by chance is very low and depends on number of sound excerpts in a set. So keeping and comparing the full sets of means seems more revealing, has stronger research potential and in most cases is more simple to perform and interpret than trying to make helpful inference comparing over-aggregated overall means of grades.

The figure above helps to make such comparisons visually and shows resulting averages with confidence intervals computed according to the described approach.

Raw grades collected for this listening test are in the article – http://soundexpert.org/news/-/blogs/opus-a...-kbit-s#results
Article about previous @64 listening test was also supplemented with similar graph, so you can compare both old and new – http://soundexpert.org/news/-/blogs/opus-a...n#update11-2013


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
Serge Smirnoff
post Mar 29 2014, 21:56
Post #24





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



Following this very painful but insightful discussion it became clear that the above calculation of overall confidence intervals (wide ones) using sample means is not correct. Any arbitrary set of samples (especially small one) chosen for a listening test is representative to some unknown/undefined general population of music. Consequently, the confidence intervals calculated for this unknown population have little or no meaning. Results of such test can't be generalized beyond this set of samples. Loosing that generalization allows to discard separate samples from analysis of overall means and consider all grades of all samples as a single/indivisible entity. Confidence intervals of overall means calculated using grades turn out to be small. Such increase of test power is a reward for loosing generalization of test.

Taking all this into account SE returns to initial/standard calculation of overall confidence intervals. The correct version is below.



--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 22nd August 2014 - 14:54