IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
SE listening test @128kbit/s, warning: with artifacts amplification
Serge Smirnoff
post Nov 28 2013, 12:28
Post #1





Group: Members
Posts: 376
Joined: 14-December 01
Member No.: 641



If somebody is interested in results of forthcoming SE listening test @128kbit/s despite questionable artifact amplification technique, that will be used in this test, please, propose your codec candidates.

Results of the test will be presented in the same detailed form as in previous @64 and @96 tests.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
Serge Smirnoff
post Mar 24 2014, 23:21
Post #2





Group: Members
Posts: 376
Joined: 14-December 01
Member No.: 641



Following codecs were added to 128kbit/s section:

AAC VBR@112.0 (Winamp 5.666) - VBR, AAC LC
AAC VBR@118.4 (iTunes 11.1.3) - TrueVBR, AAC LC
AAC VBR@117.5 (NeroRef 1540) - CVBR, AAC LC
Vorbis VBR@119.4 (Xiph 1.3.3)
Opus VBR@115.7 (libopus 1.1)
mp3 VBR@113.7 (Lame 3.99.5) - MPEG-1 Layer 3, VBR
AAC VBR@110.9 (libfdk 3.4.12) - MPEG-4 AAC LC, VBR
mpc VBR@123.3 (SV8)

All encoders have integer/discrete quality settings - http://soundexpert.org/news/-/blogs/opus-a...c-at-128-kbit-s


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
C.R.Helmrich
post Mar 25 2014, 10:57
Post #3





Group: Developer
Posts: 711
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



Sorry if this has been answered before, but: how much error amplification do you apply at 128 kbps stereo? 1 dB, or more?

And I'm surprised Fraunhofer's AAC encoder averages only 112 kbps on this item set. Do some samples include silence?

Chris

This post has been edited by C.R.Helmrich: Mar 25 2014, 10:59


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
Serge Smirnoff
post Mar 25 2014, 11:47
Post #4





Group: Members
Posts: 376
Joined: 14-December 01
Member No.: 641



QUOTE (C.R.Helmrich @ Mar 25 2014, 11:57) *
Sorry if this has been answered before, but: how much error amplification do you apply at 128 kbps stereo? 1 dB, or more?

Not all test items were amplified, only those with unnoticeable artifacts.
If amplification was applied then at least three amplified versions of a test item were produced - in most cases with +1dB, +3dB, +5dB amplification. It depends on particular codec/item, in some cases it was even +4dB +6dB +10dB. For higher bitrates amplification is usually higher as well.

QUOTE (C.R.Helmrich @ Mar 25 2014, 11:57) *
And I'm surprised Fraunhofer's AAC encoder averages only 112 kbps on this item set. Do some samples include silence?

SE test set usually results in lower bitrates than pop-music. It is closer to classical music material. Yes, some items contain silence. SE test sequence can be downloaded from http://soundexpert.org/sound-samples (bottom of the page)


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
LithosZA
post Mar 25 2014, 13:09
Post #5





Group: Members
Posts: 217
Joined: 26-February 11
Member No.: 88525



QUOTE
Not all test items were amplified, only those with unnoticeable artifacts.

I assume the same amplification would be applied to all codecs for that item?
Go to the top of the page
+Quote Post
Serge Smirnoff
post Mar 25 2014, 13:26
Post #6





Group: Members
Posts: 376
Joined: 14-December 01
Member No.: 641



QUOTE (LithosZA @ Mar 25 2014, 14:09) *
QUOTE
Not all test items were amplified, only those with unnoticeable artifacts.

I assume the same amplification would be applied to all codecs for that item?

No, as each test item is degraded by each codec differently, in each item/codec case the amplification is applied differently (if at all). If applied - three gradually degraded versions of an item are produced.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
C.R.Helmrich
post Mar 25 2014, 13:40
Post #7





Group: Developer
Posts: 711
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



QUOTE (Serge Smirnoff @ Mar 25 2014, 13:26) *
... each test item is degraded by each codec differently, in each item/codec case the amplification is applied differently (if at all). If applied - three gradually degraded versions of an item are produced.

But then how can you rank the codecs for such items?

Chris


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
Serge Smirnoff
post Mar 25 2014, 14:29
Post #8





Group: Members
Posts: 376
Joined: 14-December 01
Member No.: 641



QUOTE (C.R.Helmrich @ Mar 25 2014, 14:40) *
QUOTE (Serge Smirnoff @ Mar 25 2014, 13:26) *
... each test item is degraded by each codec differently, in each item/codec case the amplification is applied differently (if at all). If applied - three gradually degraded versions of an item are produced.

But then how can you rank the codecs for such items?

Two page doc explains the whole procedure of ranking - http://soundexpert.org/documents/10179/11017/se_igis.pdf
In short. Three (or more) gradually degraded test items are graded by testers as usual. Each test item then has two coordinates - level of waveform degradation (Difference level, dB) and subjective score [1-5]. These three points define a 2-nd order curve which shows the relationship between measurable degradation of waveform and perceived degradation of sound quality. Resulting score of the codec in such case is the point on the curve corresponding to Difference level of the item without amplification.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
C.R.Helmrich
post Apr 13 2015, 21:26
Post #9





Group: Developer
Posts: 711
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



Looking at the current live rankings makes me conclude that the following statistical outcome could occur.

Let us assume there are two codecs, A and B, which are tested using two test signals. Now, if

  • codec A has a mean score of 6 (i.e. transparent with a relatively strong margin) on both signals and
  • codec B has a mean score of 3 (clearly non-transparent) for the first and 11 (clearly transparent) for the second item,

then codec A averages 6 on both items, and codec B averages 7 on both items, meaning that:

codec A exhibits a lower mean score than codec B even though both signals are transparent for codec A while only one signal is transparent for codec B.

Is this correct? Looking at the current scores for the Vorbis encoders, the example does not seem far-off.

Chris


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
halb27
post Apr 14 2015, 06:47
Post #10





Group: Members
Posts: 2472
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



That's why looking at the bad case scenarios is much more relevant to me than looking at the average outcome. True for every listening test.


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
IgorC
post Apr 14 2015, 14:57
Post #11





Group: Members
Posts: 1643
Joined: 3-January 05
From: ARG/RUS
Member No.: 18803



When it comes to consistency of quality or anytime because consistency is important as much as level of quality (average score) then geometric mean score should be/is more representative.


P.S. There might be some other functions of averaging those can penalize a deviation of particular score from an average score. And it's a possibility to elaborate them for particular cases.

This post has been edited by IgorC: Apr 14 2015, 15:05
Go to the top of the page
+Quote Post
C.R.Helmrich
post Apr 14 2015, 19:56
Post #12





Group: Developer
Posts: 711
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



QUOTE (IgorC @ Apr 14 2015, 14:57) *
When it comes to consistency of quality or anytime because consistency is important as much as level of quality (average score) then geometric mean score should be/is more representative.

Yes, I also thought about recommending the geometric mean here. It would give 5.74 instead of 7. An alternative would be to apply some kind of compressor when computing the arithmetic mean, e.g.

outputScore(itemScore) = 5 * sqrt(0.4*itemScore - 1) if itemScore > 5
and
outputScore(itemScore) = itemScore otherwise.

The above formula would give an average of 6.11 for my earlier example.

Chris


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
IgorC
post Apr 16 2015, 01:28
Post #13





Group: Members
Posts: 1643
Joined: 3-January 05
From: ARG/RUS
Member No.: 18803



QUOTE (C.R.Helmrich @ Apr 14 2015, 15:56) *
outputScore(itemScore) = 5 * sqrt(0.4*itemScore - 1) if itemScore > 5
and
outputScore(itemScore) = itemScore otherwise.

Oh, and that formula? Reminds me of a formulating of my rules of thumbs tongue.gif

Anyway fair enough. Though I think these differences in numbers is because different encoders were tested in different times by different set of people.
And now a new codecs were added on top of that set of data which was recolected during several years. I don't know how to parse that. huh.gif

This post has been edited by IgorC: Apr 16 2015, 01:30
Go to the top of the page
+Quote Post
C.R.Helmrich
post Apr 16 2015, 07:53
Post #14





Group: Developer
Posts: 711
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



QUOTE (IgorC @ Apr 16 2015, 01:28) *
Oh, and that formula? Reminds me of a formulating of my rules of thumbs tongue.gif

Yes, it's quite rule-of-thumby smile.gif But only in the transparency range above a score of 5, where the infinite-score artifact-amplification thing itself could be considered a rule of thumb. BTW, I suggest to only apply the compressor when computing the overall average score, not in the computation of the per-item mean scores.

The whole point of me bringing this up is that I fear that encoders aiming for CONSTANT quality (i.e. small quality variance and NO overcoding for scores > 5, i.e. following the VBR principle) are being punished in this evaluation. This isn't so much the case in the lower-bit-rate rankings.

Chris

This post has been edited by C.R.Helmrich: Apr 16 2015, 07:56


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
IgorC
post Apr 17 2015, 14:10
Post #15





Group: Members
Posts: 1643
Joined: 3-January 05
From: ARG/RUS
Member No.: 18803



QUOTE (C.R.Helmrich @ Apr 16 2015, 03:53) *
The whole point of me bringing this up is that I fear that encoders aiming for CONSTANT quality (i.e. small quality variance and NO overcoding for scores > 5, i.e. following the VBR principle) are being punished in this evaluation. This isn't so much the case in the lower-bit-rate rankings.

Exactly. It can greatly benefits CBR or CBR-ish, ABR, constrained-VBR against pure quality based VBR. If some VBR encoder does 5.0 for two samples while CBR does 4.0 and 6.0 then there is no benefit from VBR. But it's clear that VBR is considerably better in real scenarios.

This post has been edited by IgorC: Apr 17 2015, 14:11
Go to the top of the page
+Quote Post
halb27
post Apr 17 2015, 15:05
Post #16





Group: Members
Posts: 2472
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



I'd prefer a formulation like: if encoder A yields 4.0 and 6.0 on 2 samples, while encoder B yields 5.0 for both of them, then judging from arithmetic average both encoders have equal quality, while probably everybody would prefer encoder B.

I can't see evidence correlating encoder A-behavior with CBR or a method which is similar to some extent, and encoder B-behavior with VBR. From theory with all other machinery the same I'd expect for a given average bitrate VBR to have the larger quality variance, but quality average on a higher level than CBR if VBR works well. With real world encoders of course things can be different in any direction.

This post has been edited by halb27: Apr 17 2015, 15:13


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 2nd August 2015 - 20:44