IPB

Welcome Guest ( Log In | Register )

SoundExpert explained, Methodology issues
Serge Smirnoff
post Nov 24 2010, 13:27
Post #1





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



I found this thread among SoundExpert referals and was a bit surprised with almost complete misunderstanding of SE testing methodology and particularly how diff signal is used in SE audio quality metrics. Discussion on the topic from 2006 actually seems more meaningful. So I decided to post here some SE basics for reference purposes. I will use a thought experiment which is close to reality though.

Suppose we have two sound signals the main and the side one. They could be for example a short piano passage and some noise. We can prepare several mixes of them in different proportions:
  • equal levels of main and side signals (0dB RMS)
  • half level of side signal (-6dB RMS)
  • quarter level of side signal (-12dB RMS)
  • 1/8 level of side signal (-18dB RMS)
  • 1/16 level of side signal (-24dB RMS)

After normalization all mixes have equal levels and we can evaluate perceptibility of the side signal in the mixes. Here at SE we found that this perceptibility is a monotonous function of side signal level and looks like this:

Figure: Side signal perception

(1) In other words, there is a relationship between objectively measured level of side signal and its subjectively estimated perceptibility in the mix. And what is more:
(a) this relationship is well described by 2-nd order curve (assuming levels are in dB)
(b) the relationship holds for any sound signals whether they are correlated or not, the only differences are position and curvature of the curve.

(2) These side stimulus perceptibility curves are the core of SE rating mechanism. Each device under test has its own curve plotted on basis of SE online listening tests.
(3) Side signals are difference signals of devices being tested. Levels of side signals are expressed in dB of Difference level parameter which is exactly equal to RMS level of side signal in our case.
(4) Subjective grades of perceptibility are anchor points of 5-grade impairment scale.
(5) Audio metrics beyond threshold of audibility is determined by extrapolation of that 2-nd order curves. Virtual grades in extrapolated area could be considered as objective quality parameters regarding human auditory peculiarities.

So, yes, difference signal is used in SE testing. We take into account both its level and how human auditory system perceives it together with reference signal. Some difference signals having fairly high levels still remain almost imperceptible against the background of reference signal and vice versa; perceptibility curves reflect this.

This is the concept. Many parts of it still need thorough verification in carefully designed listening tests, which are beyond SE possibilities. All we can do is to analyze collected grades returned by SE visitors. This will be done for sure and yet this can't be a replacement of properly organized listening tests.

SE testing methodology is new and questionable, but all assumptions look reasonable and SE ratings promising, at least to me. Time will show.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
 
Start new topic
Replies
greynol
post Nov 28 2010, 20:14
Post #2





Group: Super Moderator
Posts: 10009
Joined: 1-April 04
From: San Francisco
Member No.: 13167



That's a mighty big if.

For years people have requested verification and none has been forthcoming.

This post has been edited by greynol: Nov 28 2010, 20:17


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
2Bdecided
post Nov 29 2010, 11:49
Post #3


ReplayGain developer


Group: Developer
Posts: 5148
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (greynol @ Nov 28 2010, 19:14) *
That's a mighty big if.

For years people have requested verification and none has been forthcoming.
I think something like it is justified.

I think it's commonly accepted* that signal detection (e.g. artefact detection in these tests) is a psychometric function - an S-curve, generated by integrating a Gaussian distribution...



X-axis is level, and y axis is chance of detecting the artefact.

If you know the function takes this shape, then it's apparently that you don't need to test at the threshold. You can test at several levels somewhat above threshold, and fit the resulting data to this graph/shape, thus giving you the actual threshold value.

The major problem with this is that, if you are testing only a long way above threshold, then very minor errors in the data will give huge errors in the threshold estimate because the fit to the graph could be wildly wrong.


Now, Sound Expert isn't doing this - it's at least one step away from it, testing at levels where people can hear the artefact all the time, and asking them how bad it sounds.

As you say, we have no proof that a graph of these results can be extrapolated back to find the threshold.


An obvious criticism is that two different kinds of artefacts, 12dB above threshold, might give very different results - i.e. one might be far more annoying than the other. But that's not necessarily a failing - if just means the curve might be steeper for one than the other - which would become apparent with more points on the curve (e.g. 6dB and 18dB, for example) so could be accounted for by the method.


It would be interesting to try to prove/disprove all this. A good starting point might be to take one of the archived listening tests from from HA with known results, and use exactly the same samples on SoundExpert. The results should speak for themselves.

Cheers,
David.
Go to the top of the page
+Quote Post
Porcus
post Nov 29 2010, 13:00
Post #4





Group: Members
Posts: 1848
Joined: 30-November 06
Member No.: 38207



QUOTE (2Bdecided @ Nov 29 2010, 11:49) *
I think it's commonly accepted* that signal detection (e.g. artefact detection in these tests) is a psychometric function - an S-curve, generated by integrating a Gaussian distribution...


Think you thought of a footnote text corresponding to that asterisk?

Anyway, the "signoid" curve need not be the Gaussian cumulative distribution, though that is one common choice; another is the logistic distribution (more: http://en.wikipedia.org/wiki/Link_function#Link_function ). I'd guess a "signoid" in this context would mean any positive smooth strictly increasing convex-and-then-concave function symmetric about (0,1/2), i.e., corresponding to a unimodal symmetric distribution, absolutely continuous and of full support.

Each of these choices will constitute parametric models, meaning that you make the assumption that a certain (parametric) family of functions will be a good fit to reality. Then you fit the parameters to find the best-fit-within-the-family. Then if a model fits well from level A to level B (where all your observations are), then it is common practice to infer that it should perform acceptably at least from somewhere below A to somewhere above B as well. How far you can extrapolate, does of course depend on circumstances.


Now I think -- though this is outside my field of expertice -- that choice of link function is more crucial for wider extrapolations. Again, this depends a bit on circumstances; for example, in an ABX listening test, the interesting issue is whether you guess better than 50%, while in diagnosis of rare diseases -- or default of sovereign bonds -- you are already in the tail of the distribution.

This post has been edited by Porcus: Nov 29 2010, 13:12


--------------------
One day in the Year of the Fox came a time remembered well
Go to the top of the page
+Quote Post
2Bdecided
post Nov 29 2010, 16:27
Post #5


ReplayGain developer


Group: Developer
Posts: 5148
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (Porcus @ Nov 29 2010, 12:00) *
QUOTE (2Bdecided @ Nov 29 2010, 11:49) *
I think it's commonly accepted* that signal detection (e.g. artefact detection in these tests) is a psychometric function - an S-curve, generated by integrating a Gaussian distribution...


Think you thought of a footnote text corresponding to that asterisk?
Yes! Must have deleted it by mistake...

King-Smith, P. E., & Rose, D. (1997). Principles of an adaptive method for measuring the slope of the psy-chometric function. Vision Research, 37(12), 1595-
1604. [PubMed]

...though I've lost my copy of the article - I cited it a decade ago so I must have thought it made sense back then.

QUOTE
Anyway, the "signoid" curve need not be the Gaussian cumulative distribution, though that is one common choice; another is the logistic distribution (more: http://en.wikipedia.org/wiki/Link_function#Link_function ). I'd guess a "signoid" in this context would mean any positive smooth strictly increasing convex-and-then-concave function symmetric about (0,1/2), i.e., corresponding to a unimodal symmetric distribution, absolutely continuous and of full support.

Each of these choices will constitute parametric models, meaning that you make the assumption that a certain (parametric) family of functions will be a good fit to reality. Then you fit the parameters to find the best-fit-within-the-family. Then if a model fits well from level A to level B (where all your observations are), then it is common practice to infer that it should perform acceptably at least from somewhere below A to somewhere above B as well. How far you can extrapolate, does of course depend on circumstances.

Now I think -- though this is outside my field of expertice -- that choice of link function is more crucial for wider extrapolations.
Yep, I agree with all of that. I think the "slightly" different curve shapes don't matter as much as you might expect in practice here since the psychometric data is likely to be rather rough anyway. If you get data on the steep part of the curve, you can probably do quite well even if you're not sure of the shape. If you get data on the shallow part of the curve, you're in more trouble if you don't know the exact shape, but you were already way off anyway.


Measured psychoacoustic thresholds are often 70% (because the procedure is often the one that I quoted on the previous page).

The 50% point on the S-curve doesn't correspond to the "getting better than 50% means it's not just chance" in ABX. In ABX, if you can't hear a thing, you'll (on average) score 50%. That's way off to the left on the S-curve. I guess that 50% on the S-curve gives a 75% score on ABX (???), which can give a very low p (depending on the number of trials).

Cheers,
David.

This post has been edited by 2Bdecided: Nov 29 2010, 16:29
Go to the top of the page
+Quote Post
knutinh
post Nov 30 2010, 09:53
Post #6





Group: Members
Posts: 569
Joined: 1-November 06
Member No.: 37047



QUOTE (2Bdecided @ Nov 29 2010, 16:27) *
QUOTE (Porcus @ Nov 29 2010, 12:00) *
QUOTE (2Bdecided @ Nov 29 2010, 11:49) *
I think it's commonly accepted* that signal detection (e.g. artefact detection in these tests) is a psychometric function - an S-curve, generated by integrating a Gaussian distribution...


Think you thought of a footnote text corresponding to that asterisk?
Yes! Must have deleted it by mistake...

King-Smith, P. E., & Rose, D. (1997). Principles of an adaptive method for measuring the slope of the psy-chometric function. Vision Research, 37(12), 1595-
1604. [PubMed]

...though I've lost my copy of the article - I cited it a decade ago so I must have thought it made sense back then.

QUOTE
Anyway, the "signoid" curve need not be the Gaussian cumulative distribution, though that is one common choice; another is the logistic distribution (more: http://en.wikipedia.org/wiki/Link_function#Link_function ). I'd guess a "signoid" in this context would mean any positive smooth strictly increasing convex-and-then-concave function symmetric about (0,1/2), i.e., corresponding to a unimodal symmetric distribution, absolutely continuous and of full support.

Each of these choices will constitute parametric models, meaning that you make the assumption that a certain (parametric) family of functions will be a good fit to reality. Then you fit the parameters to find the best-fit-within-the-family. Then if a model fits well from level A to level B (where all your observations are), then it is common practice to infer that it should perform acceptably at least from somewhere below A to somewhere above B as well. How far you can extrapolate, does of course depend on circumstances.

Now I think -- though this is outside my field of expertice -- that choice of link function is more crucial for wider extrapolations.
Yep, I agree with all of that. I think the "slightly" different curve shapes don't matter as much as you might expect in practice here since the psychometric data is likely to be rather rough anyway. If you get data on the steep part of the curve, you can probably do quite well even if you're not sure of the shape. If you get data on the shallow part of the curve, you're in more trouble if you don't know the exact shape, but you were already way off anyway.


Measured psychoacoustic thresholds are often 70% (because the procedure is often the one that I quoted on the previous page).

The 50% point on the S-curve doesn't correspond to the "getting better than 50% means it's not just chance" in ABX. In ABX, if you can't hear a thing, you'll (on average) score 50%. That's way off to the left on the S-curve. I guess that 50% on the S-curve gives a 75% score on ABX (???), which can give a very low p (depending on the number of trials).

Cheers,
David.

Why are not such tests more used with monotonically degrading stuff like lossy encoders? I have made a simple matlab-script for adaptively "honing in" on the most interesting part of the degradation, but reading a paper about the statistics in such tests made me remember how much I have forgotten from my statistics classes.

-k
Go to the top of the page
+Quote Post
Porcus
post Nov 30 2010, 11:28
Post #7





Group: Members
Posts: 1848
Joined: 30-November 06
Member No.: 38207



QUOTE (knutinh @ Nov 30 2010, 09:53) *
Why are not such tests more used with monotonically degrading stuff like lossy encoders? I have made a simple matlab-script for adaptively "honing in" on the most interesting part of the degradation, but reading a paper about the statistics in such tests made me remember how much I have forgotten from my statistics classes.


Because audiophiles tend to shun science? Ooops, did I even say that?


--------------------
One day in the Year of the Fox came a time remembered well
Go to the top of the page
+Quote Post

Posts in this topic
- Serge Smirnoff   SoundExpert explained   Nov 24 2010, 13:27
- - drewfx   What is the justification for the "dashed...   Nov 24 2010, 18:20
|- - Serge Smirnoff   QUOTE (drewfx @ Nov 24 2010, 21:20) What ...   Nov 24 2010, 20:00
||- - drewfx   QUOTE (Serge Smirnoff @ Nov 24 2010, 14:0...   Nov 24 2010, 20:24
||- - Serge Smirnoff   QUOTE (drewfx @ Nov 24 2010, 23:24) Exact...   Nov 24 2010, 21:49
|- - Porcus   QUOTE (drewfx @ Nov 24 2010, 18:20) What ...   Nov 27 2010, 15:49
|- - drewfx   QUOTE (Porcus @ Nov 27 2010, 09:49) QUOTE...   Nov 29 2010, 18:43
|- - greynol   QUOTE (drewfx @ Nov 29 2010, 09:43) And t...   Nov 29 2010, 19:18
|- - Serge Smirnoff   QUOTE (greynol @ Nov 29 2010, 22:18) Some...   Nov 29 2010, 20:21
- - drewfx   Just to be clear - I am not necessarily questionin...   Nov 24 2010, 22:17
|- - Serge Smirnoff   If you want to build human-hearing-oriented audio ...   Nov 25 2010, 00:24
||- - alexeysp   QUOTE (Serge Smirnoff @ Nov 25 2010, 01:2...   Nov 25 2010, 11:35
||- - Serge Smirnoff   QUOTE (alexeysp @ Nov 25 2010, 13:35) ...   Nov 25 2010, 19:33
|- - knutinh   QUOTE (drewfx @ Nov 24 2010, 22:17) I rep...   Nov 25 2010, 19:15
|- - Serge Smirnoff   QUOTE (knutinh @ Nov 25 2010, 21:15) If t...   Nov 25 2010, 19:49
|- - Kees de Visser   In the recently closed thread which the OP referre...   Nov 25 2010, 21:39
- - 2Bdecided   Just to be clear, your graph example shows grades ...   Nov 25 2010, 12:30
|- - Serge Smirnoff   QUOTE (2Bdecided @ Nov 25 2010, 14:30) Ju...   Nov 25 2010, 23:50
- - Woodinville   QUOTE (Serge Smirnoff @ Nov 24 2010, 04:2...   Nov 26 2010, 08:25
|- - Serge Smirnoff   QUOTE (Woodinville @ Nov 26 2010, 10:25) ...   Nov 26 2010, 16:25
|- - Woodinville   QUOTE (Serge Smirnoff @ Nov 26 2010, 07:2...   Nov 27 2010, 07:17
|- - Serge Smirnoff   QUOTE (Woodinville @ Nov 27 2010, 09:17) ...   Nov 27 2010, 08:29
|- - Woodinville   QUOTE (Serge Smirnoff @ Nov 26 2010, 23:2...   Nov 27 2010, 23:05
|- - knutinh   QUOTE (Woodinville @ Nov 27 2010, 23:05) ...   Nov 28 2010, 19:24
- - greynol   That's a mighty big if. For years people have...   Nov 28 2010, 20:14
|- - Kees de Visser   The technique isn't new, according to this AES...   Nov 28 2010, 21:35
||- - Serge Smirnoff   QUOTE (Kees de Visser @ Nov 29 2010, 00:3...   Nov 28 2010, 22:47
|- - 2Bdecided   QUOTE (greynol @ Nov 28 2010, 19:14) That...   Nov 29 2010, 11:49
|- - Porcus   QUOTE (2Bdecided @ Nov 29 2010, 11:49) I ...   Nov 29 2010, 13:00
|- - 2Bdecided   QUOTE (Porcus @ Nov 29 2010, 12:00) QUOTE...   Nov 29 2010, 16:27
|- - Porcus   [Heavily edited] QUOTE (2Bdecided @ Nov 29 2...   Nov 29 2010, 16:47
|- - knutinh   QUOTE (2Bdecided @ Nov 29 2010, 16:27) QU...   Nov 30 2010, 09:53
|- - Porcus   QUOTE (knutinh @ Nov 30 2010, 09:53) Why ...   Nov 30 2010, 11:28
|- - knutinh   QUOTE (Porcus @ Nov 30 2010, 11:28) QUOTE...   Nov 30 2010, 11:34
- - greynol   If we aren't going to consider real-world usag...   Nov 29 2010, 20:27
|- - Serge Smirnoff   QUOTE (greynol @ Nov 29 2010, 23:27) What...   Nov 29 2010, 20:36
- - greynol   Breaking masking by amplifying a difference signal...   Nov 29 2010, 20:45
|- - Serge Smirnoff   QUOTE (greynol @ Nov 29 2010, 23:45) Brea...   Nov 29 2010, 21:19
|- - Kees de Visser   QUOTE (greynol @ Nov 29 2010, 21:45) Brea...   Nov 29 2010, 23:21
|- - greynol   QUOTE (Kees de Visser @ Nov 29 2010, 14:2...   Nov 30 2010, 08:19
- - greynol   How so?   Nov 29 2010, 21:31
|- - Serge Smirnoff   QUOTE (greynol @ Nov 30 2010, 00:31) How ...   Nov 29 2010, 22:10
- - SebastianG   QUOTE (Serge Smirnoff @ Nov 24 2010, 13:2...   Nov 29 2010, 22:04
- - Woodinville   Using a difference signal as a signal-detection te...   Nov 29 2010, 22:14
|- - Porcus   QUOTE (Woodinville @ Nov 29 2010, 22:14) ...   Nov 29 2010, 23:00
||- - Woodinville   QUOTE (Porcus @ Nov 29 2010, 14:00) QUOTE...   Nov 30 2010, 00:26
|- - Serge Smirnoff   QUOTE (Woodinville @ Nov 30 2010, 01:14) ...   Nov 30 2010, 09:20
- - Serge Smirnoff   QUOTE (SebastianG @ Nov 30 2010, 01:04) I...   Nov 30 2010, 09:09
|- - 2Bdecided   QUOTE (Serge Smirnoff @ Nov 30 2010, 08:0...   Nov 30 2010, 16:24
|- - Serge Smirnoff   QUOTE (2Bdecided @ Nov 30 2010, 19:24) Ho...   Nov 30 2010, 17:38
|- - Woodinville   QUOTE (Serge Smirnoff @ Nov 30 2010, 08:3...   Dec 1 2010, 03:11
|- - Serge Smirnoff   QUOTE (Woodinville @ Dec 1 2010, 06:11) Q...   Dec 1 2010, 09:17
|- - Woodinville   QUOTE (Serge Smirnoff @ Dec 1 2010, 00:17...   Dec 1 2010, 22:03
|- - Kees de Visser   QUOTE (Woodinville @ Dec 1 2010, 23:03) T...   Dec 1 2010, 23:47
||- - Woodinville   QUOTE (Kees de Visser @ Dec 1 2010, 14:47...   Dec 1 2010, 23:55
||- - greynol   QUOTE (Woodinville @ Dec 1 2010, 14:55) s...   Dec 2 2010, 06:47
||- - Serge Smirnoff   QUOTE (Woodinville @ Dec 2 2010, 02:55) T...   Dec 2 2010, 08:53
||- - Kees de Visser   QUOTE (Woodinville @ Dec 2 2010, 00:55) T...   Dec 2 2010, 09:35
||- - greynol   QUOTE (Kees de Visser @ Dec 2 2010, 00:35...   Dec 2 2010, 10:34
||- - 2Bdecided   QUOTE (Kees de Visser @ Dec 2 2010, 08:35...   Dec 2 2010, 11:25
|||- - Kees de Visser   QUOTE (2Bdecided @ Dec 2 2010, 12:25) Com...   Dec 2 2010, 13:09
||||- - 2Bdecided   QUOTE (Kees de Visser @ Dec 2 2010, 12:09...   Dec 2 2010, 16:04
|||||- - Kees de Visser   QUOTE (2Bdecided @ Dec 2 2010, 17:04) QUO...   Dec 2 2010, 17:52
|||||- - Serge Smirnoff   QUOTE (2Bdecided @ Dec 2 2010, 19:04) Now...   Dec 2 2010, 19:24
||||- - greynol   QUOTE (Kees de Visser @ Dec 2 2010, 04:09...   Dec 2 2010, 19:15
|||- - Serge Smirnoff   QUOTE (2Bdecided @ Dec 2 2010, 14:25) Com...   Dec 2 2010, 13:10
||- - Woodinville   QUOTE (Kees de Visser @ Dec 2 2010, 00:35...   Dec 3 2010, 00:32
|- - Serge Smirnoff   QUOTE (Woodinville @ Dec 2 2010, 01:03) S...   Dec 2 2010, 09:01
- - Porcus   Joking aside: I'd be surprised if MPEG didn...   Nov 30 2010, 12:03
- - 2Bdecided   I can see how this could work for a simple low pas...   Dec 1 2010, 16:26
- - Serge Smirnoff   QUOTE (2Bdecided @ Dec 1 2010, 19:26) Wit...   Dec 2 2010, 09:41
- - 2Bdecided   QUOTE (Serge Smirnoff @ Dec 2 2010, 08:41...   Dec 2 2010, 11:32
- - Serge Smirnoff   QUOTE (2Bdecided @ Dec 2 2010, 14:32) If ...   Dec 2 2010, 12:18


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 1st October 2014 - 17:01