IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Codec overall average, what does it mean?
Serge Smirnoff
post Apr 13 2013, 13:07
Post #1





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



I'm not a great specialist in statistical analysis but I feel that something is fundamentally wrong with the one I make here. I stumbled upon a simple and very basic question.

Per-sample means of grades and their bootstrapped conf. intervals have clear and simple meaning. But overall average of all grades received for a codec with corresponding confidence interval seems to me meaningless, much like average temperature over a hospital; at least it is hard to interpret and compare such values. It looks more reasonable for me to compute final codec averages using per-sample means only, not all grades. While resulting averages using both methods are almost identical, the confidence intervals, interpretation and methods of further analysis are different. So my question in short - What population we consider while computing overall codec average population of grades or population of per-sample means?

I would be thankful if somebody cleared this for me.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
Woodinville
post Apr 14 2013, 22:08
Post #2





Group: Members
Posts: 1402
Joined: 9-January 05
From: JJ's office.
Member No.: 18957



QUOTE (Serge Smirnoff @ Apr 13 2013, 05:07) *
I'm not a great specialist in statistical analysis but I feel that something is fundamentally wrong with the one I make here. I stumbled upon a simple and very basic question.

Per-sample means of grades and their bootstrapped conf. intervals have clear and simple meaning. But overall average of all grades received for a codec with corresponding confidence interval seems to me meaningless, much like average temperature over a hospital; at least it is hard to interpret and compare such values. It looks more reasonable for me to compute final codec averages using per-sample means only, not all grades. While resulting averages using both methods are almost identical, the confidence intervals, interpretation and methods of further analysis are different. So my question in short - What population we consider while computing overall codec average population of grades or population of per-sample means?

I would be thankful if somebody cleared this for me.


This would certainly reveal more about codec performance. In fact, per-sample mean compared to overall mean by itself often tells a story.

Also: Confidence intervals tell a lot. When you find a sample with a high confidence interval, it usually means that different listeners respond very differently to the distortions in that sample.

This post has been edited by Woodinville: Apr 14 2013, 22:09


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
Serge Smirnoff
post Apr 15 2013, 01:02
Post #3





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



Ok. Using experimental nature of SE project and its forever-beta state, I'm going to introduce non-standard audio quality analysis and comparison at SE. Here is a draft:

1. Treat each sound sample as revealing some aspect(s) of codec performance. Mean and conf. intervals of the sample grades are quantitative estimators of that aspect(s). A collection of such means defines quality profile of the codec. This quality profile is specific to particular samples used, listening conditions and listening subjects. Comparison of codecs is in fact comparison of their quality profiles.

2. Integral parameter of a quality profile is mean of its mean collection. As there is no possibility to make any assumptions about distribution of means in collection only non-parametric estimators are allowed. Bootstrap confidence interval could be sufficient though.

3. In order to compare different codecs (their quality profiles) some simple and clear criteria are necessary. They could be for example as follows:
some codec A considered to be better than codec B if ALL means of profile A are higher than corresponding means of profile B (Low Criterion)

For uncompromising audio purists and statisticians there could be a more rigorous criterion:
the same as Low Criterion but with additional requirement of non-overlapping corresponding confidence intervals (High Criterion)


Three degrees of “better” follow from this:

  1. if overall mean of a codec is higher, but Low Criterion is NOT met (some samples were graded higher, some - lower) such codec is “conditionally better”; comparison of per-sample means could reveal those "conditions" (particular weaknesses of the codec)
  2. if overall mean of a codec is higher, and Low Criterion is met (all samples were graded higher) such codec is “better”
  3. if overall mean of a codec is higher, and High Criterion is met (all samples were graded higher without overlapping intervals) such codec is “unconditionally better”

“Worse” can be introduced accordingly if necessary.

What are possible weaknesses/down sides of this approach?

This post has been edited by Serge Smirnoff: Apr 15 2013, 01:13


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
Woodinville
post Apr 15 2013, 10:29
Post #4





Group: Members
Posts: 1402
Joined: 9-January 05
From: JJ's office.
Member No.: 18957



There's a problem there. I know it's easy to design a codec wherein most subjects will give it a pass, i.e. not make any useful distinction, but a few listeners will hate, hate hate the results.

This will increase the confidence bound, no matter how you look at it.

So it's a bit harder than what you propose.


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
Serge Smirnoff
post Apr 15 2013, 11:45
Post #5





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



QUOTE (Woodinville @ Apr 15 2013, 12:29) *
There's a problem there. I know it's easy to design a codec wherein most subjects will give it a pass, i.e. not make any useful distinction, but a few listeners will hate, hate hate the results.

This will increase the confidence bound, no matter how you look at it.

Sorry, not quite understood your example. Confidence intervals are not used directly for inference about overall quality in the proposed metric. Only per-sample means matter, and only taken together. Can you describe your example in more details?


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 28th July 2014 - 09:36