IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
listening tests help, for an audio codec I developed
legg
post Jun 18 2005, 19:51
Post #1





Group: Members
Posts: 175
Joined: 5-March 05
From: Morelia, Mexico
Member No.: 20386



How can I rate how good or bad will the audio codec be on average?

I'm thinking on doing at least 21 ABX trials on each file and if the subject managed to tell a difference and the guess probability is low I'd also take into account a 1-5 point scale to measure the perceived quality.

So far I have analized one subject, he did pretty well on most tests, except one, where the results where these:
15 out of 29, pval = 0.500

AFAIK, p<0.05 is good to be certain that he didn't know by chance, what about the rest?
How do I interpret that?, is the codec transparent for him at that test?
What are the ranges of the guess probabilities and how should I interpret them?

Thanks in advance.

This post has been edited by Luis G: Jun 18 2005, 19:52


--------------------
Home page: http://lc.fie.umich.mx/~legg/indexen.php
Go to the top of the page
+Quote Post
guruboolez
post Jun 18 2005, 20:18
Post #2





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



You should read Pio2001's explanation: it's very clear and complete.
http://www.hydrogenaudio.org/forums/index....howtopic=16295&
Go to the top of the page
+Quote Post
Digga
post Jun 18 2005, 20:28
Post #3





Group: Members
Posts: 1047
Joined: 28-June 03
From: on the dock of the bay
Member No.: 7423



QUOTE (Luis G @ Jun 18 2005, 07:51 PM)
So far I have analized one subject, he did pretty well on most tests, except one, where the results where these:
15 out of 29, pval = 0.500

AFAIK, p<0.05 is good to be certain that he didn't know by chance, what about the rest?
How do I interpret that?, is the codec transparent for him at that test?
What are the ranges of the guess probabilities and how should I interpret them?
if you say he did pretty well except this one, the means that he could tell the difference right? for the codec it would be more complimenting (i.e. better) if he couldn't tell the difference which means it's transparent for him and he's guessing if x is a or b.

anyway, either alpha=0,01 or alpha=0,05 are generally chosen.
btw, you are not certain, but this is just (given, very low) probability that one is guessing, i.e. considered not guessing.
a pval of 0.5 would be a reference for guessing.

for further more in depth info, look here.

edit: damn am I slow...

This post has been edited by Digga: Jun 18 2005, 20:29


--------------------
Nothing but a Heartache - Since I found my Baby ;)
Go to the top of the page
+Quote Post
legg
post Jun 19 2005, 16:28
Post #4





Group: Members
Posts: 175
Joined: 5-March 05
From: Morelia, Mexico
Member No.: 20386



QUOTE (Digga @ Jun 18 2005, 01:28 PM)
if you say he did pretty well except this one, the means that he could tell the difference right? for the codec it would be more complimenting (i.e. better) if he couldn't tell the difference which means it's transparent for him and he's guessing if x is a or b.

anyway, either alpha=0,01 or alpha=0,05 are generally chosen.
btw, you are not certain, but this is just (given, very low) probability that one is guessing, i.e. considered not guessing.
a pval of 0.5 would be a reference for guessing.

for further more in depth info, look here.

edit: damn am I slow...
*



Yes, I have read that thread, but it doesn't mention how to deal with the results when p>0.05, which is this case. An interval like 0.05<p<0.25 how is to be dealt with? It certainly doesn't say much of the codec, and I would hardly classify it into transparency. IMO p>0.5 means transparency, but I wanted to check with you gurus about this.

I thought that anyone could rate the quality, but what if the subject merely rated the correct file by chance, I need more certainty, and I'm using ABX as an indicator of the trustworthy of the subject ratings. Is this correct?

Btw, I'm more interested in rating the subjective quality of the codec.

Thanks again.

This post has been edited by Luis G: Jun 19 2005, 16:46


--------------------
Home page: http://lc.fie.umich.mx/~legg/indexen.php
Go to the top of the page
+Quote Post
ff123
post Jun 19 2005, 17:34
Post #5


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



If it were me, I'd just choose a whole bunch of different samples (say 30), and then rate each one against the reference using abc-hr. Then I'd plug the results into a statistical calculator (http://ff123.net/friedman/stats.html) to determine first if you found a significant difference from the reference and second how much that difference is. No ABX'ing is involved, plus you get a better indicator of codec quality by sampling a lot of different music.

BTW, I would also keep the samples where you rate the reference, rather than throw these cases out. So if you make some mistakes, the reference will average something less than 5.0.

ff123
Go to the top of the page
+Quote Post
guruboolez
post Jun 19 2005, 18:55
Post #6





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



QUOTE (ff123 @ Jun 19 2005, 05:34 PM)
If it were me, I'd just choose a whole bunch of different samples (say 30), and then rate each one against the reference using abc-hr.
*


I'm going off-topic, but I've an important question.
I'm trying to build a complete set of classical music sample, in order to replace the usual suit of 15 samples I'm using now for 18 months. My purpose is to obtain 100 samples, including many instruments, solo, chamber, orchestral, lyrical, noisy or not noisy, quiet and loud, etc... But I'm realizing that making ABX comparisons with so many samples would be a Herculean task.
What would be the best thing in your opinion:
- 100 samples rated in ABC/HR without ABX
- 15...20 samples rated in ABC/HR + ABX confirmation?
Go to the top of the page
+Quote Post
ff123
post Jun 19 2005, 19:27
Post #7


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



For an experienced listener like you, I'd definitely dump the ABX and just go with the ratings.
Go to the top of the page
+Quote Post
guruboolez
post Jun 19 2005, 19:39
Post #8





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



Good point smile.gif
I have to be more specific. Are such tests valid (I mean: statistically) or, more precisely, have both kind of tests the same level of validity?

I'm asking because I'm used to publish the results of my test, and always try to avoid criticism. I just fear that a big listening test including 50 or 100 samples without ABX confrontation will be contested. Should I keep this kind of test for private and favour ABX for public one, or would you consider more interesting the publication of ABC/HR only listening test involving much more samples?
Go to the top of the page
+Quote Post
ff123
post Jun 19 2005, 19:45
Post #9


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



Individual results will have increased uncertainty, but overall the results will be more representative of a codec's true worth, even if you make mistakes on a few samples.

And yes, you can still say something statistically about the codecs.

ff123

This post has been edited by ff123: Jun 19 2005, 19:46
Go to the top of the page
+Quote Post
guruboolez
post Jun 19 2005, 20:06
Post #10





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



That's encouraging. A listening test involving a lot of samples (and therefore introducing much greater diversity) without the necessity of listening each one ~50 times is greatly less boring for the listener. Less stressing too (I'm often frighten when I have to click on the "view results" button).

Thank you for your precious assistance about statistic and listening tests methodology smile.gif


Luis G> Could I possibly ask you what kind of codec are you developing?
Go to the top of the page
+Quote Post
legg
post Jun 20 2005, 01:52
Post #11





Group: Members
Posts: 175
Joined: 5-March 05
From: Morelia, Mexico
Member No.: 20386



ff123, currently I'm using 8 samples: castanets, finger snaps, french horns, timpani, triangle, trumpets 1, trumpets 2, violins 1 and violins 2. I'm also planning to put male and female voice samples, perhaps a rock/pop/jazz test would also be included. How many subjects should I use to get a significant result?

guruboolez, is a transform codec (MDCT) with 25 bands corresponding to each Bark. The data rate is variable and is always adjusted according to signal demands to achieve good quality. Expected data rates range from 60 to 340kbps.


Greetings.

This post has been edited by Luis G: Jun 20 2005, 01:52


--------------------
Home page: http://lc.fie.umich.mx/~legg/indexen.php
Go to the top of the page
+Quote Post
ff123
post Jun 20 2005, 02:22
Post #12


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



I would double the number of samples from 8 to 16.

ff123
Go to the top of the page
+Quote Post
legg
post Jun 20 2005, 03:49
Post #13





Group: Members
Posts: 175
Joined: 5-March 05
From: Morelia, Mexico
Member No.: 20386



QUOTE (ff123 @ Jun 19 2005, 07:22 PM)
I would double the number of samples from 8 to 16.

ff123
*


And how many people should take the test?
I was thinking somewhere between 20 and 50.


--------------------
Home page: http://lc.fie.umich.mx/~legg/indexen.php
Go to the top of the page
+Quote Post
ff123
post Jun 20 2005, 05:53
Post #14


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



Heheheh. Yes, if you can. Good luck.

ff123
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 26th November 2014 - 00:27