IPB

Welcome Guest ( Log In | Register )

Minimum number of required ABX trials, Split from from topic ID: 92851
sauvage78
post Jan 12 2012, 00:41
Post #1





Group: Members
Posts: 677
Joined: 4-May 08
Member No.: 53282



The minimal number of trials depends on how successfull you are.
How quickly you are sucessfull shows how confident in yourself you are.

The time & number of sucessfull trials are tied, you should never separate them when judging an ABX log.

With F2K ABX component, 8 sucessfull trials in a row (& if all successfull in a row, it usually means quick) trials is the minimum for me.

As soon as you begin to fail you can easyly increase to 10 or 12 to try to "erease" your failures.
In this case if you fail once or twice you can usually still get a signifiant result although it usually means the ABXing was hard, & by consequence longer as you begin to hesitate.

Usually if you begin to fail more than 3 times on 12 trials, it begins to be so hard & you have so much hesitation that it begins to take forever to ABX. At this stage I usually give up by myself & declare that I cannot ABX as in general it means I am not sure that the audio part I am focusing on actually contains any real artefact.

This post has been edited by sauvage78: Jan 12 2012, 00:42


--------------------
CDImage+CUE
Secure [Low/C2/AR(2)]
Flac -4
Go to the top of the page
+Quote Post
 
Start new topic
Replies
apodtele
post Jan 12 2012, 17:09
Post #2





Group: Members
Posts: 39
Joined: 16-November 11
Member No.: 95199



Please read Fallacy of p-value.

I just want to point out the p-value is only a measure of confidence that one sees a difference however small it is. P-value is not a measure of how big the difference. P-value is not a measure of the quality. Using too many tests will always produce good p-value. The goal, however, is to accurately estimate the percentage. The closer it is to 50% the harder it gets.

Do we care about 55%, or 60%, or 80% of correct guesses as a hypothesis? This is what should be the subject of this thread. Then we can calculate the
number of necessary tests to test this hypothesis.
Go to the top of the page
+Quote Post
Porcus
post Jan 12 2012, 22:07
Post #3





Group: Members
Posts: 1913
Joined: 30-November 06
Member No.: 38207



QUOTE (apodtele @ Jan 12 2012, 17:09) *
Please read Fallacy of p-value.

I just want to point out the p-value is only a measure of confidence that one sees a difference however small it is. P-value is not a measure of how big the difference. P-value is not a measure of the quality. Using too many tests will always produce good p-value. The goal, however, is to accurately estimate the percentage. The closer it is to 50% the harder it gets.



While I do agree with a great deal of the content of the article, I do not buy into your subsequent point. Sure it would be a good thing to be able to present some «Probability that I can guess X=A or X=B correctly in an ABX trial with these two samples, is in [range]», but I guess that we are better of with a slightly lesser ambition for the TOS#8 rule.

The «purpose» of TOS#8 is of course up to interpretation, so here is mine: To get rid of unfounded statements. At least a certain kind of them, namely those which claim some kind of audible difference (usually a «better than»-statement). Enter ABX: if you cannot tell the difference in a blind setting, then your statement concerning the audible differences, is unfounded. Ideally, we would want to measure whether your statements are true, but we don't do that -- if you pass the «I can tell the difference» test, then you are free to speak all kinds of nonsense about what the difference is.


And then comes the interpretation of an ABX session. Again, we would like to test whether the statements are true, but again, we settle for a lower level of ambition: how often would the coin outperform you at identifying? We set a certain standard for this.


QUOTE
Using too many tests will always produce good p-value.


If you do guess better than the coin, it does. And this is an issue if we are worried about a case where journalist boosts a small and unimportant difference into a «Scientists say there is no shadow of doubt anymore: A is different from B» headline. But that is luxury.


--------------------
One day in the Year of the Fox came a time remembered well
Go to the top of the page
+Quote Post

Posts in this topic
- sauvage78   Minimum number of required ABX trials   Jan 12 2012, 00:41
- - greynol   It is expected that you choose the number of trial...   Jan 12 2012, 00:44
- - saratoga   QUOTE (sauvage78 @ Jan 11 2012, 18:41) Th...   Jan 12 2012, 00:45
- - sauvage78   Well there is the theory & there is real life ...   Jan 12 2012, 00:56
|- - saratoga   QUOTE (sauvage78 @ Jan 11 2012, 18:56) To...   Jan 12 2012, 01:05
- - greynol   He should be conducting sets of 16 trials at first...   Jan 12 2012, 01:10
- - sauvage78   I never said I judged this test valid, I only gave...   Jan 12 2012, 01:13
|- - saratoga   QUOTE (sauvage78 @ Jan 11 2012, 19:13) I ...   Jan 12 2012, 01:17
|- - greynol   QUOTE (sauvage78 @ Jan 11 2012, 16:13) I ...   Jan 12 2012, 01:20
- - sauvage78   saratoga: Yes, I had the feeling that you were thi...   Jan 12 2012, 01:27
|- - saratoga   QUOTE (sauvage78 @ Jan 11 2012, 19:27) Ye...   Jan 12 2012, 01:32
|- - greynol   QUOTE (saratoga @ Jan 11 2012, 16:32) del...   Jan 12 2012, 01:42
- - sauvage78   Well I know this topic isn't about me but my...   Jan 12 2012, 02:31
- - greynol   The difference between you and the OP is that you ...   Jan 12 2012, 02:54
- - sauvage78   I don't even need a log anymore to trust /mnt ...   Jan 12 2012, 03:30
- - IgorC   A lot of discussion here but it won't change t...   Jan 12 2012, 03:55
|- - saratoga   QUOTE (IgorC @ Jan 11 2012, 21:55) A lot ...   Jan 12 2012, 04:06
- - sauvage78   Even if I think 5 trials is too low to convince ot...   Jan 12 2012, 04:01
|- - greynol   QUOTE (sauvage78 @ Jan 11 2012, 19:01) Ig...   Jan 12 2012, 04:13
|- - saratoga   QUOTE (sauvage78 @ Jan 11 2012, 22:01) Ev...   Jan 12 2012, 04:13
- - IgorC   I think I understand what sauvage78 wants to say. ...   Jan 12 2012, 04:46
|- - saratoga   QUOTE (IgorC @ Jan 11 2012, 22:46) It...   Jan 12 2012, 04:50
|- - IgorC   QUOTE (saratoga @ Jan 12 2012, 00:50) QUO...   Jan 12 2012, 04:57
- - sauvage78   QUOTE Are you claiming that I (or greynol) have no...   Jan 12 2012, 04:54
|- - saratoga   QUOTE (sauvage78 @ Jan 11 2012, 22:54) QU...   Jan 12 2012, 05:02
|- - IgorC   QUOTE (sauvage78 @ Jan 12 2012, 00:54) It...   Jan 12 2012, 05:03
|- - greynol   QUOTE (IgorC @ Jan 11 2012, 20:03) So it...   Jan 12 2012, 05:19
- - sauvage78   IgorC: I was more trying to say that if TOS8 is ve...   Jan 12 2012, 05:02
|- - IgorC   QUOTE (sauvage78 @ Jan 12 2012, 01:02) Ig...   Jan 12 2012, 05:06
- - nesf   From a complete newbie perspective: It could have ...   Jan 12 2012, 11:08
|- - saratoga   QUOTE (nesf @ Jan 12 2012, 05:08) From a ...   Jan 12 2012, 23:15
- - apodtele   Please read Fallacy of p-value. I just want to po...   Jan 12 2012, 17:09
|- - Porcus   QUOTE (apodtele @ Jan 12 2012, 17:09) Ple...   Jan 12 2012, 22:07
- - krabapple   It looks like we're groping towards a discussi...   Jan 12 2012, 18:00
- - nesf   A Dummies guide for doing some basic two sample an...   Jan 13 2012, 01:59


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 23rd October 2014 - 13:14