ABX infallible?, What about type I/II errors? 
ABX infallible?, What about type I/II errors? 
Apr 11 2006, 04:30
Post
#1


Group: Members Posts: 2526 Joined: 25July 02 From: South Korea Member No.: 2782 
Hi, I have a question.
From http://www.hydrogenaudio.org/forums/index....showtopic=31945 : You're on the wrong forum, buddy. ABX tests are infallible.... Infallible?? Well, if you say so. I stand corrected.One is entitled to his opinion, but once he decides to share it, it must be backed up by evidence. I don't have a real firm grasp of statistics, but I remember hearing that there's a tradeoff between the possibility of type I errors and the possibility of type II errors. In other words, I thought there was a tradeoff between false positives and false negatives. If it's true, how does the doubleblind ABX method account for this? Or is that unnecessary? Thank you in advance.  http://blacksun.ivyro.net/vorbis/vorbisfaq.htm



Apr 11 2006, 16:43
Post
#2


Moderator Group: Super Moderator Posts: 3936 Joined: 29September 01 Member No.: 73 
You divide by zero !
Seriously, of course ABX test are not infaillible. This is not directly related to the type I or II errors, as their relationship relies on the assumption that listeners always give a given proportion of wrong answers. Someone gave an excellent summary of the drawback of ABX testing in a french forum : http://chaud7.forumgratuit.com/viewtopic....r=asc&start=450 However, since even for french native speakers the text is almost incomprehensible, I'll have to make a summary. Most often, it is admitted that an event whose probability of not occuring is smaller than 1/20 is "statistically significant". No interpretation, this p value is the result of a mathematical calculus relying only on what have been observed. Former results from similar tests, the quality of the test, and other statistic calculations are not taken into account. These events have an influence on the probability that the observed difference is real.
The original text is much longer, with some repetitions, and other ideas that I didn't translate, because they are not directly related with ABX tests reliability. I would like however to add an important point. The interpretation of the p value. It is by convetion admitted that p<5 % is an interesting result, and p<1% a very significant one. This does not take into account the tested hypothesis itself. If we are testing the existence of Superman, and get a positive answer, that is "Superman really exists because the probability of the null hypothesis is less than 5%". Must we accept the existence of Superman ? Is it an infaillible, scientific proof of its existence ? No, it's just chance. Getting an event whose probability is less than 5% is not uncommon. However, when a listening test about MP3 at 96 kbps gives a similar significant result, we accept the opposite conclusion ! That it was not chance. Why ? Why does the same scientific result should be interpreted in two opposite ways ? This is because we always keep the most probable hypothesis. The conclusion of an ABX test is not the p value alone, it is its comparison with the subjective p value of the tested hypothesis. Testing MP3 at 96 kbps, what do we expect ? Anything. We start with the assumption that the odds of success are 1/2. The ABX result then tells us that the odds of failure are less than 1/20. Conclusion, the success is the most probable hypothesis. Testing the existence of Superman, what do we expect ? That he does not exists. We start with the assumption that the odds of success are less than one in a million. The ABX result then tells us that the odds of failure are less than 1/20. Conclusion, the failure is still the most probable hypothesis. That's why, in addition with all the statistical bias already mentionned above we should not always take 1/20 or 1/100 are a target final p value. This is correct for tests where we don't expect a result more than another, but for tests where scientific knowledge already gives some information, smaller values can be necessary. Personnaly, in order to test the existence of Superman, i'd rather target p<1/100,000,000 


Apr 14 2006, 00:22
Post
#3


Group: Members Posts: 2181 Joined: 18December 03 Member No.: 10538 
excellent post, Pio. May I suggest that it be added to the 'What is a blind ABX test?" pinned thread, in General Audio?
(actually it seems to me that that pinned thread should be in *this* forum too) 


LoFi Version  Time is now: 22nd July 2014  17:46 