IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
How do I actually perform ABX tests?
d_headshot
post Dec 29 2011, 08:19
Post #1





Group: Members
Posts: 195
Joined: 28-September 08
Member No.: 58729



I have the ABX plugin with foobar, and I did some playing around with two files and didn't really know if I was doing the test correctly. Am I supposed to be comparing the difference between A and B or am I supposed to be looking for similarities between X and Y?
Go to the top of the page
+Quote Post
musicreo
post Dec 29 2011, 10:42
Post #2





Group: Members
Posts: 37
Joined: 20-September 10
Member No.: 84009



You have to decide if X or Y is A or B.
Go to the top of the page
+Quote Post
dhromed
post Dec 29 2011, 11:04
Post #3





Group: Members
Posts: 1339
Joined: 16-February 08
From: NL
Member No.: 51347



A and B are known, X and Y are randomized. After listening intently to all four A, B, X and Y, you should be able to say whether A is X or Y, and same for B.

You must repeat this process a number of times ("trials") in a single ABX session, preferrably 10-20. The result probability will either hover around 50%, or approximate zero. In the 50% case, you were clearly guessing, and thus incapable of hearing a difference between A and B. In the second case, there's a good chance you can hear the difference.

Try it once with completely different songs to be sure of what you're doing.

ABX tests prove beyond a shadow of a doubt whether a person can hear a difference or not.
ABX test are wholly incapable of determining which item has better quality.

This post has been edited by dhromed: Dec 29 2011, 11:14
Go to the top of the page
+Quote Post
krabapple
post Dec 30 2011, 07:20
Post #4





Group: Members
Posts: 2519
Joined: 18-December 03
Member No.: 10538



QUOTE (dhromed @ Dec 29 2011, 06:04) *
ABX tests prove beyond a shadow of a doubt whether a person can hear a difference or not.


well...no. A 'shadow of a doubt' is exactly what remains in a statistics-based result. We quantify how large that shadow is, via the p value...in the case of a 'no difference' conclusion, it's our willingness to risk a false negative result. For a p=.05 (a typical, though not necessarily appropriate, value for such tests) we accept a 1-in-20 chance that our results were merely a fluke, rather than being informative. That's the shadow hanging over our conclusion.

You can shrink this, and your conclusion can lie far beyond any *reasonable* doubt, but it never actually reaches zero.

This post has been edited by krabapple: Dec 30 2011, 07:21
Go to the top of the page
+Quote Post
dhromed
post Dec 30 2011, 09:55
Post #5





Group: Members
Posts: 1339
Joined: 16-February 08
From: NL
Member No.: 51347



pf, nitpickins. wink.gif

But of course, yes, there are facts, and then there are statistics.
Go to the top of the page
+Quote Post
pdq
post Dec 30 2011, 14:36
Post #6





Group: Members
Posts: 3450
Joined: 1-September 05
From: SE Pennsylvania
Member No.: 24233



Or, as Mark Twain would say, "Lies, Damned Lies and Statistics".
Go to the top of the page
+Quote Post
greynol
post Dec 30 2011, 19:40
Post #7





Group: Super Moderator
Posts: 10339
Joined: 1-April 04
From: San Francisco
Member No.: 13167



Also, ABX tests are designed to demonstrate perceived differences. They aren't really intended to determine if things sound the same, let alone prove that things sound the same.


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
MichaelW
post Dec 30 2011, 23:45
Post #8





Group: Members
Posts: 631
Joined: 15-March 07
Member No.: 41501



QUOTE (dhromed @ Dec 30 2011, 21:55) *
pf, nitpickins. wink.gif

But of course, yes, there are facts, and then there are statistics.


http://xkcd.com/882/
Go to the top of the page
+Quote Post
krabapple
post Dec 31 2011, 20:48
Post #9





Group: Members
Posts: 2519
Joined: 18-December 03
Member No.: 10538



QUOTE (MichaelW @ Dec 30 2011, 18:45) *
QUOTE (dhromed @ Dec 30 2011, 21:55) *
pf, nitpickins. wink.gif

But of course, yes, there are facts, and then there are statistics.


http://xkcd.com/882/


ooh I like that!

But I think, nitpickingly speaking, that one would have to perform the *green* test 20 times to make that 1-in-20 point. OR show that a different color gets a 'significant' result in the next 20 rounds.



This post has been edited by krabapple: Dec 31 2011, 20:50
Go to the top of the page
+Quote Post
mzil
post Jan 6 2012, 05:14
Post #10





Group: Members
Posts: 735
Joined: 5-August 07
Member No.: 45913



QUOTE (MichaelW @ Dec 30 2011, 17:45) *



Ha! That's good.

Replace "acne" with "a rare form of bone cancer" and "green jelly beans" with "fluoridated water" and you have a real story from the news I have read about. The fluoride scaremonger sites reduce it to simply "fluoride causes bone cancer", but when you look into it, it turns out an undergraduate decided to take a much larger, existing study, which found no correlation, and she broke it down into bunches of smaller sub groups. Sure enough, a certain age category of young boys showed a (slight) correlation between fluoridated water intake and this particular cancer, yet girls the same age didn't, nor did any other age group.

I would think there must be a name for this kind of error, does anyone know what it is?

This post has been edited by mzil: Jan 6 2012, 05:31
Go to the top of the page
+Quote Post
nesf
post Jan 6 2012, 09:37
Post #11





Group: Members
Posts: 39
Joined: 2-January 12
Member No.: 96196



QUOTE (mzil @ Jan 6 2012, 04:14) *
I would think there must be a name for this kind of error, does anyone know what it is?


I've seen it called clusters (if memory serves). Take any randomly distributed dataset, like cancer incidence over a big enough area. There will, even if it's perfectly random be clusters within this data set. So a particular town could have a high incidence of say brain cancer and also happen to have overhead power lines running through it. The brain thinks these two have to be related even though it's just statistical noise. There are huge problems with this in statistics for obvious reasons.

Edit: From a quick google, my memory is failing me it's not called clusters. It's a sampling error problem coming from using a small section of a population (and thus a serious problem in small tests if they are not repeated elsewhere). You can't know if your small sample happens to be a sample containing people/things from one of these clusters or not a priori. This is all assuming that you're looking at something is normally distributed, rather than something that has a distribution with fat tails (i.e. more of a chance of extreme events than one expects with a normal distribution) which complicates things further.

Sorry, for all the edits, just woke up.

This post has been edited by nesf: Jan 6 2012, 09:49
Go to the top of the page
+Quote Post
Porcus
post Jan 6 2012, 12:04
Post #12





Group: Members
Posts: 1995
Joined: 30-November 06
Member No.: 38207



Clusters are something else.

Dunno what the universal term for this is, but in insurance and in certain branches of economics, it is called «selection». Simply, you select the dice after you have rolled them. (Google «adverse selection» -- then you select rush to action while everyone still treats the dice as random. In this case: you do N trials and report the best, while the uninformed public thinks it is a random draw.)

I guess the prototypical joke is this science demonstration at some public fair:
Scientist equips the audience with dice and tells them to roll N times and record the outcome.
Scientist collects the data, draws the histogram on the overhead projector, explains the theory and opens for questions.
Journalist asks to get a picture to the newspaper story of the guy who was so good at rolling dice.

This post has been edited by Porcus: Jan 6 2012, 12:09


--------------------
One day in the Year of the Fox came a time remembered well
Go to the top of the page
+Quote Post
SoAnIs
post Jan 11 2012, 01:07
Post #13





Group: Members
Posts: 9
Joined: 28-August 07
Member No.: 46568



It's the "look-elsewhere effect". Related to the law of very large numbers: with a large enough sample size, any possible event will eventually occur. If you select sample sets from a larger total some will contain such very rare events.
Go to the top of the page
+Quote Post
nesf
post Jan 11 2012, 08:50
Post #14





Group: Members
Posts: 39
Joined: 2-January 12
Member No.: 96196



QUOTE (Porcus @ Jan 6 2012, 11:04) *
I guess the prototypical joke is this science demonstration at some public fair:
Scientist equips the audience with dice and tells them to roll N times and record the outcome.
Scientist collects the data, draws the histogram on the overhead projector, explains the theory and opens for questions.
Journalist asks to get a picture to the newspaper story of the guy who was so good at rolling dice.



QUOTE (SoAnIs @ Jan 11 2012, 00:07) *
It's the "look-elsewhere effect". Related to the law of very large numbers: with a large enough sample size, any possible event will eventually occur. If you select sample sets from a larger total some will contain such very rare events.



Yeah my memory is crap, did all this stuff in college and forgot the names of it. The phrase "look-elsewhere effect" doesn't ring any bells for me though, I think we called it something else. We always were presented it as illness "clusters", i.e. a town with say a high rate of mental retardation that also happened to have fluoridated water as a lesson in cause, effect and spurious correlation. I think it's slightly different to what you're talking about SoAnIs, it's more about finding unusual rates of something within a subset of the population that aren't consistent with the population rate than finding a rare event within a sample. Something along the lines of "Let me pick my sample and I can prove anything." It's like how in a randomly distributed data set there will be clusters of a particular event happening or not happening, so say heart disease was randomly distributed and we looked at a nation's distribution of it, we would by chance find towns and villages that have very high or very low rates of heart disease. Some people take these high rates or low rates and assume automatically that there needs to be some causal factor behind them when really they can just be a product of chance. Like the journalist in Porcus' joke. smile.gif

This post has been edited by nesf: Jan 11 2012, 09:07
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 26th December 2014 - 21:47