IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Decent ABX testing on Mac OS X: a solution
Jplus
post Feb 12 2013, 01:48
Post #1





Group: Members
Posts: 41
Joined: 7-February 13
Member No.: 106471



There have been several topics on this issue, e.g. here, here, here and here. Since all of these topics have been silent for at least half a year and since the solution I'm offering here hasn't been discussed in any of them, I hope I did the right thing by starting a new topic.

The state of the art: there exist a few test tools that work on Mac OS X, but none of them is a decent ABX tool in the way that foobar2000 is a decent ABX tool for Windows.
ABXer: some people are quite content with it, but it seems to have several problems including being undocumented and unmaintained.
ABC/HR for Java: it appears to be a pretty good tool but it isn't meant for ABX.
ABXTester: responsive, native mac application that is available from the mac app store, but very limited in functionality.

My solution: take ABXTester as a start and write my own scripts to provide the missing functionality. I'm sharing my scripts with the community by posting them here together with a complete manual. (This is a long post but don't worry, it's all very easy. After the installation section you might be able to understand everything if you only read the headers, the italics and the codeboxes.)

What ABXTester already provides:
  • A GUI interface (using QuickTime).
  • Picking A and B and randomizing 5 Xs; lets you play the tracks as often as you want in any order you want.
  • Displaying the number of correct identifications in a popup window, as a percentage.
  • Preparing a new batch of 5 random Xs, as often as you want.

Note that in principle you can test anything you want with ABXTester, even if the format isn't natively supported by QuickTime. In the latter case you can either use a QuickTime plugin such as Perian or Flip4Mac, or convert the file to WAV, ALAC or AIFF.

What ABXTester doesn't provide yet:
  • Keeping track of what you scored before you generated your current batch of Xs.
  • Calculating the p-value, which is the probability that someone would get your score (or better) by random guessing.
  • Generating a neatly formatted log (a log doesn't make your test any more reliable or statistically valid, but it's still nice to have).

What you need to fill in the missing functionality:
  • My scripts (see codebox at the bottom of this post).
  • R, a free statistical calculator that will run my scripts.

I assume that you have no prior experience with R.

Installing the scripts
Install R and open the application. Copy the contents of the codebox to your clipboard and paste them into the R prompt (prompt = the angle bracket with the blinking text cursor behind it that is at the bottom of the white part of the window). Hit enter. If you did this correctly you'll see a message screaming about success. You're done, you only need to do this once. You can start using the scripts straight away.

Next time you want to use the scripts, open R and enter load('.RData'). You can enter ls() in order to check that everything is where you left it. This includes all test data that you collected the previous times.

Running an ABX test
Keep the ABXTester window next to the R window so you can see both in their entirety. In R, enter recordABX('lameMP3.c128.madonna') if you want to call your test "lameMP3.c128.madonna". I recommend using names with only letters, digits and periods and starting with a letter. After pressing enter you should get a special prompt with three question marks. Switch to ABXTester and choose your A and B. You can now start the actual test.

Every time you complete a batch of Xs in ABXTester, cmd-tab to R and enter your result percentage into the ??? prompt (the result percentage is what ABXTester calls your accuracy rate and which shows up when you press the "Check answer!" button). You get a fractional number and a new ??? prompt in return. The fractional number is the p-value for the results you've entered so far; smaller is better. It has a star behind it if you've so far proven that you hear a marginal difference and two stars if your results prove that you hear a clear difference (more on that below).
If you want to do more batches, cmd-tab back to ABXTester and repeat the cycle. Otherwise enter x to store your results, you'll return to the normal prompt. Protip: if you're running a series of numbered tests, you can press the up arrow key after completing your current test to get the same line that you entered in order to start your last test. You only need to increment the number in the name and hit enter to start your next test.

I recommend to do never more than 7 batches (=35 Xs). Also, you should take at least one break for every 5 batches. Note that you'll need at least two batches to prove that you hear a clear difference.

Here's an example of what your recordABX session might look like (three batches):
CODE
> recordABX('example')
??? 80
0.1875
??? 100
0.01074219 *
??? 100
0.0004882812 **
??? x
>


Retrieving your results
recordABX stores your results in the thing called ABXdata. To see the raw data for a test, type ABXdata and the name of your test with a dollar sign in between, then hit enter (note: if the name of your test contains spaces or starts with a character other than a letter, you need to wrap the name in quotes). For example this is what you get for our example above:
CODE
> ABXdata$example
[1] 4 5 5
>

You can ignore the [1]. As you see the results are stored as a series of numbers that tell you how many Xs were identified correctly in each batch (which recordABX calculated from the percentages you entered). You can also produce a log if you wrap the ABXdata$example thing in a call to reportABX, like this:
CODE
> reportABX(ABXdata$example)
batch score subtotal p
1 4/5 4/ 5 0.1875
2 5/5 9/10 0.01074219
3 5/5 14/15 0.0004882812
clear difference

>

As you can see the log displays the same p-values that you got when you were recording the test. Neat, right? reportABX can create a log for any valid series of numbers. You create a series of numbers by separating them by commas, wrapping them in parentheses and slapping a 'c' in front, for example c(1, 2, 3, 4) (try entering that directly into the R prompt). You'll see that you get the same log as above if you enter reportABX(c(4, 5, 5)) .

But it gets better. If you enter report.collection() it will produce the logs for all the tests that you recorded into ABXdata, with the names at the top of each log. You can copy and paste it straight into your HA forum post.
CODE
> report.collection()
example
batch score subtotal p
1 4/5 4/ 5 0.1875
2 5/5 9/10 0.01074219
3 5/5 14/15 0.0004882812
clear difference

example2
batch score subtotal p
1 3/5 3/ 5 0.5
2 2/5 5/10 0.6230469
no difference

example3
batch score subtotal p
1 2/5 2/ 5 0.8125
2 3/5 5/10 0.6230469
3 3/5 8/15 0.5
4 4/5 12/20 0.2517223
5 5/5 17/25 0.05387607
6 4/5 21/30 0.02138697
marginal difference

>


Saving your stuff permanently
When you quit R, the application will ask whether you want to save your current workspace. Answer that question with "yes" and your data are saved. If you're paranoid, enter save.image() to save your workspace immediately (you can do this at any time in the normal prompt). Next time you start R you can fetch the scripts and your data from disk by entering load('.RData') . .RData is a hidden file in your home directory. For instructions on how to move or rename workspace images, consult the R help function.

Working with collections
You did 10 tests, you posted the logs to the HA forums and now you're done. You probably want to archive your test results so you can start with a fresh empty ABXdata object next time you do an experiment. Here's how to do it: first copy the contents of ABXdata to a new object with a unique name. In order to check that the name isn't already taken you can enter ls() first. The name should only contain letters, digits and periods and must start with a letter. Next, overwrite the ABXdata object with an empty list. You copy and overwrite objects with <- (pronounce 'gets') where the arrow points from the source object to the receiving object. Here's the full story if your archived collection is called "test123":
CODE
> test123 <- ABXdata
> ABXdata <- list()
>

It's important to do this without big mistakes, because otherwise you might lose data or overwrite some other object. Don't worry though, if you make a typo R will probably just complain that the object doesn't exist.

Once you've archived your data to a custom-named object you can still get the full logs by passing the object directly to report.collection, e.g. report.collection(test123) (note: no quotes around the name of the object). You can also still access individual tests with the dollar trick, e.g. test123$example .

Guess what happens when you enter report.collection(c(test123, ABXdata))?
Answer: it will show all the logs for test123 AND ABXdata in one go.

You can do a lot with collections, including changing the internal ordering between the tests and moving tests from one collection to another. I recommend learning more about R if you want to do such things, specifically about R lists.

Resuming a test that you started before
You just started a test and you believe you hear a difference but it's not going well and you start to realize you're too tired to continue. You'd like to take a break for now and resume the test tomorrow. Here's how to do it. First, stop recordABX by entering x just like you do when you're done.
CODE
> recordABX('imtootired')
??? 40
0.8125
??? 40
0.828125
??? x
>

You can now quit R and shut down your computer if you want. When you're ready to continue, just start recording again with the same name.
CODE
> recordABX('imtootired')
??? 60
0.6963806
??? 80
0.4119015
??? 100
0.1147615
??? 100
0.02138697 *
??? x
>

When you ask for the log you'll see that recordABX neatly appended the new data to the data you already had.
CODE
> reportABX(ABXdata$imtootired)
batch score subtotal p
1 2/5 2/ 5 0.8125
2 2/5 4/10 0.828125
3 3/5 7/15 0.6963806
4 4/5 11/20 0.4119015
5 5/5 16/25 0.1147615
6 5/5 21/30 0.02138697
marginal difference

>

Sometimes you might want to overwrite pre-existing data instead of appending to it. In that case pass the letter F as the second argument to recordABX, like this: recordABX('imtootired', F) . The F is a shorthand for extend=FALSE, meaning "don't append, overwrite instead".

p-values, marginal differences and clear differences (this section is optional reading)
The p-value that I mentioned before is calculated in the same way as the p-values that you see in the well-known foobar2000 logs, except that foobar2000 displays them as percentages that are rounded to one decimal place. Read this post if you want a full explanation of what this number is about. If you want you can use the scripts to calculate the p-value for any given test score. For example if you want to know the probability that somebody will correctly identify 17 or more out of 19 Xs by random guessing, enter pvalue(19, 17).

A p-value of 0.05 means that there's a probability of 1 in 20 that your result is a coincidence. In the sciences this is often considered unlikely enough that something else must be going on, i.e. you're actually hearing a difference. This is the treshold for a marginal difference.
If you really hear an obvious difference you'll score better than that, meaning you'll get a smaller p-value. The treshold for a clear difference is 0.002 or 1 in 500.

The tresholds are stored in difference.marginal and difference.clear. You can see the value if you type in the name and hit enter. You can also change the tresholds, for example if you want to set the clear difference treshold to 0.001 enter difference.clear <- .001 . recordABX and reportABX will change their behaviour accordingly. I recommend not changing the tresholds unless you really know what you're doing.

Support
I hope that many people will find my scripts useful. Please post all your questions, comments and feature requests here.

The code
CODE
ABXdata <- list()
difference.clear <- .002
difference.marginal <- .05
recordABX <- function (name, extend=T) {
results <- NULL
subtotal <- 0
ntrials <- 0
if (name %in% names(ABXdata) && extend == T) {
results <- ABXdata[[name]]
subtotal <- sum(results)
ntrials <- length(results) * 5
}
input <- readline('??? ')
while (input != 'x') {
score <- as.integer(input) / 20 # converting percentage to no. correct identifications
subtotal <- subtotal + score
ntrials <- ntrials + 5
results <- c(results, score)
p <- pvalue(ntrials, subtotal)
if (p <= difference.clear) cat(p, '**\n')
else if (p <= difference.marginal) cat(p, '*\n')
else cat(p, '\n', sep='')
input <- readline('??? ')
}
ABXdata[[name]] <<- results
}
report.collection <- function (data=ABXdata) {
for (i in 1:length(data)) {
cat(names(data)[i], '\n', sep='')
reportABX(data[[i]])
}
}
reportABX <- function (scores) {
subtotals <- cumsum(scores)
p <- 0
cat('batch score subtotal p\n')
for (i in 1:length(scores)) {
ntrials <- i * 5
p <- pvalue(ntrials, subtotals[i])
line <- paste(' ', i, ' ', scores[i], '/5 ', sep='')
if (subtotals[i] < 10) line <- paste(line, ' ', sep='')
line <- paste(line, subtotals[i], '/', sep='')
if (ntrials < 10) line <- paste(line, ' ', sep='')
cat(line, ntrials, ' ', p, '\n', sep='')
}
if (p <= difference.clear) cat('clear difference\n\n')
else if (p <= difference.marginal) cat('marginal difference\n\n')
else cat('no difference\n\n')
}
pvalue <- function (ntrials, ncorrect) .5^ntrials * sum(choose(ntrials, ncorrect:ntrials))
if (all(c('ABXdata', 'difference.clear', 'difference.marginal', 'recordABX',
'report.collection', 'reportABX', 'pvalue') %in% ls())) {
save.image()
print(ls())
cat('\nSUCCESS! The scripts are ready to use.\n')
} else cat("\nERROR: It appears that some parts haven't been properly",
"copy-pasted into the window.\n")


This post has been edited by db1989: Feb 13 2013, 03:13
Reason for edit: updated by request
Go to the top of the page
+Quote Post
Jplus
post Feb 12 2013, 23:37
Post #2





Group: Members
Posts: 41
Joined: 7-February 13
Member No.: 106471



It was brought to my attention that one should not confuse the p-value with the type 1 error rate. While these measures are related they don't have the same meaning. Foobar2000 and my script both calculate the p-value, not the type 1 error rate. See Wikipedia for a good explanation.

The incorrect name of the function that calculates the p-value has already been fixed in the opening post (it was called "type1error" before I renamed it). If you installed my scripts before Februari 13, 00.30 UTC and you want to have the name fixed in your copy as well, you can paste the patch below into R after loading your workspace image, hit enter and then save.image() to make the change permanent. From then on use pvalue(N, K) instead of type1error(N, K). Applying this patch will not affect any saved data.
CODE
recordABX <- function (name, extend=T) {
results <- NULL
subtotal <- 0
ntrials <- 0
if (name %in% names(ABXdata) && extend == T) {
results <- ABXdata[[name]]
subtotal <- sum(results)
ntrials <- length(results) * 5
}
input <- readline('??? ')
while (input != 'x') {
score <- as.integer(input) / 20 # converting percentage to no. correct identifications
subtotal <- subtotal + score
ntrials <- ntrials + 5
results <- c(results, score)
p <- pvalue(ntrials, subtotal)
if (p <= difference.clear) cat(p, '**\n')
else if (p <= difference.marginal) cat(p, '*\n')
else cat(p, '\n', sep='')
input <- readline('??? ')
}
ABXdata[[name]] <<- results
}
reportABX <- function (scores) {
subtotals <- cumsum(scores)
p <- 0
cat('batch score subtotal p\n')
for (i in 1:length(scores)) {
ntrials <- i * 5
p <- pvalue(ntrials, subtotals[i])
line <- paste(' ', i, ' ', scores[i], '/5 ', sep='')
if (subtotals[i] < 10) line <- paste(line, ' ', sep='')
line <- paste(line, subtotals[i], '/', sep='')
if (ntrials < 10) line <- paste(line, ' ', sep='')
cat(line, ntrials, ' ', p, '\n', sep='')
}
if (p <= difference.clear) cat('clear difference\n\n')
else if (p <= difference.marginal) cat('marginal difference\n\n')
else cat('no difference\n\n')
}
pvalue <- function (ntrials, ncorrect) .5^ntrials * sum(choose(ntrials, ncorrect:ntrials))
rm(type1error)


This post has been edited by db1989: Feb 13 2013, 03:13
Reason for edit: as above
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 15th September 2014 - 12:17