Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Statistics For Abx (Read 36177 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Statistics For Abx

Reply #75
I still don't see the problem.

The listener can decide to stop at any in-between point, and true, there is a problem that he may be able to optimize his strategy and choose the best in-between point to stop at.  However, the overall type 1 risk can never be greater than 0.05.

Right now, stopping at trial 19 is the only time the overall alpha is greater (0.0495) than stopping at a look point (0.0492).  So if I was a betting man and knew that all my guess were random, I'd stop on trial 19.  One can eliminate this problem by eliminating trial 19 as a stopping point, though.  And then the look points become the most advantageous places to stop.

ff123

Statistics For Abx

Reply #76
Quote
The listener can decide to stop at any in-between point, and true, there is a problem that he may be able to optimize his strategy and choose the best in-between point to stop at. However, the overall type 1 risk can never be greater than 0.05.
Sorry, my statistical background is very limited: What is "the overall type 1 risk"?

Quote
Right now, stopping at trial 19 is the only time the overall alpha is greater (0.0495) than stopping at a look point (0.0492). So if I was a betting man and knew that all my guess were random, I'd stop on trial 19. One can eliminate this problem by eliminating trial 19 as a stopping point, though. And then the look points become the most advantageous places to stop.
These numbers (0.0492 and 0.0495) are not what I'm talking about. They assume that the listener terminates the test at the corresponding time, but none of them is his best choice. His optimal strategy would be to choose his stop point dynamically, depending on his previous results.

Example: User has scored 13/18. His best choice is to stop at trial 19 (winning probability LookPVal2(Array(1), Array(1))=0.5) instead of completing the whole 28-test (LookPVal2(Array(4, 7), Array(5, 10))=0.255859375).
On the other hand, if his score is 10/18, he is forced to complete the whole test, as this is his only possibility to win.

What we have calculated before is only true, when the listener is forced to take the trials up to next look-point.

Statistics For Abx

Reply #77
Overall type 1 risk = overall alpha.  The probability that a person could achieve a certain score given that he is randomly choosing X, and further given that he chooses to stop the test at a look point if the number correct are as shown, and further given that he chooses to stop at the in-between point in question.

So the probability that a listener could achieve 14 of 19 given all of the above is 0.0495.

The current optimum strategy for a listener is to:

1. Always stop at look points 6, 12, and 18 if his scores meet the overall alpha.

2. Stop at trial 19 if he has 13 of 18

3. Otherwise, continue and stop at lookpoint 23

4. If he can't stop at lookpoint 23, continue to the end (trial 28).

ff123

Edit:  If the listener is not allowed to stop at trial 19, then the optimum strategy becomes:

1. Always stop at a lookpoint if you can, or when forced to stop at trial 28.

Statistics For Abx

Reply #78
Quote
The current optimum strategy for a listener is to:

1. Always stop at look points 6, 12, and 18 if his scores meet the overall alpha.

2. Stop at trial 19 if he has 13 of 18

3. Otherwise, continue and stop at lookpoint 23

4. If he can't stop at lookpoint 23, continue to the end (trial 28).
I'm afraid it's not that simple.

Another example: 16/23. His best choice is to stop at trial 25 (winning probability LookPVal2(Array(2), Array(2))=0.25) instead of completing the test (LookPVal2(Array(4), Array(5))=0.1875)

There might be more, I don't know.

Statistics For Abx

Reply #79
Looks like

If one has 12/18, it is optimum to stop at trial 22.
If one has 9 of 12, it is optimum to stop at trial 14.
If one has 8 of 12, it is optimum to stop at trial 17.
If one has 5 of 6, it is optimum to stop at trial 8.
If one has 4 of 6, it is optimum to stop at trial 11.

But even if the listener chooses the non-optimum strategy and stops at different in-between points, the overall alpha remains < 0.05.

ff123

Statistics For Abx

Reply #80
Quote
If one has 12/18, it is optimum to stop at trial 22.
If one has 9 of 12, it is optimum to stop at trial 14.
If one has 8 of 12, it is optimum to stop at trial 17.
If one has 5 of 6, it is optimum to stop at trial 8.
If one has 4 of 6, it is optimum to stop at trial 11.
Have you calculated each step? (I mean, have you compared all reasonable strategies at each of those trials?) Then we have already 7 problematic in-between looks.

Quote
But even if the listener chooses the non-optimum strategy and stops at different in-between points, the overall alpha remains < 0.05.
What is your non-optimum strategy? The problem is, what happens, if he uses an optimal strategy, i.e. stops at all points listed above? He plays stronger than 0.049155, maybe even stronger than 0.05.

Statistics For Abx

Reply #81
Quote
What is your non-optimum strategy? The problem is, what happens, if he uses an optimal strategy, i.e. stops at all points listed above? He plays stronger than 0.049155, maybe even stronger than 0.05.

Ah, I think I finally see what you're getting at.  The corrected simulation would have a listener with 13 of 18 stop at 14 of 19, for example, rather than waiting until trial 23 to stop.  That's an added wrinkle.

Well, that would take some time for me to code up.

ff123

Statistics For Abx

Reply #82
Probably the best solution is to disallow stop points if the optimum strategy does not lead to stopping at a look point.

So that would mean eliminating:

trials 1-4, 8, 11, 14, 17, and 22 as stop points.

I need to verify, though.

ff123

Edit:  Oh no, that's probably not enough.  I need to eliminate all suboptimal stop points as well, which are still better than stopping at a look point.  I'll look at this tonight, then.

Statistics For Abx

Reply #83
I think the easiest and savest method would be to disallow in-between stops generally. It wouldn't be very logical for the user to allow him to stop at some points while not at others.
If you find a way though, to keep the total below 0.05 it could be added. But I don't think it's possible. (or maybe for certain profiles only)

The best (easiest) way for all in-between points probably is:

Quote
1. Show the progress to the listener, but do not allow him to quit the test here (or with worst case for the next look-point).

Statistics For Abx

Reply #84
Quote
I think the easiest and savest method would be to disallow in-between stops generally. It wouldn't be very logical for the user to allow him to stop at some points while not at others.
If you find a way though, to keep the total below 0.05 it could be added. But I don't think it's possible. (or maybe for certain profiles only)

The best (easiest) way for all in-between points probably is:

Quote
1. Show the progress to the listener, but do not allow him to quit the test here (or with worst case for the next look-point).

Showing progress (while disallowing stopping in between) does seem more attractive.  Especially if the listener is not allowed to stop at important places, such as trials 7 through 11.  Let me work through all the places where I should eliminate stop points, and then reconsider.  There could even be a hybrid solution.  For example:  don't show progress on trials 1 through 4, but allow a stop point at trial 5.

ff123

Statistics For Abx

Reply #85
Hybrid version - sounds interesting!

Especially the 5/5 possibility could be useful, although personally, I still prefer to know my progress at each trial...

Statistics For Abx

Reply #86
Allowing in-between stops increases the chances of type-2 errors (failing the test when a difference was heard).

For example, if someone hears a difference and chooses to stop at trial 5, there is a chance they may fail the test (this is possible because they didn't know the results from trials 1 to 4). This can happen even though the full test may (very likely) result in a passing score. This means the Pr(type-2 error) has increased, versus a test without in-between stops.

The look points, on the other hand, actually decrease the chances of type-2 errors because they only terminate the test early in the event of a pass.

Edit: Actually, there are scenarios where the in-between stop points could result in an avoided instance of a type-2 error. But I still think the net result is that the Pr(type-2 error) is increased.  One thing is for sure... allowing the in-between stop points adds a lot of complexity.

Statistics For Abx

Reply #87
Good point about the type 2 error, although the 28-trial profile is not particularly concerned with type 2 errors in the first place.  I think I'm reaching the conclusion that, especially for the benefit of people not familiar with ABX testing, that the results for trials 1 through 4 should be visible, and that means only allowing a stop at trial 6.

So to summarize:

1. number correct displayed for every trial.  There will be a table showing the stopping points and the number correct required to pass.
2. overall alpha value also displayed at every stopping point.
3. test can only be terminated at trials 6, 12, 18, 23 and 28 (with the number correct as specified previously to get a "passing" score).
4. 9 wrong terminates the test.
5. the listener can choose to continue with the test even if he achieves a passing score, but then runs the risk of failing at a later stopping point.

The final values for the overall (passing) alphas are in the following table:


Statistics For Abx

Reply #88
The table brings up an interesting point.  Should the profile be designed to keep a constant nominal alpha, or a constant overall alpha?

ff123

Edit:  never mind -- it's impossible to keep a constant overall alpha!

Statistics For Abx

Reply #89
Quote
2. overall alpha value also displayed at every stopping point.
What overall alpha? Depending on the results? I'm not sure how this could be calculated.

Quote
5. the listener can choose to continue with the test even if he achieves a passing score, but then runs the risk of failing at a later stopping point.
Hmm, couldn't that lead to incorrect conclusions by the listener? I think he can't increase his confidence with this, because this test is of a very strict passed-failed type.
Problem: which score is better, 6/6 or 14/18?

I think there should be a different profile for people who want high confidence results, because the 28-profile can't say more than passed with confidence 0.95 or failed.

Statistics For Abx

Reply #90
duplicate post

Statistics For Abx

Reply #91
Quote
Hmm, couldn't that lead to incorrect conclusions by the listener? I think he can't increase his confidence with this, because this test is of a very strict passed-failed type.
Problem: which score is better, 6/6 or 14/18?


6 of 6 is better than 14 of 18 if the listener always follows the procedure of quitting at the earliest stopping point whenever it is possible.  Otherwise, I don't know.  Point taken.  The program should terminate automatically.

However, the converse (program forces the listener to continue) is not possible.  The listener could decide to terminate at any time (by not continuing the test).  The program only displays the overall alpha at stopping points, though.  For example:

5 of 6:  0.109
13 of 18: 0.058

It is possible to calculate an overall alpha only when the listener is forced to stop at the earliest possible time.

ff123

Statistics For Abx

Reply #92
Quote
However, the converse (program forces the listener to continue) is not possible. The listener could decide to terminate at any time (by not continuing the test). The program only displays the overall alpha at stopping points, though. For example:

5 of 6: 0.109
13 of 18: 0.058

It is possible to calculate an overall alpha only when the listener is forced to stop at the earliest possible time.
You could calculate a worst case scenario for the next look point, e.g. the listener's score is 5/6, 9/12, 13/18 and 17/22 -> passed. 16/21 -> failed.

Statistics For Abx

Reply #93
I think it is enough to display what the overall alpha would be at any particular lookpoint assuming the listener were to stop at that point.

so 5 of 6:  0.109
and 6 of 6: 0.0156

9 of 12: 0.078
10 of 12: 0.0295

Statistics For Abx

Reply #94
Quote
I think it is enough to display what the overall alpha would be at any particular lookpoint assuming the listener were to stop at that point.
For what? Wouldn't it be possible to achieve a score below 0.05 but still fail the 28-profile test?

What correct conclusions could be drawn from the displayed information?

Statistics For Abx

Reply #95
I think the test should be strictly pass or fail. I think the statistics become shaky if we try to go beyond that. How would we interpret an overall alpha?

What about 10/12 versus 11/12? Both are possible with the current plan. They are both passing scores but the calculated overall alphas are different (right?). I think distinguishing these scores with overall alphas would be tricky (i.e., how would one interpret this?).

Another possibility is to terminate the test once 10/11 is reached because 10/12 (and therefor as pass) has already been achieved. The same would apply to 14/17, 17/22, and 20/27.

Statistics For Abx

Reply #96
Quote
I think the test should be strictly pass or fail. I think the statistics become shaky if we try to go beyond that. How would we interpret an overall alpha?

What about 10/12 versus 11/12? Both are possible with the current plan. They are both passing scores but the calculated overall alphas are different (right?). I think distinguishing these scores with overall alphas would be tricky (i.e., how would one interpret this?).

I agree. It would be possible to claim differences between different results in a fixed-length test, e.g. 12/16 or 15/16, because the probability to score the same or a better result (=alpha!!), which can be calculated easily in this case, is different.
But with the 28-profile (or any test with look-points) it's not trivial to determine which scores are better than a given one.

What we can calculate is the probabilty to pass the entire test by guessing. Nothing more should be shown to the user.

Quote
Another possibility is to terminate the test once 10/11 is reached because 10/12 (and therefor as pass) has already been achieved. The same would apply to 14/17, 17/22, and 20/27.

True. But maybe this would be too confusing?

Statistics For Abx

Reply #97
I still don't see the problem with calculating and displaying an overall alpha.

10 of 12 is 0.0295
11 of 12 is 0.0171
12 of 12 is impossible

Given that a listener must have terminated if he achieved 6 of 6.  The procedure (and therefore the exact odds of getting to any particular point) are completely prescribed now.

ff123

Statistics For Abx

Reply #98
Quote
Wouldn't it be possible to achieve a score below 0.05 but still fail the 28-profile test?

No.  Once a score of 0.05 is achieved at a stop point, the test is forced to stop.

ff123

Also, if a listener refuses to continue to a look-point, his overall alpha is not displayed, and he is not considered to have passed the ABX even if he stops at a score like 10 out of 11.  This is one disadvantage of displaying the number correct at every trial.  Misunderstandings about what constitutes a passing score might sometimes arise if the listener thinks he can stop at any time.  If no score is displayed, the listener knows that he must get to a stop point to see both the number correct and the overall alpha.

ff123

Statistics For Abx

Reply #99
I think shday's idea is a good one.  I think what you lose is the ability to get a lower overall alpha at the look point, but what you gain, of course is a time savings.  That might be worth it.  The other advantage is that there is then no conflict over showing the number correct for every trial.  Nobody gets unfairly penalized for not continuing a test at 10 of 11, because the program will automatically stop and count this as a success.

Let me change the simulation at home tonight to take into account an early termination and see what pops out, but I think it should be fine.

ff123