IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
statistics code
ff123
post Sep 24 2001, 00:07
Post #1


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



Cool, I'm the first to post in this forum ;-)

I've written a command-line tool in C to perform Friedman-type analyses of codec ratings. The source and win32 executable can be found in a zipfile at:

http://www.worldzonesupport.com/~fastforw/...friedman100.zip

type:

friedman aq1.txt

to perform an analysis of Roel's first AQ test. It should not be hard to insert this code into a server tool which performs automatic analysis after a listener submission.

ff123
Go to the top of the page
+Quote Post
Dibrom
post Sep 24 2001, 00:17
Post #2


Founder


Group: Admin
Posts: 2958
Joined: 26-August 02
From: Nottingham, UK
Member No.: 1



Welcome ff123 biggrin.gif

I'll have to take a look at your utility, I'm sure it could come in quite handy. Thanks for distributing the source and such smile.gif
Go to the top of the page
+Quote Post
ff123
post Sep 26 2001, 17:43
Post #3


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



The more I read about ABC/HR (hidden reference), the more I think this would be a nice way to test for small impairments (maybe not desirable for testing at 128 kbit/s or lower, though). Short description: A is always the reference. B and C are randomly assigned to be the test sample or the reference file. The listener rates B against A, and C against A. For each test, the listener is thus rating the reference file along with the test sample.

Such a method makes it easier to post-screen listeners as described in ITU-R BS. 1116-1.

ff123
Go to the top of the page
+Quote Post
ff123
post Sep 29 2001, 21:32
Post #4


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



I've updated the command-line statistical tool to be able to perform a blocked ANOVA analysis, along with the corresponding Fisher multimean analysis. The blocked ANOVA analysis assumes more (normal distribution of listeners, equal interval rating scale), but is more powerful than the Friedman if those assumptions are met.

To do:

1. I forgot to output the ANOVA table (there is a specific format people are used to seeing), but I did verify the final results against the example in my book.

2. Add Tukey's HSD for both the blocked ANOVA and the Friedman rank analysis. This will produce conservative results compared with the Fisher results, because it is a simultaneous multimean comparison, meaning that the result taken as a whole is significant to the desired p, instead of just the individual comparisons.

3. Add a test for normality (if I can find the appropriate reference).

I've included Roel's AQ test 1 data in the archive. A blocked ANOVA analysis can be run on it by typing:

friedman -a aq1.txt

The zip archive is located at:

http://www.worldzonesupport.com/~fastforw/...n/friedman110.z ip

ff123
Go to the top of the page
+Quote Post
Garf
post Sep 29 2001, 22:51
Post #5


Server Admin


Group: Admin
Posts: 4885
Joined: 24-September 01
Member No.: 13



Err, I think you forgot to include the AQ test 1 data. No .txt in there

I'd like to point out two things just so there's no confusion:

a) the AQ test data is not normal, so the results you'll get from doing a blocked ANOVA on it are not reliable (equal interval scale also may be doubtfull)

b) the 'The following comparisons are each true with 95.0 percent confidence' is misleading because it hides the number of actual comparisons done and the significance levels of each result.

For example: if I look at the results I see 4 pairs each with 95% confidence. As explained before, the chance that one of those is wrong is greater than 5%. Emitting that data makes it impossible to determine how big exactly.

Edit: for example, if I know there were 8 codecs, using the data (all 95% confidence) I'd get that the chance that all four presented results are correct is as low as 24%. I assume it's actually higher, but there's no way to tell from the output.

I think that presenting the data in that way makes it too easy for someone who isn't aware of the details behind it to make a wrongfull conclusion.

Of course, depending on what you want to achieve with the utility this may or may not matter.

(ack, two boards, two posts, two threads?)

--
GCP
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 16th September 2014 - 18:46