Second in the series of 128 tests

Topic: Second in the series of 128 tests (Read 11781 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Second in the series of 128 tests

2001-10-29 06:57:26

Your participation is invited.

Test of various audio codecs which average about 128 kbit/s:
MP3, AAC, Ogg Vorbis, MPC, and WMA8

See http://ff123.net/128test/instruct.html

for instructions. I have posted RAR'd binaries to the title: "128 kbit/s listening test" in alt.binaries.test. There are also binaries available from my site.

Please do not discuss your results here!

ff123

Second in the series of 128 tests

Reply #1 – 2001-10-29 07:52:18

Nice instruction page...

I hope to participate as soon as I get writing a paper out of the way How long is this test scheduled to run?

Second in the series of 128 tests

Reply #2 – 2001-10-29 08:35:24

No Psytel again :-( ... Im sure you have your reasons (as in Liquid is the best in its class) but I would have liked to see it there....

...I look forward to the results (will they differ much compared to the last set using doggies?)

Cheers,
-Nic

Second in the series of 128 tests

Reply #3 – 2001-10-29 09:56:03

Vorbis RC3

Second in the series of 128 tests

Reply #4 – 2001-10-29 10:43:31

[deleted]

Second in the series of 128 tests

Reply #5 – 2001-10-29 11:50:18

Hmm, how's RC3 alpha supposed to compare against RC3? Last time I checked Monty was concentrading on bitrate control for streaming modes. Hopefully it means that the quality tweakings which will be implemented in RC3 are mostly done. (?)

Second in the series of 128 tests

Reply #6 – 2001-10-29 14:43:53

Quote

Hmm, how's RC3 alpha supposed to compare against RC3? Last time I checked Monty was concentrading on bitrate control for streaming modes. Hopefully it means that the quality tweakings which will be implemented in RC3 are mostly done. (?)

This is my understanding of it. Anyway, I have at least a week to complete the test before the real RC3 is out :-)

Quote

No Psytel again :-( ... Im sure you have your reasons (as in Liquid is the best in its class) but I would have liked to see it there....

Ivan is running his own tests of psytel at 128.

Quote

I hope to participate as soon as I get writing a paper out of the way How long is this test scheduled to run?

There's no hurry. Probably at least a couple weeks.

ff123

Second in the series of 128 tests

Reply #7 – 2001-11-12 12:36:43

Sorry FF, that I still haven't done the test, I have just started though. How long it will be going and how many people have attended so far?

I hope people will help FF out by doing the test! Otherwise he will need a quantum computer to calculate results which have statistical significance.

http://ff123.net/128test/instruct.html

Second in the series of 128 tests

Reply #8 – 2001-11-12 15:41:19

The test can run indefinitely, but I'll probably release comments and individual ratings after a decent amount of time, or if I ever get 30 people to rate the files.

I updated the results at:

http://ff123.net/128test/interim.html

ff123

Second in the series of 128 tests

Reply #9 – 2002-01-02 09:17:35

Just a note that monty has tried his hand at the tests and has posted some interesting comments. Participants have access to the comments pages, which are currently private. For those who don't have access but are interested in what monty had to say, take the tests, and I will give you the links

ff123

Second in the series of 128 tests

Reply #10 – 2002-01-04 07:43:50

I'm a bit confused as to how we're supposed to do the test. It says to give a rating indicating whether we hear a difference from the original, but all i see for download are ZIPs which each have six WAVs, presumably the results of the six different codecs; where do we get the original samples to compare with?

Second in the series of 128 tests

Reply #11 – 2002-01-04 09:05:28

The originals are included along with the six encodes in the zip archives.

ff123

Second in the series of 128 tests

Reply #12 – 2002-01-04 09:08:20

Quote

Originally posted by ff123
The originals are included along with the six encodes in the zip archives.

Well, I'm confused then, because I see exactly six samples in each archive, presumably the six encodes. They're all labeled with what appear to be random numbers, none is labeled "original" or anything of that sort to distinguish it. Perhaps I am just being retarded at 3am, but I can't seem to find them...

FWIW I downloaded the WAV archives.

Second in the series of 128 tests

Reply #13 – 2002-01-04 09:19:05

Ah, shit, I fucked those up (the plain zips were added a couple days ago by request from somebody). I'll add the originals as a separate zip file.

ff123

Edit: Ok, I've fixed it. You'll have to download another 2 to 3 MB, depending on the sample.

Second in the series of 128 tests

Reply #14 – 2002-01-06 04:48:19

Some lessons I learned from this test:

1. I should have chosen more difficult samples. Although the best listeners (for example, Monty) could reliably hear what was wrong with nearly every encoded file and could describe what he heard in great detail (he said of the others' comments: "looking at these results, I have to wonder how many people bothered even listening"), the results from fossiles and rawhide indicates that even at 128 kbit/s, many samples will be essentially transparent to many people. More difficult samples may be less representative of normal music, but the results will be more reliable.

2. I erred on some of the settings. For AAC, I should have chosen "transparent 128" (VBR), but lowpassed at 16 kHz. Liquid Audio is probably still the AAC codec to beat at 128, but I would perform some pretests vs. Psytel -internet to find out for sure for the next test. Also, I'm beginning to think that FastEnc would be a better choice for the "good" mp3. And it might be worthwhile investigating what RealAudio can do.

3. The next group test I organize will use ABC/HR. More listeners don't necessarily mean better results. One listener, if he is good enough, can yield better results than twenty untrained listeners. Hopefully ABC/HR can help to identify and remove noisy listeners from the data set.

4. I would like to have had at least a dozen different samples to test. But this is highly unrealistic for a web-based test of different formats. FTP/web space and bandwidth is one issue. Download time is another. I don't know of a good way around this difficulty.

ff123

Second in the series of 128 tests

Reply #15 – 2002-01-06 10:58:26

Sounds great. But erm.. can you explain ABC/HR to us? Thanks.

Second in the series of 128 tests

Reply #16 – 2002-01-06 11:49:26

Quote

Originally posted by ff123
4. I would like to have had at least a dozen different samples to test. But this is highly unrealistic for a web-based test of different formats. FTP/web space and bandwidth is one issue. Download time is another. I don't know of a good way around this difficulty.

ff123

Having the different test samples hosted by different people? It might not be as reliable regarding availability, but better than nothing I gather.

Second in the series of 128 tests

Reply #17 – 2002-01-06 16:43:47

Quote

Sounds great. But erm.. can you explain ABC/HR to us? Thanks.

See this thread:

http://www.hydrogenaudio.org/forums/showth...s=&threadid=633

A method for post-screening noisy listeners is discussed in ITU-R BS.1116-1

ff123

Second in the series of 128 tests

Reply #18 – 2002-01-08 01:06:09

I'm going to close the test on 1-12-02.

Looks like the results are stable, if not significant on two of the samples. Looks like the pre-RC3 test is past its time, now that some fixes have been incorporated into the official RC3.

ff123

Second in the series of 128 tests

Reply #19 – 2002-01-13 05:47:49

The second test is now closed, and comments are now linked from the main test page:

http://ff123.net/128test/instruct.html

ff123

Second in the series of 128 tests

Reply #20 – 2002-01-13 11:47:19

Minor nitpick: Your page still states 'Ogg Vorbis RC3 has not yet been released'.

Maybe also clarify that it may improve the quality of the encoded files.

Edit: in the interm results page:

The next test I organize will hopefully use a tool better suited to post-screening, such that results from listeners who consistently rate the original better than encoded files will be discarded.

Didn't you mean it the other way around?

--
GCP

Second in the series of 128 tests

Reply #21 – 2002-01-13 13:29:52

I guess the results for wayitis (piano heavy) confirm what was a generally held perception:

ogg, mpc and aac are better than wma8 with 95% confidence that the results are not due to chance alone.

Stick that up your jumper MS!

Second in the series of 128 tests

Reply #22 – 2002-01-13 19:06:25

About the weird results of listener 28 of wayitis.wav:

I have plotted some of the error function curves I made for dogies.wav. I ranked the listeners by sensitivity to artifacts, assuming that the lowest total score indicated the most sensitive listener. Then I plotted the curves of ratings vs. ranked listener. You can see these graphs at:

http://ff123.net/128test/outlier.html

The third most sensitive listener is listener 28 of the raw data. (BTW, xiphmont is the most sensitive listener for this sample). You can see, especially for Xing, WMA8, and MPC, that this listener's results are highly at odds with the trend.

I don't know what formal statistical test could test for such outliers, and even if I did find them, what I should do with them. But it's interesting that this person should have such opposing preferences, when I found the quality ranking to be so clear-cut (as did many others).

ff123

Second in the series of 128 tests

Reply #23 – 2002-01-14 06:49:44

Why don't you analyse and publish the results without listener 28 for comparison purposes. There must be a statistical validity of some type for excluding "wonky' data. I think the plots you have shown above indicate that listener 28 is an "outllier".

Can't you just exclude him on the basis of being more than 2 standard deviations from the mean?

Second in the series of 128 tests

Reply #24 – 2002-01-14 07:19:31

Well, I think listener 28's data is a valid ranking - it's quite possible that he is sensitive to certain artifacts that most people are not, and not sensitive to those that most people are sensitive to. I'm not sure how statistically one should go about averaging the data; perhaps it would be useful to do some sort of post-screening to break people down into groups based on hearing and preference (i.e. "most sensitive to pre-echo", "most sensitive to treble distortion," "most sensitive to bass scratching," and so on). Then you'd get results like "for people most sensitive to pre-echo, Ogg RC3 is best," rather than blanket preference claims that might not be true for everyone.

Notice