Vorbis Listening Test

Topic: Vorbis Listening Test (Read 77763 times) previous topic - next topic

0 Members and 2 Guests are viewing this topic.

Vorbis Listening Test

2004-04-03 14:41:42

After much discussion and misunderstanding (mostly on my part), I think it is about time we start testing the Vorbis encoders and eventually choose the best one to participate in Roberto's 128 kbps multiformat test, which will start on April 14. We should come to a decision by April 10 which gives us about 7 days, starting from now.

Due to a lack of time, we can't make this listening test as formal as the other ones, so it will be a self-prepared test, where each listener will prepare the coded test set and ABC/HR tests themselves. Ultimately that means it is possible to fudge the results but I have full confidence that it's not going to happen since there is barely any zealotry within the Vorbis community itself.

The Encoders

The Vorbis encoders and respective quality values, to be tested are:

1. Vorbis 1.0.1 CVS at q 4.25: http://www.rarewares.org/files/ogg/oggenc2.3CVSMinGW.zip
2. aoTuV 20040402 at q 4: http://www.geocities.jp/aoyoume/aotuv/test.html
3. Modest Tuning beta 3 at q 4: http://homepage3.nifty.com/nyaochi/soft/dist/oggencmtb3.zip
4. QKTune beta 3.2 at q 4.25: http://www.rarewares.org/quantumknot/oggencqk32.exe

Optional: GT3b2 with HF reduction at q 4.25: http://www.rarewares.org/quantumknot/oggenchfr.exe (late addition to the list)

The above quality values have been determined by guruboolez to give approximately 128 kbps on normal music. Given the time constraints (and given how we only got these to go by for now), they will do for now.

The Test Samples

The test samples that will be used are from ff123's 64 kbps listening test. There are 12 samples which can be downloaded from http://ff123.net/samples.html (right at the bottom of the page).

Test Results

As for the results, tabular format, such as this one would be nice and you can upload or post it in this thread. Otherwise, you can e-mail your ABC/HR results to: s dot so at griffith dot edu dot au

If you e-mail your result, please write a small post here to let me know so that I can confirm the submission was successful.

The test will end on April 10. Hopefully I can inform Roberto of the result the next day.

If there is anything I've missed or forgot to mention, please tell me ASAP and I'll make the changes/additions here. I apologise for the lack of co-ordination or notice since things have come upon me suddenly. We should manage to get some decent results.

Many thanks,

QK

EDIT: Edited aoTuV link
EDIT 2: Changed to mingw32 compile of 1.0.1 CVS
EDIT 3: Added GT3b2 with hfr. If you are already well into the test before I made this addition and don't feel like redoing everything again, continue with the original 4.
EDIT 4: I messed up the GT3b2/hfr link which pointed to a QK3.2 and GT3b2 combo....replaced with proper GT3b2/hfr binary

Vorbis Listening Test

Reply #1 – 2004-04-03 15:03:54

aoTuV 20040402 A link place is different.
It is as follows correctly.
http://www.geocities.jp/aoyoume/aotuv/aotu...nt_20040402.zip

However, since you may not be able to carry out direct download, please choose "aoTuV experiment [20040402]" from the following pages.
aoTuV test page

Thanks

Vorbis Listening Test

Reply #2 – 2004-04-03 15:17:00

A small step-by-step HOWTO do the test, to make things easier:

1) download all 5 Encoders listed in QuantumKnot's Post and place/unpack them in the same folder
2) download FLAC and OGGDEC and extract flac.exe and oggdec.exe to the same folder
3) download the samples you want to test from ff123's 64 kbit/s listening test samples (right at the bottom of the page)
4) copy paste the following code to a *.txt file:

Code: [Select]

flac -d -o sample.wav Layla.flac

oggenc23 -q4,25 sample.wav -o sample_101.ogg
encoder -q4 sample.wav
oggencmtb3 -q4 sample.wav -o sample_mtb3.ogg
oggencqk32 -q4,25 sample.wav -o sample_qk32.ogg
oggenchfr.exe -q4,25 sample.wav -o sample_gt3b2hfr.ogg

oggdec sample_101.ogg
oggdec sample.wav.ogg -o sample_aotuv.wav
oggdec sample_mtb3.ogg
oggdec sample_qk32.ogg
oggdec sample_gt3b2hfr.ogg

del sample_101.ogg
del sample.wav.ogg
del sample_mtb3.ogg
del sample_qk32.ogg
del sample_gt3b2hfr.ogg

5) VERY IMPORTANT: if you live in the us or japan replace the "-q4,25" with "-q4.25", if you live in europe (dunno about britain) leave it as is, otherwise you will only encode with "-q4", which wouldnt be good
6) replace "layla.flac" with the actual name of the file you want to test
7) rename your *.txt to *.bat and execute it (now the listening samples will get prepared for testing)

download the ABC-HR listening test tool
9) open it and go to "file -> setup test", under "orig wav" open the sample.wav you have on your disc now, under "wav" the other .wav files (dont care about offset)
10) enjoy your listening test

Quote

1. Vorbis 1.0.1 CVS at q 4.25: http://www.rarewares.org/files/ogg/oggenc2.3CVS.zip
4. QKTune beta 3.2 at q 4.25: http://www.rarewares.org/quantumknot/oggencqk32.exe

note that with these encoders -q 4.25 will not use 4.25 but 4!
you have to use -q 4,25 if you are in europe!

edit: added a small how to

Vorbis Listening Test

Reply #3 – 2004-04-03 15:21:16

Quote

Quote
1. Vorbis 1.0.1 CVS at q 4.25: http://www.rarewares.org/files/ogg/oggenc2.3CVS.zip
4. QKTune beta 3.2 at q 4.25: http://www.rarewares.org/quantumknot/oggencqk32.exe

note that with these encoders -q 4.25 will not use 4.25 but 4!
you have to use -q 4,25!

hmm....could be a Windows localisation problem.

On mine:

Code: [Select]

E:\vsamples>oggencqk32 -q 4 violin.wav

NOTE: This version of QKTune beta 3.2 is an experimental release and is not suitable for archiving. Use for testing
 only!

Opening with wav module: WAV file reader
Encoding "violin.wav" to
         "violin.ogg"
at quality 4.00
        [ 69.1%] [ 0m00s remaining] /

Done encoding file "violin.ogg"

        File length:  0m 02.0s
        Elapsed time: 0m 01.0s
        Rate:         2.7237
        Average bitrate: 133.0 kb/s


E:\vsamples>oggencqk32 -q 4.25 violin.wav

NOTE: This version of QKTune beta 3.2 is an experimental release and is not suitable for archiving. Use for testing
 only!

Opening with wav module: WAV file reader
Encoding "violin.wav" to
         "violin.ogg"
at quality 4.25
        [ 69.1%] [ 0m00s remaining] /

Done encoding file "violin.ogg"

        File length:  0m 02.0s
        Elapsed time: 0m 01.0s
        Rate:         2.7237
        Average bitrate: 138.6 kb/s


E:\vsamples>oggencqk32 -q 4,25 violin.wav

NOTE: This version of QKTune beta 3.2 is an experimental release and is not suitable for archiving. Use for testing
 only!

Opening with wav module: WAV file reader
Encoding "violin.wav" to
         "violin.ogg"
at quality 4.00
        [ 69.1%] [ 0m00s remaining] /

Done encoding file "violin.ogg"

        File length:  0m 02.0s
        Elapsed time: 0m 01.0s
        Rate:         2.7237
        Average bitrate: 133.0 kb/s

Vorbis Listening Test

Reply #4 – 2004-04-03 15:50:06

Regarding the aoTuV encoder, to encode at q 4:

Code: [Select]

encoder -q4 test.wav

Note there is no space between q and the value, which contrasts to the other oggenc encoders.

Vorbis Listening Test

Reply #5 – 2004-04-03 16:19:16

Quote

note that with these encoders -q 4.25 will not use 4.25 but 4!
you have to use -q 4,25!

It's -q 4.25 here, too. IIRC comma is used in the EU while period is used in the United States and Japan. (I don't know other region.)

Vorbis Listening Test

Reply #6 – 2004-04-03 16:30:51

Quote

Quote
1. Vorbis 1.0.1 CVS at q 4.25: http://www.rarewares.org/files/ogg/oggenc2.3CVS.zip
4. QKTune beta 3.2 at q 4.25: http://www.rarewares.org/quantumknot/oggencqk32.exe

note that with these encoders -q 4.25 will not use 4.25 but 4!
you have to use -q 4,25!

The same problem occurred with past Roberto's multiformat test. I hope your reminder will be considered with the following tests. The option of providing an "accepts only dots" version is another solution.

Vorbis Listening Test

Reply #7 – 2004-04-03 16:54:39

Garf suggested I add GT3b2 with HF reduction at q 4.25 to the list so I'll make it an optional encoder:

http://www.rarewares.org/quantumknot/oggenchfr.exe

If you're already well into the test and don't want to redo everything again, just continue with the original 4 encoders.

EDIT: I messed up The binary I originally had here (OggDropXPd) was a combo of QKTune b3.2 and GT3b2. I've updated the URL to the point to the proper GT3b2 with hfr binary. I apologise for this. It's 2.30 am here so my brain stopped a long time ago.

Vorbis Listening Test

Reply #8 – 2004-04-03 17:29:11

ok just a first small finding:

the tunings are significantly better than 1.0.1 here (on the layla sample, but still)!

so it definitely makes sense to join the test and help finding which tuning suits best!!!

Vorbis Listening Test

Reply #9 – 2004-04-04 00:07:52

Quote

,Apr 3 2004, 12:30 PM] The option of providing an "accepts only dots" version is another solution.

That's what was done in the former 128kbps test. Case provided a compile that accepted only dots.

Vorbis Listening Test

Reply #10 – 2004-04-04 11:06:51

i added a small howto to my first post to make things easier for everyone

Vorbis Listening Test

Reply #11 – 2004-04-04 11:11:47

Quote

Quote
note that with these encoders -q 4.25 will not use 4.25 but 4!
you have to use -q 4,25!

It's -q 4.25 here, too. IIRC comma is used in the EU while period is used in the United States and Japan. (I don't know other region.)

I can't comment on the specifics but in the UK the decimal point is "." not ",".

How this effects vorbis I know not.

Fairy

Vorbis Listening Test

Reply #12 – 2004-04-04 13:18:04

Quote

I can't comment on the specifics but in the UK the decimal point is "." not ",".

That was my wrong. Thanks for correction.

Vorbis Listening Test

Reply #13 – 2004-04-04 19:48:16

OK guys, I've finished. Raw results are also available here.

Vorbis Listening Test

Reply #14 – 2004-04-04 22:29:11

Quote from: harashin,Apr 4 2004, 10:48 AM

OK guys, I've finished. Raw results are also available here.

Code: [Select]

         qktune   aotuv    mtb3     101cvs   
gt3b2    0.261    0.061    0.000*   0.000*   
qktune            0.439    0.000*   0.000*   
aotuv                      0.002*   0.000*   
mtb3                                0.628    
-----------------------------------------------------------------------

gt3b2 is better than mtb3, 101cvs
qktune is better than mtb3, 101cvs
aotuv is better than mtb3, 101cvs

for harashin

gt3b2+hfr and qktune are clear winners
mtb3 and 101cvs are clear losers

Vorbis Listening Test

Reply #15 – 2004-04-05 00:47:27

Harashin: Thank you for the results. And thanks for taking the trouble to tabulate them as well

ff123: Was that analysis done using the tool you describe here? It looks perfect for choosing which encoder. So after we receive say 10 results from 10 different people, do we run this on each listener's results or on the whole group?

Vorbis Listening Test

Reply #16 – 2004-04-05 01:01:17

Quote

ff123: Was that analysis done using the tool you describe here?

Yes. That's the only analytical tool for listening test results I know of that is publicly available.

Quote

It looks perfect for choosing which encoder. So after we receive say 10 results from 10 different people, do we run this on each listener's results or on the whole group?

You must first create result tables for each sample. One column for each encoder tested, and one line for each listener.

So, you'll end up with 12 tables. Run each one of these tables through Friedman, and you'll get something similar to this:

Code: [Select]

friedman.exe -a results05.txt
FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/
Blocked ANOVA analysis

Number of listeners: 20
Critical significance:  0.05
Significance of data: 2.70E-002 (significant)
---------------------------------------------------------------
ANOVA Table for Randomized Block Designs Using Ratings

Source of         Degrees     Sum of    Mean
variation         of Freedom  squares   Square    F      p

Total               99          67.80
Testers (blocks)    19          40.26
Codecs eval'd        4           3.65    0.91    2.91  2.70E-002
Error               76          23.89    0.31
---------------------------------------------------------------
Fisher's protected LSD for ANOVA:   0.353

Means:

Compaact Real     Faac     iTunes   Nero
  4.68     4.37     4.32     4.21     4.11     <====

---------------------------- p-value Matrix ---------------------------

         Real     Faac     iTunes   Nero
Compaact 0.080    0.049*   0.011*   0.002*
Real              0.822    0.400    0.163
Faac                       0.537    0.240
iTunes                              0.574
-----------------------------------------------------------------------

Compaact is better than Faac, iTunes, Nero

(from the latest aac test)

Now, notice the line I put a <==== above. Take that same line from each sample result, and create a final table. Run this table through Friedman, and you'll get your final ranking.

Vorbis Listening Test

Reply #17 – 2004-04-05 01:32:16

Thanks Roberto.

Vorbis Listening Test

Reply #18 – 2004-04-05 02:20:05

Forgot to mention: You can use Phong's wonderful Chunky to parse the abc/hr result files into tables that can be fed to friedman. It reduces 2 hours of work to 5 seconds :B

Vorbis Listening Test

Reply #19 – 2004-04-05 10:22:21

ok i now finished all 12 samples too

i will not post my results till april 10 to avoid that anyone gets biased, but only that much:
1.0.1 is definitely last, lying 1.19 points behind the first one (funny its the same for harashin )

Vorbis Listening Test

Reply #20 – 2004-04-06 02:26:27

Quote

1.0.1 is definitely last

Nice. That definitely agrees well with the other listening tests.

Vorbis Listening Test

Reply #21 – 2004-04-06 05:10:23

Because I don't have too much time, I tested Waiting only. It's 12:03 after midnight as well... I'm not sure if the results below are valid or not, because I didn't test all the files, but since I took the time to do the test, I will post them anyway. Below, you can see that I'm not too sensitive or picky about Vorbis artifacts.

-------------------------------------------------------------------
ABC/HR Version 0.9b, 30 August 2002
Testname: Waiting - ogg

1R = C:\My Music\test_samples\waiting\QK32.wav
2L = C:\My Music\test_samples\waiting\1.0.1CVS.wav
3R = C:\My Music\test_samples\waiting\GT3b2hfr.wav
4L = C:\My Music\test_samples\waiting\MTb3.wav
5L = C:\My Music\test_samples\waiting\aoTuV20040402.wav

---------------------------------------
General Comments:
all abxable, but they are all not really annoying.
---------------------------------------
1R File: C:\My Music\test_samples\waiting\QK32.wav
1R Rating: 3.9
1R Comment:
---------------------------------------
2L File: C:\My Music\test_samples\waiting\1.0.1CVS.wav
2L Rating: 4.5
2L Comment:
---------------------------------------
3R File: C:\My Music\test_samples\waiting\GT3b2hfr.wav
3R Rating: 3.9
3R Comment:
---------------------------------------
4L File: C:\My Music\test_samples\waiting\MTb3.wav
4L Rating: 3.9
4L Comment:
---------------------------------------
5L File: C:\My Music\test_samples\waiting\aoTuV20040402.wav
5L Rating: 4.5
5L Comment:
---------------------------------------
ABX Results:
Original vs C:\My Music\test_samples\waiting\QK32.wav
7 out of 7, pval = 0.008
Original vs C:\My Music\test_samples\waiting\1.0.1CVS.wav
7 out of 7, pval = 0.008
Original vs C:\My Music\test_samples\waiting\GT3b2hfr.wav
7 out of 7, pval = 0.008
Original vs C:\My Music\test_samples\waiting\MTb3.wav
7 out of 7, pval = 0.008
Original vs C:\My Music\test_samples\waiting\aoTuV20040402.wav
7 out of 7, pval = 0.008

Vorbis Listening Test

Reply #22 – 2004-04-06 05:21:03

Thanks for the result. I'm not too sure whether it is statistically ok to include the result for just one sample. Can someone offer some advice here?

Vorbis Listening Test

Reply #23 – 2004-04-06 05:24:49

Quote

Thanks for the result. I'm not too sure whether it is statistically ok to include the result for just one sample. Can someone offer some advice here?

Of course it is. Just put it together with the other results you receive.

I get one-sample participants all the time

Vorbis Listening Test

Reply #24 – 2004-04-06 07:22:25

Finished too... My results are very different from harashin's. I'm the first surprised by my results. I could publish them, but it's maybe better to wait the end of the test.

Notice