IPB

Welcome Guest ( Log In | Register )

Personal Listening Test of Opus, Celt, AAC at 75-100kbps, ABC/HR blind test, 1 Listener
Kamedo2
post Nov 17 2012, 09:25
Post #1





Group: Members
Posts: 220
Joined: 16-November 12
From: Kyoto, Japan
Member No.: 104567



Abstract:
Blind Comparison between 2012/09 new Opusenc(tfsel5), old Celtenc 0.11.2, Apple AAC-LC tvbr, cvbr.
This is an English version of my original post in Japanese. http://d.hatena.ne.jp/kamedo2/20121116/1353099244#seemore

Encoders:
libopus 0.9.11-146-gdc4f83b-exp_analysis
https://people.xiph.org/~greg/opus-tools_exp_dc4f83be.zip
celt-0.11.2-win32
https://people.xiph.org/~greg/celt-0.11.2-win32.zip
qaac 1.40
qaac 1.40

Settings:
opusenc --bitrate 66 input.wav output.wav
celtenc input.48k.raw --bitrate 75 --comp 10 output.wav
qaac --cvbr 72 -o output.m4a input.wav
qaac --tvbr 27 -o output.m4a input.wav
opusenc --bitrate 90 input.wav output.wav
celtenc input.48k.raw --bitrate 100 --comp 10 output.wav
qaac --cvbr 96 -o output.m4a input.wav
qaac --tvbr 45 -o output.m4a input.wav

Samples:
20 Sounds of various genres, from easy to modestly critical.
http://zak.s206.xrea.com/bitratetest/main.htm
To download, access to the link above, 2nd paragraph, 3rd-6th links. (40_30sec - Run up)

Hardwares:
Sony PSP-3000 + RP-HT560(1st) , RP-HJE150(2nd), took the average of the two results.

Results:




Conclusions & Observations:
I could not detect a significant improvement in the new September 1st version of Opus, from the old Celtenc in 2011.
It's possibly because the new Opus inflates bitrates more than it improves qualities, although the set of sounds contain easy samples.
On 75kbps, Opus/Celt are markedly better. On 100kbps, there is no big difference between those codecs.

Raw data:
40 Logs and encoders, decoders log
http://zak.s206.xrea.com/bitratetest/log_o...kbps100kbps.zip
CODE
% Opus, AAC 75kbps, 100kbps ABC/HR Score
% This format is compatible with my graphmaker, as well as ff123's FRIEDMAN.
opus_75k    celt_75k    cvbr_75k    tvbr_75k    opus100k    celt100k    cvbr100k    tvbr100k
%features 6 75kbps 75kbps 75kbps 75kbps 100kbps 100kbps 100kbps 100kbps
%features 7 OPUS OPUS AAC-LC AAC-LC OPUS OPUS AAC-LC AAC-LC
3.050    3.100    2.500    2.750    3.500    3.750    3.700    3.800    
3.750    2.950    2.700    2.750    4.050    3.800    4.000    3.950    
2.800    2.550    3.000    3.000    3.600    3.250    4.050    3.900    
2.700    3.150    2.350    2.300    3.350    3.800    3.600    3.700    
4.000    3.400    2.850    2.850    4.350    3.900    3.550    3.550    
2.600    2.550    2.800    2.800    3.350    3.150    3.950    3.900    
3.400    3.950    3.000    3.200    3.850    4.500    3.700    3.800    
3.450    3.500    2.900    2.800    3.850    4.050    4.050    4.150    
2.950    2.700    3.550    3.450    3.250    3.450    4.000    3.850    
3.100    3.400    2.750    2.600    3.800    3.850    4.150    4.000    
3.350    3.100    2.600    2.600    3.750    3.400    3.450    3.500    
3.750    3.350    2.800    2.950    4.050    3.750    3.800    3.850    
3.550    3.300    2.600    2.650    4.250    3.950    3.750    3.600    
3.100    3.350    2.750    2.550    3.650    3.700    3.850    3.800    
3.400    3.450    2.900    2.900    3.650    3.950    3.750    3.900    
3.250    3.300    2.750    2.800    3.650    3.850    3.950    3.750    
3.600    3.800    3.300    3.300    3.550    4.000    3.650    3.700    
3.700    3.350    3.300    3.300    3.900    3.650    4.100    4.000    
3.100    3.600    3.150    3.000    3.700    3.800    4.100    3.850    
3.650    4.050    3.000    2.900    4.050    4.250    3.750    3.550

It's not strange that some scores get 0.050 scale because I did tests twice per each music.
Go to the top of the page
+Quote Post
 
Start new topic
Replies
Kamedo2
post Nov 23 2012, 21:51
Post #2





Group: Members
Posts: 220
Joined: 16-November 12
From: Kyoto, Japan
Member No.: 104567



I measured an average bitrate over wide range of normal 15 musics(63min).

opusenc --bitrate 66 input.wav output.opus
65.9kbps

celtenc input.48k.raw --bitrate 75 --comp 10 output.oga
73.4kbps

qaac --cvbr 72 -o output.m4a input.wav
75.2kbps

qaac --tvbr 27 -o output.m4a input.wav
66.0kbps

opusenc --bitrate 90 input.wav output.opus
88.9kbps

celtenc input.48k.raw --bitrate 100 --comp 10 output.oga
98.9kbps

qaac --cvbr 96 -o output.m4a input.wav
100.0kbps

qaac --tvbr 45 -o output.m4a input.wav
91.9kbps

The 15 songs I used is, of course, different from 20 samples I used in this listening test.
I understood the importance of calibration, but still I'm reluctant to use these value as the x-axis of the bitrate vs quality graph;

Imagine you make a x=height vs y=bodyweight scatter graph over many people, and you plot Alice's height as x and Bob's weight as y; That's a chimera.
When I plot Charlie, I use Charlie's height and Charlie's weight. When I plot Dave, I use Dave's height and Dave's weight.
Go to the top of the page
+Quote Post
jmvalin
post Nov 24 2012, 19:49
Post #3


Xiph.org Speex developer


Group: Developer
Posts: 487
Joined: 21-August 02
Member No.: 3134



QUOTE (Kamedo2 @ Nov 23 2012, 15:51) *
The 15 songs I used is, of course, different from 20 samples I used in this listening test.
I understood the importance of calibration, but still I'm reluctant to use these value as the x-axis of the bitrate vs quality graph;

Imagine you make a x=height vs y=bodyweight scatter graph over many people, and you plot Alice's height as x and Bob's weight as y; That's a chimera.
When I plot Charlie, I use Charlie's height and Charlie's weight. When I plot Dave, I use Dave's height and Dave's weight.


The method certainly has drawbacks, but it's not as silly as you might think. Here's another way to reason about it. I have a music player with a fixed capacity (e.g. 8 GB) and I need to fit my entire music collection on it. I compute the average rate I can afford and use that to encode all songs. Now imagine I wanted to figure out which codec to use. I can apply that process with many different codecs, and then compare the codecs to see which one provides me with the best music quality. Ideally, I'd listen to the each song, but that's way too long. The alternative is to pick only a relatively small sample. I *could* pick my samples at random, but in practice, I probably want to bias my selection towards harder samples (because not OK to have 90% good files with 10% awful ones), while still having "normal" samples too. So I do the listening test on those samples and pick the best codec. When I do that, I have used the quality of a few samples as the y axis, with the rate over a large sample as the x axis. And it makes sense to do so.
Go to the top of the page
+Quote Post
Kamedo2
post Nov 25 2012, 15:34
Post #4





Group: Members
Posts: 220
Joined: 16-November 12
From: Kyoto, Japan
Member No.: 104567



QUOTE (jmvalin @ Nov 25 2012, 03:49) *
QUOTE (Kamedo2 @ Nov 23 2012, 15:51) *
The 15 songs I used is, of course, different from 20 samples I used in this listening test.
I understood the importance of calibration, but still I'm reluctant to use these value as the x-axis of the bitrate vs quality graph;

Imagine you make a x=height vs y=bodyweight scatter graph over many people, and you plot Alice's height as x and Bob's weight as y; That's a chimera.
When I plot Charlie, I use Charlie's height and Charlie's weight. When I plot Dave, I use Dave's height and Dave's weight.


The method certainly has drawbacks, but it's not as silly as you might think. Here's another way to reason about it. I have a music player with a fixed capacity (e.g. 8 GB) and I need to fit my entire music collection on it. I compute the average rate I can afford and use that to encode all songs. Now imagine I wanted to figure out which codec to use. I can apply that process with many different codecs, and then compare the codecs to see which one provides me with the best music quality. Ideally, I'd listen to the each song, but that's way too long. The alternative is to pick only a relatively small sample. I *could* pick my samples at random, but in practice, I probably want to bias my selection towards harder samples (because not OK to have 90% good files with 10% awful ones), while still having "normal" samples too. So I do the listening test on those samples and pick the best codec. When I do that, I have used the quality of a few samples as the y axis, with the rate over a large sample as the x axis. And it makes sense to do so.


One of the big objective of this test is to compare Opus with Celt.
Do VBR-enabled Opus offer even better efficiency than Celt? Or did developers do some silly things and efficiency isn't improved at all?

Let's assume the latter. Suppose Celt and Opus are two ideal CBR and VBR encoders with exactly the same performance.
This ideal CBR encoder do not change the bitrate at all and harder songs occupy lower place on a bitrate-vs-quality plot.
This ideal VBR encoder do not change the quality at all and harder songs occupy right-side on the bitrate-vs-quality plot.
There are a normal music collection of 21 songs, each 4 minutes. Assume we have a 63MB of storage device, and we test which encoder offers better quality.
However, ABC/HRing the entire music collection is painstaking so we omit 8 easiest songs, and test only harder 13 songs.
We know only the 13 quality of harder songs, but we know 8 extra bitrate and we can use both 13 songs average and 21 songs average bitrate.

Which bitrate should we use?


Remember, these two encoders are of exactly the same performance.
Compare the CBR's consensus plot(red) with my plot(blue). They imply they are roughly the same performance.
Then, compare the red plot with a dark blue plot. The same implication is not there and the graph suggests superiority of VBR.
(Some may say it is superior because of it's smaller quality distribution; it's an ideal encoder and actual behavior may vary. Look at the errorbar of the actual Opus!)

I think the emphasis for bitrate calibration in hydrogenaudio is well deserved and I respect it. From now on, if I were to reproduce the same test, I'll
calibrate the bitrate on larger collection so that the bitrate is roughly the same. That's what people really want to know.

I'm going to use the bitrate of set of sample used only when the problem explained above can actually happen; the overemphasis on hard samples and use of CBR codecs.
Go to the top of the page
+Quote Post

Posts in this topic
- Kamedo2   Personal Listening Test of Opus, Celt, AAC at 75-100kbps   Nov 17 2012, 09:25
- - C.R.Helmrich   Thanks for this interesting test, Kamedo, and welc...   Nov 17 2012, 10:28
|- - Kamedo2   QUOTE (C.R.Helmrich @ Nov 17 2012, 18:28)...   Nov 17 2012, 10:40
- - Dynamic   Thank you for your time and dedication, Kamedo2 ...   Nov 17 2012, 10:48
- - IgorC   Kamedo2, Thank You for all your tests. Glad to see...   Nov 17 2012, 11:04
- - Kamedo2   The samples I used The ABX criteria is 12/15(p=0...   Nov 17 2012, 18:44
- - Anakunda   QUOTE (Kamedo2 @ Nov 17 2012, 09:25) Blin...   Nov 17 2012, 23:45
|- - C.R.Helmrich   QUOTE (Anakunda @ Nov 18 2012, 00:45) Is ...   Nov 18 2012, 00:54
|- - Kamedo2   QUOTE (Anakunda @ Nov 18 2012, 07:45) Is ...   Nov 18 2012, 05:47
- - Dynamic   QUOTE x-axis=actual bitrate That was one query I ...   Nov 18 2012, 09:20
|- - Kamedo2   QUOTE (Dynamic @ Nov 18 2012, 17:20) QUOT...   Nov 18 2012, 10:23
- - Dynamic   Thank you for the clarification. It seems that du...   Nov 18 2012, 20:07
- - IgorC   Kamedo2, I'm not here to criticize your test...   Nov 18 2012, 20:12
|- - Kamedo2   QUOTE (IgorC @ Nov 19 2012, 04:12) As all...   Nov 19 2012, 02:17
|- - jmvalin   Hi Kamedo2, thanks for the test. From what I see, ...   Nov 19 2012, 20:03
- - lvqcl   I took my Opus compile (libopus v1.0.1-140-gc55f4d...   Nov 19 2012, 20:32
|- - Kamedo2   QUOTE (jmvalin @ Nov 20 2012, 04:03) it w...   Nov 19 2012, 23:19
|- - jmvalin   QUOTE (Kamedo2 @ Nov 19 2012, 17:19) Acco...   Nov 20 2012, 18:39
- - Dynamic   Thanks again to everyone in this thread. I'm c...   Nov 20 2012, 03:03
|- - jmvalin   QUOTE (Dynamic @ Nov 19 2012, 21:03) We d...   Nov 20 2012, 05:13
- - C.R.Helmrich   QUOTE (Kamedo2 @ Nov 17 2012, 10:25) Samp...   Nov 20 2012, 23:01
- - Kamedo2   QUOTE (jmvalin @ Nov 20 2012, 13:13) CELT...   Nov 21 2012, 02:08
|- - jmvalin   QUOTE (Kamedo2 @ Nov 20 2012, 20:08) I as...   Nov 21 2012, 03:15
|- - Kamedo2   QUOTE (jmvalin @ Nov 21 2012, 11:15) Actu...   Nov 21 2012, 05:37
|- - jmvalin   QUOTE (Kamedo2 @ Nov 20 2012, 23:37) I me...   Nov 22 2012, 02:45
|- - Kamedo2   QUOTE (jmvalin @ Nov 22 2012, 10:45) QUOT...   Nov 22 2012, 07:33
|- - jmvalin   QUOTE (Kamedo2 @ Nov 22 2012, 01:33) Ther...   Nov 22 2012, 18:10
- - Kamedo2   Bitrate vs Score plot of the 20 samples used. Opu...   Nov 21 2012, 03:26
- - Dynamic   I think the objectives in tests (experiments) matt...   Nov 22 2012, 19:03
|- - Kamedo2   QUOTE (Dynamic @ Nov 23 2012, 03:03) Ques...   Nov 22 2012, 22:42
- - Kamedo2   QUOTE rjamorim: There's some inverse proportio...   Nov 23 2012, 16:35
- - Kamedo2   I measured an average bitrate over wide range of n...   Nov 23 2012, 21:51
|- - jmvalin   QUOTE (Kamedo2 @ Nov 23 2012, 15:51) The ...   Nov 24 2012, 19:49
|- - Kamedo2   QUOTE (jmvalin @ Nov 25 2012, 03:49) QUOT...   Nov 25 2012, 15:34
- - Kamedo2   My post #34 might be too difficult. I wish I had b...   Nov 25 2012, 21:35
- - IgorC   Interesting. The Opus'es scores have less devi...   Nov 26 2012, 03:08
|- - Kamedo2   QUOTE (IgorC @ Nov 26 2012, 11:08) The Op...   Nov 26 2012, 23:56
|- - DonP   QUOTE (IgorC @ Nov 25 2012, 21:08) In thi...   Jan 3 2013, 02:57
- - Dynamic   Once again, Kamedo2, I applaud you for your testin...   Nov 26 2012, 21:57
|- - Kamedo2   QUOTE (Dynamic @ Nov 27 2012, 05:57) As I...   Nov 27 2012, 00:14
- - jmvalin   Kamedo2, can you give 1.1-alpha a try? It includes...   Jan 3 2013, 01:09
- - Kamedo2   QUOTE (jmvalin @ Jan 3 2013, 09:09) Kamed...   Jan 5 2013, 11:50


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 26th December 2014 - 12:24