IPB

Welcome Guest ( Log In | Register )

Personal Listening Test of Opus, Celt, AAC at 75-100kbps, ABC/HR blind test, 1 Listener
Kamedo2
post Nov 17 2012, 09:25
Post #1





Group: Members
Posts: 219
Joined: 16-November 12
From: Kyoto, Japan
Member No.: 104567



Abstract:
Blind Comparison between 2012/09 new Opusenc(tfsel5), old Celtenc 0.11.2, Apple AAC-LC tvbr, cvbr.
This is an English version of my original post in Japanese. http://d.hatena.ne.jp/kamedo2/20121116/1353099244#seemore

Encoders:
libopus 0.9.11-146-gdc4f83b-exp_analysis
https://people.xiph.org/~greg/opus-tools_exp_dc4f83be.zip
celt-0.11.2-win32
https://people.xiph.org/~greg/celt-0.11.2-win32.zip
qaac 1.40
qaac 1.40

Settings:
opusenc --bitrate 66 input.wav output.wav
celtenc input.48k.raw --bitrate 75 --comp 10 output.wav
qaac --cvbr 72 -o output.m4a input.wav
qaac --tvbr 27 -o output.m4a input.wav
opusenc --bitrate 90 input.wav output.wav
celtenc input.48k.raw --bitrate 100 --comp 10 output.wav
qaac --cvbr 96 -o output.m4a input.wav
qaac --tvbr 45 -o output.m4a input.wav

Samples:
20 Sounds of various genres, from easy to modestly critical.
http://zak.s206.xrea.com/bitratetest/main.htm
To download, access to the link above, 2nd paragraph, 3rd-6th links. (40_30sec - Run up)

Hardwares:
Sony PSP-3000 + RP-HT560(1st) , RP-HJE150(2nd), took the average of the two results.

Results:




Conclusions & Observations:
I could not detect a significant improvement in the new September 1st version of Opus, from the old Celtenc in 2011.
It's possibly because the new Opus inflates bitrates more than it improves qualities, although the set of sounds contain easy samples.
On 75kbps, Opus/Celt are markedly better. On 100kbps, there is no big difference between those codecs.

Raw data:
40 Logs and encoders, decoders log
http://zak.s206.xrea.com/bitratetest/log_o...kbps100kbps.zip
CODE
% Opus, AAC 75kbps, 100kbps ABC/HR Score
% This format is compatible with my graphmaker, as well as ff123's FRIEDMAN.
opus_75k    celt_75k    cvbr_75k    tvbr_75k    opus100k    celt100k    cvbr100k    tvbr100k
%features 6 75kbps 75kbps 75kbps 75kbps 100kbps 100kbps 100kbps 100kbps
%features 7 OPUS OPUS AAC-LC AAC-LC OPUS OPUS AAC-LC AAC-LC
3.050    3.100    2.500    2.750    3.500    3.750    3.700    3.800    
3.750    2.950    2.700    2.750    4.050    3.800    4.000    3.950    
2.800    2.550    3.000    3.000    3.600    3.250    4.050    3.900    
2.700    3.150    2.350    2.300    3.350    3.800    3.600    3.700    
4.000    3.400    2.850    2.850    4.350    3.900    3.550    3.550    
2.600    2.550    2.800    2.800    3.350    3.150    3.950    3.900    
3.400    3.950    3.000    3.200    3.850    4.500    3.700    3.800    
3.450    3.500    2.900    2.800    3.850    4.050    4.050    4.150    
2.950    2.700    3.550    3.450    3.250    3.450    4.000    3.850    
3.100    3.400    2.750    2.600    3.800    3.850    4.150    4.000    
3.350    3.100    2.600    2.600    3.750    3.400    3.450    3.500    
3.750    3.350    2.800    2.950    4.050    3.750    3.800    3.850    
3.550    3.300    2.600    2.650    4.250    3.950    3.750    3.600    
3.100    3.350    2.750    2.550    3.650    3.700    3.850    3.800    
3.400    3.450    2.900    2.900    3.650    3.950    3.750    3.900    
3.250    3.300    2.750    2.800    3.650    3.850    3.950    3.750    
3.600    3.800    3.300    3.300    3.550    4.000    3.650    3.700    
3.700    3.350    3.300    3.300    3.900    3.650    4.100    4.000    
3.100    3.600    3.150    3.000    3.700    3.800    4.100    3.850    
3.650    4.050    3.000    2.900    4.050    4.250    3.750    3.550

It's not strange that some scores get 0.050 scale because I did tests twice per each music.
Go to the top of the page
+Quote Post
 
Start new topic
Replies
Dynamic
post Nov 22 2012, 19:03
Post #2





Group: Members
Posts: 812
Joined: 17-September 06
Member No.: 35307



I think the objectives in tests (experiments) matter. And the questions they aim to answer should determine the primary graphs produced. Scatter plots with actual bitrates in test samples are often not important to the question being asked but may be interesting secondary information.

I suspect for a lot of consumers, bitrate really amounts to a proxy for "How much music can I store on my device?" or "How much space can I save in my music to take photos on my smartphone?".

Instantaneous bitrate or even bitrate over a whole song isn't that important to them.

They may also ask: "What's the most music I can store on my device at a reasonably good quality?"

The meaning of reasonably good quality will vary.

For some, perceptible but not annoying differences may be tolerable much of the time, with just once or twice every few hours something mildly annoying and unmusical being noticed.
For others, entirely transparent most of the time, but once or twice every few hours, some difference that's perceptible but not annoying being noticed is their quality lower limit.

These are just two examples from the range of possible requirements.

For example, if testing 160 variants (e.g. 20 samples over 2 bitrates of 4 encoders or 10 samples over 4 bitrates of 4 encoders)

Question 1:
At the same bitrate over a general representative collection (thus being able to store the same amount of music on my device), which encoder offers the best quality?

(This is the question your tests are roughly set up to answer)

Question 2:
At the same bitrate over a general representative collection (thus being able to store the same amount of music on my device), what typical quality can we expect from each encoder (ignoring problem samples unless they account for 5% or more of general music).

(This is the question 'some people' seem to want answered)

Each question might warrant a different test:

Question 1 might warrant a large number of problem samples of various classes to make rare annoyances easy to detect and penalize the encoder that deals with problem samples worse. This might be best for people who wish to avoid the occasional annoying and unmusical artifacts.

Question 2 might warrant a collection of normal samples, not known to be problematic, to get a representative idea of typical quality, giving people an idea which bitrates might suit them. This might be best for people who are more forgiving of occasional artifacts, but would start to penalize any encoder frequently exposes artifacts rather than rarely. It might also be sensible to include the same 'typical' samples over a range of a few bitrates and limit the number of samples in the test corpus (e.g. make it 10 samples but test over 4 bitrates per encoder) to match this aim.

There are reasons of academic interest and limited practical use to the generic listener, where you'd really want to see a scatter plot to know either how variable VBR might be in each encoder, or to compare quality with actual bitrate to get an idea of bitrate-efficiency of the coding tools available in a particular format. Even then, the length of the sample may have a big influence on the reported bitrate, e.g. a 2 second period requiring lots of high bitrate short blocks within a 30 second clip of otherwise ordinary sound, will exhibit a lower bitrate than the same 2 seconds within a 5 second clip even though the instantaneous bit rate of the difficult part would be the same. Then again, there are some samples like fatboy that occur almost throughout the song (Kalifornia by Fatboy Slim).

In a sense it ought to be made clear to the average listener that the average bitrate line, not the scattered bitrate is what's important for the question of "How much music can I fit on my device" or "How much space will be left on my phone's microSD card for taking photos and videos?" and the y-axis height is the important point regarding quality, where the average quality may have some importance, the spread of quality values might also be important (less spread usually being good) and the lowest quality reported also having some importance (if you'll be rather unforgiving of a really nasty artifact, for example - in my case it's 'birdies' and 'warbling' in bad MP3 encoders that drive me crazy).

I think Question 1 is where you're leaning when saying that changing the type of sample used changes the quality (easy sample - higher quality) even for VBR.
I think Question 2 is the sort of listening test 'some people' have asked you for.

Where many of us at Hydrogen Audio differ from you is in our belief that average bitrate over a collection of CDs is the important bitrate calibration even for Question 1.

When making an experiment to test Question 1, we use hard samples to differentiate encoders more easily and find out which is best. At the same time, we realise that the reported Quality is more representative of rare problem cases only, and is therefore penalised, and is not representative of normal music, so we wouldn't claim to have useful information about general quality that can be expected.

Question 2 is actually quite hard to test, partly because the Quality is often very close to transparent (5.0) especially at around 96-100 kbps on recent encoders with easy samples.

I guess a useful graph for a single codec and mode (e.g. Opus in VBR mode) could be one that showed Quality (y) versus Average Bitrate (x) but plotted as two different lines at each bitrate.
The upper line (blue) would be general music quality (based on 5 to 20 samples of normal music of various genres). This might start rather low at 32kbps, increase at 48 kbps, get pretty high at 64 kbps and reach 5.0 at 96 kbps
The lower line (red) would be 'problem sample quality' where a collection of typical codec-killers is used (fatboy, tomsdiner, eig, etc.). This might start terribly low at 32kbps, still be pretty bad at 48, get a bit better and 64 kbps and reach somethiing like 4.0 at 96 kbps for example, and if extended to, say 128 kbps, it might get quite close to 5.0 for example.

The lines could also be fit lines, with quality scatter points above and below them to give an indication of the spread of quality for general samples (blue) and for problem samples (red).

The results of such a test might be relatively informative especially in terms of total space occupied by your music or total music duration per amount of storage space (e.g. expressed in hours per Gigabyte or Gigabytes per 10 hours) rather than focusing on bitrate.

Go to the top of the page
+Quote Post
Kamedo2
post Nov 22 2012, 22:42
Post #3





Group: Members
Posts: 219
Joined: 16-November 12
From: Kyoto, Japan
Member No.: 104567



QUOTE (Dynamic @ Nov 23 2012, 03:03) *
Question 1:
At the same bitrate over a general representative collection (thus being able to store the same amount of music on my device), which encoder offers the best quality?

(This is the question your tests are roughly set up to answer)

This is relatively easy to answer. If an occasional problem sample matters and want to avoid ugly artifacts, this test tells how much of these exceptionally bad moments exist.

QUOTE (Dynamic @ Nov 23 2012, 03:03) *
Question 2:
At the same bitrate over a general representative collection (thus being able to store the same amount of music on my device), what typical quality can we expect from each encoder (ignoring problem samples unless they account for 5% or more of general music).

(This is the question 'some people' seem to want answered)

The problem of using a general representative collection is I don't know the quality of the collection. I may know in the future, after 3, 4, or 5 month of listening tests, but not now. Don't expect me to know.
But, something similar can be done. I removed problem samples, namely, finalfantasy(harpsichord), FloorEssence(techno), VelvetRealm(techno, sharp attack), Tom's Dinar(Woman's a cappella).
I replotted the bitrate vs graph on the remaining 16 samples.



Hope it helps.
Go to the top of the page
+Quote Post

Posts in this topic
- Kamedo2   Personal Listening Test of Opus, Celt, AAC at 75-100kbps   Nov 17 2012, 09:25
- - C.R.Helmrich   Thanks for this interesting test, Kamedo, and welc...   Nov 17 2012, 10:28
|- - Kamedo2   QUOTE (C.R.Helmrich @ Nov 17 2012, 18:28)...   Nov 17 2012, 10:40
- - Dynamic   Thank you for your time and dedication, Kamedo2 ...   Nov 17 2012, 10:48
- - IgorC   Kamedo2, Thank You for all your tests. Glad to see...   Nov 17 2012, 11:04
- - Kamedo2   The samples I used The ABX criteria is 12/15(p=0...   Nov 17 2012, 18:44
- - Anakunda   QUOTE (Kamedo2 @ Nov 17 2012, 09:25) Blin...   Nov 17 2012, 23:45
|- - C.R.Helmrich   QUOTE (Anakunda @ Nov 18 2012, 00:45) Is ...   Nov 18 2012, 00:54
|- - Kamedo2   QUOTE (Anakunda @ Nov 18 2012, 07:45) Is ...   Nov 18 2012, 05:47
- - Dynamic   QUOTE x-axis=actual bitrate That was one query I ...   Nov 18 2012, 09:20
|- - Kamedo2   QUOTE (Dynamic @ Nov 18 2012, 17:20) QUOT...   Nov 18 2012, 10:23
- - Dynamic   Thank you for the clarification. It seems that du...   Nov 18 2012, 20:07
- - IgorC   Kamedo2, I'm not here to criticize your test...   Nov 18 2012, 20:12
|- - Kamedo2   QUOTE (IgorC @ Nov 19 2012, 04:12) As all...   Nov 19 2012, 02:17
|- - jmvalin   Hi Kamedo2, thanks for the test. From what I see, ...   Nov 19 2012, 20:03
- - lvqcl   I took my Opus compile (libopus v1.0.1-140-gc55f4d...   Nov 19 2012, 20:32
|- - Kamedo2   QUOTE (jmvalin @ Nov 20 2012, 04:03) it w...   Nov 19 2012, 23:19
|- - jmvalin   QUOTE (Kamedo2 @ Nov 19 2012, 17:19) Acco...   Nov 20 2012, 18:39
- - Dynamic   Thanks again to everyone in this thread. I'm c...   Nov 20 2012, 03:03
|- - jmvalin   QUOTE (Dynamic @ Nov 19 2012, 21:03) We d...   Nov 20 2012, 05:13
- - C.R.Helmrich   QUOTE (Kamedo2 @ Nov 17 2012, 10:25) Samp...   Nov 20 2012, 23:01
- - Kamedo2   QUOTE (jmvalin @ Nov 20 2012, 13:13) CELT...   Nov 21 2012, 02:08
|- - jmvalin   QUOTE (Kamedo2 @ Nov 20 2012, 20:08) I as...   Nov 21 2012, 03:15
|- - Kamedo2   QUOTE (jmvalin @ Nov 21 2012, 11:15) Actu...   Nov 21 2012, 05:37
|- - jmvalin   QUOTE (Kamedo2 @ Nov 20 2012, 23:37) I me...   Nov 22 2012, 02:45
|- - Kamedo2   QUOTE (jmvalin @ Nov 22 2012, 10:45) QUOT...   Nov 22 2012, 07:33
|- - jmvalin   QUOTE (Kamedo2 @ Nov 22 2012, 01:33) Ther...   Nov 22 2012, 18:10
- - Kamedo2   Bitrate vs Score plot of the 20 samples used. Opu...   Nov 21 2012, 03:26
- - Dynamic   I think the objectives in tests (experiments) matt...   Nov 22 2012, 19:03
|- - Kamedo2   QUOTE (Dynamic @ Nov 23 2012, 03:03) Ques...   Nov 22 2012, 22:42
- - Kamedo2   QUOTE rjamorim: There's some inverse proportio...   Nov 23 2012, 16:35
- - Kamedo2   I measured an average bitrate over wide range of n...   Nov 23 2012, 21:51
|- - jmvalin   QUOTE (Kamedo2 @ Nov 23 2012, 15:51) The ...   Nov 24 2012, 19:49
|- - Kamedo2   QUOTE (jmvalin @ Nov 25 2012, 03:49) QUOT...   Nov 25 2012, 15:34
- - Kamedo2   My post #34 might be too difficult. I wish I had b...   Nov 25 2012, 21:35
- - IgorC   Interesting. The Opus'es scores have less devi...   Nov 26 2012, 03:08
|- - Kamedo2   QUOTE (IgorC @ Nov 26 2012, 11:08) The Op...   Nov 26 2012, 23:56
|- - DonP   QUOTE (IgorC @ Nov 25 2012, 21:08) In thi...   Jan 3 2013, 02:57
- - Dynamic   Once again, Kamedo2, I applaud you for your testin...   Nov 26 2012, 21:57
|- - Kamedo2   QUOTE (Dynamic @ Nov 27 2012, 05:57) As I...   Nov 27 2012, 00:14
- - jmvalin   Kamedo2, can you give 1.1-alpha a try? It includes...   Jan 3 2013, 01:09
- - Kamedo2   QUOTE (jmvalin @ Jan 3 2013, 09:09) Kamed...   Jan 5 2013, 11:50


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 16th September 2014 - 02:34