IPB

Welcome Guest ( Log In | Register )

2 Pages V   1 2 >  
Reply to this topicStart new topic
Personal Listening Test of MP3 encoders at 224kbps, ABC/HR blind test, 1 Listener
Kamedo2
post May 16 2013, 17:59
Post #1





Group: Members
Posts: 220
Joined: 16-November 12
From: Kyoto, Japan
Member No.: 104567



Abstract:
Blind Comparison between Lame 3.100i V2+, Lame 3.99 V1, LAME 3.98 CBR 224kbps -q 0 , Helix -V146, BladeEnc CBR 224kbps(low anchor).

Encoders:
LAME 3.100i
http://www.hydrogenaudio.org/forums/index....showtopic=99483
LAME 3.99.5 VBR V1
http://www.rarewares.org/mp3-lame-bundle.php
LAME 3.98.4 CBR 224kbps -q 0(slowest)
Helix mp3enc v5.1 Open Source encoder 2005-12-20 -V146
http://www.rarewares.org/mp3-others.php
BladeEnc 0.94.2 CBR 224kbps (low anchor)

Settings:
lame3100i -S -V2+ input.wav output.mp3
lame3.99.5 -S -V1 input.wav output.mp3
lame3.98.4 -S -q 0 -b 224 input.wav output.mp3
hmp3 input.wav output.mp3 -X2 -U2 -V146
bladeenc -quit -nocfg input.wav output.mp3 -224

Samples:
25 Sounds of various genres.

Hardwares:
Sony PSP-3000 + RP-HT560.

Results:



Conclusions & Observations:
I could not a significant difference except the low anchor. There are no big differences in the average quality of these four encoders.

Anova analysis:
CODE
FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/
Blocked ANOVA analysis

Number of listeners: 25
Critical significance: 0.05
Significance of data: 0.00E+000 (highly significant)
---------------------------------------------------------------
ANOVA Table for Randomized Block Designs Using Ratings

Source of Degrees Sum of Mean
variation of Freedom squares Square F p

Total 124 23.56
Testers (blocks) 24 7.75
Codecs eval'd 4 9.24 2.31 33.80 0.00E+000
Error 96 6.56 0.07
---------------------------------------------------------------
Fisher's protected LSD for ANOVA: 0.147

Means:

Helix-V1 3.100iV2 3.98CBR 3.99V1 BladeEnc
4.60 4.58 4.57 4.54 3.90

---------------------------- p-value Matrix ---------------------------

3.100iV2 3.98CBR 3.99V1 BladeEnc
Helix-V1 0.829 0.706 0.419 0.000*
3.100iV2 0.871 0.553 0.000*
3.98CBR 0.666 0.000*
3.99V1 0.000*
-----------------------------------------------------------------------

Helix-V146 is better than BladeEncCBR
3.100iV2+ is better than BladeEncCBR
3.98CBR is better than BladeEncCBR
3.99V1 is better than BladeEncCBR

Raw data:
CODE
% MP3 224kbps ABC/HR Score
% This format is compatible with my graphmaker, as well as ff123's FRIEDMAN.
3.100iV2+ 3.99V1 3.98CBR Helix-V146 BladeEncCBR
%feature 7 LAME LAME LAME Other Other
4.700 4.600 4.000 4.300 3.100
4.300 4.200 4.600 4.800 3.800
4.500 4.500 4.400 5.000 4.700
4.800 5.000 4.600 5.000 4.300
4.700 4.500 4.200 4.400 3.500
4.700 4.300 5.000 4.600 4.500
4.400 5.000 3.800 4.700 3.900
4.200 4.500 4.400 4.500 3.800
4.300 4.200 4.000 4.500 3.200
4.400 4.300 5.000 4.600 3.400
4.000 4.300 4.500 4.600 3.500
4.500 4.200 4.400 4.600 3.600
4.200 4.500 5.000 4.700 4.000
4.300 4.100 4.300 4.100 3.500
4.200 4.200 4.400 4.600 3.900
5.000 4.500 5.000 5.000 4.100
5.000 4.300 4.700 4.400 4.000
4.500 4.400 4.200 4.000 3.200
5.000 5.000 5.000 4.500 4.400
5.000 5.000 4.700 5.000 4.400
5.000 4.800 4.800 4.600 4.200
4.700 5.000 5.000 4.500 3.900
4.800 5.000 5.000 4.600 4.200
5.000 5.000 4.700 5.000 4.200
4.400 4.100 4.600 4.400 4.100
%samples 41_30sec hihats
%samples finalfantasy cemb
%samples ATrain Jazz
%samples BigYellow Pops
%samples FloorEssence Techno
%samples macabre orch
%samples mybloodrusts guitar
%samples Quizas Latin
%samples VelvetRealm Techno
%samples Amefuribana Pops
%samples Trust Gospel
%samples Waiting Rock
%samples Experiencia Latin
%samples Heart to Heart Pops
%samples Tom's Diner Vocal
%samples Reunion Blues Jazz
%samples French Speech
%samples undelete Pops
%samples Dimmu Borgir Metal
%samples Run up Pops
%samples German Speech
%samples ItCouldBeSweet Pops
%samples OnTheRoofWith Pops
%samples easy game Pops
%samples Tears Infection Pops

Bitrates:
CODE
259222 250962 224500 270110 224109
212206 190894 224404 200702 224012
210626 223963 224651 248472 224040
256744 246869 224559 243848 224081
272211 268745 224645 225813 224060
212126 222561 224771 234717 224101
227229 274100 224802 226008 224228
252353 237478 224475 243034 224060
264467 270433 225449 245293 224317
230309 214030 224517 219325 224051
243315 240742 224427 240749 224024
232944 226612 224709 251829 224129
256994 236299 224619 229343 224034
245990 237097 224533 243966 224121
220848 204723 224825 225298 224235
226596 224930 224500 248784 224110
274433 235266 224408 176326 224012
240666 234456 224458 230376 224032
218750 222924 224796 232772 224208
234844 237687 224774 229799 224189
210745 167946 224583 104856 224026
219320 180893 224796 161194 224211
211124 209905 224500 200196 224110
214539 204183 224648 182829 224167
226791 224161 224673 215760 224121
average:
235016 227514 224641 221256 224112
Go to the top of the page
+Quote Post
Gainless
post May 16 2013, 19:46
Post #2





Group: Members
Posts: 173
Joined: 28-October 11
Member No.: 94764



Thanks for the work, nice to have such a detailed comparison. Did you use the -HF switch with the Helix encoder, btw?
Go to the top of the page
+Quote Post
halb27
post May 16 2013, 20:26
Post #3





Group: Members
Posts: 2439
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



It must have been a hard test, thank you very much.
Very interesting result. From this using 3.100i -V2+ isn't very useful compared to using -V1.

This post has been edited by halb27: May 16 2013, 20:32


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
Kamedo2
post May 17 2013, 19:04
Post #4





Group: Members
Posts: 220
Joined: 16-November 12
From: Kyoto, Japan
Member No.: 104567



QUOTE (Gainless @ May 17 2013, 03:46) *
Thanks for the work, nice to have such a detailed comparison. Did you use the -HF switch with the Helix encoder, btw?

hmp3 input.wav output.mp3 -X2 -U2 -V146
I didn't use the -HF switch, as the default option is typically the most recommended option by the developer(s).
But it may be interesting to test -HF, along with 3.100 alpha2, and 3.99.5f. I won't start testing them now because I'll be
very busy until June and rainy season starts from June.
Go to the top of the page
+Quote Post
halb27
post May 17 2013, 19:24
Post #5





Group: Members
Posts: 2439
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



I welcome very much if you could test Lame3.100 alpha2 as well. With your current test lame3.100i stands against 3.99.5 which is not the same basis (though I don't think things will change essentially, I even expect 3.100 alpha2 to come out a little bit better than 3.99.5 does).


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
greynol
post May 17 2013, 19:39
Post #6





Group: Super Moderator
Posts: 10040
Joined: 1-April 04
From: San Francisco
Member No.: 13167



It seems the basis here is ~224 kbits. If there is a desire to determine if there are advantages between Lame versions using the same V level, run a new test and present the results in a new discussion rather than attempt to co-opt this one.


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
halb27
post May 17 2013, 19:45
Post #7





Group: Members
Posts: 2439
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



???
I think it's interesting to see how 3.100a2 -V1 compares against 3.99.5 -V1 in its own right.
Sure I'm interested to see how 3.100i -V2+ compares against 3.100a2 -V1 (because the underlying basis is the same - except for the -V level of course).


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
greynol
post May 17 2013, 19:54
Post #8





Group: Super Moderator
Posts: 10040
Joined: 1-April 04
From: San Francisco
Member No.: 13167



Neither of those are on-topic. See TOS #5 and #7 if you have any questions. Further posts will on the matter will be binned.


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
Destroid
post May 18 2013, 01:06
Post #9





Group: Members
Posts: 554
Joined: 4-June 02
Member No.: 2220



Astounding listening test, and quite interesting. I am a long-time fan of HMP3 (a great time saver on a hum-drum machine).
QUOTE (Kamedo2 @ May 17 2013, 18:04) *
I didn't use the -HF switch, as the default option is typically the most recommended option by the developer(s).
But it may be interesting to test -HF, along with 3.100 alpha2, and 3.99.5f.
I wanted to add that simply adding -HF1 / -HF2 will shift the bitrate slightly higher. Here's my quick test results (optional remarks follow):
CODE
-X2 -U2 -B146 ....... ~234kbps
-X2 -U2 -B146 -HF1 .. ~235kbps
-X2 -U2 -B145 -HF1 .. ~233kbps
-X2 -U2 -B143 -HF2 .. ~234kbps
-X2 -U2 -B142 -HF2 .. ~233kbps

Note: all these bitrates are above 224 kbps,
I am just going with the OP's switches.

Note2: Spectrogram observations of HMP3 regarding bitrate increase
(NOT a quality metric)
- Without -HF the material is clearly cut-off above 16kHz;
- Using -HF1 looks similar to LAME 3.99 -V2/-V3 (some material encoded >16kHz);
- Using -HF2 encodes like LAME 3.99 -V0/-V1 (gradual roll-off between 16-20kHz).
In regards to the 16kHz cutoff and -HF switches, you can see from the OP that the
results are not as dramatic as some may believe smile.gif


This post has been edited by Destroid: May 18 2013, 01:13


--------------------
"Something bothering you, Mister Spock?"
Go to the top of the page
+Quote Post
IgorC
post May 19 2013, 04:10
Post #10





Group: Members
Posts: 1577
Joined: 3-January 05
From: ARG/RUS
Member No.: 18803



Kamedo2,
Thank You Very Much for sharing this test here. Great one! smile.gif
Have followed your tests since some time ago and it's clear for me why You've used CBR for LAME 3.98.4. Because it performs better http://d.hatena.ne.jp/kamedo2/20111214

I was reading your test and trying to process an information. Here are my thoughts.
Well, except the low anchor, all encoders are on par. But some additional analysis would be useful to get some extra conclusions.
  • A lowest score per encoder.
    All individual scores are >= 4.0 per sample. Except 3.98.4 CBR. That's where CBR fails imo. I would rather prefer a bit lower average score while scores for each particular sample would stay at least at 4.0. It's only one sample where 3.98.4 CBR did worse than 4.0. Yes, but it also does the same in your previous test http://d.hatena.ne.jp/kamedo2/20111214
  • It's hardly a coincidence that Helix MP3 encoder ends up with a slightly higher score each time, as here http://listening-tests.hydrogenaudio.org/s...8-1/results.htm and http://www.hydrogenaudio.org/forums/index....st&p=808142 . Helix encoder is 7 years old and it still shines.
  • All average scores are >4.5 (except the low anchor) and You are the experienced listener. It means these encoders will be transparent for an averaged listener.
  • The halb’s 3.100i V2 looks good. A bit higher average score comparing between LAME encoders (though no significant difference making statistical analysis, but still) and all individual scores are higher than or equal to 4.0.
Go to the top of the page
+Quote Post
Kamedo2
post May 19 2013, 07:07
Post #11





Group: Members
Posts: 220
Joined: 16-November 12
From: Kyoto, Japan
Member No.: 104567



QUOTE (IgorC @ May 19 2013, 12:10) *

Yes, my thought is the same. Even when the difference of Helix MP3 encoder over LAME is slight, I like the way how Helix behaves. The number of badly encoded samples is low.
I collected 3 different test results and combined the results in one image. Many people will use encoders in many bitrates and settings, and this collection represents a fair approximation of these overall average quality people will experience. Average score: LAME3.98=4.27 Helix=4.33, Number of samples: 25+20+14=59

CODE
%Kamedo2's Personal Listening Test of MP3 224kbps
LAME3.98 Helix
4 4.3
4.6 4.8
4.4 5
4.6 5
4.2 4.4
5 4.6
3.8 4.7
4.4 4.5
4 4.5
5 4.6
4.5 4.6
4.4 4.6
5 4.7
4.3 4.1
4.4 4.6
5 5
4.7 4.4
4.2 4
5 4.5
4.7 5
4.8 4.6
5 4.5
5 4.6
4.7 5
4.6 4.4

%IgorC's Personal Listening Test of MP3 encoders (part II) LAME vs Helix MP3 encoders at 130 kbps.
4.1 4
3.9 3.8
3.1 3
3.4 3.8
4 3.7
3.2 3.9
4.1 4.2
3.3 3
2 4.5
4.4 3.8
3.2 3.1
4.3 4.1
4.4 3.6
3.1 4
4 4.5
4.5 3.3
3.9 4.3
4 4.3
3.3 3
4.2 4.2


%Results of the public MP3 listening test @ 128 kbps (October 2008)
3.68 4.74
4.34 4.67
4.64 4.6
4.12 4.39
4.58 4.75
4.65 4.77
4.55 4.8
4.57 4.41
4.82 4.22
4.79 4.59
4.75 4.08
4.44 4.74
4.62 4.7
4.54 4.75

Go to the top of the page
+Quote Post
Dynamic
post May 19 2013, 11:09
Post #12





Group: Members
Posts: 825
Joined: 17-September 06
Member No.: 35307



Thanks for the effort and ability you put into this substantial test, Kamedo2.

I wonder if this thread should be added to the Wikipedia list of Codec Listening Tests (which includes those high bitrate individual tests by Guruboolez some years ago).

I also tend to look at the lower bound and/or the tightness of the distribution to attempt to reduce the likelihood of really nasty artifacts, though my artifact detection training is fairly poor. The problem, as always, is there might be extreme cases that one psymodel just doesn't deal with adequately that are missed in the test corpus, but a general idea of the spread and lower bounds of quality is still helpful.

HELIX VBR seems to do very well at 128 kbps and 224 kbps, and I'd feel confident using it anywhere from 128kbps upwards.

Pros and Cons
  • Encoding speed: Helix MP3 wins
  • Quality (~128 to ~224kbps): Helix MP3 and LAME tie
  • Gapless support: LAME wins


I do use Helix at ~131kbps for loudness-levelled background music compilations on hardware where gapless support is impossible. Otherwise, easy gapless support in sufficiently good players keeps me using LAME, and I'm happy that the likes of Amazon use LAME at around -V0 for that reason.


Halb27's special LAME -Vn+ does also have specific uses for certain types of track (e.g. solo harpsichord or other music having heavy transients with strong steady tones). I haven't completely kept up with how the main LAME3.100 copes with these (I think there's some improvement over 3.99), but the strategy of providing maximum bitrate for short blocks seemed to work for halb27's version where LAME 3.99 and Helix MP3 both fall down unless the bitrate gets very high generally. I might well adopt that version for specific types of content or to fix a problem sample.
Go to the top of the page
+Quote Post
Kamedo2
post May 21 2013, 15:57
Post #13





Group: Members
Posts: 220
Joined: 16-November 12
From: Kyoto, Japan
Member No.: 104567



QUOTE (Dynamic @ May 19 2013, 19:09) *
I wonder if this thread should be added to the Wikipedia list of Codec Listening Tests (which includes those high bitrate individual tests by Guruboolez some years ago).

Yes, it should be. And Hydrogenaudio knowledgebase too.
Go to the top of the page
+Quote Post
Kohlrabi
post May 21 2013, 16:30
Post #14





Group: Super Moderator
Posts: 1081
Joined: 12-March 05
From: Kiel, Germany
Member No.: 20561



QUOTE (Kamedo2 @ May 21 2013, 16:57) *
QUOTE (Dynamic @ May 19 2013, 19:09) *
I wonder if this thread should be added to the Wikipedia list of Codec Listening Tests (which includes those high bitrate individual tests by Guruboolez some years ago).

Yes, it should be. And Hydrogenaudio knowledgebase too.
With the conclusion that at these high bitrates all modern encoders perform equally well?

This post has been edited by Kohlrabi: May 21 2013, 16:31


--------------------
Ceterum censeo Masterdiskem esse delendam.
Go to the top of the page
+Quote Post
Kamedo2
post May 21 2013, 18:09
Post #15





Group: Members
Posts: 220
Joined: 16-November 12
From: Kyoto, Japan
Member No.: 104567



QUOTE (Kohlrabi @ May 22 2013, 00:30) *
QUOTE (Kamedo2 @ May 21 2013, 16:57) *
QUOTE (Dynamic @ May 19 2013, 19:09) *
I wonder if this thread should be added to the Wikipedia list of Codec Listening Tests (which includes those high bitrate individual tests by Guruboolez some years ago).

Yes, it should be. And Hydrogenaudio knowledgebase too.
With the conclusion that at these high bitrates all modern encoders perform equally well?

Yes, the conclusion is 4-way tie (all except BladeEnc). The word 'tie' is preferred over 'equal', for obvious reasons.
Go to the top of the page
+Quote Post
greynol
post May 21 2013, 18:16
Post #16





Group: Super Moderator
Posts: 10040
Joined: 1-April 04
From: San Francisco
Member No.: 13167



BladeEnc != modern encoder


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
Kamedo2
post May 21 2013, 19:15
Post #17





Group: Members
Posts: 220
Joined: 16-November 12
From: Kyoto, Japan
Member No.: 104567



QUOTE (greynol @ May 22 2013, 02:16) *
BladeEnc != modern encoder

That's why I said 4-way tie. (I assume 3 Lame encoders and Helix are the modern encoders. These 4 encoders are the winner and BladeEnc is the obvious loser.)
Go to the top of the page
+Quote Post
greynol
post May 21 2013, 21:51
Post #18





Group: Super Moderator
Posts: 10040
Joined: 1-April 04
From: San Francisco
Member No.: 13167



I just like stating the obvious. wink.gif


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
shadowking
post May 24 2013, 09:09
Post #19





Group: Members
Posts: 1527
Joined: 31-January 04
Member No.: 11664



Very good and interesting test. It shows that for a modern mp3 encoder above a certain threshold (192k) - the bitrate is a strong indicator of quality no matter VBR / CVBR or CBR. Also it prove as I've said before that CBR will not 'starve' of bits given sufficient bitrate and the popular 320 CBR encodings on the internet are a huge waste as 224 yields an excellent quality.

This post has been edited by shadowking: May 24 2013, 09:16


--------------------
Wavpack -b450s0.7
Go to the top of the page
+Quote Post
Gecko
post May 24 2013, 18:23
Post #20





Group: Members
Posts: 945
Joined: 15-December 01
From: Germany
Member No.: 662



Thank you very much for the test!

I realize you have extensive artifact training, but I am still surprised that so few samples are 100% transparent.
Go to the top of the page
+Quote Post
greynol
post May 24 2013, 19:01
Post #21





Group: Super Moderator
Posts: 10040
Joined: 1-April 04
From: San Francisco
Member No.: 13167



It seems that if you're particularly sensitive to pre-echo then just about anything with hard transients won't be transparent with mp3, even at 320kbits.


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
Kamedo2
post May 24 2013, 20:11
Post #22





Group: Members
Posts: 220
Joined: 16-November 12
From: Kyoto, Japan
Member No.: 104567



The ABX criteria was 15+/20(p=0.02). All samples and all encoders were ABXed 20 times. So there were 25(samples) x 5(encoders) = 125 tests, of which 25 tests I failed and thus scored 5.0(Correct answer:14 or less)
The 15+/20 criteria allows me to fail up to 25% of the blind tests, so it explains why only 20% of them were transparent.

The software to plot the graph and table in this result thread. Web application. Feel free to use it.
http://zak.s206.xrea.com/bitratetest/graphmaker3.htm
Help page:
http://zak.s206.xrea.com/bitratetest/faq.htm

QUOTE (Dynamic @ May 19 2013, 19:09) *
I wonder if this thread should be added to the Wikipedia list of Codec Listening Tests (which includes those high bitrate individual tests by Guruboolez some years ago).

I refrain from adding the result in wikipedia, because writing articles about oneself or one's own result is what should be avoided.
Go to the top of the page
+Quote Post
greynol
post May 24 2013, 20:45
Post #23





Group: Super Moderator
Posts: 10040
Joined: 1-April 04
From: San Francisco
Member No.: 13167



It would help since Guruboolez's data is classical-centric, and, quite frankly, I'm tired of seeing his results get raised during discussions where they aren't a good fit.


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
Gecko
post May 24 2013, 22:07
Post #24





Group: Members
Posts: 945
Joined: 15-December 01
From: Germany
Member No.: 662



QUOTE (greynol @ May 24 2013, 20:01) *
It seems that if you're particularly sensitive to pre-echo then just about anything with hard transients won't be transparent with mp3, even at 320kbits.

I'm only familiar with some of the samples, but would you say that most of them contain hard transients? The "speech" and "vocal" samples don't but still are not transparent.

Kamedo, could you maybe elaborate a little on the problems you heard?
Go to the top of the page
+Quote Post
greynol
post May 24 2013, 22:52
Post #25





Group: Super Moderator
Posts: 10040
Joined: 1-April 04
From: San Francisco
Member No.: 13167



It was a general comment.

Stop consonants in speech could be qualified as hard transients, however.


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post

2 Pages V   1 2 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 22nd October 2014 - 22:57