IPB

Welcome Guest ( Log In | Register )

9 Pages V  < 1 2 3 4 > »   
Reply to this topicStart new topic
Public MP3 Listening Test @ 128 kbps - FINISHED
sld
post Nov 25 2008, 04:51
Post #26





Group: Members
Posts: 1017
Joined: 4-March 03
From: Singapore
Member No.: 5312



Regarding statistics... the confidence intervals will decrease in size if there are more participants?

Great to have updated results for 2008; thanks Sebastian!
Go to the top of the page
+Quote Post
Squeller
post Nov 25 2008, 08:46
Post #27





Group: Members
Posts: 2351
Joined: 28-August 02
Member No.: 3218



Is this claim correct? There has been no improvement on the Helix encoder since after 2005?
Go to the top of the page
+Quote Post
Sebastian Mares
post Nov 25 2008, 09:02
Post #28





Group: Members
Posts: 3630
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



QUOTE (sld @ Nov 25 2008, 04:51) *
Regarding statistics... the confidence intervals will decrease in size if there are more participants?

Great to have updated results for 2008; thanks Sebastian!


Yes, that is correct. The more people post, the shorter the confidence intervals.


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
melomaniac
post Nov 25 2008, 09:19
Post #29





Group: Members
Posts: 43
Joined: 1-August 08
From: Brussels
Member No.: 56565



I analyzed my results and the ranking of the encoders is different for each sample. So indeed there's no undisputed winner here.
Though I have Helix at the first place in some samples. I would have never expected that! Nice surprise wink.gif
Another surprise to me is that, on some samples, I found LAME 3.97 worse than Fhg or iTunes.
And finally, I don't have any results where LAME 3.97 is better than 3.98.2.

QUOTE (DigitalDictator @ Nov 24 2008, 23:56) *
This is indeed surprising. I'm sure I've seen smaller, recent, ABX-tests where Lame has outperformed Helix quite clearly. I think Guruboolez and maybe also Halb27 have done a couple, but I might be mistaken.

Last time I've seen Francis doing an MP3 listening evaluation with LAME and Helix is on this post.

QUOTE (Squeller @ Nov 25 2008, 08:46) *
Is this claim correct? There has been no improvement on the Helix encoder since after 2005?

It's correct. Here's the latest compile (v5.1 2005.08.09) used in this test.

EDIT: LAME 3.97 comments

This post has been edited by melomaniac: Nov 25 2008, 09:32
Go to the top of the page
+Quote Post
halb27
post Nov 25 2008, 09:19
Post #30





Group: Members
Posts: 2435
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



Zoomed view is formally correct, but has a tendency to have an incorrect emotional impact on the reader as it emphasizes differences. In its extreme form it can give the picture of extreme differences where in fact differences are not worth mentioning.
In case the confidence interval were not given in this test's zoomed view, only the averages, we would have this extreme form here.

Information at a glance, that's what graphs are for. They easily give a wrong impression if they're are not 'ground-based' but have a basis high in the air just a small step below the lowest results.

That's why I would prefer if there was no 'zoomed' view.


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
Squeller
post Nov 25 2008, 09:28
Post #31





Group: Members
Posts: 2351
Joined: 28-August 02
Member No.: 3218



QUOTE (halb27 @ Nov 25 2008, 10:19) *
Zoomed view is formally correct, but has a tendency to have an incorrect emotional impact on the reader as it emphasizes differences. In its extreme form it can give the picture of extreme differences where in fact differences are not worth mentioning.
In case the confidence interval were not given in this test's zoomed view, only the averages, we would have this extreme form here.

Information at a glance, that's what graphs are for. They easily give a wrong impression if they're are not 'ground-based' but have a basis high in the air just a small step below the lowest results.

That's why I would prefer if there was no 'zoomed' view.
Basically you are right, but: People with dysfunctional brains who don't find this out themselves aren't the target audience of HA I guess wink.gif

About Helix: Lets not forget Guru's listening test from 2007 where Helix clearly failed on classical music.

This post has been edited by Squeller: Nov 25 2008, 09:39
Go to the top of the page
+Quote Post
halb27
post Nov 25 2008, 09:44
Post #32





Group: Members
Posts: 2435
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



QUOTE (Squeller @ Nov 25 2008, 10:28) *
Basically you are right, but: People with dysfunctional brains who don't find this out themselves aren't the target audience of HA I guess wink.gif ....

Sure, but blind people aren't the target audience either. The non-zoomed view gives all the information we need.


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
melomaniac
post Nov 25 2008, 09:46
Post #33





Group: Members
Posts: 43
Joined: 1-August 08
From: Brussels
Member No.: 56565



QUOTE (Squeller @ Nov 25 2008, 09:28) *
About Helix: Lets not forget Guru's listening test from 2007 where Helix clearly failed on classical music.

I've already posted the link Squeller.
Go to the top of the page
+Quote Post
memomai
post Nov 25 2008, 10:08
Post #34





Group: Members
Posts: 264
Joined: 13-February 05
From: Germany, Kempten
Member No.: 19808



Just confused. Helix worse than lame, Helix better than lame, Fraunhofer better than 3.97??
I'm only waiting that someone says "lossless is lossy", then my confusion is completed.


--------------------
FB2K,APE&LAME
Go to the top of the page
+Quote Post
halb27
post Nov 25 2008, 10:30
Post #35





Group: Members
Posts: 2435
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



QUOTE (memomai @ Nov 25 2008, 11:08) *
Just confused. Helix worse than lame, Helix better than lame, Fraunhofer better than 3.97??
I'm only waiting that someone says "lossless is lossy", then my confusion is completed.

There has always been a tendency at HA that Lame is expected to be seriously superior as compared with other encoders. And listening tests have always been taken too much of a 'proof' for this whereas they contribute experience with encoders in a pretty objective way but only within the restrictions of the samples tested and the listening abilities of the participants. It's the best we can do, but has its restrictions.

Why worry? Isn't it a good thing that all the encoders perform very well on the samples?
As for Lame 3.98.2: isn't it a good thing that it scores so well? All we have known so far is that that it brings improvement over 3.97 for certain classes of problems where 3.97 had a rather weak quality. We did not have a lot of experience that there is no serious regression with 3.98 which is possible. Now we have reason to beleive that this is not the case, we can expect from 3.98 with good reason that 3.98 is a real progress.


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
Alexxander
post Nov 25 2008, 10:44
Post #36





Group: Members
Posts: 463
Joined: 15-November 04
Member No.: 18143



Before anything I have to thank Sebastian again for having conducted this nice MP3 Listening Test @ 128 kbps!

I think I was too hard rating the samples but my results are very similar to the overall results:

CODE
               My Average  Test Results
iTunes 8.0.1      2,45        4,26
Lame 3.98.2       2,94        4,51
l3enc 0.99a       1,00        1,56
Fraunhofer        2,84        4,44
Lame 3.97         2,77        4,28
Helix v5.1        3,20        4,59

"Test Results" are the results of all participants. "My Average" is a simple linear average of the results as I don't remember how to do other type of analysis (too long ago laugh.gif ). Taking out the highest and lowest result of all encoders produces a similar result as presented above. If anyone can tell me which formula to use in MS Excel to get error margin please do.

I'm really surprised an encoder that hasn't been tuned since 2005 gets these good results. I have more samples Helix doing better than Lame 3.98.2 than the other way around allthough differences are small. When doing the Test I noticed clearly 2 encoders were better than the rest and I thought they were the Lame ones crying.gif
Go to the top of the page
+Quote Post
Alexxander
post Nov 25 2008, 10:55
Post #37





Group: Members
Posts: 463
Joined: 15-November 04
Member No.: 18143



QUOTE (halb27 @ Nov 25 2008, 10:30) *
...
Why worry? Isn't it a good thing that all the encoders perform very well on the samples?
...

I'm worried now not because of Helix being very competitive with Lame 3.98.2 with respect to quality but because Helix encodes so much faster and that's very usefull when I encode albums from my lossless archive to take them on the road.

I wonder why Lame doesn't do better compared to Helix having 3 years more of development on its back. I just have included Helix in my foobar2000 Converters list and will play with this one in my preferred bitrange (160-220kbps).
Go to the top of the page
+Quote Post
muaddib
post Nov 25 2008, 11:15
Post #38





Group: Developer
Posts: 398
Joined: 14-October 01
Member No.: 289



It is not good to conclude, from the results of this test, that Helix will be the best option in 160-220 kbps range. You should check quality after encoding to this bitrate.
Go to the top of the page
+Quote Post
Jan S.
post Nov 25 2008, 12:33
Post #39





Group: Admin
Posts: 2550
Joined: 26-September 01
From: Denmark
Member No.: 21



Wouldn't it be possible to compare the variance within each encoder to get an idea of the robustness of each encoder?
Go to the top of the page
+Quote Post
Alexxander
post Nov 25 2008, 12:37
Post #40





Group: Members
Posts: 463
Joined: 15-November 04
Member No.: 18143



QUOTE (muaddib @ Nov 25 2008, 11:15) *
It is not good to conclude, from the results of this test, that Helix will be the best option in 160-220 kbps range. You should check quality after encoding to this bitrate.

This is very obvious. I added Helix to foobar2000 to do just this: compare quality with some songs and samples smile.gif
Go to the top of the page
+Quote Post
Sebastian Mares
post Nov 25 2008, 12:44
Post #41





Group: Members
Posts: 3630
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



QUOTE (Jan S. @ Nov 25 2008, 12:33) *
Wouldn't it be possible to compare the variance within each encoder to get an idea of the robustness of each encoder?


I am not quite sure I understand what you mean.


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
robert
post Nov 25 2008, 13:01
Post #42


LAME developer


Group: Developer
Posts: 788
Joined: 22-September 01
Member No.: 5



I would be more interested in Quartile, instead of Varianz.

http://de.wikipedia.org/wiki/Quantil
Go to the top of the page
+Quote Post
halb27
post Nov 25 2008, 13:04
Post #43





Group: Members
Posts: 2435
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



QUOTE (Alexxander @ Nov 25 2008, 11:55) *
...I just have included Helix in my foobar2000 Converters list and will play with this one in my preferred bitrange (160-220kbps).

You may want to try level's finding about his kind of quality improvement in this bitrate range which you can find in the Helix thread. Quality improvement was not confirmed though by other people.


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
kwanbis
post Nov 25 2008, 13:06
Post #44





Group: Developer (Donating)
Posts: 2362
Joined: 28-June 02
From: Argentina
Member No.: 2425



QUOTE (robert @ Nov 25 2008, 12:01) *
I would be more interested in Quartile, instead of Varianz.

http://de.wikipedia.org/wiki/Quantil

http://en.wikipedia.org/wiki/Quantile

(i think more people knows english wink.gif)


--------------------
MAREO: http://www.webearce.com.ar
Go to the top of the page
+Quote Post
westgroveg
post Nov 25 2008, 13:55
Post #45





Group: Members
Posts: 1236
Joined: 5-October 01
Member No.: 220



If anything the test shows samples where LAME needs improvement at 128 kbps.
Go to the top of the page
+Quote Post
Pio2001
post Nov 25 2008, 14:04
Post #46


Moderator


Group: Super Moderator
Posts: 3936
Joined: 29-September 01
Member No.: 73



QUOTE (melomaniac @ Nov 25 2008, 09:19) *
And finally, I don't have any results where LAME 3.97 is better than 3.98.2.


I do. If Lame 3.98.2 is file 2 and 3.97 is file 5, then sample 8 sounds near-transparent to me with Lame 3.97, not with 3.98.2. I also find sample 11 better with Lame 3.97.
Go to the top of the page
+Quote Post
Sebastian Mares
post Nov 25 2008, 14:47
Post #47





Group: Members
Posts: 3630
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



QUOTE (robert @ Nov 25 2008, 13:01) *
I would be more interested in Quartile, instead of Varianz.

http://de.wikipedia.org/wiki/Quantil


All results are available for download already so you can calculate whatever you wish. Tukey HSD is something around 0.5 IIRC (I'm at work right now and don't have access to the exact value) so the tolerance bars are around 0.25 in each direction.


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
Alex B
post Nov 25 2008, 14:51
Post #48





Group: Members
Posts: 1303
Joined: 14-September 05
From: Helsinki, Finland
Member No.: 24472



QUOTE (westgroveg @ Nov 25 2008, 14:55) *
If anything the test shows samples where LAME needs improvement at 128 kbps.

I think we should analyze the results sample by sample and discuss about the severity of the found problems. It would be useful to find out if certain obvious problems with certain encoders were apparently confirmed by the majority of the testers.

In general, I found the choice of the low anchor a bit problematic. The encoder is clearly badly broken. Obviously the 0.99 alpha version is not the version that was involved when the 128 kbps MP3 = CD quality myth was created. In my experience the release version was already a lot better.

A too bad low anchor can have an adverse effect to the rating scale the testers choose to use. It can make the differences between the contenders appear to be less significant.

For comparison, here are my results:

CODE
% Result file produced by chunky-0.8.4-beta
% ..\chunky.exe --codec-file=..\codecs.txt -n --ratings=results --warn -p 0.05

% Sample Averages:
%    iTunes    L398    Anchor    Fhg    L397    Helix
01.    3.80    2.60    1.00    3.40    1.80    4.30
02.    3.80    2.60    1.40    2.80    2.20    3.10
03.    3.90    3.10    1.00    4.30    3.70    2.90
04.    4.20    4.40    1.00    4.40    3.30    4.40
05.    2.70    3.50    1.00    3.00    3.00    3.80
06.    2.20    3.50    1.00    3.00    3.90    4.00
07.    3.70    4.00    1.00    3.80    2.00    4.00
08.    2.40    4.00    1.00    3.00    4.30    3.00
09.    3.00    3.40    1.00    3.60    3.40    2.50
10.    4.50    4.50    1.00    4.20    4.00    4.50
11.    3.90    2.70    1.00    3.50    3.90    2.40
12.    2.00    3.70    1.00    2.60    3.30    3.60
13.    3.70    3.20    1.00    3.80    2.50    4.00
14.    2.00    3.60    1.00    3.10    3.30    4.00

% Codec averages:
%%%    3.27    3.49    1.03    3.46    3.19    3.61


This post has been edited by Alex B: Nov 25 2008, 15:01


--------------------
http://listening-tests.freetzi.com
Go to the top of the page
+Quote Post
/mnt
post Nov 25 2008, 15:19
Post #49





Group: Members
Posts: 697
Joined: 22-April 06
Member No.: 29877



Just try some Metal tracks on Helix at V60, I guarantee it will struggle.


--------------------
"I never thought I'd see this much candy in one mission!"
Go to the top of the page
+Quote Post
Neasden
post Nov 25 2008, 16:07
Post #50





Group: Banned
Posts: 185
Joined: 1-July 08
Member No.: 55148



/mnt told me that Helix is not gapless, which is to me a serious shortcomming. Another thing is that Helix is not that robust as LAME is. But what is stunning people here is the encoding speed of a encoder it hasn't been worked on for 3 years, while latest fresh LAME is so so much slower to encode!
Go to the top of the page
+Quote Post

9 Pages V  < 1 2 3 4 > » 
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 22nd September 2014 - 11:21