IPB

Welcome Guest ( Log In | Register )

9 Pages V  « < 2 3 4 5 6 > »   
Reply to this topicStart new topic
Public MP3 Listening Test @ 128 kbps - FINISHED
Synthetic Soul
post Nov 25 2008, 21:07
Post #76





Group: Super Moderator
Posts: 4887
Joined: 12-August 04
From: Exeter, UK
Member No.: 16217



QUOTE (guruboolez @ Nov 25 2008, 20:00) *
The biggest advantage of HELIX lies in encoding speed ; manipulating cuesheet and external tool would simply ruin this advantage.
Exactly. If it can't be achieved in normal working mode then I'm not interested.

@Alex B. Possibly common knowledge; however knowledge that I personally did not have, and could only assume.


--------------------
I'm on a horse.
Go to the top of the page
+Quote Post
Sebastian Mares
post Nov 25 2008, 21:10
Post #77





Group: Members
Posts: 3630
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



In case you are interested, here is a quick and dirty "quality distribution" across the samples:



--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
halb27
post Nov 25 2008, 21:13
Post #78





Group: Members
Posts: 2435
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



QUOTE (Jillian @ Nov 25 2008, 20:13) *
I like the part where test result (quality and encode speed) should raise the popularity of Helix, but instead people try to proof that Helix is bad in their test, while the others blame Helix for not support gaplessness.

I second that.
Nobody complained about the samples or a potential bias they might give to some encoders before the test.
A listening test's outcome is seriously influenced by the samples used (and the degree the participants are sensitive towards the issues with them).
But that's a natural thing we have to accept. It's been like that with any prior listening test.

As I wrote personal conclusions are another story, and everybody is doing well to look at the test's details with respect to personal relevance before making decisions about encoder usage. Put it's correct in general for instance what we leaarnt here about Helix' behavior with metal, metal lovers won't like to use Helix.

Unfortunately there's a high degree of over-simplification even in this forum.
Many people like to see the best encoder (in a universal sense), and they expect it to be Lame (and we see again that Lame is great, it's just not the greatest encoder), and they expect that there should
be serious quality differences between encoders in a general sense.

What a pity! We should be glad that we have a variety of excellent encoders to pick from, so that it's also easy to obey to non-quality related properties like gapless playback or encoding speed according to personal demands.

This post has been edited by halb27: Nov 25 2008, 21:16


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
Canar
post Nov 25 2008, 21:14
Post #79





Group: Super Moderator
Posts: 3361
Joined: 26-July 02
From: princegeorge.ca
Member No.: 2796



The point I was trying to make was that though Helix is inherently gapful, it doesn't necessarily need to be, nor do any of the encoders here. If anyone wants to start using it for regular use to hunt for problem samples and still wants gapless, it's possible to do. It's not particularly straightforward to make them gapless, it's just possible.


--------------------
You cannot ABX the rustling of jimmies.
No mouse? No problem.
Go to the top of the page
+Quote Post
halb27
post Nov 25 2008, 21:26
Post #80





Group: Members
Posts: 2435
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



QUOTE (Sebastian Mares @ Nov 25 2008, 22:10) *
In case you are interested, here is a quick and dirty "quality distribution" across the samples:...

Thanks for the graph, very interesting.
Helix keeps well above 4.0 throughout all the samples, Lame 3.98 is getting close to that, and FhG is also not far behind. iTunes and Lame 3.97 are showing several weaknesses of a more serious kind.

Very interesting is the encoders' varying performance on sample 12 (Helix and L3.98 are doing pretty good and the other ones rather bad), and sample 1 (both Lame versions, especially L3.97, are performing quite a bit worse than the other encoders). Sample 6, 7, and 14 show specific weaknesses of iTunes resp. L3.97.

Sample 11 is interesting too as it adresses Helix' metal issue we learnt about in this thread. Yes, it's the weakest sample for Helix, but obviously the majority of testers didn't see a real issue with it. Of course anybody can come to a different individual conclusion. Quality judgement is personal and vital for choosing encoder.

This post has been edited by halb27: Nov 25 2008, 21:45


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
Neasden
post Nov 25 2008, 21:42
Post #81





Group: Banned
Posts: 185
Joined: 1-July 08
Member No.: 55148



in this graph, LAME 3.98.2 seems the more stable encoder...
Go to the top of the page
+Quote Post
TechVsLife
post Nov 25 2008, 21:44
Post #82





Group: Members
Posts: 195
Joined: 29-May 07
Member No.: 43837



@sebastian mares: thanks for the test!

@guruboolez

Helix doesn't please me at all

Your personal quick test shows (if not a typo):

helix:lame398::lame398:lame397. So acc to this, if helix is worse than lame 3.98, then for you lame 3.98 is just as much worse than 3.97. How would you explain?

I take it that it's impossible for any individual's result in the general test to be (statistically) meaningless, because it's repeated and blinded etc. (so it's never a "fluke"). So I'm trying to explain what that means in the context of a statistical tie for the group, esp. in those individual cases, like guruboolez, where there is NOT a tie.

1. all encoders are so close, that individual sensitivities/variances (or quirks, depending on your view of their significance) dominate more, even (or especially) in a group of more sensitive than average listeners. [this could sometimes result in ties for just one listener, if we are not talking about a specific weakness of an encoder repeated across several selections of music but very fine and specific differences, limited to specific sounds or instruments or genres, see #2]

2. the division by music genre (or instruments used etc.) seems important. is there a way to know if there is a division in the results this way, i.e. producing something other than a tie for the whole result set? [this could be true along with #1]. (I esp. care about classical.)

3. is there a way to know whether and to what extent an unduly low anchor masked or could mask substantial quality differences?

p.s. The sample by sample discussion is good and does address #2. The graph by sample is helpful--wonder if there is enough data to make informed statistically sound judgments by music type?)


QUOTE (guruboolez @ Nov 25 2008, 12:04) *
The 2 or 3 first seconds were already ignored in this test.

Interesting results anyway. Conclusion is far from what I reached in the past. I only tested the last 11 samples ; my results are therefore not totally comparable but are significantly different:

iTunes: 2.98
Lame 3.98: 3.30
l3enc: 1.171.00
fraunhofer: 3.51
LAME 3.97: 3.68
Helix: 2.95

This is also the very first test I performed with my new headphone I just owned the day before I started the test. The new sound signature was so different and therefore so disturbing that I didn't bother to spend more than a few minutes to test and give an evaluation to each sample. It was a strange experience for me. I wonder how much a different headphone may change results. But it becomes clear to me that a different material configuration could heavily disturb a listener.

Anyway, even in this highly confused listening environment my results in this test tend to confirm that Helix doesn't please me at all, even with completely different samples / musical genre.


This post has been edited by TechVsLife: Nov 25 2008, 21:55
Go to the top of the page
+Quote Post
benski
post Nov 25 2008, 21:50
Post #83


Winamp Developer


Group: Developer
Posts: 670
Joined: 17-July 05
From: Brooklyn, NY
Member No.: 23375



QUOTE (Sebastian Mares @ Nov 25 2008, 15:10) *
In case you are interested, here is a quick and dirty "quality distribution" across the samples:


I doubt the individual samples have a large enough sample base to make the statistics meaningful, but two things of note:

1) iTunes encoder and LAME 3.98.2 never perform the best on any sample.
2) FhG encoder never performs the worst on any sample. (Except #5 where the graph shows it is tied for the worst)
Go to the top of the page
+Quote Post
Alex B
post Nov 25 2008, 22:06
Post #84





Group: Members
Posts: 1303
Joined: 14-September 05
From: Helsinki, Finland
Member No.: 24472



QUOTE (Neasden @ Nov 25 2008, 22:42) *
in this graph, LAME 3.98.2 seems the more stable encoder...

As halb27 said, just by looking the picture it is obvious that LAME 3.98.2 has a problem only with the sample no 1. It seems to produce good quality with all other samples. It should be possible to LAME developers to fix this "sample 1" problem because the other three encoders can handle it better. Though, I could name a few other samples that are especially problematic for LAME.

It is also obvious that Helix did not fail with any of the samples (it didn't go below 4). Personally, I didn't like how it handled the samples 3, 9 and 11. (In my results Helix was the worst encoder with these samples).


--------------------
http://listening-tests.freetzi.com
Go to the top of the page
+Quote Post
Sebastian Mares
post Nov 25 2008, 22:18
Post #85





Group: Members
Posts: 3630
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



The graphs for all samples are available on the results page. I will add the corresponding text tomorrow since it's 22 o'clock and I just finished cooking. tongue.gif


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
singaiya
post Nov 25 2008, 22:21
Post #86





Group: Members
Posts: 365
Joined: 21-November 02
Member No.: 3830



Is anybody else not surprised that each contender is statistically tied? It was the same in the multi-format test at 128 kbps from 2005.

I'm wondering if testing at lower bitrates will increase separation in rankings. Regardless, I'd like to test 96 kbps next instead of 80 kbps.
Go to the top of the page
+Quote Post
halb27
post Nov 25 2008, 22:25
Post #87





Group: Members
Posts: 2435
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



QUOTE (Sebastian Mares @ Nov 25 2008, 23:18) *
The graphs for all samples are available on the results page. I will add the corresponding text tomorrow since it's 22 o'clock and I just finished cooking. tongue.gif

Thanks a lot for your hard work. Enjoy your meal.


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
Sunhillow
post Nov 25 2008, 22:35
Post #88





Group: Members (Donating)
Posts: 483
Joined: 13-October 01
From: Stuttgart
Member No.: 286



Thank you for this great checkup, Sebastian! I think it shows that we can be comfortable with bitrates in the -V3 to -V2 range :-)

Now enjoy a Tannenzäpfle with your meal...
Go to the top of the page
+Quote Post
Zilog Jones
post Nov 25 2008, 22:53
Post #89





Group: Members
Posts: 41
Joined: 29-April 07
Member No.: 43028



I, as many others was also very surprised by the results. Seeing how well Helix coped with the electronic tracks (sample 12 was the only one where I could ABX 5 out of 6 samples, with Helix being the only one I couldn't) and considering the speed of the encoder I am definately going to try it out on my FLAC transcodes for my MP3 player instead of LAME 3.98.2. If it wasn't for this test I would have never bothered considering any other encoder, so thanks! smile.gif

Also, do you have the full artist names and titles for all the samples? I know sample 12 is Kalifornia by Fatboy Slim (from You've Come a Long Way, Baby), and Tom's Diner is by Suzanne Vega, but after that I'm lost. Sample 01 certainly sounds like it could be a Nobuo Uematsu composition (he did the music for most the Final Fantasy games), but it sounds like PC MIDI which is a bit odd.
Go to the top of the page
+Quote Post
sizetwo
post Nov 25 2008, 23:16
Post #90





Group: Members
Posts: 143
Joined: 22-April 03
From: Kristiansand
Member No.: 6114



Without adding fuel to the fire, I think its strange reading some of the comments to this test. As the forum is so hellbent on factual tests and ABX'ing, and when the result in a way contradicts the paradigm of the forum, a lot of people start questioning it. Its almost as if though there is a preference of encoder, and its ... not... Helix... Its as if ... some people really like defending LAME. Wow, I would have never thought. No, honestly though, thanks for the test Sebastian.

Lets look at the facts of the test; the proof is in the pudding. Does it mean that Helix is >= LAME at 128kbps? Yes, apparently it is. If the forum people really want people to use cold hard facts when making a claim; well here it is. Now get over it. Seriously. If we want to yell "ABX and ABC" at people making encoder claims, we really need to be content with the results we're given. Sometimes LAME doesn't yield a superior result, and sometimes it does. Does that mean we have all have to switch to Helix ? Absolutely not. But lets not turn the replies into some frantic strange twisted sales pitch for LAME (free as it is), as it seems that some people want it to be.

Anyway, it certainly was a baffling result.
Go to the top of the page
+Quote Post
Sebastian Mares
post Nov 25 2008, 23:24
Post #91





Group: Members
Posts: 3630
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



QUOTE (Zilog Jones @ Nov 25 2008, 22:53) *
I, as many others was also very surprised by the results. Seeing how well Helix coped with the electronic tracks (sample 12 was the only one where I could ABX 5 out of 6 samples, with Helix being the only one I couldn't) and considering the speed of the encoder I am definately going to try it out on my FLAC transcodes for my MP3 player instead of LAME 3.98.2. If it wasn't for this test I would have never bothered considering any other encoder, so thanks! smile.gif

Also, do you have the full artist names and titles for all the samples? I know sample 12 is Kalifornia by Fatboy Slim (from You've Come a Long Way, Baby), and Tom's Diner is by Suzanne Vega, but after that I'm lost. Sample 01 certainly sounds like it could be a Nobuo Uematsu composition (he did the music for most the Final Fantasy games), but it sounds like PC MIDI which is a bit odd.


Sample names and sources are now on the results page. smile.gif

QUOTE (sizetwo @ Nov 25 2008, 23:16) *
Does it mean that Helix is >= LAME at 128kbps?


Statistically, for the people who tested and for the given samples, it is on par with LAME, not better.

This post has been edited by Sebastian Mares: Nov 26 2008, 23:39


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
DigitalDictator
post Nov 25 2008, 23:30
Post #92





Group: Members
Posts: 313
Joined: 9-August 02
From: SoFo
Member No.: 3002



I've been asking this a couple of times, but I guess there's no answer to it - is it possible to tune Helix further, since it seems to be some headroom for it?

If not, can better quality be achieved by fiddling with the command line switches? Earlier on, there were different command lines for Helix floating around here at HA.
Go to the top of the page
+Quote Post
sizetwo
post Nov 25 2008, 23:34
Post #93





Group: Members
Posts: 143
Joined: 22-April 03
From: Kristiansand
Member No.: 6114



QUOTE
Statistically, for the people who tested and for the given samples, it is on par with LAME, not better.


Sorry then, Helix == Lame at 128kbps. End of discussion... ? Probably not.
Go to the top of the page
+Quote Post
guruboolez
post Nov 25 2008, 23:52
Post #94





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



QUOTE (sizetwo @ Nov 26 2008, 00:34) *
Sorry then, Helix == Lame at 128kbps.

No. It's Helix == Lame at 128kbps for the people who tested and for the given samples - as Sebastian just said it. Nothing more.
Unlike you, I don't see anybody defending LAME in this thread.

A listening test doesn't give an answer. It's just a lead to further investigation and a brick to build an encoder's reputation. LAME's positive reputation tooks years to be constructed and it wasn't done by one listening test. I don't think HELIX is currently as trustable as LAME. A possible collective experience may help to get a better vision of HELIX quality and flaws. This experience will make the pudding bigger and the proof clearer.

This post has been edited by guruboolez: Nov 25 2008, 23:54
Go to the top of the page
+Quote Post
kwanbis
post Nov 26 2008, 00:36
Post #95





Group: Developer (Donating)
Posts: 2362
Joined: 28-June 02
From: Argentina
Member No.: 2425



QUOTE (guruboolez @ Nov 25 2008, 22:52) *
No. It's Helix == Lame at 128kbps for the people who tested and for the given samples - as Sebastian just said it. Nothing more. Unlike you, I don't see anybody defending LAME in this thread.

A listening test doesn't give an answer. It's just a lead to further investigation and a brick to build an encoder's reputation. LAME's positive reputation tooks years to be constructed and it wasn't done by one listening test. I don't think HELIX is currently as trustable as LAME. A possible collective experience may help to get a better vision of HELIX quality and flaws. This experience will make the pudding bigger and the proof clearer.
100% with you guru.

This post has been edited by kwanbis: Nov 26 2008, 00:36


--------------------
MAREO: http://www.webearce.com.ar
Go to the top of the page
+Quote Post
/mnt
post Nov 26 2008, 00:49
Post #96





Group: Members
Posts: 697
Joined: 22-April 06
Member No.: 29877



I have posted some ABX logs and samples of tracks that shows Helix's major flaws.


--------------------
"I never thought I'd see this much candy in one mission!"
Go to the top of the page
+Quote Post
sld
post Nov 26 2008, 05:14
Post #97





Group: Members
Posts: 1017
Joined: 4-March 03
From: Singapore
Member No.: 5312



QUOTE (sizetwo @ Nov 26 2008, 06:16) *
Does it mean that Helix is >= LAME at 128kbps? Yes, apparently it is.

You are probably the type that derives satisfaction from counting all those "smug, self-satisfied, self-proclaimed intellectuals" wrong. Unfortunately, your claim can never use ">=" to compare the encoders. As Guru interpreted the graphs correctly, the comparison to make is "==" .

You should have brought in your peers (yourself too) to inflate the sample size (no. of participants), so that the magical black bars decrease in length, and that you may have a shot at using ">=" instead of "==" .
Go to the top of the page
+Quote Post
singaiya
post Nov 26 2008, 05:46
Post #98





Group: Members
Posts: 365
Joined: 21-November 02
Member No.: 3830



QUOTE (sld @ Nov 25 2008, 20:14) *
You should have brought in your peers (yourself too) to inflate the sample size (no. of participants), so that the magical black bars decrease in length


That's what I thought happens too, but it seems not to have had an effect: If you look at the first sample which had 39 listeners, the bars are about as long as the second sample which had 26 listeners, and definitely longer than the third sample which also had 26 listeners.
Go to the top of the page
+Quote Post
JasonQ
post Nov 26 2008, 06:18
Post #99





Group: Members
Posts: 43
Joined: 12-October 07
Member No.: 47794



Good test. Good to see that Helix had a solid showing. What I take away from this is that Lame 3.98 is rock solid. It seems to be a bit more consistent then 3.97, and should probably be used instead. If you want a fast encoder, Helix or Fraunhofer will do the trick with no worries.

This post has been edited by JasonQ: Nov 26 2008, 06:28
Go to the top of the page
+Quote Post
sizetwo
post Nov 26 2008, 07:18
Post #100





Group: Members
Posts: 143
Joined: 22-April 03
From: Kristiansand
Member No.: 6114



QUOTE (guruboolez @ Nov 25 2008, 15:52) *
QUOTE (sizetwo @ Nov 26 2008, 00:34) *

Sorry then, Helix == Lame at 128kbps.

No. It's Helix == Lame at 128kbps for the people who tested and for the given samples - as Sebastian just said it. Nothing more.
Unlike you, I don't see anybody defending LAME in this thread.

A listening test doesn't give an answer. It's just a lead to further investigation and a brick to build an encoder's reputation. LAME's positive reputation tooks years to be constructed and it wasn't done by one listening test. I don't think HELIX is currently as trustable as LAME. A possible collective experience may help to get a better vision of HELIX quality and flaws. This experience will make the pudding bigger and the proof clearer.


You make a valid point, but then I think this should also be the mantra of any listening test; the result is only valid for the people who did the test, its not a qualitative indicator to the format/encoder. But what you are saying here is that what we need is quantity to get the "real" proof, in other words, there were few participants ? My issue is this: Had LAME (3.97 or 3.98.2) come out on top, none of this (IMHO) would have happened. Am I right or has the box of crazypills been opened again.
Go to the top of the page
+Quote Post

9 Pages V  « < 2 3 4 5 6 > » 
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 22nd September 2014 - 19:32