Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Public MP3 Listening Test @ 128 kbps - FINISHED (Read 192995 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #75
The biggest advantage of HELIX lies in encoding speed ; manipulating cuesheet and external tool would simply ruin this advantage.
Exactly.  If it can't be achieved in normal working mode then I'm not interested.

@Alex B.  Possibly common knowledge; however knowledge that I personally did not have, and could only assume.
I'm on a horse.

 

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #77
I like the part where test result (quality and encode speed) should raise the popularity of Helix, but instead people try to proof that Helix is bad in their test, while the others blame Helix for not support gaplessness.

I second that.
Nobody complained about the samples or a potential bias they might give to some encoders before the test.
A listening test's outcome is seriously influenced by the samples used (and the degree the participants are sensitive towards the issues with them).
But that's a natural thing we have to accept. It's been like that with any prior listening test.

As I wrote personal conclusions are another story, and everybody is doing well to look at the test's details with respect to personal relevance before making decisions about encoder usage. Put it's correct in general for instance what we leaarnt here about Helix' behavior with metal, metal lovers won't like to use Helix.

Unfortunately there's a high degree of over-simplification even in this forum.
Many people like to see the best encoder (in a universal sense), and they expect it to be Lame (and we see again that Lame is great, it's just not the greatest encoder), and they expect that there should
be serious quality differences between encoders in a general sense.

What a pity! We should be glad that we have a variety of excellent encoders to pick from, so that it's also easy to obey to non-quality related properties like gapless playback or encoding speed according to personal demands.
lame3995o -Q1.7 --lowpass 17

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #78
The point I was trying to make was that though Helix is inherently gapful, it doesn't necessarily need to be, nor do any of the encoders here. If anyone wants to start using it for regular use to hunt for problem samples and still wants gapless, it's possible to do. It's not particularly straightforward to make them gapless, it's just possible.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #79
In case you are interested, here is a quick and dirty "quality distribution" across the samples:...

Thanks for the graph, very interesting.
Helix keeps well above 4.0 throughout all the samples, Lame 3.98 is getting close to that, and FhG is also not far behind. iTunes and Lame 3.97 are showing several weaknesses of a more serious kind.

Very interesting is the encoders' varying performance on sample 12 (Helix and L3.98 are doing pretty good and the other ones rather bad), and sample 1 (both Lame versions, especially L3.97, are performing quite a bit worse than the other encoders). Sample 6, 7, and 14 show specific weaknesses of iTunes resp. L3.97.

Sample 11 is interesting too as it adresses Helix' metal issue we learnt about in this thread. Yes, it's the weakest sample for Helix, but obviously the majority of testers didn't see a real issue with it. Of course anybody can come to a different individual conclusion. Quality judgement is personal and vital for choosing encoder.
lame3995o -Q1.7 --lowpass 17

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #80
in this graph, LAME 3.98.2 seems the more stable encoder...

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #81
@sebastian mares: thanks for the test!

@guruboolez

Helix doesn't please me at all

Your personal quick test shows (if not a typo):

helix:lame398::lame398:lame397.  So acc to this, if helix is worse than lame 3.98, then for you lame 3.98 is just as much worse than 3.97.  How would you explain? 

I take it that it's impossible for any individual's result in the general test to be (statistically) meaningless, because it's repeated and blinded etc. (so it's never a "fluke").  So I'm trying to explain what that means in the context of a statistical tie for the group, esp. in those individual cases, like guruboolez, where there is NOT a tie.

1. all encoders are so close, that individual sensitivities/variances (or quirks, depending on your view of their significance) dominate more, even (or especially) in a group of more sensitive than average listeners.  [this could sometimes result in ties for just one listener, if we are not talking about a specific weakness of an encoder repeated across several selections of music but very fine and specific differences, limited to specific sounds or instruments or genres, see #2]

2.  the division by music genre (or instruments used etc.) seems important.  is there a way to know if there is a division in the results this way, i.e. producing something other than a tie for the whole result set? [this could be true along with #1]. (I esp. care about classical.)

3. is there a way to know whether and to what extent an unduly low anchor masked or could mask substantial quality differences?

p.s. The sample by sample discussion is good and does address #2.  The graph by sample is helpful--wonder if there is enough data to make informed statistically sound judgments by music type?)


The 2 or 3 first seconds were already ignored in this test.

Interesting results anyway. Conclusion is far from what I reached in the past. I only tested the last 11 samples ; my results are therefore not totally comparable but are significantly different:

iTunes: 2.98
Lame 3.98: 3.30
l3enc: 1.171.00
fraunhofer: 3.51
LAME 3.97: 3.68
Helix: 2.95

This is also the very first test I performed with my new headphone I just owned the day before I started the test. The new sound signature was so different and therefore so disturbing that I didn't bother to spend more than a few minutes to test and give an evaluation to each sample. It was a strange experience for me. I wonder how much a different headphone may change results. But it becomes clear to me that a different material configuration could heavily disturb a listener.

Anyway, even in this highly confused listening environment my results in this test tend to confirm that Helix doesn't please me at all, even with completely different samples / musical genre.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #82
In case you are interested, here is a quick and dirty "quality distribution" across the samples:


I doubt the individual samples have a large enough sample base to make the statistics meaningful, but two things of note:

1) iTunes encoder and LAME 3.98.2 never perform the best on any sample.
2) FhG encoder never performs the worst on any sample. (Except #5 where the graph shows it is tied for the worst)

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #83
in this graph, LAME 3.98.2 seems the more stable encoder...

As halb27 said, just by looking the picture it is obvious that LAME 3.98.2 has a problem only with the sample no 1. It seems to produce good quality with all other samples. It should be possible to LAME developers to fix this "sample 1" problem because the other three encoders can handle it better. Though, I could name a few other samples that are especially problematic for LAME.

It is also obvious that Helix did not fail with any of the samples (it didn't go below 4). Personally, I didn't like how it handled the samples 3, 9 and 11. (In my results Helix was the worst encoder with these samples).


Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #85
Is anybody else not surprised that each contender is statistically tied? It was the same in the multi-format test at 128 kbps from 2005.

I'm wondering if testing at lower bitrates will increase separation in rankings. Regardless, I'd like to test 96 kbps next instead of 80 kbps.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #86
The graphs for all samples are available on the results page. I will add the corresponding text tomorrow since it's 22 o'clock and I just finished cooking.

Thanks a lot for your hard work. Enjoy your meal.
lame3995o -Q1.7 --lowpass 17

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #87
Thank you for this great checkup, Sebastian! I think it shows that we can be comfortable with bitrates in the -V3 to -V2 range  :-)

Now enjoy a Tannenzäpfle with your meal...

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #88
I, as many others was also very surprised by the results. Seeing how well Helix coped with the electronic tracks (sample 12 was the only one where I could ABX 5 out of 6 samples, with Helix being the only one I couldn't) and considering the speed of the encoder I am definately going to try it out on my FLAC transcodes for my MP3 player instead of LAME 3.98.2. If it wasn't for this test I would have never bothered considering any other encoder, so thanks! 

Also, do you have the full artist names and titles for all the samples? I know sample 12 is Kalifornia by Fatboy Slim (from You've Come a Long Way, Baby), and Tom's Diner is by Suzanne Vega, but after that I'm lost. Sample 01 certainly sounds like it could be a Nobuo Uematsu composition (he did the music for most the Final Fantasy games), but it sounds like PC MIDI which is a bit odd.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #89
Without adding fuel to the fire, I think its strange reading some of the comments to this test. As the forum is so hellbent on factual tests and ABX'ing, and when the result in a way contradicts the paradigm of the forum, a lot of people start questioning it.  Its almost as if though there is a preference of encoder, and its ... not... Helix... Its as if ... some people really like defending LAME. Wow, I would have never thought.  No, honestly though, thanks for the test Sebastian. 

Lets look at the facts of the test; the proof is in the pudding. Does it mean that Helix is >= LAME at 128kbps? Yes, apparently it is.  If the forum people really want people to use cold hard facts when making a claim; well here it is.  Now get over it. Seriously.  If we want to yell "ABX and ABC" at people making encoder claims, we really need to be content with the results we're given.  Sometimes LAME doesn't yield a superior result, and sometimes it does.  Does that mean we have all have to switch to Helix ? Absolutely not.  But lets not turn the replies into some frantic strange twisted sales pitch for LAME (free as it is), as it seems that some people want it to be.

Anyway, it certainly was a baffling result.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #90
I, as many others was also very surprised by the results. Seeing how well Helix coped with the electronic tracks (sample 12 was the only one where I could ABX 5 out of 6 samples, with Helix being the only one I couldn't) and considering the speed of the encoder I am definately going to try it out on my FLAC transcodes for my MP3 player instead of LAME 3.98.2. If it wasn't for this test I would have never bothered considering any other encoder, so thanks! 

Also, do you have the full artist names and titles for all the samples? I know sample 12 is Kalifornia by Fatboy Slim (from You've Come a Long Way, Baby), and Tom's Diner is by Suzanne Vega, but after that I'm lost. Sample 01 certainly sounds like it could be a Nobuo Uematsu composition (he did the music for most the Final Fantasy games), but it sounds like PC MIDI which is a bit odd.


Sample names and sources are now on the results page.

Does it mean that Helix is >= LAME at 128kbps?


Statistically, for the people who tested and for the given samples, it is on par with LAME, not better.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #91
I've been asking this a couple of times, but I guess there's no answer to it - is it possible to tune Helix further, since it seems to be some headroom for it?

If not, can better quality be achieved by fiddling with the command line switches? Earlier on, there were different command lines for Helix floating around here at HA.
//From the barren lands of the Northsmen

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #92
Quote
Statistically, for the people who tested and for the given samples, it is on par with LAME, not better.


Sorry then, Helix == Lame at 128kbps. End of discussion... ? Probably not.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #93
Sorry then, Helix == Lame at 128kbps.

No. It's Helix == Lame at 128kbps for the people who tested and for the given samples - as Sebastian just said it. Nothing more.
Unlike you, I don't see anybody defending LAME in this thread.

A listening test doesn't give an answer. It's just a lead to further investigation and a brick to build an encoder's reputation. LAME's positive reputation tooks years to be constructed and it wasn't done by one listening test. I don't think HELIX is currently as trustable as LAME. A possible collective experience may help to get a better vision of HELIX quality and flaws. This experience will make the pudding bigger and the proof clearer.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #94
No. It's Helix == Lame at 128kbps for the people who tested and for the given samples - as Sebastian just said it. Nothing more. Unlike you, I don't see anybody defending LAME in this thread.

A listening test doesn't give an answer. It's just a lead to further investigation and a brick to build an encoder's reputation. LAME's positive reputation tooks years to be constructed and it wasn't done by one listening test. I don't think HELIX is currently as trustable as LAME. A possible collective experience may help to get a better vision of HELIX quality and flaws. This experience will make the pudding bigger and the proof clearer.
100% with you guru.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #95
I have posted some ABX logs and samples of tracks that shows Helix's major flaws.
"I never thought I'd see this much candy in one mission!"

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #96
Does it mean that Helix is >= LAME at 128kbps? Yes, apparently it is.

You are probably the type that derives satisfaction from counting all those "smug, self-satisfied, self-proclaimed intellectuals" wrong. Unfortunately, your claim can never use ">=" to compare the encoders. As Guru interpreted the graphs correctly, the comparison to make is "==" .

You should have brought in your peers (yourself too) to inflate the sample size (no. of participants), so that the magical black bars decrease in length, and that you may have a shot at using ">=" instead of "==" .

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #97
You should have brought in your peers (yourself too) to inflate the sample size (no. of participants), so that the magical black bars decrease in length


That's what I thought happens too, but it seems not to have had an effect: If you look at the first sample which had 39 listeners, the bars are about as long as the second sample which had 26 listeners, and definitely longer than the third sample which also had 26 listeners.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #98
Good test.  Good to see that Helix had a solid showing.  What I take away from this is that Lame 3.98 is rock solid.  It seems to be a bit more consistent then 3.97, and should probably be used instead.  If you want a fast encoder, Helix or Fraunhofer will do the trick with no worries.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #99

Sorry then, Helix == Lame at 128kbps.

No. It's Helix == Lame at 128kbps for the people who tested and for the given samples - as Sebastian just said it. Nothing more.
Unlike you, I don't see anybody defending LAME in this thread.

A listening test doesn't give an answer. It's just a lead to further investigation and a brick to build an encoder's reputation. LAME's positive reputation tooks years to be constructed and it wasn't done by one listening test. I don't think HELIX is currently as trustable as LAME. A possible collective experience may help to get a better vision of HELIX quality and flaws. This experience will make the pudding bigger and the proof clearer.


You make a valid point, but then I think this should also be the mantra of any listening test; the result is only valid for the people who did the test, its not a qualitative indicator to the format/encoder.  But what you are saying here is that what we need is quantity to get the "real" proof, in other words, there were few participants ? My issue is this: Had LAME (3.97 or 3.98.2) come out on top, none of this (IMHO) would have happened. Am I right or has the box of crazypills been opened again.