IPB

Welcome Guest ( Log In | Register )

9 Pages V  « < 3 4 5 6 7 > »   
Reply to this topicStart new topic
Public MP3 Listening Test @ 128 kbps - FINISHED
guruboolez
post Nov 26 2008, 07:55
Post #101





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



QUOTE (sizetwo @ Nov 26 2008, 08:18) *
But what you are saying here is that what we need is quantity to get the "real" proof, in other words, there were few participants ?

Participants, but also samples - or what I said before, experience. And even there, you won't get any real proof, or universal answer. The very best encoder on the world won't necessary please every single user. When people here will start using HELIX, reporting their good feelings and also their bad samples; when developers will start fixing those issues; then HELIX will for sure become a true alternative, or maybe the obvious choice for MP3 encoding at x bitrate. Trust is something that need a long time to grow. LAME is not the best MP3 encoder but the most tested and therefore the most trustable. LAME not better but simply safe (to a certain point).
Anyway, if HELIX really please some people here, I really suggest them to start using it. Their experience will be for sure interesting for all other possible users.

This post has been edited by guruboolez: Nov 26 2008, 07:57
Go to the top of the page
+Quote Post
Sebastian Mares
post Nov 26 2008, 09:06
Post #102





Group: Members
Posts: 3629
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



QUOTE (singaiya @ Nov 26 2008, 05:46) *
QUOTE (sld @ Nov 25 2008, 20:14) *

You should have brought in your peers (yourself too) to inflate the sample size (no. of participants), so that the magical black bars decrease in length


That's what I thought happens too, but it seems not to have had an effect: If you look at the first sample which had 39 listeners, the bars are about as long as the second sample which had 26 listeners, and definitely longer than the third sample which also had 26 listeners.


It does have an effect. I never said it is the only thing that influences the error margins. wink.gif


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
halb27
post Nov 26 2008, 10:04
Post #103





Group: Members
Posts: 2424
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



QUOTE (guruboolez @ Nov 26 2008, 00:52) *
...Unlike you, I don't see anybody defending LAME in this thread. ... I don't think HELIX is currently as trustable as LAME. A possible collective experience may help to get a better vision of HELIX quality and flaws. This experience will make the pudding bigger and the proof clearer.

Well, as you can learn from recent posts there are some people feeling that there are posters here defending Lame in an inadequate way (though there is nothing to defend). Chance is high they wouldn't do something similar if Lame had come out clear on top. I am one of these who feel like that.
And you are one of those Lame defenders, and you do it in a way I really dislike. What you say isn't wrong, it's just killer statements which if taken seriously makes this test worthless.

It's true, and you can read it for instance in my posts in this thread, that such a test just contributes to the experience on encoders. It is one of the most objective contributions of a considerable amount of participants with higher demands on encoder quality who spent a considerable amount of time evaluating this. It's the average judgement of active HA members (and comparable people) on the samples tested. Not more. Not less.

You are trying to relativate Helix' result by throwing doubts on the way we can trust Helix, and on the other hand you try to give special merits to Lame because you think we can trust Lame more. This simply isn't fair. And it's even a bad argument, cause Lame 3.98 isn't Lame 3.97 and when going back in time we had significant changes in Lame technology when looking at the Lame history. Moreover what is this trust in Lame good for if for instance with Lame 3.97 the 'sandpaper problem' came up? We just should stick to the real experience we have with encoders. The trust speech without hard facts is the non-audio variant of the warm-fuzzy feeling speech.

I like the way AlexB talked about his judgement on Helix behavior on 3 samples which he didn't like. He says what he felt, but in a way which respects the results of the test (which is the judgement of all the participants).

If we look at the test results IMO we can conclude the following for practical purposes:

a) the overall outcome of the encoders averaged over all the samples doesn't give any hint which encoder to use

b) the detailed outcome of the encoders on the individual samples gives some hints which encoder to use:

b1) iTunes and Lame 3.97 aren't attractive candidates for encoding (things can look different in case those samples where these encoders perform weakly are not very relevant for the individual choosing the encoder)

b2) Lame 3.98, Helix, and FhG are all good candidates to use. Which encoder is 'best' is personal and can partially be answered by figuring out which samples are individually most relevant and looking at these encoders' outcome on these samples. Best is backing things up by additional personal tests with favorite music. Not mentioning non-audio quality related topics which are relevant too for encoder choice, but in a very individual way.

This post has been edited by halb27: Nov 26 2008, 10:19


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
halb27
post Nov 26 2008, 10:34
Post #104





Group: Members
Posts: 2424
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



QUOTE (/mnt @ Nov 26 2008, 01:49) *
I have posted some ABX logs and samples of tracks that shows Helix's major flaws.

We know now that Helix has major flaws for you with metal, so chance is high that this is relevant to other metal lovers too. It is also backed up by the test where Helix shows its worst behavior with metal.

This post has been edited by halb27: Nov 26 2008, 10:35


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
Synthetic Soul
post Nov 26 2008, 10:53
Post #105





Group: Super Moderator
Posts: 4887
Joined: 12-August 04
From: Exeter, UK
Member No.: 16217



I think guru has answered most of the nonsense far better than I ever could, but I can't let the statements below go without some comment.

QUOTE (halb27 @ Nov 25 2008, 20:13) *
QUOTE (Jillian @ Nov 25 2008, 20:13) *
I like the part where test result (quality and encode speed) should raise the popularity of Helix, but instead people try to proof that Helix is bad in their test, while the others blame Helix for not support gaplessness.
I second that.Nobody complained about the samples or a potential bias they might give to some encoders before the test.
A listening test's outcome is seriously influenced by the samples used (and the degree the participants are sensitive towards the issues with them).
It's incredulous to me that you think that we may sing the praises of Helix but not mention any of the cons. It is obvious, following the result of this test that members are going to be drawn to Helix: we have had people suggesting that it become the new HA recommendation solely from the results of this test, and also members stating that it is proved better than LAME. I think that it is important for members to consider the reality, pros and cons.

As for you halb27, are you complaining that we are not complaining enough or too much? Should we start complaining about the samples or bias? Is this even relevant to your point?

QUOTE (halb27 @ Nov 26 2008, 09:04) *
QUOTE (guruboolez @ Nov 26 2008, 00:52) *

...Unlike you, I don't see anybody defending LAME in this thread. ... I don't think HELIX is currently as trustable as LAME. A possible collective experience may help to get a better vision of HELIX quality and flaws. This experience will make the pudding bigger and the proof clearer.
Well, as you can learn from recent posts there are some people feeling that there are posters here defending Lame in an inadequate way (though there is nothing to defend). Chance is high they wouldn't do something similar if Lame had come out clear on top. I am one of these who feel like that.
And you are one of those Lame defenders, and you do it in a way I really dislike. What you say isn't wrong, it's just killer statements which if taken seriously makes this test worthless.
I find this attack most strange. How many LAME users are there compared to Helix users? Which encoder do you think has had the most testing? Or do you feel that these fourteen samples are enough to usurp the thousands of tracks that LAME users have thrown at LAME?

I just don't get it.

I don't think that you should see it as LAME fanboys people shooting down Helix for no reason; I would rather see it as users who have shown a fresh interest in Helix attempting to make an informed decision.


--------------------
I'm on a horse.
Go to the top of the page
+Quote Post
halb27
post Nov 26 2008, 11:24
Post #106





Group: Members
Posts: 2424
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



QUOTE (Synthetic Soul @ Nov 26 2008, 11:53) *
...As for you halb27, are you complaining (about sample bias) that we are not complaining enough or too much? Should we start complaining about the samples or bias?

I didn't complain, and IMO nobody should do (or he should have done it when it was about sample selection if there had been some concerns). What I'm trying to say is: We should take the test as it is. There is a tendency in this thread by some posters that sound like lowering the Helix results. This isn't good. Look at for instance Helix' behavior on metal. Ít is reflected in the test. And it's okay to provide additional warnings on this from people who have experience in this field. But am I over-sensitive when I feel the way it's done has a tendency to bring down Helix in a more general way? May be I am, but that's what I feel about it. And it looks like I'm not the only one.
BTW I personally don't use Helix (I'm personally considering converting from Lame to FhG), but I can't see the catastrophe when Helix gets some attraction. After all it seems to be a good encoder (okay, not so much for metal and hard rock). Maybe some warning should be given that we can't expect any Helix development (guru gave this hint already) and have to take Helix as is.
But I don't expect further FhG stereo mp3 development as well. I don't care when I'm happy with what I got. Brings even some relief not having to care about new versions. We can be happy with Lame being developed further, but we can be happy with Lame 3.98. mp3 development has pretty much reached its good end, as shown in this test.

This post has been edited by halb27: Nov 26 2008, 11:29


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
Gabriel
post Nov 26 2008, 11:54
Post #107


LAME developer


Group: Developer
Posts: 2950
Joined: 1-October 01
From: Nanterre, France
Member No.: 138



QUOTE (Sebastian Mares @ Nov 25 2008, 22:10) *
In case you are interested, here is a quick and dirty "quality distribution" across the samples:


Would it be possible for you to include this graph within the results page?

Btw, question for the audience: What are the relative speeds of FhG/Helix/Lame ?
edit: sorry, speed is already mentioned within the test results

This post has been edited by Gabriel: Nov 26 2008, 11:57
Go to the top of the page
+Quote Post
Sebastian Mares
post Nov 26 2008, 12:06
Post #108





Group: Members
Posts: 3629
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



Sure, can be done when I get home. smile.gif


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
Synthetic Soul
post Nov 26 2008, 12:14
Post #109





Group: Super Moderator
Posts: 4887
Joined: 12-August 04
From: Exeter, UK
Member No.: 16217



QUOTE (halb27 @ Nov 26 2008, 10:24) *
...but I can't see the catastrophe when Helix gets some attraction. After all it seems to be a good encoder (okay, not so much for metal and hard rock). Maybe some warning should be given that we can't expect any Helix development (guru gave this hint already) and have to take Helix as is.
I haven't seen anybody suggesting that it is catastrophic. I think the attention has been very positive on the whole. Very positive, given that many people had probably never heard of or tested the damn thing!

For my part I think that Helix's results are of great interest. So much interest that I considered running some tests of my own; however, I really enjoy the fact that I can play LAME MP3s gaplessly with foobar, when my Creative Nano fails at this I find it really jarring. The fact that Helix cannot currently do this natively (please, no-one bother pointing out Canar's tongue in cheek suggestion) is a major drawback in my eyes. I'm not saying that it is not a minor fix.

I am not one of these members that is willing to encode every track with various encoders at various settings to see which makes a better job of it. I decided upon LAME -V5 a couple of years back and I stick with it. That's not to say that I can't change, but I don't have the time to be so picky when encoding new albums.

Now, that is not to say that Helix will never be a contender. It is open source, and improvements can be made, if anyone cares to undertake it.

I'm very much in favour of some positive attention to Helix - as you rightly point out, the more the merrier - but I'm not in favour in glossing over its failings just because it's in vogue in November 2008.

This post has been edited by Synthetic Soul: Nov 26 2008, 12:27


--------------------
I'm on a horse.
Go to the top of the page
+Quote Post
sizetwo
post Nov 26 2008, 12:56
Post #110





Group: Members
Posts: 143
Joined: 22-April 03
From: Kristiansand
Member No.: 6114



Derived from this test and the consequent forum postings, this is what I have learned about listening tests:

1: Whichever encoder(s) comes out on top of a test does not indicate that it is a superior encoder at that bitrate, regardless, as in this case, if its a tie.
2: The samplesize and participants need be increased as well as some form of participant knowledge on audiocompression and what to listen for (artifacts).
3: The samples selected for a test will never be enough to make an encoder "safe", in other words we will not be able to know that the samples are representative for the various types of music one would imagine to compress.
4: The test results should be interpreted in a highly subjective manner, as everyone seems to interpret the results differently.
5: The final outcome is for most people to end up saying "test for yourself", thus negating the empirical evidence we can draw from such a test, and ultimately making it rather pointless, other then saying that people should stay away from the low anchor.

The question remains, how then can a test be deviced so that it can yield results that are in fact conclusive and create a form of intersubjective opinion regarding the prefered codec at a specific bitrate ?
Go to the top of the page
+Quote Post
Sebastian Mares
post Nov 26 2008, 13:12
Post #111





Group: Members
Posts: 3629
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



The problem which a lot of people do not understand is that you cannot generalize the results by saying "Encoder x is the best" when a finite number of participants test a finite number of samples at a certain bitrate.
If you have an average hearing and are listening to all the types of music that were covered in this test, you could actually choose any of the contenders with regards to quality. If you only listen to metal, you would put the encoders that performed best at metal on your list. What these public listening tests actually serve for is to let you narrow down the encoders you should consider for starting your own tests. Then you start to cut more and more encoders from your list depending on whether you need fast speed, gapless playback, support for platforms like Linux or Mac, etc. and in the end, you come up with one encoder that is best suited for your individual needs. I hope you get my point. smile.gif

This post has been edited by Sebastian Mares: Nov 26 2008, 13:16


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
halb27
post Nov 26 2008, 13:24
Post #112





Group: Members
Posts: 2424
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



QUOTE (sizetwo @ Nov 26 2008, 13:56) *
...5: The final outcome is for most people to end up saying "test for yourself", thus negating the empirical evidence we can draw from such a test, and ultimately making it rather pointless, other then saying that people should stay away from the low anchor.

The question remains, how then can a test be deviced so that it can yield results that are in fact conclusive and create a form of intersubjective opinion regarding the prefered codec at a specific bitrate ?

It's true that the overall outcome averaged over all samples doesn't say much especially if the results are tíed. But looking at the detailed results every reader can get results for any level of personal effort he is willing to put into interpreting the results.
IMO it's like this (keep in mind it's only about interpreting the results of the test):

a) for the take-it-easy-people people not struggling about details:
Helix is best as it achieved good results with any sample. It's rather closely followed up by first Lame 3.98 and second FhG surround which show a weakness (of minor to modest degree) on only 1 sample. iTunes and Lame 3.97 are quite a bit behind having both 3 weaknesses (one of them being of higher degree).
From the test organization a warning can be helpful that deciding things this way may lead to suboptimal decision as personal relevance of the samples is not taken into account.

b) for the more caring people giving some effort to result interpretation but avoiding own listening tests:
Concentrate on those samples which are meaningful to you (which are roughly your kind of music) and ignore those samples which have no or nearly no relevance to you. Look at the outcome of the various encoders for this sample selection and pick your favorite.
From the test organization this procedure can be supported by giving more detailed information about the samples (genre(s) in the first place), as not every reader will listen to the samples (which however should be highly recommended cause otherwise the reader doesn't know what he's reading about).

c) for the very caring people allowing for own listening tests procedure b) can be a start and eventually make things easier as it can exclude certain encoders from consideration.

In case several encoders are getting candidates for personal use this way: don't worry, enjoy the choice in it's own right (and you always have the choice to go the b) or c) way in case you're coming from a level above).

This post has been edited by halb27: Nov 26 2008, 13:51


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
uart
post Nov 26 2008, 14:36
Post #113





Group: Members
Posts: 795
Joined: 23-November 04
Member No.: 18295



QUOTE (sizetwo @ Nov 26 2008, 03:56) *
1: Whichever encoder(s) comes out on top of a test does not indicate that it is a superior encoder at that bitrate, regardless, as in this case, if its a tie.


This is called "statistical significance" and is a very important part of making judgements about which is better in cases where there is an element of randomness or uncertainly (aka variance) in measurement. There is well developed statistical theory that analyses the difference in the means (averages) in relation to the variance of the scores and the number of samples and determines whether the observed difference is likely to be a result of chance or whether it is more likely that it is due to a genuine difference in the nature of the things being measured. Loosely speaking these two cases correspond to "no significant difference" or a "significant difference" respectively.

What people really mean here when they say the scores are "tied" is that the cold hard statistical mathmatics says that the differences are not statistically significant. Essentually this just means that the underlying randomness of the data set means that is unrelaible to assume that there is a real difference.

This post has been edited by uart: Nov 26 2008, 16:47
Go to the top of the page
+Quote Post
Alex B
post Nov 26 2008, 15:02
Post #114





Group: Members
Posts: 1303
Joined: 14-September 05
From: Helsinki, Finland
Member No.: 24472



I think it would be good to quote Pio2001's valid comment here:

QUOTE (Pio2001 @ Nov 25 2008, 01:34) *
... Oh, and Greynol is right. Helix is not winner. It is tied. The differences are within the confidence intervals, which means that they are just random. If you redid the test with the same samples and same listeners, the simple fact the ABC/HR presents them in a different order every time would probably lead Lame, or Fraunhofer, or iTunes to get a slightly, but not significantly, superior score.

We must consider this to be chance, unless we have more information to backup further claims.


To better understand the results I am going to start sample specific discussion threads - one for each sample.

The first two are here:

http://www.hydrogenaudio.org/forums/index....showtopic=67562
http://www.hydrogenaudio.org/forums/index....showtopic=67561

This post has been edited by Alex B: Nov 26 2008, 15:02


--------------------
http://listening-tests.freetzi.com
Go to the top of the page
+Quote Post
greynol
post Nov 26 2008, 17:42
Post #115





Group: Super Moderator
Posts: 10000
Joined: 1-April 04
From: San Francisco
Member No.: 13167



QUOTE (halb27 @ Nov 26 2008, 01:04) *
b1) iTunes and Lame 3.97 aren't attractive candidates for encoding (things can look different in case those samples where these encoders perform weakly are not very relevant for the individual choosing the encoder)

This is nonsense. Who's to say using 14 different samples would have given the exact same outcome? If they had then maybe you'd be right but there aren't more samples. If the difference between samples where Lame 3.98 scored consistently and significantly higher than Lame 3.97 was due to a known defect of Lame 3.97 that has been corrected in Lame 3.98 ("sandpaper problem"), then possibly. Perhaps a class of samples exist that show weaknesses new to Lame 3.98. This is not beyond the realm of possibility considering that we've seen regression in Lame's CBR method with at least one documented sample between 3.93 and 3.98, though one sample does not a class make.

Based on the test results the candidates were all tied. There is not enough statistical evidence to suggest the sound quality of any are more attractive than any other, period, end of discussion.

This post has been edited by greynol: Nov 26 2008, 17:56


--------------------
Placebophiles: put up or shut up!
Go to the top of the page
+Quote Post
guruboolez
post Nov 26 2008, 18:20
Post #116





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



I'm quoting halb27:

Well, as you can learn from recent posts there are some people feeling that there are posters here defending Lame in an inadequate way (though there is nothing to defend). Chance is high they wouldn't do something similar if Lame had come out clear on top. I am one of these who feel like that.
Of course, and that's perfectly normal. When a general consensus is confirmed, there's no debate. But when the same consensus is broken by a new element (test, proof, theory) then the pertinence of the latter is subject to strong debate. Take an example. A scientific would find a new proof that earth turn around the sun: the scientific community won't put real attention to this new proof. Another scientific would bring a test proving that heliocentrism is wrong… and guess what will happen. You see a bias where there's simply a very common attitude.

“What you say isn't wrong, it's just killer statements which if taken seriously makes this test worthless.”
So what I say is not wrong but you refuse to accept it because it makes the test worthless?! I said this result is "a lead" and "a brick" to a bigger building. No more and certainly not less. I don't call this "worthless".

“and on the other hand you try to give special merits to Lame because you think we can trust Lame more. This simply isn't fair. And it's even a bad argument, cause Lame 3.98 isn't Lame 3.97 and when going back in time we had significant changes in Lame technology when looking at the Lame history. ”
This argument looks dishonest to my eyes. LAME 3.98 is an improvement, not a radically different piece of code. A new release won't break the confidence people have on an encoder just because parts of the code changed. People trust LAME in general, Vorbis in general, MPC, FLAC, x264, Xvid in general... and not a single and past version of it. LAME is trustable since years ; LAME 3.98 quality didn't start from scratch ; with no surprise several people are trusting and using the last version of the encoder. HELIX/Real wasn't trustable for years, and I don't see giving a special merit to LAME when I say that a single listening test won't make Helix as trustworthy as LAME considering the different history they have.

“Moreover what is this trust in Lame good for if for instance with Lame 3.97 the 'sandpaper problem' came up”
I case you forgot it, the sandpaper issue occured on very specific occasions and the overall progress of LAME 3.97 over 3.96 was massive enough (specially with VBR at mid -bitrate range) to prefer that most recent version. I've posted several listening tests on LAME 3.97 beta few years ago (in which the artefact you described was discovered).

“b) the detailed outcome of the encoders on the individual samples gives some hints which encoder to use:

b1) iTunes and Lame 3.97 aren't attractive candidates for encoding ”


So long on HA.org and still unable to read a listening test?!
ALL ENCODERS ARE TIED. HELIX is as good as iTunes according to this test. If you refuses it then you're implicitly admitting some limitation of collective listening tests.

This post has been edited by guruboolez: Nov 26 2008, 18:22
Go to the top of the page
+Quote Post
Sebastian Mares
post Nov 26 2008, 18:38
Post #117





Group: Members
Posts: 3629
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



The people interested, here are the samples (don't know how long they will be available since my account expires on December 1st):

http://rapidshare.com/files/167638538/Sample01.zip
http://rapidshare.com/files/167638513/Sample02.zip
http://rapidshare.com/files/167638551/Sample03.zip
http://rapidshare.com/files/167638514/Sample04.zip
http://rapidshare.com/files/167638550/Sample05.zip
http://rapidshare.com/files/167638524/Sample06.zip
http://rapidshare.com/files/167638545/Sample07.zip
http://rapidshare.com/files/167638522/Sample08.zip
http://rapidshare.com/files/167638544/Sample09.zip
http://rapidshare.com/files/167638527/Sample10.zip
http://rapidshare.com/files/167638543/Sample11.zip
http://rapidshare.com/files/167638554/Sample12.zip
http://rapidshare.com/files/167638529/Sample13.zip
http://rapidshare.com/files/167638525/Sample14.zip

Or an all-in-one ZIP from kwanbis:

http://www.megaupload.com/de/?d=13B7NWEP


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
Neasden
post Nov 26 2008, 18:50
Post #118





Group: Banned
Posts: 185
Joined: 1-July 08
Member No.: 55148



I've never seen such a hot debate. It's pretty cool. Perhaps the discussion of how things were in the past and we were there and saw LAME crawl to its majesty is useless now.
  • Facts
Helix hasn't been tuned in 3 full years.
LAME latest tuning is from the last 3 months.
Helix is encoding at 90x. I got 30x in my PC probably because of hardware limitations. But it's OK.
LAME encoding here is no more than 12x. And this bothers me.
Helix performed a bit better than LAME in this test.
LAME is showing weaknesses at 128 kbps (this could be with this set of samples, we don't know)

This is just the tip of the iceberg that already started to bother the crowd.

Can you imagine if Helix had been developed and tuned? Would it have been surpassed LAME in light-years?

I guess this discussion does not end here, I see a lot of analytical people trying to make a point. Everyone's got their point, and I think we should deepen this investigation, make another test, perhaps a different listening test with more people and a vast amount of samples to "end this discussion".

This post has been edited by Neasden: Nov 26 2008, 18:53
Go to the top of the page
+Quote Post
Big_Berny
post Nov 26 2008, 18:58
Post #119





Group: Members
Posts: 242
Joined: 9-February 03
Member No.: 4921



QUOTE (guruboolez @ Nov 26 2008, 20:20) *
So long on HA.org and still unable to read a listening test?!
ALL ENCODERS ARE TIED. HELIX is as good as iTunes according to this test. If you refuses it then you're implicitly admitting some limitation of collective listening tests.

Well to be 100% correct you can't say that HELIX is as good as iTunes. This is not 'proven' by the test (you'd have to test the beta error instead the alpha error). But since the differences between the two aren't significant you also can't say that HELIX is better as the difference MAY BE (!) random.

So what we can say (as conservative scientists) is: We can't be sure that there's a difference in quality between the different encoders. Nothing more.
Go to the top of the page
+Quote Post
null-null-pi
post Nov 26 2008, 18:59
Post #120





Group: Members
Posts: 9
Joined: 11-October 08
Member No.: 59933



yay, this is exciting!!
seems like i'll have to run a few tests myself since i didn't check on FhG or Helix for a very long time. and it also seems like i underestimated their performance/progress in development...


--------------------
10 FOR I=1 TO 3:PRINT"DAMN":NEXT
Go to the top of the page
+Quote Post
greynol
post Nov 26 2008, 19:13
Post #121





Group: Super Moderator
Posts: 10000
Joined: 1-April 04
From: San Francisco
Member No.: 13167



QUOTE (Neasden @ Nov 26 2008, 09:50) *
Helix performed a bit better than LAME in this test.
No, it didn't.

QUOTE (Neasden @ Nov 26 2008, 09:50) *
LAME is showing weaknesses at 128 kbps (this could be with this set of samples, we don't know)
How so?

To point out the impotency of your analysis, based on Sebastian's colored graph, Lame 3.97 performed the best on the greatest number of samples (it appears to be tied with Fraunhofer on sample 10). The point is that you have to look at the totality of the test and understand something about statistics. Those vertical bars in the chart summarizing the results are there for a reason and it appears that you have no idea how to interpret them.

QUOTE (Neasden @ Nov 26 2008, 09:50) *
Would it (Helix) have been surpassed LAME in light-years?
Quite possibly not.

This post has been edited by greynol: Nov 26 2008, 19:21


--------------------
Placebophiles: put up or shut up!
Go to the top of the page
+Quote Post
Soap
post Nov 26 2008, 19:13
Post #122





Group: Members
Posts: 1013
Joined: 19-November 06
Member No.: 37767



QUOTE (Neasden @ Nov 26 2008, 12:50) *
and I think we should deepen this investigation, make another test, perhaps a different listening test with more people and a vast amount of samples to "end this discussion".

We? Let us give Sebastian Mares credit - this was largely a one-man show.
We? How about you? Organize a new test if you like. Don't beg the collective audience to do the work for you.
More People? I'm sure if Sebastian had a magic wand there would have been more people involved - but even with a Slashdotting and repeated extensions there were only a limited number of participants.


--------------------
Creature of habit.
Go to the top of the page
+Quote Post
Dingo_RG
post Nov 26 2008, 19:23
Post #123





Group: Members
Posts: 1
Joined: 9-November 08
Member No.: 62011



Neasden said:

"Helix hasn't been tuned in 3 full years"

"Can you imagine if Helix had been developed and tuned? Would it have been surpassed LAME in light-years?"

-----------------------------------------------------------

Excellent point, exactly my thoughts...

With the results from the test anyone could conclude that in general, Helix is a good encoder, performing excellent there...

There are two main flaws in Helix encoder, one regarding to audio quality with metal music; and the other regarding to gapless.

Well, Helix is open source... there is a good challenge for the software developers and beta testers from HA to fix these two issues and tuning Helix to its maximum capacity.
Go to the top of the page
+Quote Post
guruboolez
post Nov 26 2008, 19:39
Post #124





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



QUOTE (Big_Berny @ Nov 26 2008, 19:58) *
So what we can say (as conservative scientists) is: We can't be sure that there's a difference in quality between the different encoders. Nothing more.

Exactly. Or at "We can't be sure that there's a difference in quality between the different encoders for this set of samples and for the participants etc..."

To put the debate on statistical difference and on the practical side of the graph, I created a fake one in which I add as competitor a lossless encoding. It's not quite perfect as the confidence error margin would change a bit but I don't think a true graph would really look different:



LAME 3.98 and Helix are statistically tied to any lossless format.
What's the point of this? Simply imagine any lossy competitor at higher bitrate (it could be LAME -V2, MPC standard or any other idol): it would appear on this graph a bit below my virtual lossless contender. Then what should people conclude? That Helix ~130 kbps is as good as LAME ~200 kbps but is also much faster and much smaller. What the hell LAME developers did during these years? Why people on HA.org are so conservative and don't immediately switch to this encoder which even competes with lossless.

Am I clear enough? The first, immediatate and indubitable conclusion of this test was first made by Sebastian Mares: it's the last time MP3 at 128 kbps will be tested (by him). They're too close to transparency to reach other conclusions. From this test you can build the most foolish recommandations, including that LAME and Helix are a substitute to any lossless format. This is what the test would say. I'm not caricaturing things and it's not even an aberration: the evidence that a group of listeners would be OK with several MP3 implementation at around 130 kbps is there. I find it nice – much nicer than the useless debate below. It's not the conclusion I dreamt about but I would thank Sebastian to bring HA.org (which is sometimes a bit elitist) to a conclusion million people reached by themselves in the world. MP3 at 128 kbps is often good even with the fastest encoders.

Now individual users are different from a group and people won't replace their lossless collection by an helix or lame at 130 kbps just because a test said it's safe to do it. We don't blindly obey to listening tests.

This post has been edited by guruboolez: Nov 26 2008, 19:53
Go to the top of the page
+Quote Post
[JAZ]
post Nov 26 2008, 19:42
Post #125





Group: Members
Posts: 1767
Joined: 24-June 02
From: Catalunya(Spain)
Member No.: 2383



QUOTE (Neasden @ Nov 26 2008, 18:50) *
Can you imagine if Helix had been developed and tuned? Would it have been surpassed LAME in light-years?


* Helix (including its former and current incarnation) has been developed for around 10 years (ok, make that 7~8 if we accept that the last modification was in 2005).

* Helix has always been developed by companies, and full-time workers.

* The original goal of Xing (helix's parent) was speed. Back then, the claim was: "it is 8 times faster than current encoders". And it was true!

* During the later days of Xing, development focused on quality (moved from i/s stereo to m/s stereo, allowed full bandwidth encoding instead of usually filtering at around 16Khz, improved in the VBR department..)

* When Helix was born, as part of a whole new attempt of Real Networks to embrace the open source community (Helix DNA, Helix server, Helix player... ), the Helix mp3 encoder was further tuned and developed with quality in mind, while preserving its speed (For Real it was good to have a fast encoder).


In constrast:

* LAME has been developed for 10 years.

* LAME's development has always been a work of volunteers, sometimes, a single person.

* The original's (1.0) original goal of LAME was to be an mp3 encoder for the Amiga pc's. That implied speed.
The actual original (2.0) goal of LAME is quality.
As such, LAME was based on the official dist10 reference MP3 encoder, and improving the methods as to get a better quality.
This got further remarked when LAME developers tuned the encoder using fraunhofer's output as reference.

* LAME has always received both, speed and quality improvements, taking quality as most important. GOGO took speed as most important.

* During the last years of LAME development, the changes have been focused on new models, tweak behaviour shown in certain killer samples and overall standards compliance. This translates that in fact, the development didn't advance much, but it did, as the test shows.


In the end, it is not strange for me to see Helix's behaviour. I may have found strange that Itunes showed Helix's behaviour.

About the test results, I will just repeat what's the consensus: They are tied. Closed point.
All them have weaker and stronger areas.
Go to the top of the page
+Quote Post

9 Pages V  « < 3 4 5 6 7 > » 
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 1st August 2014 - 12:52