MPC vs OGG VORBIS vs MP3 at 175 kbps
Reply #74 – 2004-07-24 22:04:52
What are you're conclusions here ? I'm interested. My conclusions are that codec B must be a bit underrated, since an "annoying difference" couldn't be distinguished from the original 4 times (unless the tester states that he hit the wrong button). The problem is, that it could be. Some reasons are easy to explain. Imagine that you're testing many formats in the same test. The first step is to rate each file. The first one (1L) is excellent, very hard to distinguish (4.5/5). You're not even sure that the difference really exist. The second file suffers by comparison: coarseness is clearly audible (2/5). Second step now: ABX. The first file is hard to ABX, a lot of concentration is needed. I could distinguish a slight amout of pre-echo on a precise range, that's all. 14/16 [16 as fixed value]: not bad. Second file, should be much easier to ABX. But the six first trials are bad (2/6). Why? Because all my attention is focused on pre-echo I can't hear, simply because the file doesn't suffer from this problem. By changing a bit the selected range, focusing my attention on another problem, I'll find again the annoyance I've immediately detect the first time and perform a very nice 16/16 in 2 minutes. Final score is 18/22. You're conclusion is still the same : "codec B must be a bit underrated"? There's a serious problem with test including more than one encoded file: conditions are not eaqual for all. By changing the order, you could change the results of ABX score. Beginning by an easy test could help you to warm up your ears, give you trust, but an easy 'victory' could also handicapped you by giving excessive confidence, etc... You could be tired after two files if you're beginning by the two most difficult, etc... Of course, the solution would be to rest your audition as often as you want, to take care about your concentration... being like a sportsman during a competition. Problem is that some people (including me) can't always spent three or four hours just to achieve one single test including 6 contenders.Exactly. I'm not going to spend a whole week-end trying to analyse partly sequencial ABX results with additional conditions (...) especially after most people on this board have hammered (but I'm not sure if I repeated it in the ABX tutorial) the necessity of fixing the number of trials before the test begins OR not looking at them during the test for the results to be valid. Nobody forces you to analyse these ABX results. What kind of conclusions could you build by computing ABX scores (I'm serious, I still don't understand)? What could you conclude when you see that one file was ABXed at 10/16 and the other one at 15/16? That the second one have stronger flaws? That's a wrong conclusion. The tester is not a robot, is not living in a studio and is not a champion. He can't necessary maintain the same level of concentration during a whole test; he can't necessary maintain his ears at the same level of freshness ; he logically don't have the same familiarity with the reference during the first ABX session than during the sixth and last one... By fixing a strict number of trials, you're solving problems if and only if the tester had maintained the same listening abilities (generic term including freshness, concentration, motivation, patience, silence in the room) during the whole test. If the tester admits that his listening conditions have changed during one test, there's no need to spend one week-end or simply one minute to compute some additional datas based on ABX scores, which represent nothing (at least, they're not only reflecting the level of difficulties of the samples, but could also reflect the variations of the listening conditions themselves ).Roberto's results are perfectly valid : -Tests were double blind -Pval is strictly inferior to 0.05 (<0.01 is a good thing, <0.05 is requested) And what about number of listeners? What about samples? Many people, including JohnV, ff123 and others have precised that different samples might seriously change the results. Roberto's test are probably valid (he can't use 100 samples and force 200 members of HA to participate to this test), but conclusions builded upon the final results are often... questionnable. Faac tied with Nero AAC, or WMA@128 close to have "perceptible but not annoying" difference.Your results is valid, V.A.L.I.D. Can't you read the Anova log I posted and its conclusion ? OK, I was a bit angry. Sorry (...) Therefore the test IS double-blind. (...) A simple blind test would be (...) Thank you for the explanation. I thought that double blind test was a single blind test repeated twice.When, rating MPC superior to Mp3 9 times out of 10, you get p <0.05 in ABC/HR Anova analysis, it is mathematically equivalent to succeed in a fixed ABX test with p < 0.05. But it's only true at some conditions, isn't it? The level of degradation (artifact) could also play a role I suppose.It has not been much pointed out outside Roberto's tests, but ABC/HR can be a substitute for ABX. I think that it's time to explain this in a tutorial. I'm learning different things (though it's sometimes confusing). A tutorial should be necessary. If I have further questions, I'll probably ask them in french (private message): comprehension should be easier for me. Anyway, thanks for the long explanations And sorry again for the irritating tone of my previous posts.