"Audiophile" listening event @ Definitive Audio in Seattle

Topic: "Audiophile" listening event @ Definitive Audio in Seattle (Read 154280 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #150 – 2011-02-16 21:25:37

Quote from: 2Bdecided on 2011-02-15 13:52:49

Quote from: googlebot on 2011-02-15 12:24:51
Why is it well established then, that minimum or intermediate phase filter designs are preferable over linear phase (in terms of perceived quality), when they must all sound equal to not fall into "a problem with the theory"?
"well established"? I haven't seen any tests that would satisfy HA rules that show there's any audible difference. We're talking about 20kHz+ here, not in the audible range. Is that what you're referring to? I've no argument with you about the audible range.

Well done JA. You've generated a nice diversion from discussion of your 'listening event'.

Mods, perhaps this discussion of filter audibility should be split off as a new topic?

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #151 – 2011-02-16 22:01:32

Quote from: krabapple on 2011-02-16 21:25:37

Quote from: 2Bdecided on 2011-02-15 13:52:49
Quote from: googlebot on 2011-02-15 12:24:51
Why is it well established then, that minimum or intermediate phase filter designs are preferable over linear phase (in terms of perceived quality), when they must all sound equal to not fall into "a problem with the theory"?
"well established"? I haven't seen any tests that would satisfy HA rules that show there's any audible difference. We're talking about 20kHz+ here, not in the audible range. Is that what you're referring to? I've no argument with you about the audible range.

Well done JA. You've generated a nice diversion from discussion of your 'listening event'.

It wasn't meant as a diversion, merely that as "2BeDecided" had mentioned the ringing of low-pass filters, I thought that HA posters would be interested in ABXing the files created by Keith Howard using 7 different filters that accompanied his January 2006 Stereophile article.

Regarding your multiple follow-up questions in recent posts, I feel it would be more worthwhile, as you suggested, your attending my next New York presentation (date tba) and experiencing the demonstration in person. (I do note that you asked some of the same questions on HA two years ago, and I did offer answers at that time.) In the meantime, as someone has posted a link to a segment of the original Linn recording, you can perform your own tests of the original vs various data-reduced versions.

John Atkinson
Editor, Stereophile

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #152 – 2011-02-17 02:45:30

Quote from: googlebot on 2011-02-14 10:45:30

So proper, placebo eliminating test procedures should also be in your best interest! Then if a magazine, that you pay and with considerable advertising cash-flow, continually rejects placebo eliminating procedure (it is really not that hard if there was a will), you should ask critically instead of becoming its apologist. Treating the matter as if it was all just opinion vs. opinion, has a taste of Fox News literacy. Why would a placebo elimination protocol, as ABX, make a test invalid vs. the same test without that protocol properly implemented?* That you even parrot these "opinions" is telling.

* Please safe my time and do not answer this by whatever you regard as "authority" but with good old arguments.

I'm afraid you've lost me! I believe I already said that I personally wish that Stereophile did more blind testing, though I won't go so far as to say "it is really not that hard if there was a will," because I'm not personally familiar with the practical constraints under which the magazine operates. Common sense suggests that it would be more practical for some devices -- DAC's, say, or preamps -- than for others -- loudspeakers, say, or power amps (since for example high output impedance power amplifiers can interact with the complex impedance of a specific loudspeaker).

As to ABX tests, they have some limitations having to do with the difficult of detecting differences with a low probability of detection to a 95% CI in tests of practical scope. See Les Leventhal's letter here (second one on the page):

http://www.stereophile.com/content/highs-l...-testing-page-2

There are those who claim other limitations as well, such as the confusion and stress of the test, but I'm not aware of any scientific evidence to back up those assertions, so I'll put them in the "possibility" category.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #153 – 2011-02-17 02:54:12

Quote from: 2Bdecided on 2011-02-14 14:26:27

I'm sad that we're still pissing around with making stereo better, or moaning about the "evils" of near-as-damn-it transparent lossy encoding. It's no wonder normal, intelligent, and even quality-conscious people don't give a damn about "high end" audio any more.

Cheers,
David.

Since it's established that even high bit rate lossy encoding produces the occasional artifact, minor though they may be, I'm not sure why anyone would bother with lossy encoding at this point, except in a bandwidth-constrained medium such as radio. But I wanted to say that I agree 100% with your comment about stereo. And might add to that vinyl and other obsolete technologies. The counterargument, which someone made to me a few days ago, is that audiophiles are pretty much "stuck" with these formats because of the availability of source material.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #154 – 2011-02-17 03:11:45

Quote from: Josh358 on 2011-02-17 02:45:30

Common sense suggests that it would be more practical for some devices -- DAC's, say, or preamps -- than for others -- loudspeakers, say, or power amps (since for example high output impedance power amplifiers can interact with the complex impedance of a specific loudspeaker).

Common sense suggests it is very practical for the topic at hand, which is not loudspeakers or power amps.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #155 – 2011-02-17 04:05:49

Quote from: greynol on 2011-02-17 03:11:45

Common sense suggests it is very practical for the topic at hand, which is not loudspeakers or power amps.

But then, it's settled, insofar as lossy codecs are concerned.

OTOH, after reading some very interesting posts about preringing and apodizing filters, I'm just as flummoxed as I was by questions about the audibility of filters and whether sampling rates above 44.1 (possibly 48) kHz have any audible benefit in modern equipment. The apparently conflicting ABX tests don't help. Or the conflicting tests on the audibility threshold of jitter.

It seems like fertile ground for research, and one that doesn't require a speaker shuffler. :-) Though I'm not sure how you'd design a meaningful comparative testing regime for a magazine, if you wanted to do so, and if it was warranted, that is, if it were demonstrated that there were audible differences between players, converters, and sampling rates.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #156 – 2011-02-17 05:02:49

Quote from: Josh358 on 2011-02-17 04:05:49

But then, it's settled, insofar as lossy codecs are concerned.

...and it is settled as far as high-resolution audio is concerned as well.

Lest we forget Josh358, the topic at hand is JA's willful avoidance of proper objective testing methods.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #157 – 2011-02-17 05:08:09

Quote from: greynol on 2011-02-17 05:02:49

Lest we forget Josh358, the topic at hand is JA's willful avoidance of proper objective testing methods.

That brings me down to some boring repeatable no-evidences around the net
I find myself to often discussing "theoretical" cosmetics on how to handle audio files...
I have to praise Hydrogenaudio once more not to rely on that sillly esotheric sound experience bs

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #158 – 2011-02-17 07:05:04

Quote from: Stereoeditor on 2011-02-16 22:01:32

Regarding your multiple follow-up questions in recent posts, I feel it would be more worthwhile, as you suggested, your attending my next New York presentation (date tba) and experiencing the demonstration in person. (I do note that you asked some of the same questions on HA two years ago, and I did offer answers at that time.) In the meantime, as someone has posted a link to a segment of the original Linn recording, you can perform your own tests of the original vs various data-reduced versions.

John Atkinson
Editor, Stereophile

Apparently a thing or two has changed in the past two years (e.g., you say you're no longer using Audition 1.0 to generate 128 kbps mp3s...but not saying what you ARE using now.) It should be simple to answer this one at least:

what was the title of your presentation in Seattle?

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #159 – 2011-02-17 09:52:29

Quote from: Josh358 on 2011-02-17 02:45:30

As to ABX tests, they have some limitations having to do with the difficult of detecting differences with a low probability of detection to a 95% CI in tests of practical scope. See Les Leventhal's letter here (second one on the page):

http://www.stereophile.com/content/highs-l...-testing-page-2

The stats are beyond me, but I think there were concerns raised about this article on HA at the time.

While the actual maths of the stats is beyond me, the common sense intuition of coin flipping is probably within the grasp of anyone...

Quote

Thus, with a 16-trial listening test analyzed at the conventional .05 level of significance, the probability of the investigator overlooking differences so subtle that the listener can correctly identify them only 60% of the time is a whopping .8334! Accordingly, when true differences between components are subtle, it is not surprising that 16-trial listening tests with (or without) thc ABX comparator typically fail to find them.

Think about that for a second - "differences so subtle that the listener can correctly identify them only 60% of the time" - well, on average you guess right 50% of the time anyway, so in this situation the listener is doing better than chance only 10% of the time - i.e. there's only one trial in ten when they actually notice the difference!

A 1-in-10 chance, only checked 16 times - and each 9-in-10 "miss" delivers random data. Of course you're more likely than not to miss it if you only do 16 trials.

But what are we suggesting here? We're getting someone to do an ABX test. We're letting them choose the programme material, the listening room, how long they listen, and allowing them to switch between A, B and X as much as they want before committing to a decision. Despite all this, they only hear a difference 1 time out of 10. In only 16 trials, this fact is missed.

And somehow this 1 time out of 10 difference is supposed to be the same difference that, in a sighted trial, is immediately obvious to them the moment the equipment is switched on - a difference so significant that describing it take three pages of flowery prose in Stereophile? A difference so robust that it's not impaired by the fact they're being paid to do the review, have a copy deadline to meet, and (typically) can't easily switch between this and another piece of equipment without getting up and (at least) changing some cables/connections?

I think they're taking the Michael.

I can't find it, but I'm sure there's a thread where someone found a slight difference when encoding a Madonna track to Musepack (or maybe it was mp3) - it was near as damn it perfect, but someone (I think it was Garf) got a positive ABX result by running some insane number of trials (50, or 100, or 150 - something like that). But at that kind of level, the tester's honest description of the audible difference isn't going to read like a Stereophile review - it usually reads "I though I was just guessing, but it turned out there was something in it", or "the difference I thought I was hearing seemed to change, and I had to keep resting, but I finally got a significant result so there's something real there", or whatever.

Cheers,
David.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #160 – 2011-02-17 13:19:29

Quote from: Josh358 on 2011-02-17 02:45:30

...I personally wish that Stereophile did more blind testing,

Given that it is arguable that they never really did any that were credible...

Quote

I won't go so far as to say "it is really not that hard if there was a will," because I'm not personally familiar with the practical constraints under which the magazine operates.

Other magazines have done it. These days people do ABX tests at home the same day they get the idea they want to don one, and for a zero out-of-pocket cost.

Quote

Common sense suggests that it would be more practical for some devices -- DAC's, say, or preamps -- than for others -- loudspeakers, say, or power amps (since for example high output impedance power amplifiers can interact with the complex impedance of a specific loudspeaker).

Frankly, testing high output impedance power amps is a waste of time because their performance is so weird and dependent on their operational environment. One of the basic requirements of a reasonable tset is that the item you are testing has properties that are somewhat stable and reasonably independent of their operational context. There's no reason to test something that changes dramatically every time you change its operational environment because your test results have zero generality.

Quote

As to ABX tests, they have some limitations having to do with the difficult of detecting differences with a low probability of detection to a 95% CI in tests of practical scope. See Les Leventhal's letter here (second one on the page):

http://www.stereophile.com/content/highs-l...-testing-page-2

That article is very old news. Les Leventhal was dealt with on a professional basis by The Audio Engineering Society.

The basic problem is that there are no known reliable bias-controlled listening test methodologies that produce the results that magazines like Stereophile need to justify their existence and support their credibility. Many people here understand that they are basically technically useless advertising vehicles. They do contain some valid technical information, but its there to create a perception of factuality that they generally lack.

Quote

There are those who claim other limitations as well, such as the confusion and stress of the test, but I'm not aware of any scientific evidence to back up those assertions, so I'll put them in the "possibility" category.

Leventhal's suppositions are no more credible than the whining about confusion and stress. The real problem is that we have a large segment of the audio industry that is based on fallacious assertions and bad logic.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #161 – 2011-02-17 15:37:20

Quote from: krabapple on 2011-02-17 07:05:04

It should be simple to answer this one at least:

what was the title of your presentation in Seattle?

There was no formal title. I was introduced to the audience with the words "And now John Atkinson of Stsreophile will play some of his high-resolution recordings."

John Atkinson
Editor, Stereophile

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #162 – 2011-02-17 16:44:25

Quote from: 2Bdecided on 2011-02-17 09:52:29

I can't find it, but I'm sure there's a thread where someone found a slight difference when encoding a Madonna track to Musepack (or maybe it was mp3) - it was near as damn it perfect, but someone (I think it was Garf) got a positive ABX result by running some insane number of trials (50, or 100, or 150 - something like that). But at that kind of level, the tester's honest description of the audible difference isn't going to read like a Stereophile review - it usually reads "I though I was just guessing, but it turned out there was something in it", or "the difference I thought I was hearing seemed to change, and I had to keep resting, but I finally got a significant result so there's something real there", or whatever.

Cheers,
David.

What David said. For a period years ago, Les Leventhal's JAES articles were the cudgel of choice wielded by the Stereophile/TAS faithful against DBTs. The fact that the differences typically asserted by the Mikey Fremers of the world seem quite apparent to them after brief audition, and therefore aren't what Leventhal is talking about, went by the wayside.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #163 – 2011-02-17 17:04:17

Quote from: Josh358 on 2011-02-17 02:45:30

I'm afraid you've lost me! I believe I already said that I personally wish that Stereophile did more blind testing, though I won't go so far as to say "it is really not that hard if there was a will," because I'm not personally familiar with the practical constraints under which the magazine operates.

Constraints? Well, it does sell advertising to lots of manufacturers and vendors who might not take kindly to 'no difference supported' DBT results....

But the audiophile press' antipathy to DBT seems as much philosophical -- bordering on religious -- as anything else. I have heard Mr. Atkinson tell the tale of his Damascene conversion from DBT advocate (though I'm not sure how deep that ever ran) to one who seems to find DBT quite beside the point. IIRC he set up a DBT between an amp* he liked and another amp that was cheaper. The DBT didn't support an audible difference, so he went with the cheaper amp. Some time later he found himself dissatisfied, swapped in the tube amp, and all was bliss again. So to him, that meant DBTs aren't useful.

Now to me, the thing to do would be to re-do a DBT *then*, when presumably one is sensitized to the faulty 'sound' of the 2nd amp. (Indeed, audiophiles are forever complaining that the DBTs they read about didn't allow enough time for the listener to 'learn' the sound of the devices under test. One would think a clear published demonstration of this need, by Mr. Atkinson, would be a boon to their argument.) I asked Mr. Atkinson why he didn't try that - his response, more or less, was that he didn't see the point.

(* an interesting twist here: I seem to recall that amp #1 was a tube amp, and #2 was an SS amp -- so a priori, a positive DBT result would not be as remarkable as SS vs SS)

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #164 – 2011-02-17 17:05:09

Quote from: greynol on 2011-02-17 05:02:49

Quote from: Josh358 on 2011-02-17 04:05:49
But then, it's settled, insofar as lossy codecs are concerned.

...and it is settled as far as high-resolution audio is concerned as well.

Lest we forget Josh358, the topic at hand is JA's willful avoidance of proper objective testing methods.

But is the audibility of high res settled? I know of several sets of ABX tests, including two published in JAES and one conducted here. The results are contradictory. So -- are the contradictions the result of experimental error? Specific equipment or algorithm limitations? Fundamental limitations on practical FIR filters, or obscure mechanisms, such as intermodulation in the loudspeakers, air, or ears?

So far, it seems to me that our knowledge belongs in the Journal of Irreproducible Results.

For me, anyway, this is more interesting than Stereophile's failure to do more objective testing, a policy with which, as I've said, I happen to disagree, or with the absence of scientific controls in an informal demonstration, something with which I have no argument, having witnessed over the years literally thousands of informal demonstrations at conventions, trade shows, and meetings, and having found some of them informative and educational.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #165 – 2011-02-17 17:30:10

Quote from: 2Bdecided on 2011-02-17 09:52:29

Quote from: Josh358 on 2011-02-17 02:45:30
As to ABX tests, they have some limitations having to do with the difficult of detecting differences with a low probability of detection to a 95% CI in tests of practical scope. See Les Leventhal's letter here (second one on the page):

http://www.stereophile.com/content/highs-l...-testing-page-2
The stats are beyond me, but I think there were concerns raised about this article on HA at the time.

While the actual maths of the stats is beyond me, the common sense intuition of coin flipping is probably within the grasp of anyone...

Quote
Thus, with a 16-trial listening test analyzed at the conventional .05 level of significance, the probability of the investigator overlooking differences so subtle that the listener can correctly identify them only 60% of the time is a whopping .8334! Accordingly, when true differences between components are subtle, it is not surprising that 16-trial listening tests with (or without) thc ABX comparator typically fail to find them.
Think about that for a second - "differences so subtle that the listener can correctly identify them only 60% of the time" - well, on average you guess right 50% of the time anyway, so in this situation the listener is doing better than chance only 10% of the time - i.e. there's only one trial in ten when they actually notice the difference!

A 1-in-10 chance, only checked 16 times - and each 9-in-10 "miss" delivers random data. Of course you're more likely than not to miss it if you only do 16 trials.

But what are we suggesting here? We're getting someone to do an ABX test. We're letting them choose the programme material, the listening room, how long they listen, and allowing them to switch between A, B and X as much as they want before committing to a decision. Despite all this, they only hear a difference 1 time out of 10. In only 16 trials, this fact is missed.

And somehow this 1 time out of 10 difference is supposed to be the same difference that, in a sighted trial, is immediately obvious to them the moment the equipment is switched on - a difference so significant that describing it take three pages of flowery prose in Stereophile? A difference so robust that it's not impaired by the fact they're being paid to do the review, have a copy deadline to meet, and (typically) can't easily switch between this and another piece of equipment without getting up and (at least) changing some cables/connections?

I think they're taking the Michael.

I can't find it, but I'm sure there's a thread where someone found a slight difference when encoding a Madonna track to Musepack (or maybe it was mp3) - it was near as damn it perfect, but someone (I think it was Garf) got a positive ABX result by running some insane number of trials (50, or 100, or 150 - something like that). But at that kind of level, the tester's honest description of the audible difference isn't going to read like a Stereophile review - it usually reads "I though I was just guessing, but it turned out there was something in it", or "the difference I thought I was hearing seemed to change, and I had to keep resting, but I finally got a significant result so there's something real there", or whatever.

Cheers,
David.

Whoa, there's a lot here. So let me begin by saying that, to a large extent, I agree. I don't know how many times I've heard people say that, when they tried a blind test, the differences they thought they heard either disappeared or became much subtler. I think we've all seen flowery reviews in which minor differences (if they're real) are presented as if they were major ones, or in which lateral or even backwards moves are presented as progress.

Also, for me, one of the benefits of ABXing is that even if practical tests do overlook some differences, it tends to separate the obvious ones from differences that, if they're real, are extremely subtle. This makes blind testing a very useful tool for manufacturers who are trying to design to a price point, and some manufacturers do use it that way.

That being said, I think you've misrepresented the technique of subjective reviewers, who typically listen to a component for a relatively lengthy period, to become familiar with its idiosyncracies.

In theory, it's possible to conduct an ABX test of any length, but in practice, there are practical constraints. So if lengthy listening does in fact have benefits, they will tend to be lost in an ABXing regime.

Another argument against ABX testing is that its better suited to basic psychometric evaluations with test signals than it is to music, which is a complex signal and puts great demands on short term memory. This would also be one of the arguments for lengthy testing, since long term memory has a greater capacity than short term memory.

OK, so I've presented the arguments. The problem is, I don't know how to demonstrate, objectively, that they're correct or not, without recourse to the statistics. And the statistics can only tell us so much. Statistics can quantify the practical difficult of detecting subtle differences in ABX tests, but it can't demonstrate with certainty that such differences exist, or how common they are.

I can, however, make a recording that will fool almost any ABX test of practical length. I need only record a highly intermittent but obvious audio flaw. The computer audio I'm listening now has at least three such flaws. One is apparently caused by a problem on the motherboard audio chipset. Every few days, it emits some fairly loud chirps. Obvious to anybody, but unlikely to be ABX'd. Another is a buzzing, only on some notes, pretty much only on piano. Probably caused by flaking of the protective coating on the neodynium bar magnets in the speakers. Another would be highly frequency-dependent planar resonances, again, obviously audible only on specific cuts.

They're all examples of low probability events, which per Leventhal's analysis would require an inordinate number of trials for a 95% CI.

What I can't do is extend this to the other sort of subtlety that subjective listeners say they hear, such as differences in soundstaging, bass, grain, etc. It seems obvious that some of these differences, if they were nearly as pronounced as the listeners claim, would show up right away on an ABX comparison. But I don't know how to test whether subtler differences might or might not. The problem rapidly becomes circular, because even if you introduce a known degree of distortion, you have no objective way of knowing whether it's audible without recourse to the very ABX test you're testing.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #166 – 2011-02-17 18:33:06

Quote from: Josh358 on 2011-02-17 17:30:10

In theory, it's possible to conduct an ABX test of any length, but in practice, there are practical constraints. So if lengthy listening does in fact have benefits, they will tend to be lost in an ABXing regime.

Nonsense.

Quote from: Josh358 on 2011-02-17 17:30:10

Another argument against ABX testing is that its better suited to basic psychometric evaluations with test signals than it is to music, which is a complex signal and puts great demands on short term memory. This would also be one of the arguments for lengthy testing, since long term memory has a greater capacity than short term memory.

Again, nonsense.

Quote from: Josh358 on 2011-02-17 17:30:10

it can't demonstrate with certainty that such differences exist, or how common they are.

"Flying spaghetti monster" argument duly noted.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #167 – 2011-02-17 21:03:07

Quote from: Josh358 on 2011-02-17 17:30:10

Another argument against ABX testing is that its better suited to basic psychometric evaluations with test signals than it is to music, which is a complex signal and puts great demands on short term memory. This would also be one of the arguments for lengthy testing, since long term memory has a greater capacity than short term memory.

That's one of Bob Stuart's arguments. I don't think it stacks up...

Quote

I can, however, make a recording that will fool almost any ABX test of practical length. I need only record a highly intermittent but obvious audio flaw. The computer audio I'm listening now has at least three such flaws. One is apparently caused by a problem on the motherboard audio chipset. Every few days, it emits some fairly loud chirps. Obvious to anybody, but unlikely to be ABX'd. Another is a buzzing, only on some notes, pretty much only on piano. Probably caused by flaking of the protective coating on the neodynium bar magnets in the speakers. Another would be highly frequency-dependent planar resonances, again, obviously audible only on specific cuts.

In both critiques, it seems apparent to me that the "good" way of doing something like existing subjective testing, but with double-blind statistical certainty, is firstly to do double blind A/B testing where there's no limit on time or source material. Listen to A for a month if you want - play all the music you own (OK, that would take a few years for some of us, but you get the point). Then have a go at B. Then try direct comparisons if you wish.

Then you know what you think the differences are, you know their nature, you know what kind of content reveals them etc. Now you can go for a full ABX to prove that it's real.

Quote

They're all examples of low probability events, which per Leventhal's analysis would require an inordinate number of trials for a 95% CI.

They're not though. If a slightly broken speaker cone reveals problems with solo piano music, you run 16 ABX trial with piano music - not one each with each random CD you one. If there's a highly intermittent fault (even a subtle one) you pick X=A or X=B when you hear it - you don't pick anything until you do.

The fact that magazine rarely is ever do double blind A/B testing, never mind the X part, speaks volumes IMO.

The fact that you could actually do a full standard sighted test, and then ABX whatever you found to be most revealing - AND THEN PEOPLE USUALLY FAIL - is also quite strange. Or not.

There's no great excuse against blind A/B - certainly not where all the testing happens 9-5 in the magazine's office. Obviously at home there are other issues, but it's not insurmountable.

I don't really think you should publish reviews if you can't manager to do them properly. But then, it's a free market. There are people who want to pay to read flawed test reports.

Cheers,
David.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #168 – 2011-02-17 22:01:34

Quote from: Arnold B. Krueger on 2011-02-17 13:19:29

Given that it is arguable that they never really did any that were credible...

That I can't comment on, not having been privy to the debate.

Quote

Other magazines have done [blind testing]. These days people do ABX tests at home the same day they get the idea they want to don one, and for a zero out-of-pocket cost.

JA is really far more qualified to discuss this than I. But I think it has to be remembered that what is zero out-of-pocket for us isn't for a commercial endeavor, and that Stereophile's primary focus is on judging equipment in highly optimized settings. Frankly, it's not a strategy with which I entirely agree, because I believe that choosing equipment with (alleged) sonic characteristics that counterbalance the sonic characteristics of other equipment leads to impractical complications. But I have seen the consequences of tests in which I believe insufficient attention was paid to interface and setup, as for example Harman's test of a Martin Logan hybrid in mono, in acoustics and at a distance that may not have been appropriate for that speaker.

But while, as I've said, I do wish Stereophile would ABX some components, in particular the more controversial ones such as esoteric cables and power cords, and I'd like to see blind testing done whenever that is practical to avoid confirmation bias, I can only guess at the practicalities involved and the effect they would have on Stereophile's work flow and mission.

Quote

Frankly, testing high output impedance power amps is a waste of time because their performance is so weird and dependent on their operational environment. One of the basic requirements of a reasonable tset is that the item you are testing has properties that are somewhat stable and reasonably independent of their operational context. There's no reason to test something that changes dramatically every time you change its operational environment because your test results have zero generality.

I think you're treading on dangerous ground here! Not because I don't personally think you're right, or subscribe to that design philosophy, but because I don't have any objective evidence that high output impedance power amplifiers don't have special sonic qualities that make them desirable, as some audiophiles report. Just my sense that the whole business is a crock which is based on intentionally introduced coloration, and ABX tests which find no audible difference in the linear range when frequency response aberrations are equalized out. Unfortunately, that's not proof.

Quote

That article is very old news. Les Leventhal was dealt with on a professional basis by The Audio Engineering Society.

How so? That's news to me.

Quote

The basic problem is that there are no known reliable bias-controlled listening test methodologies that produce the results that magazines like Stereophile need to justify their existence and support their credibility. Many people here understand that they are basically technically useless advertising vehicles. They do contain some valid technical information, but its there to create a perception of factuality that they generally lack.

This I think involves speculation on your part regarding the motives of Stereophile's staff. I try in general not to impute motive, because for the most part such imputations aren't falsifiable, which makes them a prime tool of confirmation bias. As in "She only ran in front of that truck to save the baby because she wanted to be on the evening news." Not, alas, much of an exaggeration of the way people use this tool to conclude whatever they want.

I do disagree with what I regard as excessive subjectivity on the part of the audiophile press, and a refusal to consider at least some controls. As Stereophile's founder said, "As far as the real world is concerned, high-end audio lost its credibility during the 1980s, when it flatly refused to submit to the kind of basic honesty controls (double-blind testing, for example) that had legitimized every other serious scientific endeavor since Pascal. [This refusal] is a source of endless derisive amusement among rational people and of perpetual embarrassment for me, because I am associated by so many people with the mess my disciples made of spreading my gospel." The irony here is that as often as not, ABX testing does confirm the existence of subjective differences, as why wouldn't it, given the current state of the art. Absent the questionable, there would still be lots to compare, including both differences that do show up in ABX tests and the gray area of subtle differences that practical blind testing may not reveal.

BTW, did you see Jim Austin's "As We See It" in the March issue of Stereophile? He concludes, "Yet a science-based activity without scientific constraints, in which the only distinction among tweaks that appear to be nothing more than snake-oil, well-designed amplifiers, and speakers with good dispersion characteristics are the vicissitudes of personal aural experience, makes me uncomfortable. I find myself craving some certainty, if only to put a little more space between the creations of a skilled audio designer and, say, a jar of petty rocks."

Not necessarily John Atkinson's personal views, as he made clear in another forum, but he did see fit to publish the essay, not, I think, something he would have done had his intention been merely to defend Stereophile's methodology from all comers.

Quote

Leventhal's suppositions are no more credible than the whining about confusion and stress. The real problem is that we have a large segment of the audio industry that is based on fallacious assertions and bad logic.

Here I would have to ask for supporting evidence. I have experienced fatigue and stress in forced-choice tests, and I know others have reported the same experience, including IIRC some testers on HA. What I can't do is objectify my personal experience, in terms of whether or not it had an effect on my ability to reliably detect subtle differences. It would be possible to design an experiment to test that hypothesis, but, I think, difficult to conduct one owing to the large number of trials required for a statistically meaningful result.

Frankly, I wish it were otherwise, because as you pointed out there's a lot of snake oil out there. But as I said, I can demonstrate that there are easily audible distortions that won't show up on an ABX test of practical length. I have no reason to believe that there aren't subtler forms of distortion that have such characteristics. Unfortunately, most of the candidates I can think of are loudspeaker-related, e.g., the frequency-dependent harmonic distortion that afflicts some ribbons, and so aren't suitable, since different loudspeakers can always be ABXed. One could intentionally introduce intermittent distortion to demonstrate the point, but it would either be trivially obvious or represent artificial conditions.

At the end of the day, I'm left with the impression that sighted testing over-reports (and sometimes under-reports) differences and introduces bias, while ABX testing can potentially under-report them and can't prove that they don't exist. Not very satisfying, I'm afraid, since it leaves a gray region about which nothing rigorous can be said.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #169 – 2011-02-17 22:19:09

Quote from: Josh358 on 2011-02-17 22:01:34

I have experienced fatigue and stress in forced-choice tests, and I know others have reported the same experience, including IIRC some testers on HA.

Yes, it is quite difficult to look for differences where seemingly none exist. Keep this in mind next time you read something from a placebophile talking about how music instantly comes to life and the other horseshit you will read in Stereophile and on forums which do not require objective methods when discussing sound quality.

You've read this insightless piece of garbage, haven't you, Josh358:
http://www.stereophile.com/features/308mp3cd

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #170 – 2011-02-18 01:04:41

Quote from: krabapple on 2011-02-17 17:04:17

Quote from: Josh358 on 2011-02-17 02:45:30
I'm afraid you've lost me! I believe I already said that I personally wish that Stereophile did more blind testing, though I won't go so far as to say "it is really not that hard if there was a will," because I'm not personally familiar with the practical constraints under which the magazine operates.

Constraints? Well, it does sell advertising to lots of manufacturers and vendors who might not take kindly to 'no difference supported' DBT results....

But the audiophile press' antipathy to DBT seems as much philosophical -- bordering on religious -- as anything else. I have heard Mr. Atkinson tell the tale of his Damascene conversion from DBT advocate (though I'm not sure how deep that ever ran) to one who seems to find DBT quite beside the point. IIRC he set up a DBT between an amp* he liked and another amp that was cheaper. The DBT didn't support an audible difference, so he went with the cheaper amp. Some time later he found himself dissatisfied, swapped in the tube amp, and all was bliss again. So to him, that meant DBTs aren't useful.

Now to me, the thing to do would be to re-do a DBT *then*, when presumably one is sensitized to the faulty 'sound' of the 2nd amp. (Indeed, audiophiles are forever complaining that the DBTs they read about didn't allow enough time for the listener to 'learn' the sound of the devices under test. One would think a clear published demonstration of this need, by Mr. Atkinson, would be a boon to their argument.) I asked Mr. Atkinson why he didn't try that - his response, more or less, was that he didn't see the point.

(* an interesting twist here: I seem to recall that amp #1 was a tube amp, and #2 was an SS amp -- so a priori, a positive DBT result would not be as remarkable as SS vs SS)

Yes, it sounds like a second DBT would have been useful and interesting. Though I can't really comment on why JA decided that another test wasn't worthwhile, whether he thought the result would be the same, whether he thought it was unnecessary from a personal perspective, what have you. He'd have to answer that.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #171 – 2011-02-18 01:14:21

Quote from: greynol on 2011-02-17 18:33:06

Quote from: Josh358 on 2011-02-17 17:30:10
In theory, it's possible to conduct an ABX test of any length, but in practice, there are practical constraints. So if lengthy listening does in fact have benefits, they will tend to be lost in an ABXing regime.
Nonsense.

Quote from: Josh358 on 2011-02-17 17:30:10
Another argument against ABX testing is that its better suited to basic psychometric evaluations with test signals than it is to music, which is a complex signal and puts great demands on short term memory. This would also be one of the arguments for lengthy testing, since long term memory has a greater capacity than short term memory.
Again, nonsense.

Quote from: Josh358 on 2011-02-17 17:30:10
it can't demonstrate with certainty that such differences exist, or how common they are.
"Flying spaghetti monster" argument duly noted.

If you care to make a substantive argument, grounded in fact or logic, I'll be delighted to respond. I'm afraid that the word "nonsense" does not qualify.

Your reference to flying spaghetti monsterism, that is, the claim that assertions can be made on the basis of faith alone, is in error: rather obviously, I was referring to two limitations of statistical analysis, rather than asserting that because such differences cannot be proven or enumerated by statistics they can be said to exist or not exist. That would indeed by flying spaghetti monsterism -- whichever conclusion one reached.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #172 – 2011-02-18 01:40:00

Quote from: greynol on 2011-02-17 22:19:09

Quote from: Josh358 on 2011-02-17 22:01:34
I have experienced fatigue and stress in forced-choice tests, and I know others have reported the same experience, including IIRC some testers on HA.

Yes, it is quite difficult to look for differences where seemingly none exist. Keep this in mind next time you read something from a placebophile talking about how music instantly comes to life and the other horseshit you will read in Stereophile and on forums which do not require objective methods when discussing sound quality.

You've read this insightless piece of garbage, haven't you, Josh358:
http://www.stereophile.com/features/308mp3cd

Believe me, I take reports of such differences with a grain of salt. Those whose opinions I do consider plausible (though not proven to my satisfaction) are those who point out that differences of the sort we're discussing are subtle. But I can't say that listener fatigue is a consequence of listening for non-existent differences. There just is no evidence for or against that. Personally I find it difficult to concentrate on the many aspects of sound reproduction simultaneously while worrying about a floating reference. You may not listen that way. I do. Whether it's valid or not, I don't know, but I've found ABX tests on musical material fatiguing, even when the results were positive.

I think JA made the mistake of letting the measurements speak here, rather than referencing tests which show that psychoacoustic phenomena mask most compression artifacts on musical material. OTOH, I agree with his argument about not using lossy codecs in high fidelity reproduction. After all, DBT's tell us that they can be audible. I'm not bothered by the artifacts in high bit rate MP-3's, but that doesn't mean that others aren't. As long as they're hearing real artifacts and not imagined ones, I don't have any argument with their preference (or the preference of those who think they're just fine).

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #173 – 2011-02-18 02:05:39

Quote from: Josh358 on 2011-02-18 01:14:21

If you care to make a substantive argument, grounded in fact or logic, I'll be delighted to respond. I'm afraid that the word "nonsense" does not qualify.

I expect substantiation from you since you are the one putting up the theory. The burdendoes not fall on me to disprove it. Perhaps you have some verifiable psychological studies to present?

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #174 – 2011-02-18 02:08:24

Quote from: Stereoeditor on 2011-02-17 15:37:20

Quote from: krabapple on 2011-02-17 07:05:04
It should be simple to answer this one at least:

what was the title of your presentation in Seattle?

There was no formal title. I was introduced to the audience with the words "And now John Atkinson of Stsreophile will play some of his high-resolution recordings."

John Atkinson
Editor, Stereophile

Even if that were the case (can you supply some objective proof ?), the closest thing to a title would be the way your presentation was promoted in your fine publication as stated in the lead post in this thread .... and that is clearly not neutral in terms of leading expectation bias ..... is it?

Notice