IPB

Welcome Guest ( Log In | Register )

6 Pages V  < 1 2 3 4 5 > »   
Reply to this topicStart new topic
Overcoming the Perception Problem
Nick.C
post Oct 12 2012, 21:51
Post #51


lossyWAV Developer


Group: Developer
Posts: 1814
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



QUOTE (googlebot @ Oct 12 2012, 21:44) *
The belief, that sound was coming from a impressively crafted sound system, was able to significantly alter the subjects perception.
Is that not pretty much a summary of the placebo effect?


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
googlebot
post Oct 12 2012, 21:58
Post #52





Group: Members
Posts: 698
Joined: 6-March 10
Member No.: 78779



QUOTE (Nick.C @ Oct 12 2012, 22:51) *
Is that not pretty much a summary of the placebo effect?


Yes. Does this change anything? Placebos could be shown to have significant causal effect.

Go to the top of the page
+Quote Post
dhromed
post Oct 12 2012, 22:12
Post #53





Group: Members
Posts: 1339
Joined: 16-February 08
From: NL
Member No.: 51347



But is there a problem?
Go to the top of the page
+Quote Post
hlloyge
post Oct 12 2012, 22:14
Post #54





Group: Members
Posts: 701
Joined: 10-January 06
From: Zagreb
Member No.: 27018



QUOTE (Porcus @ Oct 12 2012, 16:04) *
Not necessarily. Suppose I want to establish a “sufficiently good” (for whatever purpose) end-user format. Then I am not satisfied with your score on your music, unless I am only targetting you as a customer. Even if your music does not have nasty enough artifacts for you to detect (or find annoying), it might be different with other ears and other signals. (Of course, you then need to use the appropriate method (test / design of experiment) to check whether the accuracy is better than random, but that is a practical obstacle.)
If 5 percent of the listeners hear differences on 10 percent of their music collection, is then the format “transparent”? I think not. It may be good enough for the purpose, by all means, but it does not mean that there are no audible differences.


You misunderstood me.
I am conducting ABX test for MYSELF. I am not conducting ABX test to gain statistical knowledge if people can hear difference.
I understand you have to have some sort of statistical chance for error when doing multiple users test, but I am talking about single person making test for his (or her's) own advantage and knowledge.
That margin, if testing codecs for personal knowledge, is irrelevant, IMO.
Go to the top of the page
+Quote Post
Nick.C
post Oct 12 2012, 22:21
Post #55


lossyWAV Developer


Group: Developer
Posts: 1814
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



QUOTE (googlebot @ Oct 12 2012, 21:58) *
Yes. Does this change anything? Placebos could be shown to have significant causal effect.
Yes, but in the case of sighted vs ABX, placebo causes the sighted test to be biased in favour of the source of which the test subject has a preconceived preference.


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
googlebot
post Oct 12 2012, 23:43
Post #56





Group: Members
Posts: 698
Joined: 6-March 10
Member No.: 78779



I do not see how calling the phenomenon "preconveived preference" changes anything. It is a variable one tries to eliminate in many tests, but why here? The subject, in its usual environment, produces different results than the same subject in a modified ("bias eliminating") environment. If the subject wants to compare a Burmester vs. a Teac vs. a Sansa Clip for future usage in its usual environment and as the person he/she is, a sighted test might be more appropriate than a DBT to identify the product with the best perceived (by this subject) performance.
Go to the top of the page
+Quote Post
AndyH-ha
post Oct 13 2012, 03:32
Post #57





Group: Members
Posts: 2224
Joined: 31-August 05
Member No.: 24222



The sighted test difference is not coming from the equipment, its origin is within the test subject. The results do not reveal anything about the equipment. Doing sighted tests just reinforce the individual's bias.

If the purpose of the test is to make the subject feel good about his preferences, then maybe the sighted test is useful: The person has just spent significant money. Buyer's remorse is starting to hit hard. Ahh, relief! The sighted test says he made the right choices after all.
Go to the top of the page
+Quote Post
Nick.C
post Oct 13 2012, 09:43
Post #58


lossyWAV Developer


Group: Developer
Posts: 1814
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



@googlebot: You are now allowing the results to be heavily skewed in favour of the equipment that the test subject "wants to be the best" (for whatever reason - cost, etc).

In your example it is no longer about whether any differences can be heard by the test subject in a objective test, rather whether the test subject states preference for the output of their preferred equipment in a blatantly subjective test.


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
greynol
post Oct 13 2012, 14:16
Post #59





Group: Super Moderator
Posts: 10339
Joined: 1-April 04
From: San Francisco
Member No.: 13167



QUOTE (AndyH-ha @ Oct 12 2012, 19:32) *
Buyer's remorse is starting to hit hard. Ahh, relief! The sighted test says he made the right choices after all.

...unless buyer's remorse has sunk in and the person is now biased against the new purchase.


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
greynol
post Oct 13 2012, 14:25
Post #60





Group: Super Moderator
Posts: 10339
Joined: 1-April 04
From: San Francisco
Member No.: 13167



QUOTE (hlloyge @ Oct 12 2012, 14:14) *
I am conducting ABX test for MYSELF.

Even still, a failed test only fails to demonstrate that an individual can distinguish a difference during that instance. Training and/or rest may affect the outcome of a future test, as examples.

This post has been edited by greynol: Oct 13 2012, 15:10


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
krabapple
post Oct 13 2012, 22:05
Post #61





Group: Members
Posts: 2518
Joined: 18-December 03
Member No.: 10538



QUOTE (googlebot @ Oct 12 2012, 16:44) *
QUOTE (krabapple @ Oct 12 2012, 07:59) *
Except, DBT doesn't do that.


Why do sighted test regularly lead to different results, then? Just calling it "bias, that should be eliminated" doesn't change the fact.


You wrote: Double-blind testing of isolated senses...

DBT does not necessarily 'isolate' any senses. Your can see, hear, taste, touch smell. All that has changed is what you *know*.


QUOTE
Imagine the following test setup: A test subject is presented music supposedly sourced from either a Sansa Clip or his favorite Burmester rack. You present an expensive looking switch to him, that's basically a dummy and that only inserts a small pause, but connects to the Clip at all times. Now imagine, you'd get a statistically significant result, that the subject rates the sound quality consistently higher, when he believes it to be coming from his Burmester rack / not coming from the Sansa Clip.


Now do a second test, this time double blind with both sources actually connected. Imagine the subject now fails to identify a difference.


What can we draw from this, especially when the subject was a honest type, sincerely motivated to rate the quality exactly as he perceived it in the first setup, without trying to prove or defying anything?



The first time he failed to identify that there was in fact no difference, and we can reasonably attribute that to sighted bias. The second time he may well have successfully identified that there was no difference, or may have failed to identify a real, but small, difference.

QUOTE
First, HA habit, the subject should stop claiming, that his Burmester setup sounds better than a Sansa Clip, as proven by the DBT. HA usually stops here.

But maybe one shouldn't. The belief, that sound was coming from a impressively crafted sound system, was able to significantly alter the subjects perception. In addition, the subjects usual mode of listening is reflected much better in the first setup than in the second (DBT).


This is no different from putting the same cheap wine in differently-priced bottles. Subjects often think the pricier wine tastes better. So, what does that tell us about the *wine*? What does your listeners *beliefs* about a piece of gear, tell us about the *gear*? What claims can reasonably be made about the relative performance of A and B?



Go to the top of the page
+Quote Post
hlloyge
post Oct 14 2012, 13:35
Post #62





Group: Members
Posts: 701
Joined: 10-January 06
From: Zagreb
Member No.: 27018



QUOTE (greynol @ Oct 13 2012, 15:25) *
QUOTE (hlloyge @ Oct 12 2012, 14:14) *
I am conducting ABX test for MYSELF.

Even still, a failed test only fails to demonstrate that an individual can distinguish a difference during that instance. Training and/or rest may affect the outcome of a future test, as examples.


Yes, but I am conducting the test at that one point of time. And the results are valid for that test.
Of course you should take what, 16 full turns? But they don't have to be the same day. Or week.
Go to the top of the page
+Quote Post
Porcus
post Oct 14 2012, 23:31
Post #63





Group: Members
Posts: 1995
Joined: 30-November 06
Member No.: 38207



QUOTE (hlloyge @ Oct 14 2012, 14:35) *
Yes, but I am conducting the test at that one point of time.


Strange to read your initial postings in this thread now after you have tried to downplay the applicability of the test to a one-time personal experience with a clear-cut full sensitivity/specificity.


--------------------
One day in the Year of the Fox came a time remembered well
Go to the top of the page
+Quote Post
2Bdecided
post Oct 15 2012, 12:20
Post #64


ReplayGain developer


Group: Developer
Posts: 5364
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



I think Googlebot is making a valid philosophical point. If the shiny thing sounds best to you (because of placebo), and you want the thing that sounds best to you, you can be very grateful to the shiny thing and placebo for delivering it.

Though you'd better not probe too deeply - because, while the only guaranteed way to completely remove placebo is to take away the knowledge of what you're listening to, you can certainly reduce placebo (or change the direction in which it operates) by introducing doubt as to whether something really does sound better.

This latter effect is probably the cause of the audiophile's never ending upgrade path.


The downsides to this are many. e.g.
1) your entire investment can be rendered worthless to you by anything that causes placebo to break down - that's a pretty risky investment.
2) if you had blind tested before purchase, you would probably have chosen the cheapest well-made thing that sounded as good as everything else - saving you money, and giving you your own unshakable placebo effect in enjoying that equipment - you see, ABX-lovers can enjoy their own placebo experience after having chosen the equipment. They've proven scientifically that it's as good as it needs to be, and then placebo can add to the subjective perception that it's as good as they could possibly perceive it to be.
3) some nice looking equipment sounds objectively awful, and doesn't work very well. While you might be able to convince yourself that it sounds wonderful, you'll still have the pain of unreliability, quickly/difficult operation, and the anxiety of damaging or wearing your music collection away every time you play an LP (if sighted testing led you to choose vinyl over CD).

However, sighted listening equipment purchasing is great for the economy (you just keep spending money), and it avoids time consuming things (e.g. proper listening tests), and difficult questions such as "how well can you hear anyway?"

Cheers,
David.

This post has been edited by 2Bdecided: Oct 15 2012, 12:22
Go to the top of the page
+Quote Post
pisymbol
post Oct 15 2012, 13:48
Post #65





Group: Members
Posts: 43
Joined: 22-March 09
Member No.: 68274



QUOTE (2Bdecided @ Oct 15 2012, 07:20) *
I think Googlebot is making a valid philosophical point. If the shiny thing sounds best to you (because of placebo), and you want the thing that sounds best to you, you can be very grateful to the shiny thing and placebo for delivering it.

Though you'd better not probe too deeply - because, while the only guaranteed way to completely remove placebo is to take away the knowledge of what you're listening to, you can certainly reduce placebo (or change the direction in which it operates) by introducing doubt as to whether something really does sound better.

This latter effect is probably the cause of the audiophile's never ending upgrade path.


The downsides to this are many. e.g.
1) your entire investment can be rendered worthless to you by anything that causes placebo to break down - that's a pretty risky investment.
2) if you had blind tested before purchase, you would probably have chosen the cheapest well-made thing that sounded as good as everything else - saving you money, and giving you your own unshakable placebo effect in enjoying that equipment - you see, ABX-lovers can enjoy their own placebo experience after having chosen the equipment. They've proven scientifically that it's as good as it needs to be, and then placebo can add to the subjective perception that it's as good as they could possibly perceive it to be.
3) some nice looking equipment sounds objectively awful, and doesn't work very well. While you might be able to convince yourself that it sounds wonderful, you'll still have the pain of unreliability, quickly/difficult operation, and the anxiety of damaging or wearing your music collection away every time you play an LP (if sighted testing led you to choose vinyl over CD).

However, sighted listening equipment purchasing is great for the economy (you just keep spending money), and it avoids time consuming things (e.g. proper listening tests), and difficult questions such as "how well can you hear anyway?"

Cheers,
David.


I believe as a friend pointed out you mean "expectation bias" not "placebo."

Your environment certainly plays a big role when testing gear.

I mean look, the DBT is CLEARLY pointless in Evan's Sound Room (the "couple" test is a better metric):

https://www.youtube.com/watch?v=ovr1TvQSQII

(safe for work)

This post has been edited by pisymbol: Oct 15 2012, 13:50
Go to the top of the page
+Quote Post
mzil
post Oct 15 2012, 17:14
Post #66





Group: Members
Posts: 735
Joined: 5-August 07
Member No.: 45913



QUOTE (krabapple @ Oct 11 2012, 13:24) *
QUOTE (skamp @ Oct 11 2012, 09:28) *
If ABXing negatively alters one's ability to hear differences, it's only a problem if you're using negative results to prove that there is no difference, which is a fallacy in any case: while a positive ABX result shows with a high degree of probability that there IS an audible difference, a negative result never proves anything.


Basically, what a 'negative' ABX results means is that the hypothesis 'there is an audible difference' was not supported, with a 'p' chance (typically 1 in 20) that an audible difference nevertheless exists .


This doesn't seem correct to me. What skamp said I think is accurate. One can apply the statistical analysis you mention to a test where the the test subject, the listener, showed a strong ability to differentiate between the two sources, however one can't apply the same statement if he or she only had *random* results.

Besides there not being any actual audible differences between the two sources to mortal ears, other possibilities for such random results might include:

A. The listener wasn't trying very hard or was sleepy/fatigued/ill etc.
B. The listener was mischievous and *intentionally* gave random results.
C. The test conditions, such as the resolution/accuracy of the loudspeakers used, weren't up to the task, that day, etc.

What's important to note is that these three possibilities, A, B, and C, are *precluded* when the listener successfully *does* differentiate between the two DUTs. That's why one can correctly apply the statistics to such an outcome, only. Sure, there's a one in twenty chance the listener's results were just dumb luck, however there's a 95% chance it was because they truly could hear a difference.

This post has been edited by mzil: Oct 15 2012, 17:30
Go to the top of the page
+Quote Post
krabapple
post Oct 15 2012, 20:51
Post #67





Group: Members
Posts: 2518
Joined: 18-December 03
Member No.: 10538



You're right that different terms apply when we are talking about rejecting vs accepting the *null* hypothesis (in this case, the 'no difference' hypothesis). Rejecting null H when it is true is a Type I error, accepting null H when it is false is a Type II.
When we get results we do statistics to calculate the probability that those results would have been obtained 'by chance'. This is the p value. We compare the p value to a pre-determined, more or less arbitrary (though traditions exist) maximum p value , usually 1 in 20 (0.05) , the alpha value. So if our p < alpha, we reject the null H ('null H not supported') , otherwise not .


Alpha and p are really values for the probability of making a Type I error -- alpha is the pre-set threshold for 'acceptable' chance of Type I error, p is the calculated value for the obtained results. If we get a p < alpha, then we say the chance that we made a Type I error, while by no means eliminated , is within our comfort zone.


It's true that my original post was really talking about a Type II error. But in either case we use statistics to call our results 'random' or not, so I don't see how you can say that statistics only work for 'positive' ABX results. Or maybe I'm just not understanding what you are getting at. I didn't disagree with what skamp wrote...at least, not intentionally!

This post has been edited by krabapple: Oct 15 2012, 21:07
Go to the top of the page
+Quote Post
skamp
post Oct 15 2012, 22:49
Post #68





Group: Developer
Posts: 1454
Joined: 4-May 04
From: France
Member No.: 13875



QUOTE (mzil @ Oct 15 2012, 18:14) *
B. The listener was mischievous and *intentionally* gave random results.


QUOTE (krabapple @ Oct 15 2012, 21:51) *
I don't see how you can say that statistics only work for 'positive' ABX results. Or maybe I'm just not understanding what you are getting at. I didn't disagree with what skamp wrote...at least, not intentionally!


What good are statistics if the listener acted in bad faith? If he decided to answer randomly, the result is meaningless. Whereas he could hardly act in bad faith in the other direction.


--------------------
See my profile for measurements, tools and recommendations.
Go to the top of the page
+Quote Post
greynol
post Oct 15 2012, 22:57
Post #69





Group: Super Moderator
Posts: 10339
Joined: 1-April 04
From: San Francisco
Member No.: 13167



There are ways of cheating to get positive ABX results such as altering the log or using some sort of workaround that un-blinds the test subjects.


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
krabapple
post Oct 16 2012, 03:58
Post #70





Group: Members
Posts: 2518
Joined: 18-December 03
Member No.: 10538



QUOTE (skamp @ Oct 15 2012, 17:49) *
What good are statistics if the listener acted in bad faith? If he decided to answer randomly, the result is meaningless. Whereas he could hardly act in bad faith in the other direction.


Sure he could, if he's determined, and the test isn't very carefully proctored. So once you assume 'bad faith' then the results either way are useless. All reported ABX results on HA could be considered invalid, if you assume that cheating was involved.

(Btw if the stats show that answers are *more* wrong than they should be by chance, that can also be a useful thing to know)

I've lost track, but why exactly are we going down this 'what if they're answering randomly on purpose' road? The supposed 'perception problem' is not one of bad faith.



Go to the top of the page
+Quote Post
mzil
post Oct 16 2012, 04:32
Post #71





Group: Members
Posts: 735
Joined: 5-August 07
Member No.: 45913



[Trying to bring this back on topic]

There is not a big distinction between consciously selecting random results (acting unethically/ in bad faith) and simply not trying very hard because one thinks, perhaps at least subconsciously, that A and B *should* sound alike so they don't bring their "A game" and simply "phone it in". That's another form of expectation bias and we don't have a good way to preclude it. This is why applying statistical analysis to such results seems unsettling to me. You never know for sure why the results are random.

Here's an example, for all: If asked to participate in a DBT of "the bass response of aftermarket power cords", all of adequate gauge thickness to conduct the current required by the CD player, how many of you would bow out on the grounds that you wouldn't be a good test subject because you find the premise laughable and you'd therefor be biased? If you were to participate, do you really, honestly think you'd be giving it your best effort possible and that there's no way your bias could be influencing your selections, at least at a subconscious level?

This post has been edited by mzil: Oct 16 2012, 04:34
Go to the top of the page
+Quote Post
Woodinville
post Oct 16 2012, 04:33
Post #72





Group: Members
Posts: 1414
Joined: 9-January 05
From: In the kitchen
Member No.: 18957



QUOTE (mzil @ Oct 15 2012, 20:32) *
[Trying to bring this back on topic]

There is not a big distinction between consciously selecting random results (acting unethically/ in bad faith) and simply not trying very hard because one thinks, perhaps at least subconsciously, that A and B *should* sound alike so they don't bring their "A game" and simply "phone it in". That's another form of expectation bias and we don't have a good way to preclude it. This is why applying statistical analysis to such results seems unsettling to me. You never know for sure why the results are random.


Baloney, that's what positive controls are for. You did build both negative and positive controls into your test, right?


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
mzil
post Oct 16 2012, 04:39
Post #73





Group: Members
Posts: 735
Joined: 5-August 07
Member No.: 45913



Please enlighten me. I am not a scientist nor have I had any training or study in this area. What positive and negative controls would one use to avoid this particular problem of bias I just mentioned? Please be specific, thanks.

This post has been edited by mzil: Oct 16 2012, 04:46
Go to the top of the page
+Quote Post
Porcus
post Oct 16 2012, 06:19
Post #74





Group: Members
Posts: 1995
Joined: 30-November 06
Member No.: 38207



QUOTE (mzil @ Oct 16 2012, 05:32) *
Here's an example, for all: If asked to participate in a DBT of "the bass response of aftermarket power cords", all of adequate gauge thickness to conduct the current required by the CD player, how many of you would bow out on the grounds that you wouldn't be a good test subject because you find the premise laughable and you'd therefor be biased?


Sure. Turning to medicine: what if we were to test the effect of homeopathy? I would have a fairly negative expectation bias, especially if I was told it was actually done the homeopathically “proper” way. This and the particular case you mention, could have been mitigated by not telling what specifically were tested. I assume you really shouldn't.

And if you have anything such at hand, introducing a third thingy with known effect could help the analysis. I.e., if you have A, B and C where the difference between A and C is well-established and quantified, and the listeners are biased as you describe (or merely not sufficiently randomly drawn – in practice you would have to deal with self-selection) then you might check if they can distinguish A and C better or worse than “the known average”. That could have been done in the homeopathy case as well. Problem is, it only tells you that you have no test.


--------------------
One day in the Year of the Fox came a time remembered well
Go to the top of the page
+Quote Post
hlloyge
post Oct 16 2012, 09:54
Post #75





Group: Members
Posts: 701
Joined: 10-January 06
From: Zagreb
Member No.: 27018



QUOTE (Porcus @ Oct 15 2012, 00:31) *
QUOTE (hlloyge @ Oct 14 2012, 14:35) *
Yes, but I am conducting the test at that one point of time.

Strange to read your initial postings in this thread now after you have tried to downplay the applicability of the test to a one-time personal experience with a clear-cut full sensitivity/specificity.


Downplay?
If I am conductig ABX test of a codec settings when I think I hear difference, how is simple encoding wav file, loading it into foobar and running ABX comparator not valid?
Where does the percepcion comes into? I am listening music either on speakers or headphones (mostly headphones) - speakers being Brand A and headphones Brand B - where does exactly my (mis)perception kicks in? I am sorry, but your theory isn't very explainable - tell me where is, in my case, ABX test failing?
Or, for that matter, to anyone doing the same exact test? Or technically properly designed DAC/speaker test with ABX switchbox?
And correct me if I am wrong, but ABX test is primarily personal experience from which the results can be collected and statistically processed. The more personal results, the more accurate statistics.
Go to the top of the page
+Quote Post

6 Pages V  < 1 2 3 4 5 > » 
Reply to this topicStart new topic
2 User(s) are reading this topic (2 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 23rd December 2014 - 05:43