IPB

Welcome Guest ( Log In | Register )

Should HA promote a more rigorous listening test protocol?, was: "HA -- guilty as charged?" (TOS #6)
krabapple
post Nov 23 2012, 19:01
Post #1





Group: Members
Posts: 2328
Joined: 18-December 03
Member No.: 10538



I was taken aback to read today this exchange on gearslutz, from earlier this year

QUOTE ("Bob Ohlsson")
It's important to understand that what JJ considers a listening test and what the ABX/Hydogen Audio skeptics crowd considers a listening test are two very very different things.


QUOTE (Kees de Visser")
Perhaps JJ can explain what he considers a listening test and how it's different from the Hydrogenaudio standpoint.
I was somehow under the impression they were not that different.


QUOTE ("j_j")
Including positive and negative controls, lots of training for the test as well as familiarity with the equipment and music, and equiment validation are the biggies.

Test evaluation might be an issue, too. Many tests, including some of the MPEG tests and 1116 make assumptions that the entire population reacts the same to impairments. While basic masking is universal, what people dislike when they can hear something is NOT universal.



http://www.gearslutz.com/board/7672621-post329.html
http://www.gearslutz.com/board/7674886-post337.html
http://www.gearslutz.com/board/7677113-post348.html


Now, I agree with Kees -- I don't think the HA community 'take' on listening tests is that different from what JJ mentions. Few here, I suspect, would dismiss the real utility of training , or of positive controls, or familiarity etc., in making a listening test maximally sensitive. (as for the rest, I confess I;m not really clear whether JJ's criciticsm of test evaluation is directed at HA)

What I think is happening is a difference in what listening tests are used for. Most individual HA reports of ABX tests are from users wanting to know if file X sounds different from file Y to them, as they are now, using the equipment they have, not as they would be after training to hear artifacts, on the most revealing equipment. They aren't doing basic research into a difference's audibility, as JJ did, for example, when developing lossy codecs. For that purpose, trained listeners, positive & negative controls, familiarity and 'validated' equipment are necessities.

Still, HA *does* host mass listening tests from time to time -- which are more akin to 'basic research' -- and its few 'official' guidelines on setting up listening tests -- the HA wiki, and Pio's sticky threads -- make no mention of training, +/- controls, etc. as factors in such tests.

Time to change this?

This post has been edited by krabapple: Nov 23 2012, 19:05
Go to the top of the page
+Quote Post
 
Start new topic
Replies
greynol
post Nov 28 2012, 16:51
Post #2





Group: Super Moderator
Posts: 10042
Joined: 1-April 04
From: San Francisco
Member No.: 13167



If the contenders are statistically tied, changing the anchors isn't going to magically untie them. Also, having only a few listeners and a few samples doesn't make for very compelling results, especially when the listeners are untrained.

Unlike ABX, where you rely on continued trials to demonstrate that you can consistently distinguish between two things, MUSHRA tests rely on many samples and well-chosen controls to help weed out bad data. When working with contenders that are near-transparent, a hidden reference makes sense, otherwise it is a poor control that is too easy to identify. Same goes for low anchors if they are too low.

When the anchors are too close, low anchors may get ranked better than the contenders. High anchors may get ranked worse. This is not exactly unreasonable. What needs to be taken seriously is that judging is subjective; not everyone ranks different artifacts the same way. It could be that the low anchors actually do sound better or the high anchor actually does sound worse. It is also not unreasonable to get differing rankings between all stimuli based on the specific clips being auditioned. What may be unreasonable is to dismiss discrepancies like these from the "expected" results as "wrong".

With this in mind, I only take seriously the clear trends in very large tests (many participants and many worthwhile, typical real-life sample clips). I somewhat reject the idea that all participants must be trained when there are large numbers of them, however. While the testers should be able to distinguish and categorize them, they should not be steered into thinking one is less desirable than the other.

Lastly, all too often people treat the results of small tests posted here as definitive. They really aren't.

This post has been edited by greynol: Nov 28 2012, 18:34


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
Woodinville
post Nov 28 2012, 18:32
Post #3





Group: Members
Posts: 1402
Joined: 9-January 05
From: JJ's office.
Member No.: 18957



QUOTE (greynol @ Nov 28 2012, 07:51) *
If the contenders are statistically tied, changing the anchors isn't going to magically untie them. Also, having only a few listeners and a few samples doesn't make for very compelling results, especially when the listeners are untrained.

Actually, having too low an anchor can make things tie by changing the listeners' scaling of the test results.
QUOTE
Unlike ABX, where you rely on continued trials to demonstrate that you can consistently distinguish between two things, MUSHRA tests rely on many samples and well-chosen controls to help weed out bad data. When working with contenders that are near-transparent, a hidden reference makes sense, otherwise it is a poor control that is too easy to identify. Same goes for low anchors if they are too low.

Please don't use that test for near-transparent codecs. It's not appropriate. ABX or ABC/hr are appropriate. But you still need both negative and positive controls.
QUOTE
Not everyone ranks different artifacts the same way.


That is part of my problem with tests that compare many different codecs simultaneously along only one axis (scale). But it's only part of the problem. There are many others.


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post

Posts in this topic
- krabapple   Should HA promote a more rigorous listening test protocol?   Nov 23 2012, 19:01
- - saratoga   Lots of the personal listening tests are by people...   Nov 23 2012, 19:16
|- - krabapple   QUOTE (saratoga @ Nov 23 2012, 13:16) Lot...   Nov 24 2012, 04:10
- - greynol   Pio's post does make mention of relegating ABX...   Nov 23 2012, 19:24
|- - krabapple   QUOTE (greynol @ Nov 23 2012, 13:24) Pio...   Nov 24 2012, 04:02
- - Canar   With all due respect to Mr. J., while his criticis...   Nov 23 2012, 23:16
|- - krabapple   QUOTE (Canar @ Nov 23 2012, 17:16) With a...   Nov 24 2012, 04:09
|- - greynol   QUOTE (krabapple @ Nov 23 2012, 19:09) Wo...   Nov 24 2012, 17:36
|- - krabapple   QUOTE (greynol @ Nov 24 2012, 11:36) QUOT...   Nov 25 2012, 16:34
|- - greynol   QUOTE (krabapple @ Nov 25 2012, 07:34) QU...   Nov 25 2012, 18:17
- - Canar   Honestly, I think our procedure is fine, given wha...   Nov 24 2012, 04:39
|- - krabapple   QUOTE (Canar @ Nov 23 2012, 22:39) Honest...   Nov 24 2012, 14:07
- - greynol   My concern about people coming here to argue that ...   Nov 24 2012, 05:00
- - Axon   There's a tradeoff going on here. One the one...   Nov 25 2012, 08:04
- - Woodinville   Ok, I'm a little confused here. How does what ...   Nov 25 2012, 09:20
|- - greynol   QUOTE (Woodinville @ Nov 25 2012, 00:20) ...   Nov 25 2012, 17:31
- - Porcus   I agree with Axon, if that is what is being discus...   Nov 26 2012, 08:25
- - 2Bdecided   Do that many tests meet BS.1116? It's a long t...   Nov 26 2012, 13:58
- - dhromed   I am frankly surprised that there is no sticky at ...   Nov 26 2012, 14:22
- - IgorC   Great. A lot of problem statements. Now people can...   Nov 26 2012, 18:14
|- - Woodinville   QUOTE (IgorC @ Nov 26 2012, 09:14) Sorry,...   Nov 27 2012, 02:27
|- - krabapple   QUOTE (Woodinville @ Nov 26 2012, 20:27) ...   Nov 27 2012, 15:31
|- - IgorC   QUOTE (Woodinville @ Nov 26 2012, 22:27) ...   Nov 27 2012, 17:43
|- - Porcus   QUOTE (IgorC @ Nov 27 2012, 17:43) You ju...   Nov 27 2012, 18:12
|- - Woodinville   QUOTE (Porcus @ Nov 27 2012, 09:12) Also,...   Nov 27 2012, 23:05
|- - IgorC   QUOTE (Porcus @ Nov 27 2012, 14:12) If an...   Nov 28 2012, 02:12
- - greynol   Krabapple, the author of this discussion, did in f...   Nov 26 2012, 18:30
- - Canar   With the talk about "including positive and n...   Nov 26 2012, 18:38
|- - Woodinville   QUOTE (Canar @ Nov 26 2012, 09:38) With t...   Nov 27 2012, 02:32
|- - Dynamic   QUOTE (Woodinville @ Nov 27 2012, 01:32) ...   Nov 27 2012, 15:05
|- - krabapple   QUOTE (Dynamic @ Nov 27 2012, 09:05) I th...   Nov 27 2012, 15:40
|- - Woodinville   QUOTE (Dynamic @ Nov 27 2012, 06:05) We u...   Nov 27 2012, 23:03
- - Canar   There's a concept that might be useful: ...   Nov 27 2012, 20:21
- - IgorC   Let's suppose two separate tests and 3 codecs:...   Nov 28 2012, 01:17
|- - Woodinville   QUOTE (IgorC @ Nov 27 2012, 16:17) Let...   Nov 28 2012, 04:03
- - IgorC   Indeed it's a different one. I took just one ...   Nov 28 2012, 05:00
|- - Woodinville   QUOTE (IgorC @ Nov 27 2012, 20:00) Do You...   Nov 28 2012, 06:40
- - IgorC   Got it. The idea of positive and negative control...   Nov 28 2012, 08:03
- - greynol   Not really JJ's technique, but that which is c...   Nov 28 2012, 08:15
- - 2Bdecided   I agree that using controls is necessary in a prop...   Nov 28 2012, 12:05
|- - IgorC   QUOTE (2Bdecided @ Nov 28 2012, 08:05) e....   Nov 28 2012, 17:41
|- - Woodinville   QUOTE (IgorC @ Nov 28 2012, 08:41) QUOTE ...   Nov 28 2012, 18:35
- - Dynamic   Good point, David. I guess a rough and ready pre-...   Nov 28 2012, 15:54
- - greynol   If the contenders are statistically tied, changing...   Nov 28 2012, 16:51
- - Woodinville   QUOTE (greynol @ Nov 28 2012, 07:51) If t...   Nov 28 2012, 18:32
|- - greynol   QUOTE (Woodinville @ Nov 28 2012, 09:32) ...   Nov 28 2012, 19:00
|- - Woodinville   QUOTE (greynol @ Nov 28 2012, 10:00) QUOT...   Nov 28 2012, 19:15
- - Dynamic   QUOTE (greynol @ Nov 28 2012, 15:51) If t...   Nov 28 2012, 20:08
- - IgorC   QUOTE (Dynamic @ Nov 28 2012, 16:08) My p...   Nov 29 2012, 01:04


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 23rd October 2014 - 21:04