IPB

Welcome Guest ( Log In | Register )

2 Pages V  < 1 2  
Reply to this topicStart new topic
Why Live-vs-Recorded Listening Tests Don't Work
kdo
post Jul 13 2010, 01:02
Post #26





Group: Members (Donating)
Posts: 304
Joined: 18-April 02
From: Russia
Member No.: 1812



QUOTE (solive @ Jul 13 2010, 01:43) *
The participants in a listening test are normally the listeners - not the singers/performers, who act as one of the stimuli.

It is pure semantics. And terminology differences.

In ABX and in other kinds of small impairment tests, yes, the listeners are usually called 'participants' and recorded/live sounds are the 'stimuli'.

RCT is a different kind of test, and in RCT test things would be labeled differently.
The performers would be called test subjects (participants).
The fact of being recorded would be called the 'treatment' or 'intervention'.
And the listeners would evaluate the effect of the 'treatment' or 'no treatment'.


QUOTE (solive @ Jul 13 2010, 01:43) *
It is the listeners who decide whether or not the reproduction of the recording is similar to the live performance -- not the performers.

Yes. This is quite obvious. So?

And no, I have not misunderstood the intent of the live-vs-recorded test. laugh.gif

Go to the top of the page
+Quote Post
solive
post Jul 13 2010, 04:09
Post #27





Group: Members
Posts: 162
Joined: 21-February 04
From: Los Angeles
Member No.: 12173



QUOTE (kdo @ Jul 12 2010, 17:02) *
QUOTE (solive @ Jul 13 2010, 01:43) *
The participants in a listening test are normally the listeners - not the singers/performers, who act as one of the stimuli.

It is pure semantics. And terminology differences.

In ABX and in other kinds of small impairment tests, yes, the listeners are usually called 'participants' and recorded/live sounds are the 'stimuli'.

RCT is a different kind of test, and in RCT test things would be labeled differently.
The performers would be called test subjects (participants).
The fact of being recorded would be called the 'treatment' or 'intervention'.
And the listeners would evaluate the effect of the 'treatment' or 'no treatment'.


QUOTE (solive @ Jul 13 2010, 01:43) *
It is the listeners who decide whether or not the reproduction of the recording is similar to the live performance -- not the performers.

Yes. This is quite obvious. So?

And no, I have not misunderstood the intent of the live-vs-recorded test. laugh.gif


Sorry, then I guess I misunderstood you. Let me reread what you've written.

This post has been edited by solive: Jul 13 2010, 04:10


--------------------
Sean Olive
[url="http://seanolive.com"]Audio Musings[/url]
Go to the top of the page
+Quote Post
solive
post Jul 13 2010, 04:53
Post #28





Group: Members
Posts: 162
Joined: 21-February 04
From: Los Angeles
Member No.: 12173



QUOTE (kdo @ Jul 11 2010, 08:25) *
QUOTE (kdo @ Jul 10 2010, 17:53) *
Would it be possible to design a valid test with the opposite approach? That is, instead of trying to reproduce a single performance identically, could we use various different performances and recordings every time?


A quick follow-up on my first question.

I did some googling and found that I was actually thinking of a kind of test called "Randomized controlled trial" (RCT).
The "explanatory" type of RCT with "parallel-group" design and "allocation concealment", in particular. The goal of such RCT is to test the 'efficacy' of a treatment or medicine given to a group of patients.

So, here goes my analogy:
* 'efficacy' = ability to create illusion of a live performance.
* Participants (patients) are the singers/performers.
* Half of the participants are allocated to receive the 'treatment' (record and playback via loudspeakers),
and the other half is allocated to receive no 'treatment' (perform live).

The problem, I guess, is that it might be not quite properly triple-blind, since our 'participants' (singers) know which 'treatment' they are receiving, obviously.

But maybe this bias could be eliminated, too: let's make all singers perform live, but let some of them ("control group") perform before dummy-listeners in one room, and the other group perform before the actual audience in another room. Or something like that. Mix them and confuse them. smile.gif
Then the singers wouldn't know which of their performances would actually count, and the test becomes fully triple-blind, I hope. rolleyes.gif

So, on the surface it seems that it should be possible to eliminate the need for identical stimuli in a live-vs-recorded test - by using RCT method.

Any thoughts? Anybody? unsure.gif


OK. First the Randomized Control Trial method you refer to is designed to control "selection bias" by randomly assigning different treatments to subjects. In medical/drug studies they would give a different drug or dosage to a different subject. What this means is you need a lot more subjects which makes the study expensive.

In audio, we typically use a repeated measures design where each subject evaluates all of the available treatments (e.g. different loudspeakers) -- not just one treatment This has main advantage here is it reduces number of subjects required to estimate the variance or effect on the subject due the treatment. So, it has great benefits in efficiency over what I think you are proposing. A team of 15 trained listeners is the typically the statistical equivalent of 100-200 untrained listeners because the latter listeners are less reliable and discriminating in their judgements. If you are proposing assigning a single treatment (known as a single stimulus test) to different groups of listeners, good luck getting a meaningful result.

If I understand you correctly (and please correct me if I am wrong here) you are proposing having one group of listeners evaluate the recording/reproduction of live performance on an accuracy scale. The other group of listeners would evaluate the live performance on the same scale. By definition the live performance is a 10/10 on the accuracy scale (notwithstanding Taylor's Swift's wrong notes). The other group have no idea of how accurate the recording is w/o having heard the live performance, so they would be guessing based on some internal reference of what they consider to be fidelity or accurate. However, if the recording/reproduction group gave every recording a 10/10, would you accept that as proof that the recording is 100% accurate?

I don't think live-versus-recorded apologists would accept that as a valid result or conclusion because their fundamental argument is that accuracy in sound reproduction can only be measured against a reference, which for them, is the live performance.

This post has been edited by solive: Jul 13 2010, 04:58


--------------------
Sean Olive
[url="http://seanolive.com"]Audio Musings[/url]
Go to the top of the page
+Quote Post
Arnold B. Kruege...
post Jul 13 2010, 17:25
Post #29





Group: Members
Posts: 3700
Joined: 29-October 08
From: USA, 48236
Member No.: 61311



QUOTE (solive @ Jul 12 2010, 01:46) *
1) I think a basic tenet of a good scientific experiment is that it is repeatable. So using humans musicians as sound sources is going to cause a lot of errors, and biases if the live performance doesn't perfectly match the recorded one. If you can devise a way to compare the live performance (via live mic feeds w. no delay) and compare that double-blind to the performance that would eliminate some of the errors.


The non-repeatability of live performances was illustrated to me by the following experience:

Some years back some studio techs prepared and sold sets of CDs that were designed to illustrate the characteristic colorations of microphones and mic preamps. I invensted in a set.

I decided to see what would happen if I tried to ABX them. In the process of preparing the samples for ABXing, I found that hte purportedly identical musical samples that were supposed to differ only in terms of equipment used were different in fairly gross ways. The musical samples had different lengths if you trimmed them to be musically alike. Their average levels varied by more than enough to be audible. Once those basic issues were dealt with, there were still clearly audible differences in timing, inflection and intonation that were clearly audible. I never had any trouble ABXing them and obtaining perfect or nearly perfect scores in short order based on just the misical differences.

The second issue is that the musical reproduction chain can be broken down into three general areas being microphones and microphone technique, audio signal stoarge and production, and speakers and room acoustics. By various means we can show that signal storage and production can be sonically transparent. It is well known that neither of the other two areas of music reproduction have attained that level of refinement.
Go to the top of the page
+Quote Post
zane9
post Jul 13 2010, 18:50
Post #30





Group: Members
Posts: 13
Joined: 19-May 09
Member No.: 69939



QUOTE (Arnold B. Krueger @ Jul 13 2010, 11:25) *
...The second issue is that the musical reproduction chain can be broken down into three general areas being microphones and microphone technique, audio signal stoarge and production, and speakers and room acoustics. By various means we can show that signal storage and production can be sonically transparent. It is well known that neither of the other two areas of music reproduction have attained that level of refinement.


Not withstanding the deficiences of the other two areas mentioned by Arnold, my default preference is listening to a recording of unamplified music in a non-studio setting, done with a pair of microphones and no mix.

Not so easy to find these recordings, these days.

Go to the top of the page
+Quote Post
kdo
post Jul 13 2010, 20:10
Post #31





Group: Members (Donating)
Posts: 304
Joined: 18-April 02
From: Russia
Member No.: 1812



QUOTE (solive @ Jul 13 2010, 05:09) *
Sorry, then I guess I misunderstood you.

That's alright, no problem. I know I'm probably not being all too clear. It's almost like I'm thinking out loud. Trying to figure out what can we do about these 'live-vs-recorded' tests. And I don't have a clear picture yet.


QUOTE (solive @ Jul 13 2010, 05:53) *
OK. First the Randomized Control Trial method you refer to is designed to control "selection bias" by randomly assigning different treatments to subjects. In medical/drug studies they would give a different drug or dosage to a different subject. What this means is you need a lot more subjects which makes the study expensive.

Well, expensive and 'hard-to-do' study is still a lot easier than an 'impossible-to-do'. smile.gif

I'm no expert in statistics, so someone will have to work out all those probabilities, statistical power, sample size, etc etc.
In a quick google search I found some articles and tables, where the recommended sample size (number of test subjects) in RCT seems to vary anywhere between 50 and 200, sometimes much more, sometimes less.
I suppose that to test for a marginal effect we'd need a very large sample size, and to test for 'night-and-day' effect a smaller sample size would do.

And one more thought:
In ABX we are testing just one performer, but we need at least 10 trials, 3 performances per trial (A,B,X), so at least 30 performances in total.
In a RCT test we'd need, say, 50 or 100 test subjects (performers), but we'd only use just 1 performance of each subject, so it's 50 to 100 performances in total.

So, okay, the bad news is that we'd have to invite a lot more performers for RCT than for ABX, but the work load on the listeners (number of performances to evaluate) is not all that different - that's a good news.


QUOTE (solive @ Jul 13 2010, 05:53) *
In audio, we typically use a repeated measures design where each subject evaluates all of the available treatments (e.g. different loudspeakers) -- not just one treatment This has main advantage here is it reduces number of subjects required to estimate the variance or effect on the subject due the treatment. So, it has great benefits in efficiency over what I think you are proposing. A team of 15 trained listeners is the typically the statistical equivalent of 100-200 untrained listeners because the latter listeners are less reliable and discriminating in their judgements. If you are proposing assigning a single treatment (known as a single stimulus test) to different groups of listeners, good luck getting a meaningful result.

If I understand you correctly (and please correct me if I am wrong here) you are proposing having one group of listeners evaluate the recording/reproduction of live performance on an accuracy scale. The other group of listeners would evaluate the live performance on the same scale.

I see we are not quite on the same wavelength yet. I'll try to explain a bit more what I'm proposing.

We should have only one group of listeners. Could be just one sole listener. Or a small group of trained listeners. These listeners will evaluate a randomized sequence of reproductions/live performances. No communication between listeners, strictly individual evaluations.

(The reason I mentioned "dummy-listners" earlier is that, perhaps, it could be part of the plot to eliminate performer's bias. But that's a technicality.)

It is the performers (our 'test subjects') who will be split in 2 parallel groups. One group of performers will be recorded and the listeners would be exposed only to reproductions of their recordings. The other group of performers would only perform live.

Thus, the listeners would hear each performer's sound only once (either recorded or live).

The objective of the listener is to guess which is which, live or reproduction.
No scale, no accuracy grade. Just simple "yes/no" evaluation.

Just like the imfamous "An artist or an ape?" quiz. smile.gif

QUOTE (solive @ Jul 13 2010, 05:53) *
(The listeners) have no idea of how accurate the recording is w/o having heard the live performance, so they would be guessing based on some internal reference of what they consider to be fidelity or accurate.

Yes, exactly.

QUOTE (solive @ Jul 13 2010, 05:53) *
However, if the recording/reproduction group gave every recording a 10/10, would you accept that as proof that the recording is 100% accurate?

Strictly speaking, it cannot be 'proof' (just like a failed ABX is not a 'proof'), because it might be false negative. Then, by the same logic as in ABX, we can say that it would be a strong indication that these reproductions (taken collectively) provide a realistic illusion of being in the presence of a live performance.

It doesn't proof anything w.r.t. whether any reproduction was a true accurate copy of the original performance.

But the opposite result (when listeners successfully guess which one is recorded and which is live) - would be proof that our recording-reproduction system is not at all able to create illusion of a live performance. Then our system is a piece of junk. No need for testing its accuracy any further.


I already tried to explain this in one of the previous posts (See above, in Post #24, right after the words "I must emphasize one important point")

So, okay, we cannot grade accuracy of the reproducation on a fine scale. The need for identical live performances being one of the major problems.
And then I ask, if we cannot do ABX, what is the next best thing we could do?
Let's try and evaluate reproductions using coarse scale, the simplest scale (0/1, yes/no).

I realize that it may be far from what you are interested in, in your research of loudspeakers etc., but I think it would still be an interesting test. As a consumer I would be interested to know: is my system really 'good enough', is it able to create a believable illusion of a live performance, or is it all marketing, hype and self-suggestion.


QUOTE (solive @ Jul 13 2010, 05:53) *
I don't think live-versus-recorded apologists would accept that as a valid result or conclusion because their fundamental argument is that accuracy in sound reproduction can only be measured against a reference, which for them, is the live performance.

I'm guessing there must be some sort of eternal debate with "live-versus-recorded apologists" somewhere. I guess I totally missed out on that one. Oh, well... smile.gif


/EDIT: by the way, I can see now that it may be difficult to control false negatives in the test I'm proposing.
Well, there must be some standard techniques how to minimize the damage. Hopefully...

This post has been edited by kdo: Jul 13 2010, 20:31
Go to the top of the page
+Quote Post
Arnold B. Kruege...
post Jul 13 2010, 20:21
Post #32





Group: Members
Posts: 3700
Joined: 29-October 08
From: USA, 48236
Member No.: 61311



QUOTE (zane9 @ Jul 13 2010, 13:50) *
QUOTE (Arnold B. Krueger @ Jul 13 2010, 11:25) *
...The second issue is that the musical reproduction chain can be broken down into three general areas being microphones and microphone technique, audio signal stoarge and production, and speakers and room acoustics. By various means we can show that signal storage and production can be sonically transparent. It is well known that neither of the other two areas of music reproduction have attained that level of refinement.


Not withstanding the deficiences of the other two areas mentioned by Arnold, my default preference is listening to a recording of unamplified music in a non-studio setting, done with a pair of microphones and no mix.


I've made in excess of 500 recordings of live, unamplified music using 2 microphones chosen for flat on-axis response, no equalization or other processsing, for hire, in just the past 5 years.

Changing just the position and orientation of the 2 microphones, I can adjust the timbre and soundstanging of the recording over a fairly wide range. My choices are informed by the desires of the clients, who are professional musicans, mostly high school and college educators. I'm usually tryng to duplicqte the sound in a particular range of locations in the auditorium.

In no case would I consider the resulting recordings to be "sonically accurate" in the sense that they would frustrate or even challenge attempts at identification in an ABX test. I don't think they would be very hard to differentiate from live sound in a live versus recorded comparison.

I've also made a goodly number of multitrack recordings using close micing, distant micing and even the capture of raw electical signals from amplified electronic instruments. I think that most people would find carefully mixed recordings made this way to *not* be obviously less "lifielike" than the ones made using minimal micing and no mixing or other processing.

In terms of recreation of lifelike sound, the prcedure that seems to get the closest is IME close-micing and loudspeaker reproduction in the same room, given that the room is extremely reverberant.
Go to the top of the page
+Quote Post
Arnold B. Kruege...
post Jul 13 2010, 20:26
Post #33





Group: Members
Posts: 3700
Joined: 29-October 08
From: USA, 48236
Member No.: 61311



QUOTE (solive @ Jul 12 2010, 19:43) *
It is the listeners who decide whether or not the reproduction of the recording is similar to the live performance -- not the performers. I think you have misunderstood the original intent of the live-vs-recorded test.


An important point. While some performers have some sense of what their music sounds like, the audience knows far better what their music sounds like to the audience. That only makes common sense.

There's a reason why the preferred location for the mixer of a live performance is generally near the middle of the audience, and not the middle of the performers! ;-)
Go to the top of the page
+Quote Post
kdo
post Jul 14 2010, 02:19
Post #34





Group: Members (Donating)
Posts: 304
Joined: 18-April 02
From: Russia
Member No.: 1812



I sense a big fat TOS-8 violation right here:
QUOTE (Arnold B. Krueger @ Jul 13 2010, 21:21) *
In terms of recreation of lifelike sound, the prcedure that seems to get the closest is IME close-micing and loudspeaker reproduction in the same room, given that the room is extremely reverberant.

Can you back up that assertion by a rigorous 'live-vs-recorded' test with statistically significant results?


Hint: this is exactly why I believe that 'live-vs-recorded' tests are necessary. To validate any such claims about "lifelike sound".

Go to the top of the page
+Quote Post
Ed Seedhouse
post Jul 14 2010, 02:35
Post #35





Group: Members
Posts: 156
Joined: 19-May 09
Member No.: 69959



One form of "live vs. reccorded" test with less problems than others might be to have a high quality speaker recorded playing music in an anechoic chamber, then seeing how much another speaker would sound like it when they both play in the same room.

So we record speaker A playing musical recordings in an anechoic chamber. Then we listen in a room to speaker "B" playing the recording made in the chamber and comparing it with speaker A playing the original recordens. Assuming speaker A has some colorations, how well would speaker B do in playing these colorations accurately?

We could even use speaker A as it's own comparitor. If it could play it's own self back unchanged vs. the original recordings then we would know it was certainly highly accurate. In fact I doubt if any actual production speaker could pass that test, but it would be great to be proven wrong.

This would at least be much easier to arrange than with live musicians.


--------------------
Ed Seedhouse
Go to the top of the page
+Quote Post
kdo
post Jul 14 2010, 02:57
Post #36





Group: Members (Donating)
Posts: 304
Joined: 18-April 02
From: Russia
Member No.: 1812



QUOTE (Ed Seedhouse @ Jul 14 2010, 03:35) *
One form of "live vs. reccorded" test with less problems than others might be to have a high quality speaker recorded playing music in an anechoic chamber, then seeing how much another speaker would sound like it when they both play in the same room.

And how do we know whether our high quality speaker is able to produce lifelike sound to start with?
Can we escape this catch 22 somehow?


Personally I have nothing against this type of testing (comparison with a reference speaker).

But I think it rather falls into category of 'recorded-vs-recorded'.

Plus there is more to the "lifelike sound" than a good loudspeaker -
various microphones and recording techniques play as big a role as loudspeakers.

This post has been edited by kdo: Jul 14 2010, 02:59
Go to the top of the page
+Quote Post
Arnold B. Kruege...
post Jul 14 2010, 03:12
Post #37





Group: Members
Posts: 3700
Joined: 29-October 08
From: USA, 48236
Member No.: 61311



QUOTE (kdo @ Jul 13 2010, 21:19) *
I sense a big fat TOS-8 violation right here:
QUOTE (Arnold B. Krueger @ Jul 13 2010, 21:21) *
In terms of recreation of lifelike sound, the prcedure that seems to get the closest is IME close-micing and loudspeaker reproduction in the same room, given that the room is extremely reverberant.

Can you back up that assertion by a rigorous 'live-vs-recorded' test with statistically significant results?


I think you aren't getting the point of the qualifier "that seems". If I was trying to be rigorous, I would have said "that is".

Please notice my recent comments about the non-recreatability of live music events using real world musicans.
Go to the top of the page
+Quote Post
kdo
post Jul 14 2010, 03:30
Post #38





Group: Members (Donating)
Posts: 304
Joined: 18-April 02
From: Russia
Member No.: 1812



QUOTE (Arnold B. Krueger @ Jul 14 2010, 04:12) *
Please notice my recent comments about the non-recreatability of live music events using real world musicans.

Please notice my recent comments that recreatability of live music events, it seems, is not required if all we want is to verify claims of "lifelike sound" using RCT methods.
Go to the top of the page
+Quote Post
Arnold B. Kruege...
post Jul 14 2010, 10:42
Post #39





Group: Members
Posts: 3700
Joined: 29-October 08
From: USA, 48236
Member No.: 61311



QUOTE (kdo @ Jul 13 2010, 22:30) *
QUOTE (Arnold B. Krueger @ Jul 14 2010, 04:12) *
Please notice my recent comments about the non-recreatability of live music events using real world musicans.

Please notice my recent comments that recreatability of live music events, it seems, is not required if all we want is to verify claims of "lifelike sound" using RCT methods.


I don't see where Randomized Controlled Trials do anything but add complexity.

Please show otherwise, if you can.

The non-repeatability of live music events means that there is no possibility of using the same stimulus. It's like a drug trial where every drug can be tried only once for all time.
Go to the top of the page
+Quote Post
kdo
post Jul 14 2010, 16:56
Post #40





Group: Members (Donating)
Posts: 304
Joined: 18-April 02
From: Russia
Member No.: 1812



QUOTE (Arnold B. Krueger @ Jul 14 2010, 11:42) *
I don't see where Randomized Controlled Trials do anything but add complexity.

Please show otherwise, if you can.

Did you read my previous posts? Especially post #31.

Could you please be a bit more specific, what is not clear in my reasoning?
I'd be glad to try and rephrase and/or add more explanations.

And please keep in mind that I don't have all the answers. That is why I brought up the question here. I was hoping that our audio experts might be able to help.


QUOTE (Arnold B. Krueger @ Jul 14 2010, 11:42) *
The non-repeatability of live music events means that there is no possibility of using the same stimulus.

Yes, correct.

QUOTE (Arnold B. Krueger @ Jul 14 2010, 11:42) *
It's like a drug trial where every drug can be tried only once for all time.

This analogy is true in the "Repeated Measures Design" category of tests, which are typically used in audio. Sean Olive noted it in one of his comments above.

However, RCT is a whole different category. Different approach. Unconventional to audio. So I'm trying to think outside-the-box. Please bear with me for a moment.


My understanding is that in the RCT framework the analogy should go like this:

Performer is the 'patient' (test subject).
Or better yet, not performer himself, but the live sound of the performer is the 'patient' (test subject).

The process of recording/playback is the 'drug' ('treatment', 'intervention') that affects our 'patient' (the sound of the performer).

Thus, live sound is a patient who didn't receive any drug.
Recorded sound is a patient who was given the drug.

Obviously, there is no such thing as two identical patients (two identical live sounds), therefore we use multiple patients (multiple live performances by different performers). One group of patients is given the drug (sound is recorded and reproduced). The control group of patients is not given any drug (sound is live).

A trained listener is the 'expert/doctor' who evaluates randomized sequence of 'patients'.
The listener ('doctor') must answer a question: is the 'patient' dead (recorded sound) or still alive ("lifelike sound") .

This post has been edited by kdo: Jul 14 2010, 16:57
Go to the top of the page
+Quote Post
Arnold B. Kruege...
post Jul 14 2010, 17:15
Post #41





Group: Members
Posts: 3700
Joined: 29-October 08
From: USA, 48236
Member No.: 61311



QUOTE (kdo @ Jul 14 2010, 11:56) *
QUOTE (Arnold B. Krueger @ Jul 14 2010, 11:42) *
The non-repeatability of live music events means that there is no possibility of using the same stimulus.

Yes, correct.

QUOTE (Arnold B. Krueger @ Jul 14 2010, 11:42) *
It's like a drug trial where every drug can be tried only once for all time.

This analogy is true in the "Repeated Measures Design" category of tests, which are typically used in audio. Sean Olive noted it in one of his comments above.

However, RCT is a whole different category. Different approach. Unconventional to audio. So I'm trying to think outside-the-box. Please bear with me for a moment.

My understanding is that in the RCT framework the analogy should go like this:

Performer is the 'patient' (test subject).
Or better yet, not performer himself, but the live sound of the performer is the 'patient' (test subject).

The process of recording/playback is the 'drug' ('treatment', 'intervention') that affects our 'patient' (the sound of the performer).

Thus, live sound is a patient who didn't receive any drug.
Recorded sound is a patient who was given the drug.

Obviously, there is no such thing as two identical patients (two identical live sounds), therefore we use multiple patients (multiple live performances by different performers). One group of patients is given the drug (sound is recorded and reproduced). The control group of patients is not given any drug (sound is live).

A trained listener is the 'expert/doctor' who evaluates randomized sequence of 'patients'.
The listener ('doctor') must answer a question: is the 'patient' dead (recorded sound) or still alive ("lifelike sound") .


Thanks for the explanation.

OK, I think I get it. Yes I think it could work. It seems to be horribly expensive in terms of time and qualified test participants as compared to say an ABX test. Note that the amount of time required to do ABX tests is widely objected to.


Go to the top of the page
+Quote Post
kdo
post Jul 14 2010, 17:58
Post #42





Group: Members (Donating)
Posts: 304
Joined: 18-April 02
From: Russia
Member No.: 1812



QUOTE (Arnold B. Krueger @ Jul 14 2010, 18:15) *
Thanks for the explanation.

OK, I think I get it. Yes I think it could work. It seems to be horribly expensive in terms of time and qualified test participants as compared to say an ABX test. Note that the amount of time required to do ABX tests is widely objected to.

You're welcome.

But how much horrible it would be, really?
Would be very interesting to work out some ballpark numbers. Perhaps, we need to ask a statistician.


Earlier I tried to make a simple 'uneducated estimate' of complexity. It seems that the work load on the listeners (number of performances to evaluate) in RCT may be not so different from ABX.

I'll just quote the relevant part here:
QUOTE (kdo @ Jul 13 2010, 21:10) *
In a quick google search I found some articles and tables, where the recommended sample size (number of test subjects) in RCT seems to vary anywhere between 50 and 200, sometimes much more, sometimes less.
...
In ABX we are testing just one performer, but we need at least 10 trials, 3 performances per trial (A,B,X), so at least 30 performances in total.
In a RCT test we'd need, say, 50 or 100 test subjects (performers), but we'd only use just 1 performance of each subject, so it's 50 to 100 performances in total.

So, okay, the bad news is that we'd have to invite a lot more performers for RCT than for ABX, but the work load on the listeners (number of performances to evaluate) is not all that different - that's a good news.


But now I'm also thinking: can we, perhaps, get away with using just 1 performer to produce all live and recorded sounds?
This might considerably reduce complexity of the test. (No need for hundreds of performers - that would be fantastic).

This post has been edited by kdo: Jul 14 2010, 17:59
Go to the top of the page
+Quote Post
2Bdecided
post Jul 14 2010, 18:54
Post #43


ReplayGain developer


Group: Developer
Posts: 5104
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (solive @ Jul 11 2010, 20:16) *
The reference is what the artist heard in the studio plain and simple. That is the "live performance"
What do you mean - in the studio while they were performing, or in the studio as they listened to playback?

I would suggest that both have little relevance. What a person sounds like to them selves is very different from the way they sound to others. What a given recording sounds like through a specific set of studio loudspeakers is of little relevance to me - especially if those studio speakers are crap.

I know the engineer will make decisions based on the sound they hear through those speakers, but a skilled engineer mixes for all speakers, and isn't likely to tailor the sound specifically to overcome the deficiencies of one pair of speakers. They may do so to some extent, but the more skill and experience they have, the less this will be an issue.

I can justify this with a great example: recordings from the 1930s sound closer to "real" instruments when replayed today than they did when replayed in the 1930s, yet according to your argument, the sound heard in the control room in 1935 is "accurate", and the sound heard on "better" equipment today is less accurate.

Cheers,
David.
Go to the top of the page
+Quote Post
2Bdecided
post Jul 14 2010, 19:04
Post #44


ReplayGain developer


Group: Developer
Posts: 5104
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (Arnold B. Krueger @ Jul 13 2010, 17:25) *
The second issue is that the musical reproduction chain can be broken down into three general areas being microphones and microphone technique, audio signal stoarge and production, and speakers and room acoustics. By various means we can show that signal storage and production can be sonically transparent. It is well known that neither of the other two areas of music reproduction have attained that level of refinement.
I think this is probably factually accurate - but until we have all three areas "transparent", I wonder how we can truthfully say that any one or two of them are. How do we know?!

I suppose there are spatial dimensions of sound (e.g. directional response in the room) which are simply not present on stereo (or arguably conventional multi-channel) recordings. Since they're not present, then I guess any speaker which reproduces the other cues (which are present) correctly, can be said to be transparent. Which may mean that two arguably "transparent" speakers can sound different in a real room, due to their different spatial response patterns. I think.

(I've confused myself here. I'm not trying to start an argument).

It would be nice to have an audio recording and reproduction system that was end-to-end transparent - even if we started with one sound source (e.g. someone talking) in an anechoic chamber. If we can't even manage that after a century, what have we been playing at? wink.gif

Cheers,
David.

Go to the top of the page
+Quote Post
analog scott
post Jul 14 2010, 20:40
Post #45





Group: Members
Posts: 332
Joined: 26-July 09
Member No.: 71796



QUOTE (2Bdecided @ Jul 14 2010, 19:04) *
QUOTE (Arnold B. Krueger @ Jul 13 2010, 17:25) *
The second issue is that the musical reproduction chain can be broken down into three general areas being microphones and microphone technique, audio signal stoarge and production, and speakers and room acoustics. By various means we can show that signal storage and production can be sonically transparent. It is well known that neither of the other two areas of music reproduction have attained that level of refinement.
I think this is probably factually accurate - but until we have all three areas "transparent", I wonder how we can truthfully say that any one or two of them are. How do we know?!

I suppose there are spatial dimensions of sound (e.g. directional response in the room) which are simply not present on stereo (or arguably conventional multi-channel) recordings. Since they're not present, then I guess any speaker which reproduces the other cues (which are present) correctly, can be said to be transparent. Which may mean that two arguably "transparent" speakers can sound different in a real room, due to their different spatial response patterns. I think.

(I've confused myself here. I'm not trying to start an argument).

It would be nice to have an audio recording and reproduction system that was end-to-end transparent - even if we started with one sound source (e.g. someone talking) in an anechoic chamber. If we can't even manage that after a century, what have we been playing at? wink.gif

Cheers,
David.


This why it is a trickier to talk about speaker "accuracy." You take something like a preamp (maybe one of the easiest components to judge for accuacy) and you literally can take the input and output and directly compare them. With speakers you defintiely can't compare the input and the output realistically since the output of a speaker is so fundamentally different than it's input.

So long as audio recording and playback systems are designed to create an aural illusion of the original event form a single perspective rather than an accuracte reconstruction of the original soundfield, transparency of the system in it's entirety becomes a bit dodgy. No matter how transparent the precieved sound may be to the original event the soundfileds of the original event and the playback will always be incomparable.
Go to the top of the page
+Quote Post
Arnold B. Kruege...
post Jul 16 2010, 07:58
Post #46





Group: Members
Posts: 3700
Joined: 29-October 08
From: USA, 48236
Member No.: 61311



QUOTE (2Bdecided @ Jul 14 2010, 14:04) *
It would be nice to have an audio recording and reproduction system that was end-to-end transparent - even if we started with one sound source (e.g. someone talking) in an anechoic chamber. If we can't even manage that after a century, what have we been playing at? wink.gif


IMO way too much time has been spent playing with the part of the chain that has been sonically transparent for 2-3 decades.

I see that the AES is still fighting the battle of 24/192:

Yet another example of people who should know better wasting time building sand castles

:-(

This post has been edited by Arnold B. Krueger: Jul 16 2010, 07:58
Go to the top of the page
+Quote Post
2Bdecided
post Jul 16 2010, 16:29
Post #47


ReplayGain developer


Group: Developer
Posts: 5104
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (Arnold B. Krueger @ Jul 16 2010, 07:58) *
QUOTE (2Bdecided @ Jul 14 2010, 14:04) *
It would be nice to have an audio recording and reproduction system that was end-to-end transparent - even if we started with one sound source (e.g. someone talking) in an anechoic chamber. If we can't even manage that after a century, what have we been playing at? wink.gif


IMO way too much time has been spent playing with the part of the chain that has been sonically transparent for 2-3 decades.
You might be right.

QUOTE
I see that the AES is still fighting the battle of 24/192:

Yet another example of people who should know better wasting time building sand castles

:-(
You're upset that the discussion happened. I'm upset that the flipping AUDIO engineering society can't even record the discussion and post it on-line!!!

(unless I've missed it)

Still, a report says...
QUOTE
These thought provoking presentations gave some teaching for psychoacoustic test
methods and some fascinating recent results on perception thresholds. Peter Craven gave
an insight into subjective testing and how the forced decision ABX test may in fact fail to
find out what the ear /brain perception is doing, where the test blocks the natural
perceived response to audio quality variation unless the differences are relatively gross.
Milind Kunchur outlined the extreme care necessary to establish sensitive tests to
establish a 5uS or so temporal detection threshold, backed by a theoretical analysis of this
aspect of hearing.
from http://www.hificritic.com/downloads/HDA2010.pdf
...so it might have been good, or it might have been nonsense. It would be good to have some papers to read.

Critics of (some caricature of) ABX need to show that some other double-blind test methodology can allow listeners to hear a difference that ABX masks. Did this happen here? Who can tell!

Cheers,
David.
Go to the top of the page
+Quote Post
Arnold B. Kruege...
post Jul 16 2010, 23:59
Post #48





Group: Members
Posts: 3700
Joined: 29-October 08
From: USA, 48236
Member No.: 61311



QUOTE (2Bdecided @ Jul 16 2010, 11:29) *
Critics of (some caricature of) ABX need to show that some other double-blind test methodology can allow listeners to hear a difference that ABX masks. Did this happen here? Who can tell!


That is the meat of the discussion. It is easy to say that ABX sucks or that all blind tests suck. It seems to be very hard to actually ring up strong reliable results any other way that doesn't also give away the store by giving clues about what people are listening to, other than plain old sound quality.

As a somewhat OT aside, I am fighting a similar battle at work. We've got some people who probably have classic hypersensitive hearing (due to age and/or hearing damage) who are objecting strongly when the music peaks briefly to over 90 dB at their seats. I see their point. The music gets a little loud and they get a headache. A few other people report the same problem. The audiologist at their hearing aid dealer says that their hearing is good. Most people find that a few peaks up to 100 dB are fun. Some of the sources that get really loud are acoustic instruments so the sound guy who gets some of their wrath, can't do anything about it anyway.

The similarity is that what they perceive completely supports their viewpoint. How could they be wrong?
Go to the top of the page
+Quote Post

2 Pages V  < 1 2
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 28th August 2014 - 02:35