IPB

Welcome Guest ( Log In | Register )

Measuring/ranking lossy difference from input+implications for quality, [TOS #5 from “iTunes AAC@256 vs 320 LAME: Different Experience?”/95519
mzil
post Jun 21 2012, 08:40
Post #1





Group: Members
Posts: 735
Joined: 5-August 07
Member No.: 45913



I'm of the mind that a SNR of 130 dB could be described as "better" than 120 dB, even though I'm confident human beings would be unable to tell.

I'm of the mind a car that can accelerate from 0 to 60 mph in 9 seconds is "better" than one which takes 9.01 seconds, even though no human being could tell, either.

This person's question seems perfectly legit to me and I would think there must be a "null test" [please forgive my layman's term] comparing the compressed song to the uncompressed, using the two codecs in question, which can analyze the difference and give an answer in quantifiable terms, at least for a given song in question.

[If my term "null test" isn't understood by all, I can elaborate, but basically it means to play the two songs simultaneously and combining them but out of phase with each other. The distortion and artifacts introduced by the codec are then exposed in their pure state, without the music to mask them, and the level of these distortions can be given a value compared to the level of the actual song from moment to moment. Do we have such a test? With amplifiers I believe it was David Hafler who first thought of this in the 70's .]

Results would be along the lines of "Codec A had distortions -70 dB below the music level on average with occassional peaks of -60 dB, whereas Codec B was, um , "better" because its distortions were -80 dB below the music level on average with occassion peaks of only -75 dB. Does it matter to a human listener? NO. But that doesn't mean we are no longer allowed to use the term "better", in my opinion.

Granted some sorts of distortions and artifacts are more annoying (if discernable) than others, and weighting this regarding where the ear is most sensitive (say around 3.5 kHz) makes sense to me, but I still think an automated system with absolute numbers we can compare, instead of "Go ABX your entire 500 GB music collection your self to see what the answer is" would help this person out.

This post has been edited by mzil: Jun 21 2012, 08:46
Go to the top of the page
+Quote Post
 
Start new topic
Replies (1 - 19)
Kohlrabi
post Jun 21 2012, 09:16
Post #2





Group: Super Moderator
Posts: 1150
Joined: 12-March 05
From: Kiel, Germany
Member No.: 20561



QUOTE (mzil @ Jun 21 2012, 09:40) *
I'm of the mind that a SNR of 130 dB could be described as "better" than 120 dB, even though I'm confident human beings would be unable to tell.

A single metric, which might not even be correlated well if at all with perceived audio quality is never a good thing. Look at video codec development, where some people heavily rely (relied?) on the PSNR metric. Encoders mainly optimized for that metric can fail spectacularly in real world perception tests. Noise level is certainly not the only determining factor, not in audio and not in video.

QUOTE (mzil @ Jun 21 2012, 09:40) *
Granted some sorts of distortions and artifacts are more annoying (if discernable) than others, and weighting this regarding where the ear is most sensitive (say around 3.5 kHz) makes sense to me, but I still think an automated system with absolute numbers we can compare, instead of "Go ABX your entire 500 GB music collection your self to see what the answer is" would help this person out.

While I try to adhere to the ToS here at this forum, I can accept the possibility of a metric with high correlation with perceived audio quality to be useful, but so far I haven't seen one in the audio world. Also, ABX tests are not as involved as you suggest, just take a small sample of music you listen to, encode it, and conduct the tests. If you don't have enough time, even a single song might be enough. There is also no reason to go OCD over your codec choice, if you find a music piece which doesn't encode well, just turn up the quality dial a bit on your encoder, and see if it helps. It is also no problem to just choose a "overcompensating" higher encoding setting overall, if you don't want to test a lot, but keep in mind that this is wasteful and not a real solution.

Transparency of lossy encoder setting results is inherently subjective, there is no objective perfect setting for every person. You can never be sure that a certain encode is transparent to everyone. That's the reason why we tell everyone to just ABX themselves. And that's the point why selling lossy audio is a bad concept to begin with, too.

This post has been edited by Kohlrabi: Jun 21 2012, 09:17


--------------------
It's only audiophile if it's inconvenient.
Go to the top of the page
+Quote Post
Ouroboros
post Jun 21 2012, 09:20
Post #3





Group: Members
Posts: 291
Joined: 30-May 08
From: UK
Member No.: 53927



@mzil: A set of numbers wouldn't help anyone out, it would simply perpetuate the myth that measuring is a substitute for listening. Also, no-one is suggesting that anyone should ABX a 500GB music collection, and that sort of baseless hyperbole doesn't help the discussion.

The flaws in measuring numerical differences between compressed and uncompressed music as a mechanism for evaluating the quality of lossy music compression have been discussed repeatedly. Try this topic as an example.
Go to the top of the page
+Quote Post
db1989
post Jun 21 2012, 11:05
Post #4





Group: Super Moderator
Posts: 5275
Joined: 23-June 06
Member No.: 32180



QUOTE (mzil @ Jun 21 2012, 08:40) *
[…] I still think an automated system with absolute numbers we can compare, instead of "Go ABX your entire 500 GB music collection your self to see what the answer is" would help this person out.
Kohlrabi is right. There’s no point designing some catch-all algorithm to evaluate perceptual quality when there’s no way to determine a priori what is perceptually transparent to any one user.

QUOTE
I'm of the mind that a SNR of 130 dB could be described as "better" than 120 dB, even though I'm confident human beings would be unable to tell.
This is rather irrelevant, given that SNR is much more easily quantified than is the probability of perceptual transparency.

QUOTE
If my term "null test" isn't understood by all, I can elaborate, but basically it means to play the two songs simultaneously and combining them but out of phase with each other. The distortion and artifacts introduced by the codec are then exposed in their pure state, without the music to mask them, and the level of these distortions can be given a value compared to the level of the actual song from moment to moment. Do we have such a test? With amplifiers I believe it was David Hafler who first thought of this in the 70's .
Zooming in:
QUOTE
without the music to mask them
…which is largely the purpose of lossy encoding.

Again, if the listener cannot tell a difference, it doesn’t matter what the difference signal or any other derived metric says about the ‘quality’. As Kohlrabi said, they can bump down to as low a setting as they can stand and never have to worry about it. That’s the aim of lossy formats.

This post has been edited by db1989: Jun 21 2012, 11:07
Go to the top of the page
+Quote Post
greynol
post Jun 21 2012, 15:04
Post #5





Group: Super Moderator
Posts: 10348
Joined: 1-April 04
From: San Francisco
Member No.: 13167



QUOTE (mzil @ Jun 21 2012, 00:40) *
I'm of the mind that a SNR of 130 dB could be described as "better" than 120 dB, even though I'm confident human beings would be unable to tell.
You clearly aren't talking about lossy encoding.

QUOTE
If my term "null test" isn't understood by all, I can elaborate, but basically it means to play the two songs simultaneously and combining them but out of phase with each other.
Such tests are easy to conduct. Unfortunately they are completely useless.

QUOTE
The distortion and artifacts introduced by the codec are then exposed in their pure state, without the music to mask them
...which is the entire point of a perceptual coder!!!!!

QUOTE
that doesn't mean we are no longer allowed to use the term "better"
TOS #8 is quite clear about what one must provide as evidence in order to be allowed to use the term "better" as it relates to sound quality. Difference signals do not qualify!

QUOTE
instead of "Go ABX your entire 500 GB music collection your self to see what the answer is" would help this person out.
What part of "find music with transients" equates to "ABX your whole collection?"

QUOTE (db1989 @ Jun 21 2012, 03:05) *
There’s no point designing some catch-all algorithm to evaluate perceptual quality when there’s no way to determine a priori what is perceptually transparent to any one user.
If there was such a test and it worked then it would completely revolutionize lossy encoding. Until that day comes, sound quality of perceptual encoding must be judged by the ears and nothing more. Graphs, SNR, null tests and the color of your car are irrelevant metrics.

This post has been edited by greynol: Jun 21 2012, 15:47


--------------------
Breath is found in plots and DR figures.
Go to the top of the page
+Quote Post
db1989
post Jun 21 2012, 15:52
Post #6





Group: Super Moderator
Posts: 5275
Joined: 23-June 06
Member No.: 32180



Of course, I should have said trying to design and probably failing rather than “designing”. wink.gif
Go to the top of the page
+Quote Post
mzil
post Jun 21 2012, 18:14
Post #7





Group: Members
Posts: 735
Joined: 5-August 07
Member No.: 45913



QUOTE (Kohlrabi @ Jun 21 2012, 04:16) *
QUOTE (mzil @ Jun 21 2012, 09:40) *
I'm of the mind that a SNR of 130 dB could be described as "better" than 120 dB, even though I'm confident human beings would be unable to tell.

A single metric, which might not even be correlated well if at all with perceived audio quality is never a good thing. Look at video codec development, where some people heavily rely (relied?) on the PSNR metric. Encoders mainly optimized for that metric can fail spectacularly in real world perception tests. Noise level is certainly not the only determining factor, not in audio and not in video.

[emphasis mine]

QUOTE
This is rather irrelevant, given that SNR is much more easily quantified than is the probability of perceptual transparency
.-db1989

and
QUOTE
You clearly aren't talking about lossy encoding.
- greynol.

Correct, I'm not.

Yikes! I clearly shouldn't have given an analogy that had anything to do with audio perception and should have just used the car acceleration 0-60 mph analogy, since my choice seems to have caused some confusion with several people here. My bad. I was attempting to give examples where instrumentation easily exceeds the limits of human perception, that's all. SNR as it addresses the topic at hand was not my point. Oops. It was a terrible choice on my part since I see now how people would have thought I really was talking about SNR in particular. It was just a fluky coincidence.
---

TOS #8 is fantastic. I love it. However it doesn't address the point I was attempting to discuss because it relates to things which are being argued as be perceptible to humans, hence its use of the term "subjective sound quality":

"8. All members that put forth a statement concerning subjective sound quality, must -- to the best of their ability -- provide objective support for their claims. Acceptable means of support are double blind listening tests (ABX or ABC/HR) demonstrating that the member can discern a difference perceptually, together with a test sample to allow others to reproduce their findings. Graphs, non-blind listening tests, waveform difference comparisons, and so on, are not acceptable means of providing support."

I never argued that humans can tell a difference between 9.00 seconds and 9.01 seconds, however in our endeavour to more precisely determine what is and what isn't perceptible, having the instrumentation to record such subtle differences (which are beyond what humans can detect), I think has value and is worthy of discussion, however I will do my best and refrain from describing one figure as "better" than another, because it seems to be a sticky point as to what "better" means. I never meant to imply that "better" always means "perceptible to humans," yet it seems to be taken that way by some here, so I will stop using the term.

QUOTE
What part of "find music with transients" equates to "ABX your whole collection?"

Rather than listen to any of my own collection at all, I'd be more inclined to ABX "killer samples" that have been selected from a vastly larger library than I have. Also transients in particular aren't my main concern, but I can't speak for the OP. I can hear the pre-echo problem in some killer samples of electronic music with sharp click sounds but can't say I've ever experienced the same problem with the music I actually listen to.

[I asked in another thread if there was a name for the kind of distortion artifact I'm more concered with but wasn't given a more precise answer other than "undercoded", if I recall correctly.]

This post has been edited by mzil: Jun 21 2012, 18:20
Go to the top of the page
+Quote Post
mzil
post Jun 21 2012, 18:29
Post #8





Group: Members
Posts: 735
Joined: 5-August 07
Member No.: 45913



QUOTE (Kohlrabi @ Jun 21 2012, 04:16) *
A single metric, which might not even be correlated well if at all with perceived audio quality is never a good thing. Look at video codec development, where some people heavily rely (relied?) on the PSNR metric. Encoders mainly optimized for that metric can fail spectacularly in real world perception tests. ...

Yes , they need work and refinement, however should we just give up and rely on human testing forever? Or should we try to determine what things the humans are keying on and then learn how to quantify those things using instrumentation? I vote for the latter.

QUOTE
...I can accept the possibility of a metric with high correlation with perceived audio quality to be useful, but so far I haven't seen one in the audio world.
Maybe some day, I hope.

edit to add: I also don't think we need to limit it to a single metric. We could have several working together at once.

This post has been edited by mzil: Jun 21 2012, 18:35
Go to the top of the page
+Quote Post
mzil
post Jun 21 2012, 18:43
Post #9





Group: Members
Posts: 735
Joined: 5-August 07
Member No.: 45913



QUOTE (greynol @ Jun 21 2012, 10:04) *
QUOTE
If my term "null test" isn't understood by all, I can elaborate, but basically it means to play the two songs simultaneously and combining them but out of phase with each other.
Such tests are easy to conduct. Unfortunately they are completely useless.

That's great news. [That they seem to exist smile.gif ]I would be interested if there is software which will allow my to do this null test on my own. Might you recommend some for me, a newb, to try out? Thanks.

This post has been edited by mzil: Jun 21 2012, 18:47
Go to the top of the page
+Quote Post
greynol
post Jun 21 2012, 18:44
Post #10





Group: Super Moderator
Posts: 10348
Joined: 1-April 04
From: San Francisco
Member No.: 13167



QUOTE (mzil @ Jun 21 2012, 10:14) *
Rather than listen to any of my own collection at all, I'd be more inclined to ABX "killer samples" that have been selected from a vastly larger library than I have.

I couldn't care less about a killer sample if it doesn't occur in my library, personally.

QUOTE (mzil @ Jun 21 2012, 10:14) *
I can hear the pre-echo problem in some killer samples of electronic music with sharp click sounds but can't say I've ever experienced the same problem with the music I actually listen to.

hi-hat, snare drum, acoustic guitar
...if I had a penchant for harpsichord (I don't)

QUOTE (mzil @ Jun 21 2012, 10:29) *
Or should we try to determine what things the humans are keying on and then learn how to quantify those things using instrumentation?

...as if this hasn't been done over the the evolution of perceptual coding. rolleyes.gif

QUOTE (mzil @ Jun 21 2012, 10:43) *
I would be interested if there is software which will allow my to do this null test on my own. Might you recommend some for me, a newb, to try out?

Adobe Audtion -> mix paste + invert

Knock your socks off!

This post has been edited by greynol: Jun 21 2012, 18:56


--------------------
Breath is found in plots and DR figures.
Go to the top of the page
+Quote Post
Apesbrain
post Jun 21 2012, 18:55
Post #11





Group: Members
Posts: 500
Joined: 3-January 04
From: East Coast, USA
Member No.: 10915



QUOTE (mzil @ Jun 21 2012, 13:43) *
I would be interested if there is software which will allow my to do this null test on my own. Might you recommend some for me, a newb, to try out? Thanks.

Audacity is free and can easily do this. Open the first file, then import the second. Invert one of them and push play.
Others have already commented on the value of this exercise. What exactly do you plan to compare? AAC to MP3 would be particularly meaningless. FLAC to AAC then FLAC to MP3 is at least curiously interesting. Let us know what you find out.
Go to the top of the page
+Quote Post
greynol
post Jun 21 2012, 18:59
Post #12





Group: Super Moderator
Posts: 10348
Joined: 1-April 04
From: San Francisco
Member No.: 13167



QUOTE (Apesbrain @ Jun 21 2012, 10:55) *
FLAC to AAC then FLAC to MP3 is at least curiously interesting.

If by interesting you mean misleading, then sure. smile.gif

One word: masking. I'd put that in 1000 pt. font if I thought it would make a difference.

This post has been edited by greynol: Jun 21 2012, 18:59


--------------------
Breath is found in plots and DR figures.
Go to the top of the page
+Quote Post
mzil
post Jun 21 2012, 19:10
Post #13





Group: Members
Posts: 735
Joined: 5-August 07
Member No.: 45913



QUOTE (greynol @ Jun 21 2012, 13:44) *
Adobe Audtion/Cool Edit Pro:
mix paste + invert

Great. I will try a trial version of Cool Edit Pro when I get a chance!
---

QUOTE
The flaws in measuring numerical differences between compressed and uncompressed music as a mechanism for evaluating the quality of lossy music compression have been discussed repeatedly. Try this topic as an example.


Thanks, Ouroboros. That thread looks like pay dirt. I will read it.

In my attempt to search for the topic I used the term "null test" only because thats what Hafler called it back in the day, but is there some better terminology I should be using for lossy codec testing using this method?

This post has been edited by mzil: Jun 21 2012, 19:24
Go to the top of the page
+Quote Post
Apesbrain
post Jun 21 2012, 19:15
Post #14





Group: Members
Posts: 500
Joined: 3-January 04
From: East Coast, USA
Member No.: 10915



QUOTE (greynol @ Jun 21 2012, 13:59) *
If by interesting you mean misleading, then sure. smile.gif

Ha! I didn't say that I was going to waste my time doing this. Anything that gets the OP to do some work to answer his own question to his own satisfaction seems like a step forward.
Go to the top of the page
+Quote Post
greynol
post Jun 21 2012, 19:19
Post #15





Group: Super Moderator
Posts: 10348
Joined: 1-April 04
From: San Francisco
Member No.: 13167



QUOTE (mzil @ Jun 21 2012, 11:10) *
I will read it.

My condolences.

QUOTE (mzil @ Jun 21 2012, 11:10) *
In my attempt to search for the topic I used the term "null test" only because thats what Hafler called it back in the day, but is there some better terminology I should be using for lossy codec testing using this method?

"Difference signal" or "error signal" though they are not necessarily better worse.

Please don't dignify it by referring to it as a "method." It is not a method.

This post has been edited by greynol: Jun 21 2012, 19:57


--------------------
Breath is found in plots and DR figures.
Go to the top of the page
+Quote Post
mzil
post Jun 21 2012, 20:01
Post #16





Group: Members
Posts: 735
Joined: 5-August 07
Member No.: 45913



QUOTE (greynol @ Jun 21 2012, 14:19) *
QUOTE (mzil @ Jun 21 2012, 11:10) *
In my attempt to search for the topic I used the term "null test" only because thats what Hafler called it back in the day, but is there some better terminology I should be using for lossy codec testing using this method?

"Difference signal," though it is not necessarily better worse.

Please don't refer to it as a "method." It is not a method.


It is a method (or "way") to determine the difference between a lossy compressed and uncompressed audio file. No inference that the
difference found proves the two original files are necessarily audibly different to humans should be made, nor was implied by me.

This post has been edited by mzil: Jun 21 2012, 20:58
Go to the top of the page
+Quote Post
rick.hughes
post Jun 21 2012, 20:28
Post #17





Group: Members
Posts: 131
Joined: 16-February 07
Member No.: 40679



QUOTE (mzil @ Jun 21 2012, 15:01) *
It is a method (or "way") to determine the difference between a lossy compressed and uncompressed audio file. No inference that the
difference found is necessarily audible to humans should be made, nor was implied by me.

The difference if listened to by itself may very well be audible, but this is not useful.

Lossy encoding works by taking advantage of masking. Discovering what was masked is trivial. Any encoding that retains more of the masked audio is not doing a better job.
Go to the top of the page
+Quote Post
greynol
post Jun 21 2012, 20:48
Post #18





Group: Super Moderator
Posts: 10348
Joined: 1-April 04
From: San Francisco
Member No.: 13167



Beautifully put!


--------------------
Breath is found in plots and DR figures.
Go to the top of the page
+Quote Post
mzil
post Jun 21 2012, 21:24
Post #19





Group: Members
Posts: 735
Joined: 5-August 07
Member No.: 45913



QUOTE (rick.hughes @ Jun 21 2012, 15:28) *
The difference if listened to by itself may very well be audible, but this is not useful...


My wording was poor/sloppy, sorry, so I have edited my post that you just quoted me on and therefore wont comment on the above.


QUOTE
.Lossy encoding works by taking advantage of masking. Discovering what was masked is trivial.
The difference file generated by the null test ( I'm talking about) is not exclusively the masked material that was discarded during encoding, it is also the added artifacts/distortions, such as pre-echo, which have been inadvertently added to the lossy compressed version that never existed in the original file.

As for it being "useful" to have this at hand, that would depend what one wants it for. If you are suggesting that I mean it in some way "proves" the audibility between the original and lossy compressed version of the audio sample, you'd be mistaken.
Go to the top of the page
+Quote Post
drewfx
post Jun 22 2012, 01:02
Post #20





Group: Members
Posts: 98
Joined: 17-October 09
Member No.: 74078



QUOTE (mzil @ Jun 21 2012, 16:24) *
The difference file generated by the null test ( I'm talking about) is not exclusively the masked material that was discarded during encoding, it is also the added artifacts/distortions, such as pre-echo, which have been inadvertently added to the lossy compressed version that never existed in the original file.

Are you sure you're talking about two different things here? I suspect you might find that there's more than a little overlap between "masked material that was discarded" and "added artifacts/distortion".
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 29th December 2014 - 03:09