IPB

Welcome Guest ( Log In | Register )

2 Pages V   1 2 >  
Reply to this topicStart new topic
Observing the loss., How good a criterion for quality measure
atici
post Jul 12 2003, 20:35
Post #1





Group: Members (Donating)
Posts: 1180
Joined: 21-February 02
From: Chicago
Member No.: 1367



Ok after some other discussion that prodded me into this I decided to give it a try again.

That is, I calculated the pure loss with mp3 and mpc encoders in CoolEdit (Mix Paste, both channels inverted & Overlap) and listened the pure loss in each case with different quality settings. I observed that with mp3 standard preset, I can still figure out the melody because I can still hear some instruments (probably because of low pass filter). With MPC I hear the swoosh sound intensifying in some parts of the sample esp. when the original sample's volume is high. The average volume of the loss decreases as I increase the quality setting.

Could you tell me your points why this is not a good way of objectively evaluating how successful a lossy codec is? I think it's nice because the difference is not masked by the rest of the sample (which is usually higher in volume and dominates). But I can also imagine that using this way one cannot figure out stereo separation artifacts and even though when the pure loss is listened as a sample and sounds tolerable, the actual encoding result might have noticeable differences and is non-transparent. But isn't this also a reasonable method to supplement the results about which encoder is more successful? Can't we conclude anything objectively or subjectively by observing the pure loss? It sounds to me the discarded information is more tolerable a loss in q4 MPC than lame standard mp3 3.93.1.

This post has been edited by atici: Jul 12 2003, 20:51


--------------------
The object of mankind lies in its highest individuals.
One must have chaos in oneself to be able to give birth to a dancing star.
Go to the top of the page
+Quote Post
upNorth
post Jul 12 2003, 20:59
Post #2





Group: Members
Posts: 1099
Joined: 18-March 03
From: Oslo, Norway
Member No.: 5569



QUOTE (atici @ Jul 12 2003, 09:35 PM)
I think it's nice because the difference is not masked by the rest of the sample (which are usually higher in volume).

My understanding of the problem:
The masking is the essence of psycho acoustics, what you do as I see it, is removing the foundation the lossy codec is build upon. It is essential to have the higher volume sounds present to do the masking. They are ment to hide the introduced noise.

Btw: I guess there will be given better and more elaborate explanations by others. I just want to try my understanding a little, to see if it will be picked to pieces... smile.gif
Go to the top of the page
+Quote Post
Xerophase
post Jul 12 2003, 21:20
Post #3





Group: Members
Posts: 38
Joined: 12-July 03
Member No.: 7722



QUOTE (atici @ Jul 12 2003, 02:35 PM)
Ok after some other discussion that prodded me into this I decided to give it a try again.

That is, I calculated the pure loss with mp3 and mpc encoders in CoolEdit (Mix Paste, both channels inverted & Overlap) and listened the pure loss in each case with different quality settings.
...
Could you tell me your points why this is not a good way of objectively evaluating how successful a lossy codec is?

I don't have a lot to immediately add to this (although I plan to research it, as it really interests me) but I wanted to applaud your thinking, atici. Hopefully some of the members will come in with some constructive brainstorming. It would be helpful if even some part of this process could eventually be used to analyze a codec because theoretically, the loss is something you can't obfuscate through hardware, human perception, etc.

My only concern is, based on the way lossy codecs work, we couldn't always rely on this data because we wouldn't be taking into account the "ear tricking" factor and how well the codec is doing that despite of what it's discarding, along with the other acoustical properties you mentioned.
Go to the top of the page
+Quote Post
atici
post Jul 12 2003, 21:25
Post #4





Group: Members (Donating)
Posts: 1180
Joined: 21-February 02
From: Chicago
Member No.: 1367



Once there was a program called eaqual. I guess it's still around as warez I don't know whether they're developing it anymore. It did provide and objective algorithmic quality loss measure. Although it was a neat idea, it wasn't extremely useful most of the times. For instance it thought for some samples ogg encoded wave file was more accurate than the original laugh.gif

I don't know how this discussion reminded me of that. I guess just because it was suggesting another way of evaluating how successful a lossy codec is as this thread.

This post has been edited by atici: Jul 12 2003, 21:26


--------------------
The object of mankind lies in its highest individuals.
One must have chaos in oneself to be able to give birth to a dancing star.
Go to the top of the page
+Quote Post
2Bdecided
post Jul 12 2003, 22:21
Post #5


ReplayGain developer


Group: Developer
Posts: 5059
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



Haven't we been here very recently, in a joint HA and Creative forum thread?

The method tells you almost nothing about how good the encoded version sounds.

Some one find the thread - I can't bare to type it all again, and I have to go and book a holiday!

D.
Go to the top of the page
+Quote Post
tigre
post Jul 12 2003, 22:56
Post #6


Moderator


Group: Members
Posts: 1434
Joined: 26-November 02
Member No.: 3890



QUOTE
Can't we conclude anything objectively or subjectively by observing the pure loss?


1. It's not as easy as it seems, me thinks.

Example (exteme case):
"Original" = sine wave, frequency = <x> Hz, amplitude = <y>
"Lossy copy" = sine wave, frequency = <x> Hz, amplitude = <y> 180 phase shift
Wave substraction result = sine wave, frequency = <x> Hz, amplitude = 2*<y>

"Original" and "Lossy copy" will sound identical while the result of the substraction will be even louder than the original. blink.gif


2. Using this method, what would you conclude?
- "Loss" consists of pure noise, no representation of the original singal noticable, low volume = good lossy compression ?
- The louder the "Loss" (while Original vs. Copy not ABXable) the better (->Psychoacoustic model successfully adds loads of noise/cuts sounds without being noticable)
- ... ?


Maybe something similar could work to help detecting some kinds of artifacts:

O = original, L = lossy copy, D = Difference, E = "Exagerated lossy copy"

D = L - O (wave substraction)

E(x) = O + D*x (D*x: amplify D)

For x > 1 (e.g. 1.5 or 2) some sorts of "loss" or problems caused by encoding could be better noticable, so trying to ABX D(x) vs. O could be used to find artifacts more easily or as kind of ABX training (lowering x step by step -> 1).


--------------------
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello
Go to the top of the page
+Quote Post
atici
post Jul 12 2003, 23:27
Post #7





Group: Members (Donating)
Posts: 1180
Joined: 21-February 02
From: Chicago
Member No.: 1367



@tigre:

For point 1) we may assume the lossy encoder would not shift the phase of the original. And I don't think it does.

For point 2), I think we may conclude if the volume is low then lossy encoder does a better job (as is shown by increasing quality settings give rise to that effect). Also we can somehow conclude if the "loss" consists of random noise and the original signal is not noticeable, then lossy encoder is doing a better job, can't we?

I think the lossy encoder effect and addition of dither noise is in that sense somehow similar. As there're dither algorithms that introduce higher energy but less audible noise, there may be lossy algorithms that introduce higher volume "loss" but have more transparent effect. But in general, the lower the noise amount the better.

This post has been edited by atici: Jul 12 2003, 23:52


--------------------
The object of mankind lies in its highest individuals.
One must have chaos in oneself to be able to give birth to a dancing star.
Go to the top of the page
+Quote Post
Jebus
post Jul 12 2003, 23:30
Post #8





Group: Developer
Posts: 1293
Joined: 17-March 03
From: Calgary, AB
Member No.: 5541



Look,

The more bits you throw away, the more information will be in the diff file. So an MPC at 160kbps average diff file will have more information in it than a LAME file at say 190kbps. This is NOT to say though that the MPC is lower quality, it actually means that MPC is simply smarter at throwing away stuff that gets masked anyhow.

If you remove the masking effects, you are in essence ignoring the goal of psychoacoustic compression in the first place: to remove information that would otherwise be masked anyhow.
Go to the top of the page
+Quote Post
atici
post Jul 12 2003, 23:36
Post #9





Group: Members (Donating)
Posts: 1180
Joined: 21-February 02
From: Chicago
Member No.: 1367



Jebus, I guess you missed my point. Because I agree with what you say in the first paragraph. Do not consider it as throwing away bits and the difference in bitrates but as the noise introduced as each case of lossy encoder is applied. Just consider the Wav files only (the original and decode(encode(original)) ) because that's the effect of the lossy encoder.

I think when I listen to the loss file and it sounds like the original then I predict a good amount of information about the original is lost.

This post has been edited by atici: Jul 12 2003, 23:43


--------------------
The object of mankind lies in its highest individuals.
One must have chaos in oneself to be able to give birth to a dancing star.
Go to the top of the page
+Quote Post
lucpes
post Jul 12 2003, 23:50
Post #10





Group: Members
Posts: 517
Joined: 9-October 01
Member No.: 254



Blah... take a wave file. Apply wave gain or normalize to reduce by -12dB. Invert and mix with original. I'd have to say that this was a very lossy process tongue.gif - sounds just like the original...

Anyway, the best lossy codec is the one that removes the biggest amount of 'information' but with good results - non ABXable - no audible differences between the encoded & original.

This post has been edited by lucpes: Jul 12 2003, 23:51
Go to the top of the page
+Quote Post
guruboolez
post Jul 12 2003, 23:52
Post #11





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



QUOTE (atici @ Jul 12 2003, 08:35 PM)
(...) I observed that with mp3 standard preset, I can still figure out the melody because I can still hear some instruments (probably because of low pass filter). With MPC I hear the swoosh sound intensifying in some parts of the sample(...)  It sounds to me the discarded information is more tolerable a loss in q4 MPC than lame standard mp3 3.93.1.

Just a question : have you decoded your mp3 first with LAME, in order to remove the additional samples ? If not, your test is biased : Fhg decoding engine will maintain the 'gap'.


I had fun some times ago with this tool. I find interesting to mesure the real loss of the encoding process. Nevertheless, you can't evaluate the quality of two encodings by this way. The stronger difference (= noise) isn't necessary the most ABXable file. I tried to oppose MPC and Vorbis by this way ; mpc seemed to be the most degraded one, but after a carefully blind listening test, I only heard a difference (hiss) for the vorbis file.

Note that this tool is interesting in order to detect artifacts similar to the erhu effect.
Go to the top of the page
+Quote Post
atici
post Jul 12 2003, 23:58
Post #12





Group: Members (Donating)
Posts: 1180
Joined: 21-February 02
From: Chicago
Member No.: 1367



QUOTE
Just a question : have you decoded your mp3 first with LAME, in order to remove the additional samples ? If not, your test is biased : Fhg decoding engine will maintain the 'gap'.


Yes I did. I used the same version of lame to decode and it notified that the encoding gap was taken into consideration.

QUOTE
The stronger difference (= noise) isn't necessary the most ABXable file.


I agree with that. But there is still a link with the quality lost and the "loss" file. Just as the dithering noise introduced in dither process as I was trying to make an analogy. Some dither algorithms, although at the expense of being audible, aim for the least amount of noise volume (like Waves L2 type 2). In general the lower the noise the better, but of course the difference might be audible and that's not what we want with neither the lossy encoders nor the dithering algorithms.

I was just trying to suggest a supplement to the comparison methods used not offer a panacea.

This post has been edited by atici: Jul 13 2003, 00:05


--------------------
The object of mankind lies in its highest individuals.
One must have chaos in oneself to be able to give birth to a dancing star.
Go to the top of the page
+Quote Post
ErikS
post Jul 13 2003, 02:08
Post #13





Group: Members
Posts: 757
Joined: 8-October 01
Member No.: 247



QUOTE (atici @ Jul 12 2003, 11:27 PM)
I think we may conclude if the volume [of the difference file] is low then lossy encoder does a better job (as is shown by increasing quality settings give rise to that effect). Also we can somehow conclude if the "loss" consists of random noise and the original signal is not noticeable, then lossy encoder is doing a better job, can't we?

I would say these conclusions are invalid.

1. You have to show implication both ways. Showing that A implies B, doesn't mean that B implies A until you show that too somehow. And I'd say it's very difficult to do that because you should start with the difference file and then find a pair of one original and one encoded file that will match this difference and then check the quality setting after that.

2. How do you draw the second conclusion? Same way as the first?
Go to the top of the page
+Quote Post
Pio2001
post Jul 13 2003, 03:13
Post #14


Moderator


Group: Super Moderator
Posts: 3936
Joined: 29-September 01
Member No.: 73



Personally, I would be less worried by a difference file with audible music. This way I can imagine that the difference is always masked by the music, since it's loud when the music is loud, quiet when the music is quiet, hi pitched when the music is high pitched, etc. A noisy difference file can be more worrisome, as music can't always mask noise.

In fact I'm worried by none, because I trust the masking effect.

QUOTE (2Bdecided @ Jul 13 2003, 12:21 AM)
Some one find the thread - I can't bare to type it all again, and I have to go and book a holiday!

There's another one in the FAQ
Go to the top of the page
+Quote Post
tangent
post Jul 13 2003, 07:34
Post #15





Group: Members
Posts: 674
Joined: 29-September 01
Member No.: 63



Look around for the "masking effect", I'm tired of explaining it again.
It's possible to test the difference data to compare codecs. All you really have to do is to build a frequency graph of the difference and compare it to the masking curve built from the original data. Audible noise can be anything in the difference data which goes above the masking curve. I have no idea how EAQUAL or Earguy's objective comparer works, but may be similar.
Go to the top of the page
+Quote Post
Pio2001
post Jul 13 2003, 11:21
Post #16


Moderator


Group: Super Moderator
Posts: 3936
Joined: 29-September 01
Member No.: 73



But... masking effects occurs when two frequencies are played simultaneously. Building a frequency graph of a file will give no temporal information. If the original has -5 db @1000 Hz and the noise -70 db @999 Hz, how do you know if they are at the same time, thus masked, of if the noise one is occuring during complete silence, while the reference one is one minute away ?
And again, not speaking of temporal masking, ATH etc, won't this result in using a very bad "codec" as reference for perfect quality ?
Go to the top of the page
+Quote Post
tangent
post Jul 13 2003, 13:24
Post #17





Group: Members
Posts: 674
Joined: 29-September 01
Member No.: 63



Obviously you do the frequency analysis over time, block by block.
Go to the top of the page
+Quote Post
2Bdecided
post Jul 14 2003, 10:08
Post #18


ReplayGain developer


Group: Developer
Posts: 5059
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



It's not the same question, but it's close enough...
http://www.hydrogenaudio.org/forums/index....opic=10522&st=0


This is from my PhD thesis:

Figure 2.6: Input Output difference analysis

In a digital system, providing any delay due to the device is known and corrected for, the input signal can be subtracted exactly from the output signal, as shown in Figure 2.6. The residue consists of any noise and distortion added by the device. This technique may be used to determine the noise that is added by an audio codec in the presence of an input signal. If a test signal is applied, standard noise measuring techniques (e.g. [ITU-R BS.468-4, 1986] weighting followed by RMS averaging) may be used to calculate a single noise measurement. Alternatively, a Signal to Noise like Ratio may be computed, where the noise level is measured in the presence of the signal, rather than with the signal absent. This noise measurement may be used in equation (2-1), in place of VN. The measurement is objective and repeatable.

Unfortunately, this measurement is almost useless for audio quality assessment. It is useless because the measured value does not correlate with the perceived sound quality of the audio codec. In fact, the noise measurement gives no indication of the perceived noise level.

The problem is that the noise measurement is quantifying inaudible noise. An audio codec is designed to add noise. The intention is to add noise within spectral and temporal regions of the signal where it cannot be perceived by a human listener. Subtracting the input signal from the output of the codec will expose this noise, and the noise measurement will quantify it. If the inaudible noise could somehow be removed from the measurement, then the resulting quantity would match human perception more accurately, since it would reflect what is audible. This task is complex, and many other approaches have been suggested which avoid this task. Some of these approaches, and the reasons why they are inappropriate, are discussed below.

A measurement of coding noise will include both audible and inaudible noise. Many analyses assume that all codecs will add equal amounts of inaudible noise. If this is true, then the codec that adds the most noise will sound worst, since it must add the most audible noise. However, a good codec may add a lot of noise, but all the noise may be masked. This codec will cause no audible degradation of the signal. Conversely, a poor codec may add only a little noise, but if the noise is above the masking threshold, then the codec will sound poor to a human listener. Hence, this approach is flawed, because the basic assumption is incorrect.

Many codec analyses published on the World Wide Web include plots of the long-term spectrum of the signal and coding noise. This approach assumes that where the coding noise lies below the signal spectrum, it will be inaudible, and where the noise is above the signal spectrum, it will be audible. Unfortunately, these assumptions are false. Noise above the signal spectrum may be masked, because masking extends upwards in the frequency domain. Noise below the signal spectrum may be audible, because the spectrum must be calculated over a finite time (ranges from 1024 samples to three minutes have been encountered). Hence, the signal that apparently masks the codec noise may not occur at the same time as the noise itself. This is especially true for sharp attacks, where many encoders generate audible pre-echo before the attack. This pre-echo is below the spectral level of the attack, so appears "masked" using this mistaken analysis method.

The problem with all these techniques is that they side-step the basic problem: it is necessary to determine which noise components are audible, and which are inaudible, before the audible effect of the codec upon the signal may be quantified.


http://www.mp3-tech.org/programmer/docs/Ro...nson_thesis.zip

Cheers,
David.
Go to the top of the page
+Quote Post
DonP
post Jul 14 2003, 12:38
Post #19





Group: Members (Donating)
Posts: 1471
Joined: 11-February 03
From: Vermont
Member No.: 4955



Here's another issue to chew on..

Even allowing/assuming/accepting that the difference file doesn't tell you the
quality, what do you think of using it as a crib in ABX'ing? That is, using the
difference file to identify artifacts which you go and try to find in an ABX between
the original and encoded files, knowing exactly where to look?

Is it that all is fair as long as in the end you can identify the encoded file in a blind test, or
is it cheating a valid model if you could never pick the encoded file without "looking under
the covers" at the diff file?

Should this be a poll?
Go to the top of the page
+Quote Post
Vietwoojagig
post Jul 14 2003, 12:46
Post #20





Group: Members
Posts: 247
Joined: 28-November 02
From: Germany, Trier
Member No.: 3916



QUOTE (atici @ Jul 12 2003, 02:27 PM)
@tigre:

For point 1) we may assume the lossy encoder would not shift the phase of the original. And I don't think it does.

I would say, it's a must, that this sometimes happens. How could you otherwise explain clippings?

Lets say, you have a given frequency-spectrum at a given moment. The lossy encoder removes some of these frequencies. How could the sum of those frequencies be higher than the original, if none of the frequencies has its phase shifted? And exactly that happens during clippings.
Go to the top of the page
+Quote Post
2Bdecided
post Jul 14 2003, 12:54
Post #21


ReplayGain developer


Group: Developer
Posts: 5059
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (DonP @ Jul 14 2003, 11:38 AM)
Here's another issue to chew on..

Even allowing/assuming/accepting that the difference file doesn't tell you the
quality, what do you think of using it as a crib in ABX'ing?  That is, using the
difference file to identify artifacts which you go and try to find in an ABX between
the original and encoded files, knowing exactly where to look?

Is it that all is fair as long as in the end you can identify the encoded file in a blind test, or
is it cheating a valid model if you could never pick the encoded file without "looking under
the covers" at the diff file?

That's not cheating - it's not making something audible that was inaudible. Rather, it's making something noticeable that you hadn't previously noticed.

If you compared the original to the coded version 100 times, the chance are that you could hear that difference eventually, if you had the patience. So, listening to the diff signal first is just making it a much quicker process. Maybe.


You are very likely to imagine that you hear the diff signal within the coded signal, once you've learnt it. But if it's pure imagination, ABX will take care of that!

Cheers,
David.
Go to the top of the page
+Quote Post
Pio2001
post Jul 14 2003, 14:13
Post #22


Moderator


Group: Super Moderator
Posts: 3936
Joined: 29-September 01
Member No.: 73



QUOTE (Vietwoojagig @ Jul 14 2003, 02:46 PM)
How could the sum of those frequencies be higher than the original, if none of the frequencies has its phase shifted? And exactly that happens during clippings.

It is enough that the sum is higher at a given time in order to introduce clipping. The specrtal view only shows you the average level over the window analyzed.
Remove harmonics from a square wave without changing the phase and it clips.
Go to the top of the page
+Quote Post
tigre
post Jul 14 2003, 14:13
Post #23


Moderator


Group: Members
Posts: 1434
Joined: 26-November 02
Member No.: 3890



QUOTE (2Bdecided @ Jul 14 2003, 03:54 AM)
You are very likely to imagine that you hear the diff signal within the coded signal, once you've learnt it. But if it's pure imagination, ABX will take care of that!

Additionally you never know if the diff signal consists of dropped information or of things added to the original. (Diff1 = Original - Copy and Diff2 = Copy - Original sound the same. wink.gif ) Take a sine wave as example. In lossy compression step its amplitude is quantized, so the compressed copy will be a little bit louder or a little bit lower than the original, but otherwise the same sine wave. In both cases the same tone (at lower volume) will be audible in diff signal ...


--------------------
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello
Go to the top of the page
+Quote Post
ErikS
post Jul 14 2003, 14:33
Post #24





Group: Members
Posts: 757
Joined: 8-October 01
Member No.: 247



This is interesting... Diff1 = -Diff2 by your own definition, so the question comes down to if you can hear the difference between the original and an inverted waveform. My uneducated guess would be that you can't. Other opinions?
Go to the top of the page
+Quote Post
Vietwoojagig
post Jul 14 2003, 15:11
Post #25





Group: Members
Posts: 247
Joined: 28-November 02
From: Germany, Trier
Member No.: 3916



QUOTE (ErikS @ Jul 14 2003, 05:33 AM)
This is interesting... Diff1 = -Diff2 by your own definition, so the question comes down to if you can hear the difference between the original and an inverted waveform. My uneducated guess would be that you can't. Other opinions?

That's exactly what Tigre is telling. Diff1 is the noise you add, Diff2 is the noise you drop. You can't tell the difference.
Go to the top of the page
+Quote Post

2 Pages V   1 2 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 25th July 2014 - 04:56