IPB

Welcome Guest ( Log In | Register )

4 Pages V   1 2 3 > »   
Closed TopicStart new topic
Does subtracting MP3 from WAV reveal artifacts?, was: "How True Is This" (TOS #6)
Native_Soulja
post Jul 18 2010, 01:10
Post #1





Group: Members
Posts: 25
Joined: 7-July 10
Member No.: 82108



I was wondering if the mp3 audio codec does leave behind artifacts within the file its encoded in?I am asking because i saw this video on youtube and want to know if this is true?
link> http://www.youtube.com/watch?v=u5gdwpPrv_8...feature=related
Go to the top of the page
+Quote Post
Ouroboros
post Jul 18 2010, 01:32
Post #2





Group: Members
Posts: 291
Joined: 30-May 08
From: UK
Member No.: 53927



His assertion is nonsense. He is assuming that the difference is artifacts added by the mp3 encoder, whereas in reality it's the parts of the signal discarded by the mp3 encoder, plus any artifacts added by the mp3 encoder. He has no way of telling which is which. Now, we know that the mp3 encoding process discards sounds that your ear can't hear because they are masked by other, more important parts of the signal. That's one of the ways the mp3 encoder gets the files down to such a small size, so it's perfectly normal and expected for there to be a difference between the original wav and the encoded mp3.

Further, you would expect to discard more when the overall signal is louder and the sound busier, because there will be more sound there to do the masking, hence more sound able to be masked and discarded, hence the difference will be greater - which is exactly what his experiment shows.

This post has been edited by Ouroboros: Jul 18 2010, 01:32
Go to the top of the page
+Quote Post
saratoga
post Jul 18 2010, 01:36
Post #3





Group: Members
Posts: 5119
Joined: 2-September 02
Member No.: 3264



If you subtract them you just get whatever was lost when you encoded the MP3. Essentially that guy has proved to himself that MP3 isn't a lossless format like WAV. Why he felt the need to tell youtube, well, thats anyones guess smile.gif
Go to the top of the page
+Quote Post
DVDdoug
post Jul 18 2010, 06:32
Post #4





Group: Members
Posts: 2676
Joined: 24-August 07
From: Silicon Valley
Member No.: 46454



Subtraction gives you the sample-by-sample mathematical difference, but it does NOT give you the SOUND difference! MP3 does not retain the original timing & phase information, so subtracting the samples doesn't tell you anything about the sound. It does not tell you what was added or removed from the sound during encoding.

The experiment does prove that MP3 is lossy. That is, it proves that the bytes in the file have changed.

I'll give a couple of examples that you can "try at home" with your audio editor...

Starting with two identical WAV files for each of the following experiments -

- Just as a starting point, subtract the two files. You'll get silence proving that the files are identical, and proving that no mathematical difference means no sound/audio difference.

- Invert one file. Listen to both files and compare the sound (most people won't hear a difference). Now, subtract these files (or invert again & add/mix). The result is truly the mathematical difference, but since you've "subtracted a negative", you've mathematically added the two identical files and doubled the volume. You are hearing the mathematical difference between the two files, but you're not hearing the difference in the sound.

- Take one of the identical files and add 10 milliseconds of silence to the beginning. Subtract and listen. Again, what you're hearing does NOT represent sound difference between the files, which will sound identical.

- Take one of the files an speed it up by 1/2% (I can't hear a half-percent tempo/pitch change). Subtract the two files and listen. Again the mathematical difference doesn't represent the true the sound-difference.

Note that in every case above, no artifacts are added and no sound is removed from the 2nd file before subtracting the two files, yet in every case there is a clearly audible matematical difference when you subtract the two files.

This post has been edited by DVDdoug: Jul 18 2010, 06:37
Go to the top of the page
+Quote Post
mjb2006
post Jul 18 2010, 07:37
Post #5





Group: Members
Posts: 848
Joined: 12-May 06
From: Colorado, USA
Member No.: 30694



Now if you can just squeeze all that into a little comment that will fit on that YouTube video smile.gif
Go to the top of the page
+Quote Post
DVDdoug
post Jul 18 2010, 07:41
Post #6





Group: Members
Posts: 2676
Joined: 24-August 07
From: Silicon Valley
Member No.: 46454



P.S. A couple more silly subtraction experiments -

- Record something twice. (Yourself or someone else reading something, singing a song, playing a song on an instrument, etc.) Subtract the two recordings and listen to the "difference". laugh.gif

- Subtract two completely different files (two different songs, or one song and a recording of someone speaking, etc.) and listen to the "difference". laugh.gif
Go to the top of the page
+Quote Post
Arnold B. Kruege...
post Jul 19 2010, 03:48
Post #7





Group: Members
Posts: 4304
Joined: 29-October 08
From: USA, 48236
Member No.: 61311



QUOTE (saratoga @ Jul 17 2010, 20:36) *
If you subtract them you just get whatever was lost when you encoded the MP3. Essentially that guy has proved to himself that MP3 isn't a lossless format like WAV. Why he felt the need to tell youtube, well, thats anyones guess smile.gif


Lots of people have gotten hung up on this factoid. One of them was the editor of a well known high end audio magazine. I had a chance to comment on the issue before publication and tried to disabuse him of the error, but he went ahead and published it anyway.

I recently subtracted a wav file that was made by decoding a MP3 from the .wav file that the MP3 was made from, and found that the difference file looked and sounded a lot like either wave file. This sugggests to me that a lot of the difference was due to latency (time delay) in the encode/decode cycle.
Go to the top of the page
+Quote Post
JapanAudio
post Nov 7 2010, 03:57
Post #8





Group: Members
Posts: 86
Joined: 3-November 10
Member No.: 85187



OK. I'm bumping this topic because in my attempt at reviving the subject in another thread, my post was binned. But it's all good, no hard feelings.

To start off, it is my belief that there is no actual flaw in this experiment. Signal theory is largely based on transform theory, and this is where i will find most of my arguments. I take it that some of you already know all this, but i would still like to iterate a few points that i feel are relevant to this subject. Just bear with me.

For more info on signals and systems, DSP, filtering, general theory, i recommend any book by Chi-Tsong Chen on the subject.
http://www.ece.sunysb.edu/index.php?option...8&Itemid=26
--

The very foundation of signal theory lies upon the presupposition that signals can be added and subtracted together. For example, periodic signals can be constructed by means of a Fourier series: a sum of discrete sinusoidal signals, each with their own phase and amplitude. (See: http://www.dspguide.com/ch13/4.htm)

Real world signals are rarely periodic, no worries because we can extrapolate this theorem by proposing that a non-periodic can actually be viewed as a periodic signal of infinite period (or null period), this leads us to the Fourier series in continuous form: the Fourier Transform. I would like to review basic but key properties of the Fourier transform (FT): Linearity and thus, inherently, additivity. Let f(a+b) = f(a) + f(b); Hence, the FT can be applied before or after the addition of two signals and yields the same output signal. (For a more detailed explanation, visuals, see: http://www.dspguide.com/ch10/1.htm)

The FT is also reversible, which indicates that both the time (amplitude) and frequency (frequency, phase) domains exist conjointly. It follows that any operation on one domain or the other affects its counterpart simultaneously. Encoding of the MP3 relies largely on operations applied in the frequency domain, ie. band filtering, noise shaping, etc. but these operations have the direct effect of altering the signal (time domain) by adding 'encoding noise', if you will. Adding, here, has its literal meaning-- addition (of amplitude values). This addition can be reversed by subtracting the encoding noise from the encoded signal, from there recovering the initial signal, unaltered.

(This is exactly how residual coding works. In lossless compression, the encoding noise, or prediction noise, rather, is deflated and kept, instead of being discarded like lossy compression does. Whether you add the amplitude values digitally or in the analog stage (playback), the result is the same.)

Example: You could try an experiment where you digitally add noise to a mono track, and compare it (at playback) with a stereo track with noise only in the right channel + clean signal in the left (the speakers would need to be close together and/or the listener far away to eliminate spacial distance between channels, or more simply you could render the sound through a mono device without stereo input, its all the same.) The output is your noisy track.

This is usually a very intuitive concept, so let me illustrate it with another example: Two people talking simultaneously, recorded in mono. At the same time, record a stereo as control with each voice in individual channels. How do you recover a single voice from the mono recording? You simply subtract the amplitude values of the other voice (single stereo channel) from the mono recording. In the same manner, you can losslessly downmix 5.1 Dolby digital to stereo by adding amplitude values of all the left and right channels together, the sub and center channels are equally spread to both left and right.

Therefore... you can determine exactly what encoding noise is applied to an mp3 by subtracting it from the original source. You can subsequently listen to it, just as if it were another human voice (above example). At that point you may judge if the signal has a significant effect on your hearing and enjoyment. It is possible that you would not even hear it at all, conveying good lossy compression, poor hearing, a bad sound system, etc.
--

Finally, addressing some of your concerns:

QUOTE
- Just as a starting point, subtract the two files. You'll get silence proving that the files are identical, and proving that no mathematical difference means no sound/audio difference.

That is correct.

QUOTE
- Invert one file. Listen to both files and compare the sound (most people won't hear a difference). Now, subtract these files (or invert again & add/mix). The result is truly the mathematical difference, but since you've "subtracted a negative", you've mathematically added the two identical files and doubled the volume. You are hearing the mathematical difference between the two files, but you're not hearing the difference in the sound.

Like you said, inverting a signal, or negating all its samples, has the effect of adding the signal to itself when subtracting it to original signal ie. a - (-a) = a + a = 2a; Amplification by a gain of 2. In the same fashion, two recordings playing the exact same waveform exactly at the same time will create amplification by a factor of 2.

Shifting the phase of a signal is an operation intuitively done in the frequency domain, but this directly alters the signal by inverting it (if shifted by pi rad). Although it may still sound the same, in this instance, adding them both together results in amplification due to the nature of the shift operation.

QUOTE
- Take one of the identical files and add 10 milliseconds of silence to the beginning. Subtract and listen. Again, what you're hearing does NOT represent sound difference between the files, which will sound identical.

The signals may sound identical disregarding the time shift, but they are not the same signal. So adding them would create a similar effect similarly as delaying one channel in a stereo system.

QUOTE
- Take one of the files an speed it up by 1/2% (I can't hear a half-percent tempo/pitch change). Subtract the two files and listen. Again the mathematical difference doesn't represent the true the sound-difference.

Speeding up a signal is equivalent to modulation (frequency shift). Although it may sound like the original, a slight modulation, again, alters the signal. It must be demodulated before any operation can be applied.

QUOTE
- Record something twice. (Yourself or someone else reading something, singing a song, playing a song on an instrument, etc.)
Subtract the two recordings and listen to the "difference". laugh.gif

It would be difficult to record exactly the same thing twice. The signals are different.

QUOTE
- Subtract two completely different files (two different songs, or one song and a recording of someone speaking, etc.) and listen to the "difference".

This an interesting example because you present it the other way around. Subtracting two signals that have no relation would have the same effect as having one signal traveling through air in the opposite fashion as sound does, ie. compression of air becomes dilation and vice versa. This has no common sense and is difficult to imagine. Nonetheless you could probably still hear the non-inverted signal through the distortion.

If you want an output that makes sense you could trying adding them together instead of subtracting them. This would result in hearing both recordings at the same time.

Subtraction is the opposite of addition, but you need to do it right. In our initial case we subtract to calculate encoding noise, but we need to add this noise in order to retrieve the original signal. You can do it the other way around, you anticipate what kind of noise your encoder makes, then compute and subtract it; This way the noise will be the inverted with respect to the first case. That being said, subtracting for fun makes no sense in your example.
--

Note that all these situations you enumerate one after the other are mostly examples of slight modifications that might not necessarily manifest themselves in audition, however they fundamentally modify the signal. Usually, a slight offset error is more irritating and distraction than a larger one, eg. two slightly off guitar strings will produce high rate pulsation while a larger interval might produce harmony; A realistic humanoid might seem creepy while a robot like ASIMO (Honda) looks friendly.

Please revise your analogies and comparisons, maybe signal theory, or your own intuition, before rejecting my arguments or deleting my post.

This post has been edited by JapanAudio: Nov 7 2010, 04:33
Go to the top of the page
+Quote Post
Ouroboros
post Nov 7 2010, 05:00
Post #9





Group: Members
Posts: 291
Joined: 30-May 08
From: UK
Member No.: 53927



QUOTE (JapanAudio @ Nov 7 2010, 02:57) *
Therefore... you can determine exactly what encoding noise is applied to an mp3 by subtracting it from the original source.

Correct, you can do this.

QUOTE (JapanAudio @ Nov 7 2010, 02:57) *
You can subsequently listen to it,

Correct.

QUOTE (JapanAudio @ Nov 7 2010, 02:57) *
At that point you may judge if the signal has a significant effect on your hearing and enjoyment. It is possible that you would not even hear it at all, conveying good lossy compression, poor hearing, a bad sound system, etc.

Nonsense. Listening to what's been discarded tells you nothing of any value. The whole design goal of the psychoacoustic model associated with lossy music compressors is to work out what can safely be discarded with no perceptual impact on the listener.

Here's an analogy for you. Cook a meat sauce (like a bolognese sauce) for a family of four, with 2 teaspoons of salt, then cook an identical one but with only 1.5 teaspoons of salt. Taste what's been left out (the half teaspoon of salt), I guarantee that it will be very noticeable and not very pleasant to taste. However, do a double blind tasting of the two sauces and you're unlikely to be able to taste the difference - tasting the difference has told you nothing about the difference between the two versions of the sauce. It's the same with lossy encoding - the only valid test is to compare the uncompressed file with the compressed one in a double blind test.
Go to the top of the page
+Quote Post
HibyPrime
post Nov 7 2010, 05:33
Post #10





Group: Members
Posts: 8
Joined: 7-November 10
Member No.: 85497



QUOTE (JapanAudio @ Nov 7 2010, 03:57) *
OK. I'm bumping this topic because in my attempt at reviving the subject in another thread, my post was binned. But it's all good, no hard feelings....

Truncated


While I'm inclined to agree with your post, theres a small factor that DVDdoug said:

QUOTE (DVDdoug @ Jul 18 2010, 06:32) *
MP3 does not retain the original timing & phase information, so subtracting the samples doesn't tell you anything about the sound.


If this is true, and the timing is different in an MP3 encoding from it's original, then the subtraction method used in the video doesn't truely show what was removed from the encoding process.

I tried to google what doug was talking about, but couldn't find any information, so I'm going to create a scenario. This is just hypothetical because I don't have a major in psychoacoustics and don't know exactly where the information that can be safely removed is, this is just my best guess at a scenario.

If you have a tom tom that is hit at exactly the same time as a snare at a similar volume, assume that there is information that can be removed from the waveform without a perceptible loss. If the information is removed, all is well and the listener (hopefully) hears no difference. But what if the tom tom was hit 10ms before the snare? An encoder would have to produce the early attack of both drums separately, it is possible that the encoder would benefit (as in smaller file size) from moving the attack of the tom tom up to match with the attack of the snare. In this hypothetical case changing the timing information would serve the file size while not appreciably altering the sound.

If (yes, another IF) the MP3 encoder does this, then simply subtracting the MP3 from the WAV will not show exactly what was removed, it will also produce an amount of distortion that is near impossible to determine.

This post has been edited by HibyPrime: Nov 7 2010, 05:39
Go to the top of the page
+Quote Post
JapanAudio
post Nov 7 2010, 05:39
Post #11





Group: Members
Posts: 86
Joined: 3-November 10
Member No.: 85187



QUOTE (Ouroboros @ Nov 7 2010, 00:00) *
QUOTE (JapanAudio @ Nov 7 2010, 02:57) *
Therefore... you can determine exactly what encoding noise is applied to an mp3 by subtracting it from the original source.

Correct, you can do this.

QUOTE (JapanAudio @ Nov 7 2010, 02:57) *
You can subsequently listen to it,

Correct.

QUOTE (JapanAudio @ Nov 7 2010, 02:57) *
At that point you may judge if the signal has a significant effect on your hearing and enjoyment. It is possible that you would not even hear it at all, conveying good lossy compression, poor hearing, a bad sound system, etc.

Nonsense. Listening to what's been discarded tells you nothing of any value. The whole design goal of the psychoacoustic model associated with lossy music compressors is to work out what can safely be discarded with no perceptual impact on the listener.

Here's an analogy for you. Cook a meat sauce (like a bolognese sauce) for a family of four, with 2 teaspoons of salt, then cook an identical one but with only 1.5 teaspoons of salt. Taste what's been left out (the half teaspoon of salt), I guarantee that it will be very noticeable and not very pleasant to taste. However, do a double blind tasting of the two sauces and you're unlikely to be able to taste the difference - tasting the difference has told you nothing about the difference between the two versions of the sauce. It's the same with lossy encoding - the only valid test is to compare the uncompressed file with the compressed one in a double blind test.

Yes i agree. The psychoacoustic model determines what kind of noise could be added to the mix for the effect to be negligible. Another analogy would be a yelling crowd, you can determine which people you can remove from the crowd to make sure the overall yelling sounds the same. However, my point is, that to a certain extent, it is reliable to hear the people you've discarded from the crowd to realize what kind of noise the overall crowd is missing.

The role of the psychoacoustic model is to ensure that the discarded sound is as least audible as possible to achieve the target bitrate. Our analogies are good for visualization, but it is hard to compare on the same scale; for many acoustic phenomenons act upon the logarithmic and exponential scales.

On the other hand, if you eat meat sauce everyday for a whole year with the precise same recipe and exact quantities, perhaps at the end of the year you would be able to detect a slight change in taste, a little as it may be.
Go to the top of the page
+Quote Post
JapanAudio
post Nov 7 2010, 05:59
Post #12





Group: Members
Posts: 86
Joined: 3-November 10
Member No.: 85187



QUOTE (HibyPrime @ Nov 7 2010, 00:33) *
QUOTE (JapanAudio @ Nov 7 2010, 03:57) *
OK. I'm bumping this topic because in my attempt at reviving the subject in another thread, my post was binned. But it's all good, no hard feelings....

Truncated


While I'm inclined to agree with your post, theres a small factor that DVDdoug said:

QUOTE (DVDdoug @ Jul 18 2010, 06:32) *
MP3 does not retain the original timing & phase information, so subtracting the samples doesn't tell you anything about the sound.


If this is true, and the timing is different in an MP3 encoding from it's original, then the subtraction method used in the video doesn't truely show what was removed from the encoding process.

I don't have specific any references to falsify this quote, but i am fairly familiar with the mpeg-1 audio standard to believe that converting to mp3 does not change any timings deliberately (the quote from DVDdoug is pretty vague). In simple terms, it only filters out certain frequencies and introduces noise shaping according to the psychoacoustic model. One argument that could incline you to believe this is that the MPEG-1 layer 3 standard uses MDCT, which does not take into account explicit expression and modification of phase values, such as the FFT does.

But even so, modifying phase at certain frequencies (actually, just thinking about it now) wouldn't really matter because this would be translated into noise in the time domain. As for timing, there is really no interest in delaying a signal for compression purposes and the mp3 std does not delay the input signal unless specified afaik.
Go to the top of the page
+Quote Post
carpman
post Nov 7 2010, 06:02
Post #13





Group: Developer
Posts: 1334
Joined: 27-June 07
Member No.: 44789



QUOTE (JapanAudio @ Nov 7 2010, 05:39) *
On the other hand, if you eat meat sauce everyday for a whole year with the precise same recipe and exact quantities, perhaps at the end of the year you would be able to detect a slight change in taste, a little as it may be.

Precisely, you detect the difference by tasting the meal you've become familiar with, NOT by eating the half a spoonful of salt that was removed.

C.


--------------------
TAK -p4m :: LossyWAV -q 6 | TAK :: Lame 3.98 -V 2
Go to the top of the page
+Quote Post
Ouroboros
post Nov 7 2010, 06:07
Post #14





Group: Members
Posts: 291
Joined: 30-May 08
From: UK
Member No.: 53927



QUOTE (JapanAudio @ Nov 7 2010, 04:39) *
However, my point is, that to a certain extent, it is reliable to hear the people you've discarded from the crowd to realize what kind of noise the overall crowd is missing.

I'm not sure what you mean by reliable in this context, but your point about listening to the discarded signal is just wrong. The discarded signal has been discarded because it can't be heard underneath the retained signal. Listening to the discarded signal tells you nothing about how well the codec has identified what signal elements can be discarded, because you don't know what was going on with the retained signal at the same time.

Again, you can only assess the transparency of a lossy codec by listening to the original signal and the compressed signal in a DBT, not by examining or listening to the difference. The fact that you can see the difference, and that you can play it through a speaker system, doesn't mean that it's a useful measure of how your brain will perceive the compressed signal.
Go to the top of the page
+Quote Post
JapanAudio
post Nov 7 2010, 06:37
Post #15





Group: Members
Posts: 86
Joined: 3-November 10
Member No.: 85187



QUOTE (carpman @ Nov 7 2010, 01:02) *
QUOTE (JapanAudio @ Nov 7 2010, 05:39) *
On the other hand, if you eat meat sauce everyday for a whole year with the precise same recipe and exact quantities, perhaps at the end of the year you would be able to detect a slight change in taste, a little as it may be.

Precisely, you detect the difference by tasting the meal you've become familiar with, NOT by eating the half a spoonful of salt that was removed.

C.

That's true. Read below.


QUOTE (Ouroboros @ Nov 7 2010, 01:07) *
QUOTE (JapanAudio @ Nov 7 2010, 04:39) *
However, my point is, that to a certain extent, it is reliable to hear the people you've discarded from the crowd to realize what kind of noise the overall crowd is missing.

I'm not sure what you mean by reliable in this context, but your point about listening to the discarded signal is just wrong. The discarded signal has been discarded because it can't be heard underneath the retained signal. Listening to the discarded signal tells you nothing about how well the codec has identified what signal elements can be discarded, because you don't know what was going on with the retained signal at the same time.

Again, you can only assess the transparency of a lossy codec by listening to the original signal and the compressed signal in a DBT, not by examining or listening to the difference. The fact that you can see the difference, and that you can play it through a speaker system, doesn't mean that it's a useful measure of how your brain will perceive the compressed signal.

Well, i'd rephrase it like this: The discarded signal is the least likely to be heard when combined with the main signal; keep in mind that you still have to obtain a target bitrate, so performance varies. Listening to the discard tells you exactly what is missing, and what is missing is the least likely signal to be heard with the main signal.

I am not disputing this: To assess or compare lossy compression performance, you NEED to listen to the lossy files because it is difficult to determine what kind of noise may or may not be heard combined with the main track. I am only saying that listening to the discarded noise reveals the exact artifacts that are introduced by the encoder.
Go to the top of the page
+Quote Post
HibyPrime
post Nov 7 2010, 06:43
Post #16





Group: Members
Posts: 8
Joined: 7-November 10
Member No.: 85497



QUOTE (Ouroboros @ Nov 7 2010, 06:07) *
...your point about listening to the discarded signal is just wrong. The discarded signal has been discarded because it can't be heard underneath the retained signal.


Thats the goal of psychoacoustics, the fact that many people claim to hear the difference between lossless and lossy says that they haven't quite achieved that goal yet - at least at 320Kbps and under.

QUOTE
Listening to the discarded signal tells you nothing about how well the codec has identified what signal elements can be discarded, because you don't know what was going on with the retained signal at the same time.

Again, you can only assess the transparency of a lossy codec by listening to the original signal and the compressed signal in a DBT, not by examining or listening to the difference. The fact that you can see the difference, and that you can play it through a speaker system, doesn't mean that it's a useful measure of how your brain will perceive the compressed signal.


I have to disagree with this.

When I started ripping/downloading all of my music in FLAC, I only did it because I had just a gotten bigger hard drive, and saw no reason to keep ripping in MP3. I ran ABX tests in foobar and would consistently fail to tell a difference between +192Kbps MP3s and FLAC. Over time I started to notice things I've never heard before in tracks I've had for a long time, but I would still fail ABX tests. At this point you would assume that the only difference is all in my head, but when I heard the difference file in the youtube link it sounds a lot like the sounds that I occasionally notice in FLAC files. Note I'm referring to the lower volume part of the clip, once it hits the louder part it just sounds like a mess.

To use the sauce analogy, it's like eating the sauce with 1.5tbsp of salt for a month, then eating the sauce with 2tbsp of salt for a month and tasting a difference that you're not even sure is really there, until someone gives you 0.5tbsp of salt and it hits you - "that was what changed".

QUOTE (JapanAudio @ Nov 7 2010, 05:59) *
I don't have specific any references to falsify this quote, but i am fairly familiar with the mpeg-1 audio standard to believe that converting to mp3 does not change any timings deliberately (the quote from DVDdoug is pretty vague). In simple terms, it only filters out certain frequencies and introduces noise shaping according to the psychoacoustic model. One argument that could incline you to believe this is that the MPEG-1 layer 3 standard uses MDCT, which does not take into account explicit expression and modification of phase values, such as the FFT does.

But even so, modifying phase at certain frequencies (actually, just thinking about it now) wouldn't really matter because this would be translated into noise in the time domain. As for timing, there is really no interest in delaying a signal for compression purposes and the mp3 std does not delay the input signal unless specified afaik.


Should I take this to mean that the noise in the time domain won't result in noise in the frequency (or rather, audible) domain? I'm having a hard time wrapping my head around time-related noise.

Well it seems a lot more intuitive in many ways to not alter the timing, so it's probably a good thing that they don't do it for the folks over at the lame development team smile.gif

This post has been edited by HibyPrime: Nov 7 2010, 06:47
Go to the top of the page
+Quote Post
JapanAudio
post Nov 7 2010, 06:58
Post #17





Group: Members
Posts: 86
Joined: 3-November 10
Member No.: 85187



QUOTE
At that point you may judge if the signal has a significant effect on your hearing and enjoyment. It is possible that you would not even hear it at all, conveying good lossy compression, poor hearing, a bad sound system, etc.

To quote myself, i would like to clarify that not hearing the signal noise could indeed signify good lossy compression. But this is only a one-way implication, the opposite is not true (hearing it does not mean bad lossy). I will also rectify the "judging" part of that statement, it can only give you an objective rendition of what is not being heard, no judging involved.

That being said, i don't know why anyone would ever settle for lossy, unless portability or filesize is an issue. Because, you know, that spoonful of salt may well be the one that triggers that fatal heart attack...
Go to the top of the page
+Quote Post
JapanAudio
post Nov 7 2010, 07:29
Post #18





Group: Members
Posts: 86
Joined: 3-November 10
Member No.: 85187



QUOTE (HibyPrime @ Nov 7 2010, 01:43) *
Should I take this to mean that the noise in the time domain won't result in noise in the frequency (or rather, audible) domain? I'm having a hard time wrapping my head around time-related noise.

Well it seems a lot more intuitive in many ways to not alter the timing, so it's probably a good thing that they don't do it for the folks over at the lame development team smile.gif

Nope, both are simultaneously related by the transforms: Altering frequency adds noise, and adding noise alters frequency. The magic is in the psychoacoustic noise shaping; it shifts the noise to less audible frequency bands. The noise has the same power, shaped or not, and by listening to the discards only you may or may not be able to tell which is better than the other. Although you could make an educated guess, because noise shaping tends to produce higher pitched noise... But after matching them with the original audio, you will be in a better position to tell which one has better performance.
Go to the top of the page
+Quote Post
greynol
post Nov 7 2010, 08:52
Post #19





Group: Super Moderator
Posts: 10256
Joined: 1-April 04
From: San Francisco
Member No.: 13167



QUOTE (JapanAudio @ Nov 6 2010, 22:37) *
I am only saying that listening to the discarded noise reveals the exact artifacts that are introduced by the encoder.

...and this signal played by itself is completely meaningless in determining the sound quality of the lossy signal from which it was derived. The only proper way to determine this is through double-blind testing. There is no way to wiggle your way out of this; not on this forum.


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
Alexey Lukin
post Nov 7 2010, 09:10
Post #20





Group: Members
Posts: 198
Joined: 31-July 08
Member No.: 56508



QUOTE (DVDdoug @ Jul 18 2010, 00:32) *
Subtraction gives you the sample-by-sample mathematical difference, but it does NOT give you the SOUND difference! MP3 does not retain the original timing & phase information, so subtracting the samples doesn't tell you anything about the sound.

I disagree!
Mp3 encoding does preserve timing and phase information. Unlike parametric coders, Mp3 strives to preserve the waveform, including its phase. The distortion introduced by mp3 a quantization noise is fully revealed when mp3 is subtracted from wav (provided that your decoder does not time-shift the decoded file). And this distortion gets quite small when the bit rate gets higher.
Of course, you cannot judge on audibility of this quantization noise w/o having the rest of the audio signal, due to masking.
Go to the top of the page
+Quote Post
krabapple
post Nov 7 2010, 09:58
Post #21





Group: Members
Posts: 2446
Joined: 18-December 03
Member No.: 10538



QUOTE (JapanAudio @ Nov 7 2010, 01:37) *
I am not disputing this: To assess or compare lossy compression performance, you NEED to listen to the lossy files because it is difficult to determine what kind of noise may or may not be heard combined with the main track. I am only saying that listening to the discarded noise reveals the exact artifacts that are introduced by the encoder.



So what? It doesn't tell you whether they are audible in context.

Go to the top of the page
+Quote Post
Ouroboros
post Nov 7 2010, 13:18
Post #22





Group: Members
Posts: 291
Joined: 30-May 08
From: UK
Member No.: 53927



QUOTE (HibyPrime @ Nov 7 2010, 05:43) *
Thats the goal of psychoacoustics, the fact that many people claim to hear the difference between lossless and lossy says that they haven't quite achieved that goal yet - at least at 320Kbps and under.

That's an incorrect generalisation - it's only true for a very few killer samples, and for a few people. In general, modern lossy codecs (MP3 / AAC / OGG) achieve perceptual transparency at much lower bit rates for the vast majority of people and for the vast majority of music. That's what all of the properly conducted tests have revealed, and that's why many people who do their own tests settle on LAME -V2 (around 220 kb/S) or even lower for their MP3 encoding.

QUOTE (HibyPrime @ Nov 7 2010, 05:43) *
When I started ripping/downloading all of my music in FLAC, I only did it because I had just a gotten bigger hard drive, and saw no reason to keep ripping in MP3. I ran ABX tests in foobar and would consistently fail to tell a difference between +192Kbps MP3s and FLAC. Over time I started to notice things I've never heard before in tracks I've had for a long time, but I would still fail ABX tests. At this point you would assume that the only difference is all in my head, but when I heard the difference file in the youtube link it sounds a lot like the sounds that I occasionally notice in FLAC files. Note I'm referring to the lower volume part of the clip, once it hits the louder part it just sounds like a mess.

It is illogical to claim that the difference file on Youtube sounds like the sounds you occasionally notice in FLAC files. If you can't ABX it then you clearly can't notice the difference, so it's all in your head i.e. it's a placebo effect.

Again, the psychoacoustic models in lossy codecs exploit the way your ears and brain perceive sound, and ABX is the only generally available method that measures your perception of the sound. Looking at the difference file, or listening to it in isolation, tells you nothing.

This post has been edited by Ouroboros: Nov 7 2010, 13:23
Go to the top of the page
+Quote Post
pdq
post Nov 7 2010, 18:21
Post #23





Group: Members
Posts: 3443
Joined: 1-September 05
From: SE Pennsylvania
Member No.: 24233



QUOTE (Alexey Lukin @ Nov 7 2010, 04:10) *
I disagree!
Mp3 encoding does preserve timing and phase information. Unlike parametric coders, Mp3 strives to preserve the waveform, including its phase.

There is nothing in the mp3 standard that says that you have to do anything of the kind. If current implementations DO preserve timing and phase then that may be because it is easiest to implement that way.

If someone discovered a way to encode to mp3 such that the timing and phase were not preserved, but the result was peceptually transparent at lower bitrates, then everybody would switch to doing it that way.

In fact, the only thing that the mp3 standard specifies is how to decode an mp3 file to wav.
Go to the top of the page
+Quote Post
greynol
post Nov 7 2010, 19:37
Post #24





Group: Super Moderator
Posts: 10256
Joined: 1-April 04
From: San Francisco
Member No.: 13167



I'd like to see DVDdoug defend his comment regarding phase response.


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
Alexey Lukin
post Nov 7 2010, 19:56
Post #25





Group: Members
Posts: 198
Joined: 31-July 08
Member No.: 56508



QUOTE (pdq @ Nov 7 2010, 13:21) *
There is nothing in the mp3 standard that says that you have to do anything of the kind. If current implementations DO preserve timing and phase then that may be because it is easiest to implement that way. If someone discovered a way to encode to mp3 such that the timing and phase were not preserved, but the result was peceptually transparent at lower bitrates, then everybody would switch to doing it that way.

May I assure you that there no practical way to encode an mp3 file better than with a standard phase-preserving quantization of coefficients of a phase-preserving filter bank.

This post has been edited by Alexey Lukin: Nov 7 2010, 19:59
Go to the top of the page
+Quote Post

4 Pages V   1 2 3 > » 
Closed TopicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 27th November 2014 - 15:49