IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
What is this low-frequency content?
Zoetrap
post Apr 26 2012, 09:32
Post #1





Group: Members
Posts: 3
Joined: 26-April 12
Member No.: 99220



Hello everyone! New member here smile.gif

I'm a student in media technology and I'm currently in the process of writing a paper on lossy audio compression. I was recently studying some spectral figures comparing an audio file in its "original" Wave-format (imported straight from a CD in 16-bit 44,100 kHz) and two versions of the same audio file coded in 128 kbps AAC and 128 kbps Mp3. While just randomly fiddling around with the scale of the spectral view in Adobe Audition, learning how o use the program, I stumbled across this seemingly low frequency content appearing in the 128 kbps versions:



The content that's appeared in the Mp3 and AAC versions sits roughly in the 0 - 30 Hz range.

I've been studying the principles of perceptual coding and psychoacoustic models, but I haven't been able to draw any conclusion to what this really is. Is it some low frequency noise allowed to rise because it's being masked by other content?

The low frequency content does not appear in Mp3 and AAC versions coded in 256 kbps so I'm fairly certain it has something to do with coding quality...


I'd be grateful for any insights! smile.gif
Go to the top of the page
+Quote Post
Dynamic
post Apr 26 2012, 11:19
Post #2





Group: Members
Posts: 796
Joined: 17-September 06
Member No.: 35307



It appears to occur mostly during transients. It's possible there's even clipping, which produces white noise distortion. Try copying the files then applying negative gain in the decoder (e.g. use foobar2000 with Replaygain turned on in the converter dialogue).
Go to the top of the page
+Quote Post
Zoetrap
post Apr 26 2012, 14:08
Post #3





Group: Members
Posts: 3
Joined: 26-April 12
Member No.: 99220



I tried re-encoding the original Wave file with replaygain enabled in Foobar2000 (I hope that was what you meant? Being primarily a mac user I'm not very familiar with Foobar2000 overall) and the noise is still there, although it seems to be altered and kind of "shuffled around" quite a bit. It's still mostly prominent during transients though.
Go to the top of the page
+Quote Post
Alexey Lukin
post Apr 26 2012, 18:36
Post #4





Group: Members
Posts: 191
Joined: 31-July 08
Member No.: 56508



This is most probably a quantization noise spreading in frequency. Hard to tell without WAV files...
Go to the top of the page
+Quote Post
Zoetrap
post Apr 26 2012, 20:13
Post #5





Group: Members
Posts: 3
Joined: 26-April 12
Member No.: 99220



Here is a part of the original Wave file (cut due to file size. No other modifications done):

http://dl.dropbox.com/u/518147/El_Colibri_part.wav


Forgive me if I'm not fully understanding all of the inner workings of perceptual coding, but if the added noise in the coded files is indeed quantization noise, then the difference in the noise between Mp3 and AAC is surely due to the differences in the codecs regarding psychoacoustic model used, different technologies such as Temporal Noise Shaping and such?


I've been studying perceptual coding for quite some time now, but I find it to be such a huge subject and many parts are not that easy to comprehend. I'm finding it quite fascinating though. smile.gif

This post has been edited by Zoetrap: Apr 26 2012, 20:19
Go to the top of the page
+Quote Post
Dynamic
post May 4 2012, 14:57
Post #6





Group: Members
Posts: 796
Joined: 17-September 06
Member No.: 35307



Seem like there's no decoder clipping. Having now seen the file, it only reached full scale once in one channel and has plenty of headroom the rest of the time.

I just made an encoding using lame 3.98.4 (not the latest version, I know)

CODE
lame -V5 filename.wav


This created filename.wav.mp3 which averaged about 144 kbps of variable bitrate MP3. It's likely to use significantly higher bitrate at certain times (often transients) and lower bitrates at others. With such a lot of picking sounds on the guitar strings, and possibly some timing difference in when these transients reach left and right channels, I dare say this is more demanding of bitrate than most samples.

I then decoded this using
CODE
lame --decode filename.wav.mp3

to create filename.wav.mp3.wav

It also removed the timing offsets introduced by the encode-decode process, which most mp3 encoder-decoder pairs don't do.

I used an old version of Cool Edit 96 (predecessor to Adobe Audition) to view the spectrogram (mainly because I remembered where to find the menu to change spectral resolution to 2048 bands with Blackman window)

As you see in my capture image (not embeddable, so click the link), neither version exhibits significant content below 40Hz (displaying whole file but only lower portion of frequency spectrum):

http://www.mediafire.com/i/?56sa66550vsxbjk

I then repeated using lame without the -V5 option so it encodes to CBR 128kbps (same as -b 128 option) and found the same as you.

None of the areas involved were clipping (except maybe one towards the end of the sample). With so much picking during the piece, I'd imagine there could be significant transients throughout, so it's plausible that the encoder switches to short blocks (with poorer frequency resolution) and in the case of CBR 128 it doesn't have the available bits to encode with the accuracy that will result in no bleed-through and concentrates the bits available on encoding more accurately in the most important parts of the spectrum. But this is very hand-waving guess at the causes.

The important thing is not whether the spectrum looks identical but whether or not it sounds identical. Transients introduce temporal masking such that more distortion can go un-noticed for a short time after a transient (and a shorter time before it), so it might be that you can't hear the difference. The best way to find that out is to run an ABX test comparing the decoded MP3 or AAC to the original WAV. A spectrogram that looks great can sound awful (try the old BLADE encoder) and one that looks a poor match can sound indistinguishable (often LAME -V5 or so will look lacking in the treble area but sound perfect thanks to its very well tuned psychoacoustic model, which applies to the VBR modes but not CBR).

If you want to find out if it's significant, forget 'measurement' and rule-of-thumb engineering specs (like 20 Hz- 20 kHz range of human hearing rule of thumb) and use ABX to see how it sounds to the human being. That's absolutely necessary with psychoacoustic encoders in the presence of real music rather than test tones.

[edit: minor typos]

This post has been edited by Dynamic: May 4 2012, 14:58
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 1st August 2014 - 14:08