IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Vorbis Noise Normalization
dsimcha
post Mar 16 2010, 16:12
Post #1





Group: Members
Posts: 58
Joined: 2-November 04
Member No.: 17953



I've become very curious about how Vorbis noise normalization works. My (very vague) understanding is that it somehow avoids the metallic artifacts of MP3 and makes Vorbis's degradation when perceptual transparency can't be achieved sound much more like (less annoying) Gaussian noise. I can't seem to find a much more detailed explanation anywhere, though. Can someone please either explain it or point me to a decent explanation? I'm looking for a moderately technical explanation. Ideally, I want one that is intended for people with significant math background, etc., but is an overview rather than describing every small detail to someone who might want to implement noise normalization.
Go to the top of the page
+Quote Post
SebastianG
post Mar 16 2010, 18:46
Post #2





Group: Developer
Posts: 1317
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



The idea is simple, really. A psychoacoustic model basically estimates how much distortions/quantization noise can be tolerated in specific time/frequency regions. But at low signal to noise ratios we can observe that simple undithered scalar quantzation introduces nonlinear artefacts. This is what makes it sound "metallic". Also, for us humans it's very important to preserve the "energy" of the signal (think of energy = sum of squared samples). MPEG4 AAC even uses a tool called PNS (perceptual noise substitution). This tool does only preserve the energy level and nothing else. Back to Vorbis. The Vorbis encoder simply computes the sum of squared original samples over blocks of (I think) 32 samples in the high frequency area and compares it to the sum of squared quantized samples. If the latter sum is less than the former, we lost some energy. Then, some samples are "promoted" to have higher values. This goes both ways. People used to complain about a "HF boost" with Vorbis which can partly be explained by quantization noise adding to the loudness which is directly linked to "energy". Example:

CODE
original:
[ -2.35  -0.84   0.65  -1.39   1.25  -0.49  -0.85   0.89 ]

rounded (quantization)
[ -2  -1   1  -1   1  0  -1   1 ]

The sum of squares for the original is 11.9. The sum of squares of the second signal is only 10. We can try to fix that by "rounding" -0.49 to -1 instead of 0. Then we have a sum of squares which equals 11. This is closer to 11.9. In addition, we could "promote" the 4th sample to -2 instead of -1. This would add another 3 to the sum of squared samples. But 14 is a little too much. We should probably stick to
CODE
[ -2  -1   1  -1   1  -1  -1   1 ]

as our quantized signal. So, we're not only interested in finding a quantized vector that is close to the original but also in finding one that has about the same "length" (think of it as vector quantization).

This technique also seems to reduce the metallic artefacts.

This post has been edited by SebastianG: Mar 16 2010, 18:56
Go to the top of the page
+Quote Post
dsimcha
post Mar 17 2010, 14:00
Post #3





Group: Members
Posts: 58
Joined: 2-November 04
Member No.: 17953



Thanks. This basically makes sense. I'd been experimenting with some listening tests lately and unfortunately you ruined my blissful inability to ABX -q 3 Vorbis. Now that I'm aware of noise normalization I definitely hear the high frequency boost in a few tracks with a lot of high frequency material and can ABX it. However, apparently noise normalization is turned off at -q 4, so given a passage with enough high frequency material (read: only in a few pathological cases), I can ABX -q 4 by listening for a drop in high frequency content.

On the other hand, I don't find these slight changes in the level of high frequency material at all annoying, since it sounds as if the music could have just been mixed slightly differently. I'd never be able to pick it out in casual listening and I can only even ABX it on samples with lots and lots of high frequency content. On the other hand, I can ABX Nero AC3 and LAME VBR at similar average bitrates much more easily and the artifacts are subjectively much more annoying, as even in casual listening I would realize I was probably hearing digital artifacts, not just minutely different frequency response.
Go to the top of the page
+Quote Post
Garf
post Mar 17 2010, 15:12
Post #4


Server Admin


Group: Admin
Posts: 4883
Joined: 24-September 01
Member No.: 13



I thought the problem of the HF energy being increased by the quantization in noise normalization was fixed long ago in aoTuv, so I wonder if what you're hearing has anything at all to do with it to begin with.

The change at -q4 might be due to different stereo models, to give an example.
Go to the top of the page
+Quote Post
dsimcha
post Mar 17 2010, 15:24
Post #5





Group: Members
Posts: 58
Joined: 2-November 04
Member No.: 17953



QUOTE (Garf @ Mar 17 2010, 10:12) *
I thought the problem of the HF energy being increased by the quantization in noise normalization was fixed long ago in aoTuv, so I wonder if what you're hearing has anything at all to do with it to begin with.

The change at -q4 might be due to different stereo models, to give an example.


I did these tests using libvorbis 1.2.2 (the version included with Foobar), which I thought had all of the more important aoTuv stuff folded in.

As far as stereo models, I thought that only -q 6 and above use lossless stereo coupling.
Go to the top of the page
+Quote Post
Garf
post Mar 17 2010, 15:26
Post #6


Server Admin


Group: Admin
Posts: 4883
Joined: 24-September 01
Member No.: 13



QUOTE (dsimcha @ Mar 17 2010, 15:24) *
As far as stereo models, I thought that only -q 6 and above use lossless stereo coupling.


Yes, which implies below -q6 the stereo isn't lossless and can use different frequency thresholds smile.gif
Go to the top of the page
+Quote Post
lvqcl
post Mar 17 2010, 16:42
Post #7





Group: Developer
Posts: 3341
Joined: 2-December 07
Member No.: 49183



QUOTE (dsimcha)
I did these tests using libvorbis 1.2.2 (the version included with Foobar)

?? foobar2000 doesn't have any encoder included.

QUOTE
which I thought had all of the more important aoTuv stuff folded in

IIRC official libvorbis still incorporates aoTuV beta2 code (i.e. the same as libvorbis 1.1.0).
Go to the top of the page
+Quote Post
dsimcha
post Mar 17 2010, 17:16
Post #8





Group: Members
Posts: 58
Joined: 2-November 04
Member No.: 17953



QUOTE (lvqcl @ Mar 17 2010, 11:42) *
QUOTE (dsimcha)
I did these tests using libvorbis 1.2.2 (the version included with Foobar)

?? foobar2000 doesn't have any encoder included.


My bad. It does have a UI for one though. I forgot that I had set this up a long time ago. Will see if this has been corrected in the most recent versions of AoTuv.
Go to the top of the page
+Quote Post
googlebot
post Mar 17 2010, 17:27
Post #9





Group: Members
Posts: 698
Joined: 6-March 10
Member No.: 78779



I never understood why AoTuv has been there so long. It's not a classical experimental branch, but widely used as a preferable choice. So why doesn't the author commit to mainline directly, he seems to know what he is doing?
Go to the top of the page
+Quote Post
dsimcha
post Mar 17 2010, 17:28
Post #10





Group: Members
Posts: 58
Joined: 2-November 04
Member No.: 17953



QUOTE (Garf @ Mar 17 2010, 10:12) *
I thought the problem of the HF energy being increased by the quantization in noise normalization was fixed long ago in aoTuv, so I wonder if what you're hearing has anything at all to do with it to begin with.


BTW, can you please define "long ago"? Do you mean before Beta 2, long enough ago that libvorbis would have incorporated it, or not?
Go to the top of the page
+Quote Post
dsimcha
post Mar 17 2010, 17:48
Post #11





Group: Members
Posts: 58
Joined: 2-November 04
Member No.: 17953



QUOTE (googlebot @ Mar 17 2010, 12:27) *
I never understood why AoTuv has been there so long. It's not a classical experimental branch, but widely used as a preferable choice. So why doesn't the author commit to mainline directly, he seems to know what he is doing?


Yeah, I haven't kept up with the development of Vorbis for the past few years. I'm just starting catch up now. I was under the impression until today that AoTuv was a normal experimental branch and that everything worthwhile had been merged into libvorbis 1.1 and 1.2.
Go to the top of the page
+Quote Post
lvqcl
post Mar 17 2010, 18:02
Post #12





Group: Developer
Posts: 3341
Joined: 2-December 07
Member No.: 49183



Changes described at http://www.geocities.jp/aoyoume/aotuv/ and http://www.geocities.jp/aoyoume/aotuv/old_beta.html :

QUOTE
aoTuV Beta5.5:
# Noise Normalization was reviewed. As a result, the bug is revised.

aoTuV Beta5:
# The action of noise normalization has been improved. This has an effect in the sound roughness and tremor problem etc. in the low bitrate.

Beta4:
# Tuning of Masking relation and Noise Normalization. These mainly influence balance and the quantity of distortion which can be heard.
Go to the top of the page
+Quote Post
dsimcha
post Mar 18 2010, 01:03
Post #13





Group: Members
Posts: 58
Joined: 2-November 04
Member No.: 17953



QUOTE (lvqcl @ Mar 17 2010, 13:02) *
Changes described at http://www.geocities.jp/aoyoume/aotuv/ and http://www.geocities.jp/aoyoume/aotuv/old_beta.html :

QUOTE
aoTuV Beta5.5:
# Noise Normalization was reviewed. As a result, the bug is revised.

aoTuV Beta5:
# The action of noise normalization has been improved. This has an effect in the sound roughness and tremor problem etc. in the low bitrate.

Beta4:
# Tuning of Masking relation and Noise Normalization. These mainly influence balance and the quantity of distortion which can be heard.



Thanks. Tried aoTuv and it's truly amazing. Now I can't even consistently ABX -q 2. I can (barely) get it on some songs with lots and lots of high frequency content, but it's hard enough that I'm confident it will be transparent in more casual listening. I keep everything as FLAC anyhow on my hard drive, since storage on a PC is so cheap and plentiful that it's not even worth taking a chance of quality loss here. The Vorbis files are only for my portable player, where I listen with crappy headphones in crappy listening environments anyhow.

Now, to find more stuff to put into my newfound space on my Sansa...
Go to the top of the page
+Quote Post
HotshotGG
post Mar 18 2010, 14:13
Post #14





Group: Members
Posts: 1593
Joined: 24-March 02
From: Revere, MA
Member No.: 1607



QUOTE
I've become very curious about how Vorbis noise normalization works. My (very vague) understanding is that it somehow avoids the metallic artifacts of MP3 and makes Vorbis's degradation when perceptual transparency can't be achieved sound much more like (less annoying) Gaussian noise. I can't seem to find a much more detailed explanation anywhere, though. Can someone please either explain it or point me to a decent explanation? I'm looking for a moderately technical explanation. Ideally, I want one that is intended for people with significant math background, etc., but is an overview rather than describing every small detail to someone who might want to implement noise normalization.


SebastianG summed it it pretty well. In addition to that it uses a sorting technique (probably a quicksort of a bubblesort) to redistribute the energy to neighboring bands. It's by-band noise energy.


--------------------
College student/IT Assistant
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 31st July 2014 - 23:38