Lame 3.99.5x, a functional extension

2012-03-06 10:49:41

I finished work on 3.99.5x. It can be downloaded from here.

The functional extension is about giving Lame the possibility of further improving audio quality. It is invoked by using -Vx+ instead of -Vx. Audio quality of -Vx+ is equal or better than that of -Vx by design, at the expense of an increased average bitrate. In a listening test with previous version 3.99.3x I found that -V5+ isn’t worth while: the average bitrate equivalent -V4.75 yielded the better overall quality. With -V2+ however the overall quality was better than that of the bitrate equivalent -V1.55. Using -V0+ I could not ABX the result against the original for some samples I was able to ABX using -V0. I’m talking about severe problem samples, usually -V0 is absolutely fine.

3.99.5x is 3.99.3x ported to 3.99.5 with one addition which is about further improving quality for quality levels above -V2+ up to -V0+. Quality (and average bitrate) is increased a bit in this -V level range compared to 3.99.3x.

How to use
You can use lame3995x.exe the way you use lame.exe. Just use -Vx+ instead of -Vx.
Lame3995x.exe is compiled with Visual C++ 2010. For this reason it is necessary to install the Microsoft Visual C++ 2010 Redistributable Package invcredist_x86.exe.

The functional extension - technically speaking
a) inaccurately encoded frames are avoided
Compared to 3.98, Lame 3.99 does a good job at avoiding inaccurate frames which are produced when there is not enough audio data space available to fulfill the VBR quality demands. 3.99.5x goes a bit beyond. -V0+ usually avoids roughly 50 per cent of the out of data space situations left over from -V0. 3.99.5x does so primarily by using a different frame packaging strategy which keeps bit reservoir close to the maximum possible value.
b) keeping a minimum audio data bitrate
Lame 3.99.5x controls the audio data bitrate and keeps it above a certain value depending on -V level. The defaulted minimum audio data bitrate can be overriden by using --adbr_min x.
minimum audio data bitrate control is done by adjusting the masking threshold in the quantization stage (as was possible formerly by using the --ns-bass/alto/treble switches which are now used by Lame internally). This masking variation is controlled in a way that doesn’t worsen the inaccurate frame situation.
c) masking and hearing threshold of the psymodel analysis stage are made more demanding
The fact that roughly 50% of the inaccurately encoded frames are avoided gives some headroom for increasing the psy model demands a bit.
For the very high quality -Vx+ levels above -V2+ it is the V-level dependent psy model parameters 'masking_lower' and 'ATHfixpoint’ that are made more demanding. Usually this eats up a minor amount of the improved inaccurate frames avoidance, and it was tuned to never worsen the inaccurate frame situation of the corresponding -Vx usage for all the samples I tested.

The functional extension - properties of -the various -Vx+ levels
The functional extension is working from -V7.5+ to -V0+.
For level -V7.5+ to -V2+ it is assumed that users care much about quality and efficiency. That's why bitrate increase from -Vx to -Vx+ is very moderate in this quality level range. Nearly nothing is done for avoiding inaccurate frames - it's not necessary here. Minimum audio data bitrate requirements are not very demanding. Masking and hearing threshold of the psymodel analysis stage is not touched.
For the levels above -V2+ up to -V0+ it is assumed that users care very much about quality, but not about efficiency. That's why this quality level range covers the average bitrate range from 200 up to nearly 320 kbps. -Vx+ uses a more demanding -V level internally. -V1.5+ for instance makes internal use of -V1, -V1+ uses -V0 internally. Masking and hearing threshold of the psymodel analysis stage are more demanding in this quality level range.

Optimizing output - a suggestion
For the levels above -V2+ up to -V0+ expect 9 to 12 kbps in the output to be used for nothing. This is a consequence of the new frame packaging strategy as well as the fact that a lot of 320 kbps frames are used which cannot efficiently be filled entirely with audio data (same problem as with CBR 320). In case this bothers you use the fast and lossless mp3packer tool which squeezes the unused data out of the file. Comfortable to use with an explorer invokable .bat file that works on an entire folder containing mp3s.

A listening test
Instead of repeating the listening test of 3.99.3x the results of which are expected to carry over to 3.99.5x I compared 3.99.5 -V0 to the average bitrate equivalent 3.99.5x -V1.1+.
The sec. 3.0 issue of eig was audible with either candidate to the same degree.
I was able to ABX harp40_1 and herding_calls with both variants, but it was hard, maybe a tiny bit harder using -V1.1+.
lead-voice was fine with -V1.1+, but easily ABXable using -V0.
I was able to ABX the tremolo of trumpet_myPrince with both versions.
I did not get sufficiently good ABX results for sample 1 of the last 128 kbps mp3 listening test, but my feeling is that it is ABXable (with both variants).
I also was not able to ABX trumpet with neither version (though I could ABX 3.99.3 -V0 in my 3.99.3x test).

As a conclusion the tremolo issue of samples like lead-voice or trumpet_myPrince is the only problem I know where Lame has significant room for improvement. trumpet_myPrince shows that the -Vx+ bitrate equivalent to -V0 doesn’t solve the issue. It is solved for lead-voice because this is a mono sample where the minimum bitrate feature of -Vx+ has a lot more headroom for quality improvement.
So for getting what -Vx+ is intended to achieve it is recommended to use a higher quality -Vx+ level like -V0+ which solves the tremolo issue of trumpet_myPrince as well as the issues of some other problems (see the 3.99.3x test). This is where 3.99x shines.

Lame 3.99.5x, a functional extension

Reply #1 – 2012-03-06 10:55:43

Why not implement such mp3packer functionality into LAME itself?

Does mp3packer have optimized Huffman compression or something?

Lame 3.99.5x, a functional extension

Reply #2 – 2012-03-06 11:14:23

I was thinking about that and had a look at the mp3packer source. But only to find out that it's quite a complex thing at least to me, not talking about the programming language used.
And it wouldn't be too good an idea anyway. By having it a seperate procedure you can always get the benefits of newer versions independent of Lame399x development. And usage with a bat file invokable from the explorer's context menu to work on an entire folder is really a very comfortable procedure.
Last not least: It's not a major issue having roughly 10 kbps of unused data in the encoded file (9 kbps (less than 3%) when using -V0+ with my test set).

AFAIK Omion's great repacking tool when used with no options does not touch the audio data at all. It just repackages them into another frame container structure ignoring the unused data space. Audio data optimization is done when using mp3packer's -z switch, but it's not necessary in our context, and it's slow.

Notice