Lame 3.99.5z, a functional extension
Lame 3.99.5z, a functional extension
Sep 18 2012, 23:06
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015
You can download it from here.
Whatís the functional extension?
It offers VBR quality settings -V3+ to -V0+ and -V0+eco (economic version of -V0+).
What are -Vn+ and -V0+eco good for?
They improve pre-echo behavior.
Beyond that, they combine the quality advantages of VBR (regarding pre-echo) with the quality advantages of CBR/ABR (with respect to ringing and other tonal issues).
Lame users can be classified into three categories:
a) Users who donít care about rare quality issues and/or care much about small file size.
The common way for these users to work with Lame is to use -V5, -V4, or similar.
b) Users who donít like to have obvious and especially ugly issues in their music even when theyíre rare but who care about file size as well.
The common way for these users to work with Lame is to use -V2, -V3, or similar.
CBR 192 or similar, or ABR in this bitrate range, is an alternative (but seldom used).
c) Users who want overall transparency or at least a quality which comes close to it, and who donít care much about file size.
The common way for these users to work with Lame is to use -V0, -V1, or similar as a VBR method, or to use CBR 320 or 256. Using very high bitrate ABR is an alternative (seldom used).
For users of group b) and c) -Vn+/-V0+eco offers significant quality advantages:
We have two major issue classes with most of the lossy codecs:
- temporal smearing (pre-echo) issues
- ringing (tremolo) and other tonal issues.
Letís look at the worst samples I know for these classes:
- eig (extremely strong temporal smearing)
- lead-voice (extreme ringing, for instance at sec. 0..2)
With samples like these users of group b) canít be very happy when using -V2 or -V3, because the ringing issues are very obvious and ugly. The temporal smearing of eig is pretty obvious as well, especially around sec. 3. Using CBR/ABR 192 or similar is a good procedure to fight the ringing, but temporal smearing is much worse than with VBR, itís real ugly.
Things donít really change when using slightly increased quality settings.
For users of group c) itís exactly the same thing, with quality requirements and quality received just both on a higher level.
So the traditional way of doing things isnít totally satisfactory.
Users of group b) can use -V3+ or -V2+ (recommended) and get much better results in the overall view.
Users of group c) can use -V1+ or -V0+eco (recommended) or -V0+ (recommended for the paranoid like me) and get transparency or close-to-transparency. Sure itís impossible to prove transparency for the universe of music, but itís true for the samples mentioned. And as these are very outstanding samples within their problem classes and because of the technical details of -Vn+ described below itís plausible that the approach works rather universally.
How is it done?
-Vn+ uses -Vn internally (-V0+eco uses -V0), but the accuracy demands for short blocks are increased. Short blocks are used when the encoder takes care of good temporal resolution. Audio data bitrate is kept rather high also with long blocks which are normally used.
These audio data requirements are helpful for any kind of problem, they are not restricted to ringing or pre-echo issues.
Moreover a strategy is used which is targeting at providing close to maximum possible audio data space for short blocks.
Whatís the price to pay?
Compared to -Vn the increased accuracy demands of -Vn+ raise average bitrate. As -Vn+ is targeting at significant quality improvements compared to -V2 for real bad samples, we need an average bitrate around 200 kbps at least.
-V3+ and -V2+ are designed for users of group b) above, and as such take care of average bitrate not to be much higher than 200 kbps. For my test set of various pop music average bitrate is 205 kbps for -V3+, and 217 kbps for -V2+.
For users of group c) I allow for the full quality resp. average bitrate range mp3 can offer.
-V1+ takes an average bitrate of 257 kbps for my test set, -V0+ takes 317 kbps.
-V0+eco (economic version of -V0+) takes 266 kbps. So -V0+eco comes nearly for free as -V0 takes 260 kbps for my test set.
Unlike versions I published before, mp3packer isnít really needed any more to squeeze the unused bits out of the mp3 file (with the exception of fractional settings like -V0.5+ between -V1+ and -V0+).
mp3packer brings average bitrate down by only 1 kbps maximum for -Vn+ between -V3+ and -V2+, by 1 to 2 kbps for -Vn+ between -V2+ and -V1+, and by 2 kbps for -V0+ and -V0+eco. So I think we can forget about mp3packer with these settings.
sets the minimum audio data bitrate for short blocks to x [kbps] when using -Vn+ or -V0+eco, with x in the range 150..450.
Defaults are 360,370,420,440,440 kbps for -V3+,-V2+,-V1+,-V0+eco,-V0+ resp.
sets the minimum audio data bitrate for long blocks to x [kbps] when using -Vn+ or -V0+eco, with x in the range 110..310.
Defaults are 160,170,215,220,290 kbps for -V3+,-V2+,-V1+,-V0+eco,-V0+ resp.
prints detailed information for each frame (L/R or M/S representation, blocktype of both granules, available audio data bits, audio data bits used, etc.). Works for both -Vn and -Vn+.
lame3100m -V1 --insane-factor 0.75
Oct 28 2012, 11:33
Joined: 17-September 06
Member No.: 35307
I haven't tried --frameAnalysis but I was thinking more about digging into problem samples to try to detect certain problems like highly tonal signals during short blocks and coming up with a detection algorithm that could be rolled back into general LAME VBR codebase, which means that the original PCM, the MP3 decoder output and the FFT analysis spectrum would be useful pointers - and just the sort of things mp3x provided, IIRC.
I had the impression that mp3x was once very closely linked to LAME encoder up until around 3.90 time (especially before Dibrom's --alt-preset standard work when early investigations were tuning it from poor performance against the then best-in-class like FhG). But I suspect mp3x now is more than just a compile-time switch in the lame codebase, and would probably need an unjustifiable amount of work to get it running. Maybe if I find time to look into this, I'll set up a compiler, possibly under Linux until I can build a current standard LAME compile or the 3.99.5z version myself, then I can break out at short blocks and dump out the info I'm interested in to plot in LibreOffice Calc (or MS Excel) and try tweaking the code to trigger extra high bitrate only for problem samples and perhaps generate some samples artificially to ABX the correct threshold in the absence of other variables. I might even search for a very old mp3x version and see if provides enough info to germinate some ideas before I put in the effort of getting myself set up to compile the current LAME code frequently.
I'd be hopeful that it must be algorithmically determinable, and thereby of cracking a few of the problem samples like trumpet and Angels Fall First by specific analysis used only when short-blocks are active, and of doing so without increasing LAME VBR bitrate on general currently-transparent music. It clearly won't help herding calls, which only uses long blocks. Presumably that either needs a different short-block detection threshold (unlikely from the sound of it) or needs more bits in the long blocks for some frequency resolution reason that the psymodel is missing.
This work might be some time away, as I have too much going on in life for now, but I'm quite keen to give it a go in some downtime.
|Lo-Fi Version||Time is now: 25th January 2015 - 11:30|