Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Lame 3.99.5z, a functional extension (Read 54174 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Lame 3.99.5z, a functional extension

Reply #75
Thanks for the pointer, Robert.

I've grabbed 3.100 alpha 2 from rarewares.org / LAME bundles so at least I can see how the latest version handles certain problem samples before I plough ahead.
Dynamic – the artist formerly known as DickD

Lame 3.99.5z, a functional extension

Reply #76
This thread has me curious: Robert, have you considered incorporating some of halb27's "minimal bitrate" approaches to combat some of the issues discussed in this thread?
I suppose a superior approach would be to tweak the psychoacoustic model itself so that it better handles such samples.  But I can imagine how difficult it must be to do so without degrading the model's performance elsewhere.

Lame 3.99.5z, a functional extension

Reply #77
BFG, I can't speak for Robert, but my own thoughts are along the psymodel lines.
I think that:

a fairly large proportion of cases where Lame3.99.5 -Vn has problems that halb27's -Vn+ version fixes consist of sharp transient (highly localized in the time-domain but spread out in the frequency domain) simultaneous with a tonal signal (highly-localized in the frequency domain but spread out in the time domain).

The time-frequency product tradeoff type characteristic (localized in one means spread-out in the other) is analagous to Heisenberg's Uncertainty Principle (Δt.Δf ~ constant).

The mathematics of transforms such as MDCT (or FT) means that:

if you have a long block, you have a lot of frequency bins, each of which is fairly narrow in bandwidth, allowing fairly precise reproduction of long-duration tonal signals (localized peaks in the frequency domain) even with relatively imprecise values* stored for each frequency bin (the imprecision implies lower bit-depth and hence lower bitrate). As these tonal signals are spread out in the time domain, any time-domain variation is slow enough not to need precise representation.

*these frequency-domain values are complex numbers, basically implying that they carry information about both amplitude and phase. Values from neighbouring bins actually interfere when transformed into the time domain, allowing reproduction of frequencies more precisely defined than the bin-width itself.

Stlll in a long block, if you have an event that is localized in time, however, such as a transient, you can reproduce it, but it requires much greater precision for the values of each frequency bin to sum together in the time domain with correct phase to reproduce the time-localization to prevent it from being smeared out like a soft noise over a longer time (which produces pre-echo and post-echo, though post-echo is more readily masked). Such precision (or bit-depth) over so many frequency bins requires a high bitrate.

An alternative is to detect these time-localized transients and split the time into, say three short blocks. There are now fewer frequency bins in each short block (each having greater bandwidth) but there's less smearing of time (the maximum smearing being the duration of the whole short block), and sufficient time-localization can be achieved with a modest precision of the values for each frequency bin, thus a modest bit-depth and bit-rate (at the expense of frequency-smearing). As time-localized signals are frequency-unlocalized (broad spectrum, noiselike) that's often not a problem.

If there is a tonal (frequency-localized but time-smeared) signal to be represented within the short block that we don't think will be masked by the loud transient, its frequency can be reproduced more accurately only by increasing the precision of the values for each of the frequency bins (because the summation of interfering components of neighbouring broad bandwidth frequency bins when we convert back to the time-domain will then accurately preserve the frequency and phase of the tonal signal. This greater precision, as before, requires greater bit-depth to represent the values in the transform-domain and thus higher bitrate.

It's this latter case that -Vn+ seems to solve, but it doesn't currently detect that there actually IS an important tonal component that isn't masked by the transient (pre-masking and post-masking), it just assumes that there might be, so to be on the safe side, employ a much higher bitrate (much higher precision of bin values) during all short blocks.

For any encoder, with enough processing time, it should be possible to derive an extra measurement on the analysis FFT in the psymodel, but only do the check once short blocks have been triggered (and only test the check on short blocks). That check would look for tonal signals (frequency-localized) during these short blocks, and probably during the switching windows too (long->short  and short->long), to determine whether any of them might not be masked entirely by the transient and whether they require higher precision in the transform-domain quantization (and thus higher bitrate) to maintain their frequency precision despite the wide bin-width. It might be possible to determine a suitable mathematical function to determine the required quantization precision from listening tests on tone+transient signals of varying relative amplitudes (and varying tone frequency ranges) and to build in enough margin of safety to account for practical limitations arising from window functions and the like, or failing that to simply determine a threshold of 'tonality' that triggers the encoder to turn up the precision to the maximum for the affected short blocks. Either way would mainly solve the problem cases without boosting bitrate for many general unproblematic short blocks, which is the efficient approach normally adopted in LAME VBR tuning.

Robert has improved the lead-voice problem sample in the latest 3.100 alpha, which I'd have put into this category, so I'll do some keen listening tests to see what might be fixed. Having taken a quick look at the diffs for the latest psymodel.c seem to include a good deal of stuff relating to tonality measures, so I'm hopeful that a lot of the problem samples are going to be hard to ABX using 3.100 alpha2 when I get time to try.

There remain some problems that don't fit this tonal+transient during short-block description, so halb27's -Vn+ modes will still have mileage while the psymodel hasn't fixed them.
Dynamic – the artist formerly known as DickD

Lame 3.99.5z, a functional extension

Reply #78
BFG, I can't speak for Robert, but my own thoughts are along the psymodel lines.
I think that:

Thanks for the explanation Dynamic; I'll need to read through it a couple more times to ensure I fully understand it.
I have a lot of interest in this area, but not much understanding!

In the meantime, I have a (perhaps silly) question: would adding a twopass system to LAME help the tonal or sharp attack problems in any way?
That is, if LAME already knew what the data in future frames looked like, would it be able to more accurately encode the current frames and/or anticipate tonal or sharp attack problems?

Lame 3.99.5z, a functional extension

Reply #79
Yes, a lowpass can help.
lame3995o -Q1.7 --lowpass 17

Lame 3.99.5z, a functional extension

Reply #80
Yes, a lowpass can help.

Twopass, not lowpass...that is, fully analyzing the file once, and THEN going back and compressing it.
I don't know if it's a viable option for LAME but I have seen it work wonders for DivX and other lossy video codecs.

Lame 3.99.5z, a functional extension

Reply #81
Sorry.
A twopass system could help a bit to better manage potential out of data space situations. I reduce the accuracy requirements a bit for a frame with granules of type start and short because there is a chance that the next frame will have a short block too. If I knew that I could do better. But it's of minor concern IMO because I think it's best to provide a large data space for the first frame in a sequence of short blocks. From the second frame we can't have much more than the data space of a  320 kbps frame.  As the accuracy reduction for a startshort frame isn't severe (amd can be personolized downto 0) I don't think it's worth while going into the pain of developing a twopass system and having the speed penalty when using the encoder.
lame3995o -Q1.7 --lowpass 17

Lame 3.99.5z, a functional extension

Reply #82
A twopass system could help a bit to better manage potential out of data space situations...

I suppose I was suggesting a twopass more as part of the "primary" LAME build, not your variant.  I agree with your comments, however; I can only see a twopass system being beneficial if a person is trying to create the most accurate MP3 possible, given the 320kbps limit.  In your variant, I would think the additional time requirement would only be worthwhile for -V0+ or higher.

Lame 3.99.5z, a functional extension

Reply #83
I think what I said holds true for mp3 in general.
AFAIK video encoding is essentially based on the relation between consecutive pictures in the video. A similar thing is with lossless audio encoding where the current wave sample is encoded as a formula based prediction from the previous samples + coding the prediction error. In these cases a multipass system might help.
Lossy codecs like mp3 however partition a track into frames each of which is encoded individually. The encoding process of the current frame does not depend on the contents of the neighboring frames. So what should a multipass process do? The only exception to this is bitreservoir usage. A multipass system can principally lead to a better bitreservoir management. But for quality settings not very high running out of data space is a minor issue. Even with very high quality settings we are hard-pressed to run out of data space for tonal frames, and we are hard-pressed not to run out of data space in  case of a sequence of several short block granules, and a single-pass bitreservoir management for just one short block granule is possible (I do that with the extension variant).
So within the mp3 framework, it would be hard to find a niche where a multipass process could bring a real advantage.
lame3995o -Q1.7 --lowpass 17

Lame 3.99.5z, a functional extension

Reply #84
I'm back having finished everything on my mind.

As I said in a previous post I wanted to get rid of unused bits by either filling up the unused bits with audio data or giving away the unused bits together with some audio data, whichever seems most appropriate. I was successful in the 270-300 kbps average bitrate range, but I did not succeed to generalize the approach to the entire bitrate spectrum.
I had a hard time thinking it over, and finally dropped the idea. After all filling up unused bits with audio data produced for only this reason, or - worth in terms of quality - giving away audio data just because of this is a bit questionable.
So I did something else: I call Omion's fast and lossless mp3packer tool from within lame3995f to do the job. All it takes is to put mp3packer.exe into lame3995f's folder. Of course lame3995f works also without having put mp3packer there, but the mp3 files will be somewhat larger.

While trying to realize my finally unsuccessful approach I had to care about speed because filling the otherwise unused bits with audio data turned out to be an iterative process. As a result I redesigned the iterative process for arriving at target bitrate. Though some of the other things I did are not in favor of speed, the final speed increase will be noticeable, especially with the higher quality settings.

With these changes I could bring things to the limits: keeping bitreservoir high is done now without any compromise, and target bitrate for short blocks is 460 kbps now for any -Vn+.

I did a lot of listening tests for testing about where to put the internal parameters for the various -Vn+ levels in terms of what are appropriate average bitrates in what overall sense. Sure this is a matter of taste to a certain extent. I ended up with the following systematics:
Users who care about universal good quality in the first place, but also about file size (the typical -V2 users) need a bitrate of 200 kbps +/- ~20 kbps according to their needs. So I designed -V5+ to -V2+ to be in this range.
Users who care more or less only about universal top quality can use -V1+ (~257 kbps) or -V0+ (~300 kbps).

You can use the --adbr_xxxx options to finetune longblock and shortblock behavior according to your needs.
Please tell me if you have other ideas of --adbr_xxxx defaults for the various -Vn+.
In the final 3995f version to be published in a new thread I'd like to give away the --adbr_xxxx options with the exception of --adbr_long. Please tell me if you don't agree.

Here is 3.99.5f for download.
lame3995o -Q1.7 --lowpass 17

 

Lame 3.99.5z, a functional extension

Reply #85
In the final 3995f version to be published in a new thread I'd like to give away the --adbr_xxxx options with the exception of --adbr_long. Please tell me if you don't agree.

Do you mean by that: to remove the other options? I only experimented with --adbr_long (and --lowpass) on the previous versions, so I'm not gonna say I need the others.
In theory, there is no difference between theory and practice. In practice there is.

Lame 3.99.5z, a functional extension

Reply #86
Yes, I'd like to remove all the speciial options of the functional extension except for --adbr_long.
lame3995o -Q1.7 --lowpass 17

Lame 3.99.5z, a functional extension

Reply #87
Yes, I'd like to remove all the speciial options of the functional extension except for --adbr_long.

Personally, I like having the additional options, but I can't say I need them.  I prefer to have standard -V0 behavior on long frames, and -V0+ behavior on short/start/stop frames, and I can approximate that with only the --adbr_long option.

I have a different question regarding --adbr_long, however.  (This may have an obvious answer but I have not checked into it yet.)
Will your variant still allow silence to be encoded at 32kbps, and other low-complexity frames to be encoded at 128kbps or below, as default -V0 behavior will allow?

Lame 3.99.5z, a functional extension

Reply #88
a) I can keep all the --adbr_xxx options if this is preferred. I just wanted to have things simpler again, but that's not essential to me.

b) Silence is encoded as in original Lame with the exception that bitreservoir is held at maximum.  What this means for frame size see c)

c) Non-silent long block frames are encoded according to the --adbr_long value and the energy in the frame. In case the energy is very low, audio data bitrate can be significantly lower than the --adbr_long value. Frame size is  chosen as the smallest frame to hold the corresponding audio data while keeping bireservoir size extremely close to 4000 bit (during the encoding process. After this was done for the intended purpose of providing  maximum audio data space for short blocks, the final usage of mp3packer repackages anything into frames of the most appropriate size.)
lame3995o -Q1.7 --lowpass 17

Lame 3.99.5z, a functional extension

Reply #89
I'm considering doing a listening test of the encoder.
I have had four successful 192kbps ABC/HR tests, and halb27 says at least 200kbps is needed, but Helix mp3 encoder VBR can't do over 230kbps over typical songs.
So I want to test somewhere in between (for example, 224k). I'm thinking of:

lame3995f -S -V3+ %i %o
lame3995 -S -V1 %i %o
lame3984 -S -q 0 -b 224 %i %o
helix/hmp3 %i %o -X2 -U2 -V148
bladeenc -quit -nocfg %i %o -224

Average bitrate of 25 samples I'll be using:
229 - this encoder V3+
228 - LAME 3.99.5 V1
225 - LAME 3.98.4 CBR
224 - Helix mp3 encoder VBR(2005)
224 - BladeEnc CBR(low anchor)

Is there a better option that I should use? Should I wait for another extension(s) that is coming soon?

Lame 3.99.5z, a functional extension

Reply #90
Such a test is wonderful, Kamedo2. You are welcome to use version 3.99.5f, because I have put anything I had on my mind into this version.

But again, the question from your other listening thread of what samples to use for getting average bitrate drops in.
I'm also an advocate for the 'take a collection of normal music' for this. I can't see a problem with reproducibility because the collection can be a collection of 30 sec. snippets which can be published. Collection needn't be huge, but it should be more or less representative of the genres included in the test.

With my personal test set of various pop music average bitrates are:
3.99.5 -V1: 223 kbps
3.995.f -V3+: 206 kbps
3.99.5f -V2+: 220 kbps
Helix -X2 -U2 -V148: 226 kbps.

This corresponds pretty well with your results for 3.99.5 and Helix, but not with your 3.99.5f result. Did you put mp3packer.exe into the Lame3995f folder? For version f this is essential because I've optimized any internal detail for quality and rely on mp3packer to squeeze otherwise unused bits out of the output file.

As for 3.100a2: could you do a small a priori test to check whether it's really not worth while including it? While I agree that alpha versions should not participate my impression is that improvements on some problems are so strong that it may be worth while including it. Sure this version should not be treated like a final version when judging about the results.
lame3995o -Q1.7 --lowpass 17

Lame 3.99.5z, a functional extension

Reply #91
Halb27, thank you for your great advice! I put mp3packer into the lame3995f folder and the bitrate was significantly reduced.

I think I should use V2+ then, what most people use, according to a poll.

I calibrated these encoders to around 224kbps on 63min of various pop songs.

219k - this encoder V2+
221k - LAME 3.99.5 V1
224k - LAME 3.98.4 CBR
225k - Helix mp3 encoder VBR148(2005)
224k - BladeEnc CBR(low anchor)

As for 3.100a2: Is 5 samples ABC/HR on 160k, LAME 3.99.5 VBR and LAME 3.100a2 VBR and LAME 3.100a2 CBR OK?

Lame 3.99.5z, a functional extension

Reply #92
Fine.
Nice that you want to give 3.100a2 a try. The details of how to do it are all up to you IMO.
lame3995o -Q1.7 --lowpass 17