Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: LAME resampling/lower sample rate questions (Read 7071 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

LAME resampling/lower sample rate questions

I have a couple hundred hours of recorded speech (recorded in mono, 16-bit, mostly 48kHz but some 44.1kHz) I need to make available on the Internet, and as much as I'd like to use Opus for this, it's probably going to have to be MP3 for compatibility reasons. A bit of off-the-cuff testing shows that somewhere around 40-48 kbps seems to be fine for my purposes, but I'd like to be somewhat more confident about the choices I'm making before I start encoding all this. Thought I'd use the opportunity to educate myself a little while I'm at it too.

I know LAME automatically resamples the input when targeting low bitrates. I'd like to know exactly how it determines the output bitrate. I'm not fantastic with C (reading others' source takes me an inordinate amount of time) and I didn't find the relevant code with a little use of grep in the LAME sources. Could somebody point me to where in the sources this decision is made?

Also, how are the threshholds for switching to lower sample rates tuned? Might I be better off to resample at a lower rate than LAME would normally choose, since speech has so much less energy / useful information at high frequencies than music?

One thing that I did come across when looking for how the output rate is set was the following line from lame.c:
Code: [Select]
cfg->mode_gr = cfg->samplerate_out <= 24000 ? 1 : 2; /* Number of granules per frame */

I don't know much about the details of the MP3 format, but my initial guess is that using only one granule per frame increases the overhead from headers but allows for better accuracy in seeking etc. Since a granule at 24kHz is only 24ms this doesn't strike me as a very good guess. Could someone enlighten me about the reason for this switch? What other threshholds/decision points in either bitrate or sample rate are interesting or might be worth being informed about?

LAME resampling/lower sample rate questions

Reply #1
Quote
Could someone enlighten me about the reason for this switch?

MPEG-1 Layer III frames consist of two granules (you may call them "sub-frames" if that helps you  ). MPEG-2 LSF Layer III frames consist of only one granule.

So the reason is to follow Layer3 specs.

LAME resampling/lower sample rate questions

Reply #2
The --resample and --lowpass switch gives you a good control for fine tuning the way you like it (within what's possible with mp3).
Use for instance --resample 11.025 --lowpass 4.5 for speech if you're not content with the defaults for your purposes.
Use lame --longhelp and watch out for the possible resampling frequencies given at the end of the help. I guess the MPEG-2.5 values are those that best suit your needs.

I suggest you use a rather good quality -V setting like -V5  together with --resample 11.025  --lowpass 4.5 or similar. Should give you very low bitrate for mono recordings in the range you consider (or even a bit lower) as well as a decent quality for speech. You can use a fractional -V value like -V5.5 in case that is useful to you.

An alternative to using -V5 is using --abr 35 or similar. In the low bitrate range ABR is sometimes considered to be superior to VBR.
lame3995o -Q1.7 --lowpass 17

LAME resampling/lower sample rate questions

Reply #3
Thanks, lvqcl. I still wonder what the practical upshot is and why MPEG made it that way.

halb27- I don't think I need to use the <16kHz MPEG-2.5 rates. Though speech is certainly comprehensible in narrowband and 12kHz isn't bad, there is a noticeable quality difference, and since nothing's forcing me to use <32kbps bitrates I don't imagine I need to make that tradeoff. I'll probably stick with 16, 22, or 24.

Also, I gather that some mp3 players may not be able to play the MPEG-2.5 sample rates (MPEG 2.5 was never really standardized, it was a proprietary Fraunhofer extension). Probably not much of an issue but I don't really know.

I have been using ABR; I had thought the recommendation to use it rather than VBR for sub-64kbps was pretty definite. Maybe there's more debate on that question than I realized.

Still looking for where in the source the default output sample rate is determined...

LAME resampling/lower sample rate questions

Reply #4
In lame.c, function int optimum_samplefreq(int lowpassfreq, int input_samplefreq):

Code: [Select]
/*
* Rules:
*  - if possible, sfb21 should NOT be used
*
*/
...
    if (lowpassfreq <= 15250)
        suggested_samplefreq = 32000;
    if (lowpassfreq <= 11220)
        suggested_samplefreq = 24000;
    if (lowpassfreq <= 9970)
        suggested_samplefreq = 22050;
...

About ABR. Do you use LAME 3.99.x or earlier version?

LAME resampling/lower sample rate questions

Reply #5
Well, that simply means we need to find out how the lowpass frequency gets set. For ABR that turns out to be fairly simple: the optimum_bandwidth function is called, giving a lowpass frequency which depends only on the target bitrate; the result is then multiplied by 1.5 for mono, giving us the following table of lowpass frequencies and resampling rates:
Code: [Select]
bitrate >=    lowpass freq  sampling rate
60            16500        48000
52            15000        32000
44            11250        32000
36            10500        24000
28             8250        22050
20             5850        16000
12             5550        16000
0              3000         8000

This doesn't seem particularly carefully tuned. I see no reason why just multiplying the stereo lowpass frequencies by 1.5 should work all the way across this range of bitrates, and this completely skips 44.1kHz, 12kHz, and 11.05kHz sampling rates.

I had thought that I'd be learning more about what makes sense from LAME's carefully tuned defaults. While I imagine the stereo ABR defaults have been carefully tuned, it may not be at all difficult to improve on the above for mono, and it would be simple to cobble together a patch implementing such improvements.

LAME resampling/lower sample rate questions

Reply #6
The Lame defaults have music in mind, not speech.
lame3995o -Q1.7 --lowpass 17

LAME resampling/lower sample rate questions

Reply #7
For comparison, here's the corresponding table for stereo ABR:
Code: [Select]
bitrate >=    lowpass freq  sampling rate
120            17000        48000
104            15600        44100
88             15100        32000
72             13500        32000
60             11000        24000
52             10000        24000
44              7500        22050
36              7000        16000
28              5500        16000
20              3900         8000
12              3700         8000
0               2000         8000

12kHz and 11.05kHz are again skipped. Curious. BTW in both cases there are higher lowpass frequency cutoffs at higher bitrates, all the way up to 320kbps, but I only included the lower bitrates where there's more of a difference in lowpass frequency and where resampling comes into play.

Also, I am suddenly finding it annoying that HA doesn't support the BBCode for tables and the only option seems to be to go back to the fixed-width ascii-art past.

LAME resampling/lower sample rate questions

Reply #8
halb27, I'm aware of that- that's why I wanted to learn more about this in the first place, since if LAME had modes tuned for speech I would have just trusted it to make good decisions on its own. But some of these, esp in mono, don't seem at first glance like they would make sense for music either.

Of course I'm no expert, I have no idea what goes on with the psymodel, and the LAME devs have been at it for quite a while. Maybe there are very good reasons for every single odd-looking behavior. But I imagine most of the tuning effort has gone into higher-bitrate stereo VBR (the -V6 to -V2 "sweet spot" everybody wants to use for encoding their CDs) rather than low-bitrate mono ABR, and I've heard some people claim that some other mp3 encoders outperform LAME at low bitrates (though I've not seen this proven).

BTW, lvqcl- I've been using 3.99.4. Why do you ask? Were there ABR-related changes in 3.99 which didn't show up on the changelog?

LAME resampling/lower sample rate questions

Reply #9
The changes in CBR/ABR modes are mentioned in the ChangeLog and history.html files:

Code: [Select]
LAME 3.99 beta 0   not officially released

All encoding modes use the PSY model from new VBR code, addresses Bugtracker item [ 3187397 ] Strange compression behavior


btw, 3.99.5 was released several days ago.

LAME resampling/lower sample rate questions

Reply #10
jensend, as the Lame defaults have music in mind and probably are not excessively tuned for very low bitrate mono sources, why don't you just use a resampling frequency and lowpass according to your likings?
Obviously you know what you're doing and have reasonable settings in mind, and you want to use Lame for specific purposes. Do you expect miracles from the Lame defaults?

As for the quality: when I proposed -V5/--abr 35 --resample 11.025 --lowpass 4.5 I did a test with some speech from my smartphone, and to me the quality was very decent. If your speech source is of very good quality it's better of course to use a higher sampling frequency and lowpass (and abr setting), but as for the quality I think Lame is a good choice. I guess you tried. What was your findings?

One of the advantages of Lame is that you can choose parameters like sampling frequency and lowpass according to your specific needs. You can't expect that from other encoders.
lame3995o -Q1.7 --lowpass 17