IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
LAME resampling/lower sample rate questions
jensend
post Mar 3 2012, 06:44
Post #1





Group: Members
Posts: 143
Joined: 21-May 05
Member No.: 22191



I have a couple hundred hours of recorded speech (recorded in mono, 16-bit, mostly 48kHz but some 44.1kHz) I need to make available on the Internet, and as much as I'd like to use Opus for this, it's probably going to have to be MP3 for compatibility reasons. A bit of off-the-cuff testing shows that somewhere around 40-48 kbps seems to be fine for my purposes, but I'd like to be somewhat more confident about the choices I'm making before I start encoding all this. Thought I'd use the opportunity to educate myself a little while I'm at it too.

I know LAME automatically resamples the input when targeting low bitrates. I'd like to know exactly how it determines the output bitrate. I'm not fantastic with C (reading others' source takes me an inordinate amount of time) and I didn't find the relevant code with a little use of grep in the LAME sources. Could somebody point me to where in the sources this decision is made?

Also, how are the threshholds for switching to lower sample rates tuned? Might I be better off to resample at a lower rate than LAME would normally choose, since speech has so much less energy / useful information at high frequencies than music?

One thing that I did come across when looking for how the output rate is set was the following line from lame.c:
CODE
cfg->mode_gr = cfg->samplerate_out <= 24000 ? 1 : 2; /* Number of granules per frame */

I don't know much about the details of the MP3 format, but my initial guess is that using only one granule per frame increases the overhead from headers but allows for better accuracy in seeking etc. Since a granule at 24kHz is only 24ms this doesn't strike me as a very good guess. Could someone enlighten me about the reason for this switch? What other threshholds/decision points in either bitrate or sample rate are interesting or might be worth being informed about?
Go to the top of the page
+Quote Post
lvqcl
post Mar 3 2012, 09:30
Post #2





Group: Developer
Posts: 3325
Joined: 2-December 07
Member No.: 49183



QUOTE
Could someone enlighten me about the reason for this switch?

QUOTE (smack @ May 2 2007, 18:12) *
MPEG-1 Layer III frames consist of two granules (you may call them "sub-frames" if that helps you wink.gif ). MPEG-2 LSF Layer III frames consist of only one granule.

So the reason is to follow Layer3 specs.


This post has been edited by lvqcl: Mar 3 2012, 14:49
Go to the top of the page
+Quote Post
halb27
post Mar 3 2012, 14:38
Post #3





Group: Members
Posts: 2424
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



The --resample and --lowpass switch gives you a good control for fine tuning the way you like it (within what's possible with mp3).
Use for instance --resample 11.025 --lowpass 4.5 for speech if you're not content with the defaults for your purposes.
Use lame --longhelp and watch out for the possible resampling frequencies given at the end of the help. I guess the MPEG-2.5 values are those that best suit your needs.

I suggest you use a rather good quality -V setting like -V5 together with --resample 11.025 --lowpass 4.5 or similar. Should give you very low bitrate for mono recordings in the range you consider (or even a bit lower) as well as a decent quality for speech. You can use a fractional -V value like -V5.5 in case that is useful to you.

An alternative to using -V5 is using --abr 35 or similar. In the low bitrate range ABR is sometimes considered to be superior to VBR.

This post has been edited by halb27: Mar 3 2012, 15:38


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
jensend
post Mar 3 2012, 21:35
Post #4





Group: Members
Posts: 143
Joined: 21-May 05
Member No.: 22191



Thanks, lvqcl. I still wonder what the practical upshot is and why MPEG made it that way.

halb27- I don't think I need to use the <16kHz MPEG-2.5 rates. Though speech is certainly comprehensible in narrowband and 12kHz isn't bad, there is a noticeable quality difference, and since nothing's forcing me to use <32kbps bitrates I don't imagine I need to make that tradeoff. I'll probably stick with 16, 22, or 24.

Also, I gather that some mp3 players may not be able to play the MPEG-2.5 sample rates (MPEG 2.5 was never really standardized, it was a proprietary Fraunhofer extension). Probably not much of an issue but I don't really know.

I have been using ABR; I had thought the recommendation to use it rather than VBR for sub-64kbps was pretty definite. Maybe there's more debate on that question than I realized.

Still looking for where in the source the default output sample rate is determined...
Go to the top of the page
+Quote Post
lvqcl
post Mar 3 2012, 22:01
Post #5





Group: Developer
Posts: 3325
Joined: 2-December 07
Member No.: 49183



In lame.c, function int optimum_samplefreq(int lowpassfreq, int input_samplefreq):

CODE
/*
* Rules:
*  - if possible, sfb21 should NOT be used
*
*/
...
    if (lowpassfreq <= 15250)
        suggested_samplefreq = 32000;
    if (lowpassfreq <= 11220)
        suggested_samplefreq = 24000;
    if (lowpassfreq <= 9970)
        suggested_samplefreq = 22050;
...

About ABR. Do you use LAME 3.99.x or earlier version?

This post has been edited by lvqcl: Mar 3 2012, 22:12
Go to the top of the page
+Quote Post
jensend
post Mar 4 2012, 00:58
Post #6





Group: Members
Posts: 143
Joined: 21-May 05
Member No.: 22191



Well, that simply means we need to find out how the lowpass frequency gets set. For ABR that turns out to be fairly simple: the optimum_bandwidth function is called, giving a lowpass frequency which depends only on the target bitrate; the result is then multiplied by 1.5 for mono, giving us the following table of lowpass frequencies and resampling rates:
CODE
bitrate >=    lowpass freq  sampling rate
60            16500        48000
52            15000        32000
44            11250        32000
36            10500        24000
28             8250        22050
20             5850        16000
12             5550        16000
0              3000         8000

This doesn't seem particularly carefully tuned. I see no reason why just multiplying the stereo lowpass frequencies by 1.5 should work all the way across this range of bitrates, and this completely skips 44.1kHz, 12kHz, and 11.05kHz sampling rates.

I had thought that I'd be learning more about what makes sense from LAME's carefully tuned defaults. While I imagine the stereo ABR defaults have been carefully tuned, it may not be at all difficult to improve on the above for mono, and it would be simple to cobble together a patch implementing such improvements.
Go to the top of the page
+Quote Post
halb27
post Mar 4 2012, 01:10
Post #7





Group: Members
Posts: 2424
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



The Lame defaults have music in mind, not speech.


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
jensend
post Mar 4 2012, 01:16
Post #8





Group: Members
Posts: 143
Joined: 21-May 05
Member No.: 22191



For comparison, here's the corresponding table for stereo ABR:
CODE
bitrate >=    lowpass freq  sampling rate
120            17000        48000
104            15600        44100
88             15100        32000
72             13500        32000
60             11000        24000
52             10000        24000
44              7500        22050
36              7000        16000
28              5500        16000
20              3900         8000
12              3700         8000
0               2000         8000

12kHz and 11.05kHz are again skipped. Curious. BTW in both cases there are higher lowpass frequency cutoffs at higher bitrates, all the way up to 320kbps, but I only included the lower bitrates where there's more of a difference in lowpass frequency and where resampling comes into play.

Also, I am suddenly finding it annoying that HA doesn't support the BBCode for tables and the only option seems to be to go back to the fixed-width ascii-art past.
Go to the top of the page
+Quote Post
jensend
post Mar 4 2012, 02:08
Post #9





Group: Members
Posts: 143
Joined: 21-May 05
Member No.: 22191



halb27, I'm aware of that- that's why I wanted to learn more about this in the first place, since if LAME had modes tuned for speech I would have just trusted it to make good decisions on its own. But some of these, esp in mono, don't seem at first glance like they would make sense for music either.

Of course I'm no expert, I have no idea what goes on with the psymodel, and the LAME devs have been at it for quite a while. Maybe there are very good reasons for every single odd-looking behavior. But I imagine most of the tuning effort has gone into higher-bitrate stereo VBR (the -V6 to -V2 "sweet spot" everybody wants to use for encoding their CDs) rather than low-bitrate mono ABR, and I've heard some people claim that some other mp3 encoders outperform LAME at low bitrates (though I've not seen this proven).

BTW, lvqcl- I've been using 3.99.4. Why do you ask? Were there ABR-related changes in 3.99 which didn't show up on the changelog?

This post has been edited by jensend: Mar 4 2012, 02:14
Go to the top of the page
+Quote Post
lvqcl
post Mar 4 2012, 08:51
Post #10





Group: Developer
Posts: 3325
Joined: 2-December 07
Member No.: 49183



The changes in CBR/ABR modes are mentioned in the ChangeLog and history.html files:

CODE
LAME 3.99 beta 0   not officially released

All encoding modes use the PSY model from new VBR code, addresses Bugtracker item [ 3187397 ] Strange compression behavior


btw, 3.99.5 was released several days ago.
Go to the top of the page
+Quote Post
halb27
post Mar 4 2012, 10:38
Post #11





Group: Members
Posts: 2424
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



jensend, as the Lame defaults have music in mind and probably are not excessively tuned for very low bitrate mono sources, why don't you just use a resampling frequency and lowpass according to your likings?
Obviously you know what you're doing and have reasonable settings in mind, and you want to use Lame for specific purposes. Do you expect miracles from the Lame defaults?

As for the quality: when I proposed -V5/--abr 35 --resample 11.025 --lowpass 4.5 I did a test with some speech from my smartphone, and to me the quality was very decent. If your speech source is of very good quality it's better of course to use a higher sampling frequency and lowpass (and abr setting), but as for the quality I think Lame is a good choice. I guess you tried. What was your findings?

One of the advantages of Lame is that you can choose parameters like sampling frequency and lowpass according to your specific needs. You can't expect that from other encoders.

This post has been edited by halb27: Mar 4 2012, 10:45


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 23rd July 2014 - 13:51