Welcome Guest ( Log In | Register )

Reply to this topicStart new topic
Best tweaks for encoding speech with Vorbis
post Oct 20 2008, 17:26
Post #1

Group: Members
Posts: 2
Joined: 14-April 08
Member No.: 52781

The title says it already: I want to encode a lot of spoken messages (mainly just human speech, mono) with Vorbis. I want to go as low as possible, but still not suffer too much from artifact of lossy compression.

My question is: has anybody found a relatively optimal oggenc tweaks to get a nice-sounding audio at low bitrate, but not suffering from lossy artifact?

In the past I have typically used 48kbps compression:

oggenc -o out.ogg --bitrate=48 --downmix src.wav

Something like that. It yields sound that is better than MP3 (in my opinion; I can be wrong since now there are so many more MP3 encoders), but as I listen more often, I realize there is a kind of strange "echo" here and there, especially when there is rich sound like American "are". The strange "echo" is somewhat like the "robot" sound in movies. I can upload a sample Vorbis stream to point that out (please let me know how to upload it, I am new to this forum).

I have been using oggenc version 1.0.2 provided by Ubuntu 7.04. The original stream has 44kHz sampling rate. I tried a simple tweak by compiling aotuv beta 5.5 (b5.5_20080330) and use its shared library in place of the stock liboggenc, by invoking this kind of script (Bourne shell script):

export LD_LIBRARY_PATH=/usr/local/aotuv-b5.5_20080330/lib
exec oggenc "$@"

Still, the artifact is there.

As another attempt, I tried to reduce the bitrate using "ssrc", then invoking oggenc. Here's what I got for oggenc-ing the data stream:


Encoding speech: TEST 04

Subdir: /data1/wirawan/test/vorbis/speech04
Sample: pet_30.flac
The original filename was cut from LS Peter radio message #30 (1 minute length).

Sample Bitrate File size
Filename rate Nominal Avg Inflation Actual Inflation
(kHz) (kbps) (kbps) (%) (bytes) (%)
16khz/oggenc-32kbps.ogg 16 32 29.81 -6.85 226969 -41.41
16khz/oggenc-48kbps.ogg 16 48 38.84 -19.09 294674 -23.93
16khz/oggenc-64kbps.ogg 16 64 48.52 -24.18 367650 -5.1
16khz/oggenc-80kbps.ogg 16 80 61.93 -22.59 468206 20.86
22khz/oggenc-32kbps.ogg 22 32 39.03 21.96 296110 -23.56
22khz/oggenc-48kbps.ogg 22 48 59.89 24.76 452540 16.82
22khz/oggenc-64kbps.ogg 22 64 75.60 18.12 570709 47.32
22khz/oggenc-80kbps.ogg 22 80 91.83 14.79 692450 78.74
32khz/oggenc-32kbps.ogg 32 32 37.83 18.22 287232 -25.86 Very robotic
32khz/oggenc-48kbps.ogg 32 48 55.48 15.59 419620 8.32 OK, but second man's voice is not great
32khz/oggenc-64kbps.ogg 32 64 65.38 2.16 493593 27.41
32khz/oggenc-80kbps.ogg 32 80 74.31 -7.12 560914 44.79
44khz/oggenc-32kbps.ogg 44 32 37.65 17.64 285854 -26.21
44khz/oggenc-48kbps.ogg 44 48 51.18 6.64 387396 Baseline
44khz/oggenc-64kbps.ogg 44 64 63.96 -0.06 482875 24.65
44khz/oggenc-80kbps.ogg 44 80 70.87 -11.41 534853 38.06

Inflation is the percent kbps inflation of the avg kpbs in comparison to
the nominal (target) kbps.

File size inflation is against the "baseline" of 44khz/48kbps encoding.

Interesting! At lower sampling freq (22 and 32kHz), actually the file size is larger (at 48, 64, 80 kbps). Now this can be a topic on its own, but my main question remains: how to optimize the compression-vs-quality?

For your notes, this may be relevant: the original audio may not be directly from a raw source (I mean, like recorded directly, or from faithful CD-quality recording). In the case above, it is actually from a high-quality MP3 mono stream (which I guess is 80kbps mono stream).

Linux "file" utility yields the following information (filename is different, but they are of the same kind) for the original file:

/data1/wirawan/test/vorbis/speech04 $ file /d/temp/ls/luk/Luke_01.mp3
/d/temp/ls/luk/Luke_01.mp3: MPEG ADTS, layer III, v1, 160 kBits, 44.1 kHz, Monaural

Any help and pointer will be appreciated. Unfortunately I don't have time to deeply study this matter, so it is best to go to the point, and point the deeper explanation (web pages, wiki) as a "side note".

Go to the top of the page
+Quote Post
post Oct 20 2008, 21:38
Post #2

Group: Members
Posts: 116
Joined: 28-September 04
From: Germany
Member No.: 17360

Is there a special reason why you want to use Vorbis?
Speex http://www.speex.org/ is specifically designed for voice recordings.
Go to the top of the page
+Quote Post
post Oct 21 2008, 02:46
Post #3

Group: Members
Posts: 2
Joined: 14-April 08
Member No.: 52781

I did try speex a little bit, but I did not find it very satisfactory. probably I wasn't trying seriously. Another problem, as many other members already point out, is that speex is not widely available on systems other than "computer". It is not yet supported on small hardware like portable audio players. I want to create a copy of OGG file which can be played both on computers and portable audio players alike.
Go to the top of the page
+Quote Post
post Oct 21 2008, 05:02
Post #4

Group: Members
Posts: 1593
Joined: 24-March 02
From: Revere, MA
Member No.: 1607

I did try speex a little bit, but I did not find it very satisfactory. probably I wasn't trying seriously.

Did you try ulta-wideband mode? Speex also has echo cancellation.

Another problem, as many other members already point out, is that speex is not widely available on systems other than "computer".

It supported on the Rockbox open-source firmware, which is used by many DAP. Take a look at the website:


This post has been edited by HotshotGG: Oct 21 2008, 05:08

College student/IT Assistant
Go to the top of the page
+Quote Post
post Oct 21 2008, 22:28
Post #5

Group: Members
Posts: 152
Joined: 29-December 05
Member No.: 26719

If you're still open to the idea of using mp3 for your application, try LAME. I find the following parameters to provide amazingly small files that are transparent for me:
lame -V8 -m m --resample 24

If you have the time, try it and let us know what you think.
Go to the top of the page
+Quote Post
post Dec 2 2008, 16:48
Post #6

Group: Members (Donating)
Posts: 1491
Joined: 11-February 03
From: Vermont
Member No.: 4955

FWIW, I have some vorbis files.. don't recall the options used, but they show as mono, 44.1 khz sampling, 30 kbps.

They play ok in DBpoweramp player and my Rockbox Sansa, but won't play in foobar2000 or winamp.

If I recall correctly, when I first started playing with mono, DBpoweramp played it back at double speed (like it split the available mono samples between the L and R channels,) but Spoon fixed it promptly when I reported the problem.
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:


RSS Lo-Fi Version Time is now: 25th November 2015 - 05:33