Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: What is the recommended setting for speeches? (Read 16843 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

What is the recommended setting for speeches?

I was going to look after Speex because I heard about it once.
Now I found out it is obsolete and Opus is the way to go.

Usually I am releasing the speeches recording as LAME V5. One hour of speech is like 31MB.

What seeting would it be recommended for Opus?

Thanks

What is the recommended setting for speeches?

Reply #1
Try the lowest bitrate you can until the artifacts become too annoying.  Also mono.
Eg: --downmix-mono --bitrate 32.  Decrease bitrate as required.

What is the recommended setting for speeches?

Reply #2
Opus will do anything that Speex can do, usually better.  You should be able to push it well below 32 for mono speech, but I guess it depends on how good the original recording is and whether you can tolerate any artefacts.  The codec will likely downmix to mono automatically at low bitrates.  opusenc also has an option to optimise for speech.  That will encourage things like bandpass restriction that reduce the bitrate without affecting the quality of speech too much.

What is the recommended setting for speeches?

Reply #3
Quote
opusenc also has an option to optimise for speech


What command would that be? I looked through the options and didn't see anything related.

I tried 20, 32 and 40kbps. 40 is acceptable. 32 there is water artifact but it's not that annoying. 20 there's a lot of artifacts.

I also noticed that it always outputs audio as 48000 Hz, and read some pages here that justify this. But I imagine that if someone encodes music audio, will stay away from that, right?

What is the recommended setting for speeches?

Reply #4
opusenc --speech

Which version of opusenc do you have?  Might not be in your program.  I have opus-tools 0.1.2 with libopus 1.1.  opus-tools 0.1.8 is available, maybe it has *less* options?

Are you sure you're outputting mono?  40kbps seems high, but maybe if you are really looking for transparency.

Opus always runs at 48kHz.  This is normal.  Up-sampling from 44.1kHz shouldn't scare anyone.  If you feel you need 96kHz or higher then maybe Opus isn't for you.  At high quality (say bitrates 128kbps upwards) Opus doesn't offer much over other modern codecs such as AAC or Ogg Vorbis.

What is the recommended setting for speeches?

Reply #5
I seem to have the latest version...
The speeches are done in FLAC mono, then converted to lossy for streaming. So it is already mono.
The help file doesn't show anything about --speech

opusenc opus-tools 0.1.8-git (using libopus 1.1.x-git)

Update:

I just confirmed in IRC that this option is said to be removed a long time ago.

What is the recommended setting for speeches?

Reply #6
Ok, I settled for 32 kbps, which is good enough.
Getting something like 10MB an hour speech.
Very good codec.

My main issue is that it is not compatible with Internet Explorer.
I'll get complaints about audio not opening... 

What is the recommended setting for speeches?

Reply #7
This is mostly directed at lithopsian's confusion.

Yes, they removed the --speech command line option back in September 2012; they gave two reasons. First, they said people were confused about what it did. It was a hint to the encoder which could help it slightly improve some decisions, not a necessary forced mode switch; you could always encode any content without specifying the --speech or --music options. Second, the newer 1.1 encoder does a passable job of automatic classification anyways, which reduces the impact of the hint. (C.f. the section on Automagic Speech vs. Music Discrimination in Monty's 1.1 demopage.) Though the speech classifier isn't perfect, stuff that's borderline to the classifier is usually also close to borderline as far as which encoder mode is better, so the output quality is fine.

The functionality is still there, just "hidden." You can still give it the hint using opusenc --set-ctl-int with some magic numbers. For normal use this probably won't make a noticeable difference. But for some more particular use cases as well as for some kinds of testing the set-ctls can be helpful.

The only place these magic numbers are documented is in opus_defines.h in the source tree. The numbers could be changed at any time; the names are more descriptive and are frozen as part of the API, but opusenc can't handle the names.

To hint to the encoder that your content is speech you set OPUS_SET_SIGNAL_REQUEST to OPUS_SIGNAL_VOICE, i.e. opusenc --set-ctl-int 4024=3001. Music would be 4024=3002. These do exactly the same thing under the hood that --speech and --music used to do.

The other ctl most likely to come in handy is OPUS_SET_MAX_BANDWIDTH_REQUEST, which allows you to tell Opus to not encode any content above a frequency cutoff (4, 6, 8, or 12 KHz). Good for stuff that will be played back on very constrained devices or for when you know that the high frequencies in a recording are only noise.

What is the recommended setting for speeches?

Reply #8
Yes, Microsoft has not been friendly as far as integrating free codecs into IE or Windows Media. See caniuse.com for a table showing which browsers have native opus playback.

For some of the browsers which don't have native support, it's possible to use a 3rd-party solution. I'm not aware of any plugins etc for IE which play Opus yet.

The webm folks publish a framework that allows for ogg and VP8 playback in IE. They're currently looking in to trying to extend this to VP9 and Opus.

What is the recommended setting for speeches?

Reply #9
If i my hijack the thread a bit, What are people using to have transparent speeches?
I know it's subjective, but would be nice to know the general idea of what people tend to use, as i guess it should be in a fairly close range to one another.


What is the recommended setting for speeches?

Reply #11
http://www.opus-codec.org/comparison/GoogleTest1.pdf
"Opus at 32 kbps is almost transparent" (mono)


P.S. Another one http://research.nokia.com/files/public/%5B..._Opus_Codec.pdf


So, according to these, OPUS is near-transparency at 32kbps, and that was awhile ago (at least the 2011 one), and they made quite the update with 1.1 if i remember correctly, especially with CVBR, which i prefer to use (Though can someone tell if i should, can't find any real information on this, i just take it as a "safe net" compare to VBR that can flicker high/low which worries me).

So, for full transparency, with some overhead, what is realistic, 96,128?

I rather have more bitrate than to low, i don't like living on the edge so to speak.

What is the recommended setting for speeches?

Reply #12
I agree with the term "near transparent" at 32kbps.  Definitely not transparent at 24kbps, although quite acceptable for many uses.  96-128 seems like massive overkill for (mono?) speech.  That will produce near transparent output for most stereo music.  Perhaps 48kbps-64kbps if you really are more bothered about quality than space.  Of course taking up more space is really the only downside to using a higher bitrate than really necessary.

If you're concerned about varying bitrates then maybe you shouldn't be using Opus  Let it do what it does best.  Fixed bitrate encoding becomes progressively less effective at very low bitrates, and progressively less effective in advanced codecs.  The documentation specifically states that VBR mode produces more consistent quality.  We're not in 1980 any more, Toto.

What is the recommended setting for speeches?

Reply #13
I agree with the term "near transparent" at 32kbps.  Definitely not transparent at 24kbps, although quite acceptable for many uses.  96-128 seems like massive overkill for (mono?) speech.  That will produce near transparent output for most stereo music.  Perhaps 48kbps-64kbps if you really are more bothered about quality than space.  Of course taking up more space is really the only downside to using a higher bitrate than really necessary.

If you're concerned about varying bitrates then maybe you shouldn't be using Opus  Let it do what it does best.  Fixed bitrate encoding becomes progressively less effective at very low bitrates, and progressively less effective in advanced codecs.  The documentation specifically states that VBR mode produces more consistent quality.  We're not in 1980 any more, Toto.



Oh, well 64 seems like the spot i guess:)

But wait, i am pretty sure CVBR has been said to be better than VBR (or something along does lines?), not talking about CBR here, which i know is a waste in all scenarios pretty much.

What is the recommended setting for speeches?

Reply #14
Tho I seeked hardly I can't find a '--speech' option for current opusenc (opus-tools)
Is this become obsolete or what?
I read somewhere that opus analysis switches self betweeen celt and opus library. Not sure about it tho and I agree it would be convenient if user could decide self which librayr to use

What is the recommended setting for speeches?

Reply #15
Tho I seeked hardly I can't find a '--speech' option for current opusenc (opus-tools)
Is this become obsolete or what?
I read somewhere that opus analysis switches self betweeen celt and opus library. Not sure about it tho and I agree it would be convenient if user could decide self which librayr to use


In an earlier post they said that, it has become obsolete as it was confusing, and it wasn't a Forced option.

 

What is the recommended setting for speeches?

Reply #16
Thanks, so is there a command to force opusenc using celt codec?

What is the recommended setting for speeches?

Reply #17
Internally, there is CELT, SILK, or a hybrid mode.  You have very little direct choice about which one is used, although certain modes are always chosen at certain bitrates - basically CELT is used at high bitrates (high for speech), SILK at low bitrates, and the hybrid mode in between.  Again, don't worry about it, any more than you'd try to control the level of channel coupling in an MP3 file (you don't, do you?).

What is the recommended setting for speeches?

Reply #18
Thanks, so is there a command to force opusenc using celt codec?

In short, no.  Why would you want to?  Get it wrong and it will sound just awful.  Get it a bit wrong and it will be worse than it should without you necessarily noticing.  In any case, there is no "pure CELT" inside Opus, just an MDCT encoding based on CELT.

What is the recommended setting for speeches?

Reply #19
But wait, i am pretty sure CVBR has been said to be better than VBR (or something along does lines?), ...

No, You've probably heard that from Apple AAC topics.
For Opus VBR is better than CVBR in quality terms.


It's good to read an official documentation.
http://www.opus-codec.org/docs/
opusenc (.wav to .opus)   HTML

Quote
--cvbr

Use constrained variable bitrate encoding.

Outputs to a specific bitrate. This mode is analogous to CBR in AAC/MP3 encoders and managed mode in vorbis coders. This delivers less consistent quality than VBR mode but consistent bitrate.

What is the recommended setting for speeches?

Reply #20
Read about CELT, SILK, and hybrid mode here.

What is the recommended setting for speeches?

Reply #21
But wait, i am pretty sure CVBR has been said to be better than VBR (or something along does lines?), ...

No, You've probably heard that from Apple AAC topics.
For Opus VBR is better than CVBR in quality terms.


It's good to read an official documentation.
http://www.opus-codec.org/docs/
opusenc (.wav to .opus)   HTML

Quote
--cvbr

Use constrained variable bitrate encoding.

Outputs to a specific bitrate. This mode is analogous to CBR in AAC/MP3 encoders and managed mode in vorbis coders. This delivers less consistent quality than VBR mode but consistent bitrate.



Oh, seems to be the case i guess.

Just i have hard understanding the words though, as i read it it simply says:

CVBR gives you a higher difference in quality fluctuation compared to VBR, but the Bitrate is steady?

Which in terms would simply say, It's a steady flow with less quality?

Is this interpretation correct?

"This mode is analogous to CBR in AAC/MP3 encoders"

That doesn't tell me much though, i though AAC/MP3 at CBR was no variation at all, just force a certain bitrate, which i guess Opus CBR does, while CVBR is "VBR" with a certain limit attached to it.


Also, just to make sure, Framesize, that only matters for VOIP stuff right?
In archiving and playback, you should always set it to max (60?) right?

What is the recommended setting for speeches?

Reply #22
I think it is confusing to say that Opus cvbr is analogous to AAC/MP3 CBR.  Perhaps closer to AAC cvbr, but don't assume it is the same.  Opus also has a hard-cbr mode which really is constant bitrate, exactly the same number of bytes in every compressed frame, useful for some specialised applications.  I'm not familir with the internals of AAC and MP3, but I don't think either is exactly equivalent to Opus cvbr, or even exactly the same as eachother.

Constraining the bitrate variation results in lower quality audio.  Bits cannot be saved in passages that could be encoded well at a very low bitrate, hence there are fewer bits available for passages that require more bits to sound good.  End result is that, for the same file size, you will hear more audio artefacts.

The default frame size is 20ms.  In 90% of cases, leave it at 20ms.  Smaller frame sizes will allow for lower latency, but for most applications, 20ms latency (plus a few ms internally) is negligible.  Smaller frame sizes mean more overhead, which means either bigger files or lower quality, although in some rare situations a smaller frame size can reduce some audio artefacts.  CELT in particular (used for higher bitrates, higher quality) produces lower quality output with smaller frames, while SILK is less sensitive to frame size.  The overhead for 2.5ms frames nearly doubles the file size compared to 20ms frames.  Above 20ms the overhead hardly changes so there isn't much point going beyond that.  The CELT encoder doesn't even support frame sizes larger than 20ms, although it can create 60ms packets from three 20ms frames.  The SILK encoder can create 60ms frames and can form 120ms packets from smaller frames.  Smaller frames may be helpful where packet loss is expected, and obviously where low latency is criticial.

What is the recommended setting for speeches?

Reply #23
I am thinking, something like these would yield transparent results for Mono (Mic) content.

Code: [Select]
opusenc  --bitrate 70 --vbr --framesize 60 --comp 10  --ignorelength


Make any sense?

What is the recommended setting for speeches?

Reply #24
I am thinking, something like these would yield transparent results for Mono (Mic) content.

Code: [Select]
opusenc  --bitrate 70 --vbr --framesize 60 --comp 10  --ignorelength


Make any sense?


Probably would be transparent, although you really should listen to some results and then decide if they are suitable for your needs.

As for making sense ... well.  --vbr is the default.  You can specify it without harm but it is unnecessary.  Similarly --comp 10 is also the default, it is provided only so that the high CPU load (aka SLOW) of Opus encoding can be reduced in situations where it would be a problem.  I would not specify --ignorelength unless you are actually getting problems without it.  In most cases, opusenc will work without this option, but for streaming audio through stdin it is probably needed, and occasionally for input files that don't specify the data length appropriately.  With it, you may get undefined results from bad input files.

Lastly, ignoring everything that has been said so far, you have specified a non-standard frame size.  I see nothing to indicate that you need this frame size, or that it will be of any benefit to you.  The default is 20ms, use it unless you really know why you need some other value.  At a bitrate of 70kbps for mono speech, you will almost certainly get a straight CELT coding and it does not even support a 20ms frame size, so you are fudging together weird composite packets, increasing latency and complexity, for no good reason.