IPB

Welcome Guest ( Log In | Register )

Can audio encoders target quality w/o caring about bit rate/file size?, [OP = softrunner / split from “IETF Opus codec now ready for testing”]
softrunner
post Feb 14 2013, 02:33
Post #1





Group: Members
Posts: 48
Joined: 19-July 12
Member No.: 101579



QUOTE (Martel @ Jan 1 2013, 14:46) *
QUOTE (softrunner @ Dec 29 2012, 04:50) *
I don't know weather it is possible for encoder to do such an analysis of a source audio, but it would be great it yes.
It's only a matter of finding the right formula/algorithm.

x264 video encoder has encoding mode called Constant Rate Factor. In this mode number (16, 17, etc) is used to define desired quality (lower - better quality and higher bitrate), and encoder does not care about bitrate, only about keeping rate factor constant. It is a question, why nobody has invented something similar for audio encoding (except lossyWAV, which needs too much bitrate for acceptable quality)?
----------------
Opus 1.1 Alpha has some bugs, which can be found using samples from thread High Frequency Listening Test Samples. For example, at 16-24 kbps Opus gives this:

and for 32-40 kbps it gives this:

For samples 1_12kHz, 1_20kHz, 2_8kHz, 2_12kHz and 2_20kHz Opus sounds wrongly even at 512 kbps.
Full set of files is here (problematic sampes are marked with exclamation mark). Hope, developers will use this samples in their work.

This post has been edited by softrunner: Feb 14 2013, 02:34
Go to the top of the page
+Quote Post
 
Start new topic
Replies
Big_Berny
post Feb 14 2013, 12:11
Post #2





Group: Members
Posts: 242
Joined: 9-February 03
Member No.: 4921



QUOTE (softrunner @ Feb 14 2013, 02:33) *
x264 video encoder has encoding mode called Constant Rate Factor. In this mode number (16, 17, etc) is used to define desired quality (lower - better quality and higher bitrate), and encoder does not care about bitrate, only about keeping rate factor constant. It is a question, why nobody has invented something similar for audio encoding (except lossyWAV, which needs too much bitrate for acceptable quality)?

I think every encoder with real vbr (not abr) does that? Lame has V(0-9), QT AAC has --tvbr (0-127), Vorbis has -q((-2)-10). The bitrate may vary a lot with these settings between different songs/genres.
Go to the top of the page
+Quote Post
softrunner
post Feb 14 2013, 19:33
Post #3





Group: Members
Posts: 48
Joined: 19-July 12
Member No.: 101579



QUOTE (Big_Berny @ Feb 14 2013, 15:11) *
I think every encoder with real vbr (not abr) does that? Lame has V(0-9), QT AAC has --tvbr (0-127), Vorbis has -q((-2)-10). The bitrate may vary a lot with these settings between different songs/genres.

QUOTE (LigH @ Feb 14 2013, 15:32) *
Opus has that too. It just calculates a quality factor from the given target bitrate, based on statistics. Therefore the resulting bitrate may vary depending on the audio source, Opus will not try to approximate the given target bitrate in true VBR mode.

Well, if you mix audiobook and complex electronic music in one file, then which bitrate will you use for this file? Opus 64 kbps will give good quality for that part, which contains audiobook, but the quality of musical part will be very low. And 176 kbps will give good quality for music, but that bitrate will be too excessive for audiobook. And I would like to have encoder, which takes from me "good quality" as input option, and gives ~64 kbps for audiobook part of the file and ~176 kbps for musical part. None of modern audio encoders can to this. sad.gif
Go to the top of the page
+Quote Post
splice
post Feb 18 2013, 21:08
Post #4





Group: Members
Posts: 143
Joined: 23-July 03
Member No.: 7935



QUOTE (softrunner @ Feb 14 2013, 11:33) *
Well, if you mix audiobook and complex electronic music in one file, then which bitrate will you use for this file? Opus 64 kbps will give good quality for that part, which contains audiobook, but the quality of musical part will be very low. And 176 kbps will give good quality for music, but that bitrate will be too excessive for audiobook. And I would like to have encoder, which takes from me "good quality" as input option, and gives ~64 kbps for audiobook part of the file and ~176 kbps for musical part. None of modern audio encoders can to this. sad.gif


Your problem hinges on the point that we usually accept a lower level of quality for the spoken word than we do for music. Likewise, we can accept poor quality printing of text, so long as it is legible, but we prefer images to be high quality.

You have two choices - either develop (or have developed for you) an encoder that recognises sections containing the spoken word and adopts a different quality metric for them, or do it manually - encode the spoken words separately from the music and join (edit) the sections together afterwards. I assume that you already generate the speech and music separately and join then afterwards, so it should not be too much of a change in workflow. I know you can do this with MP3. I don't know about Opus.



--------------------
Regards,
Don Hills
Go to the top of the page
+Quote Post
jensend
post Feb 18 2013, 23:34
Post #5





Group: Members
Posts: 149
Joined: 21-May 05
Member No.: 22191



QUOTE (splice @ Feb 18 2013, 13:08) *
Your problem hinges on the point that we usually accept a lower level of quality for the spoken word than we do for music. Likewise, we can accept poor quality printing of text, so long as it is legible, but we prefer images to be high quality.
*sigh* No. Quality!=PSNR. Long ago, a wise man once said,
QUOTE (jensend @ Feb 17 2013, 16:10) *
Some of you seem to be saying "maybe it's just that the speech is equally degraded but people don't find that as unacceptable as they do for music." Since people's preferences are what define quality, this makes zero sense. A VBR encoder that encodes speech at the same bitrate as music when listeners find the degradation of music at that bitrate to be annoying but would not be annoyed with speech at a marginally lower bitrate is simply not managing to maintain constant quality.


Nessuno: no, this isn't about recognizability either. (Recognizability could be considered for music too- e.g. "regardless of how awful it sounds, I can tell- just barely- this is Beethoven's 9th.") It's about quality. This can't be reduced to a binary distinction, but if you must have a binary distinction to start with and you want something more descriptive than "good vs bad" perhaps the best one is "annoying vs not annoying" (c.f. MUSHRA, ABC/HR). It may be distinguishably different under ideal conditions- so what? Is it any worse, or would you be perfectly fine with listening to this instead of that? On the opposite end of the quality spectrum, it may be recognizable- so what? Is it any good, or would you tear your hair out if you had to listen to it for any substantial amount of time?

Softrunner: It appears you were substantially more confused than I thought you were. Others esp. ggf31416 and db1989 are doing a good job of explaining why.

IgorC: The test was done with the same low anchors and quite likely a subset of the same listeners using the same equipment. You think that the differences significantly biased the results in one coherent direction? Whatever. My opinion is of course not based on these tests but on my own 12-64kbps listening comparisons. Feel free to try your own. Of course, as I already said, no test protocol can make cross-sample quality comparisons blind, so whenever you can ABX the speech there's nothing preventing you from saying "gee, I'm going to rate this a 2 and the encoded music a 4.9, just 'cause I wanna show jensend is wrong."

Please note that despite what it may look like from the pile-on in this thread, my view appears to be in the majority. Just about everybody recommends bitrates for speech they would not recommend for mono music (or recommend bitrates for speech less than half what they recommend for stereo music despite the savings of channel coupling). This is not because they expect people to just put up with being more annoyed.

This post has been edited by jensend: Feb 18 2013, 23:35
Go to the top of the page
+Quote Post
Nessuno
post Feb 19 2013, 11:36
Post #6





Group: Members
Posts: 423
Joined: 16-December 10
From: Palermo
Member No.: 86562



QUOTE (jensend @ Feb 18 2013, 23:34) *
Nessuno: no, this isn't about recognizability either. (Recognizability could be considered for music too- e.g. "regardless of how awful it sounds, I can tell- just barely- this is Beethoven's 9th.") It's about quality. This can't be reduced to a binary distinction, but if you must have a binary distinction to start with and you want something more descriptive than "good vs bad" perhaps the best one is "annoying vs not annoying" (c.f. MUSHRA, ABC/HR). It may be distinguishably different under ideal conditions- so what? Is it any worse, or would you be perfectly fine with listening to this instead of that? On the opposite end of the quality spectrum, it may be recognizable- so what? Is it any good, or would you tear your hair out if you had to listen to it for any substantial amount of time?

Do you know what the quality parameter in all true VBR modes of every modern encoder stands for? Do you know that, for example, AAC accepts 128 different quality level values in VBR mode? Do you know that this quality parameter is a dimensionless number and in fact is a (guess what?!?) qualitative property of the desired output?

What misleads you is that you still think that you always set a desired bitrate (which is wrong, as I shown you before with numbers). In fact you still say:
QUOTE
Just about everybody recommends bitrates for speech they would not recommend for mono music (or recommend bitrates for speech less than half what they recommend for stereo music despite the savings of channel coupling).

This remark is completely out of context when you set a VBR mode.

In the end, what you want is an encoding mode which takes no parameter at all and select the right (for what?!?!?) output quality level by understanding that its input is speech or music or whatever in between, because it knows how much in each case you'll be more or less annoyed by artifacts.

So it must be smart enough to understand that a musical piece (target: transparency) could contain a speech segment (opera anyone?), that a speech (target: not annoying) could contain background music, that even if the input is music, this time the user would accept a lower quality output (target: a few artifact above audible threshold) because he's planning to use it for listening on the go, that even if the input is speech, the user would like to have a higher quality output (target: better than just enough, but not that much anyway) because it is a lecture in a foreign language and the speaker has also a strong regional accent, so harder to comprehend...

All of the above can be easily accomplished just selecting a VBR mode and an appropriate quality level between the ones that the specified encoder accepts. Then the encoder will choose the lower bitrate possible depending on that quality level, on its psycoacoustic model and on the instantaneous properties of input signal. Only it's self evident that the desired quality level must be a user choice, not an encoder one!


--------------------
... I live by long distance.
Go to the top of the page
+Quote Post

Posts in this topic


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 26th December 2014 - 09:47