IPB

Welcome Guest ( Log In | Register )

An idea of audio encode algorithm, based on maximum allowed volume of , WavPack hybrid mode test included
softrunner
post Mar 6 2013, 00:11
Post #1





Group: Members
Posts: 48
Joined: 19-July 12
Member No.: 101579



Full topic title: "An idea of audio encode algorithm, based on maximum allowed volume of signals difference"

Recently I have discovered for myself, that the difference of the source and encoded audio can be easily obtained by inverting source audio and mixing it with the encoded one. Then the idea of encode algorithm came into my head: just try to keep the signals difference at the same level (or less), defined by user. Thus, the audio quality is simply measured by volume of the difference of the signals, and this difference is nothing but distortions, produced by encoder.
The whole algorithm looks like this:
1. Take maximum allowed volume of signals difference from user.
1. Make a copy of source audio and invert it.
2. Split both source and inverted audio on frames of the same size.
3. Encode first frame of source audio, mix the result with first frame of inverted audio and calculate the volume of obtained difference.
4. If the volume of the difference is higher, than allowed by user, add some bitrate and repeat from item no. 3.
5. If the volume of the difference is not higher, than allowed by user, add first encoded frame to the final output.
6. Repeat items 3-5 with second, third, etc... frames, until the end of the source file.

Of cause, this algorithm is much slower then just direct encode, but definately if should not be slower, than video encoding (and people are ready to wait for many hours while their videos are being encoded).

I tried to reproduce this algorithm manually by test using WavPack hybrid mode as an encoder (source audio sample was splitted on 11 parts of 1 second), and it showed, that 23.4 % of space/bitrate could be saved. Another important thing is that the user is guaranteed, that he will not get distortions with volume level, higher then he expects, so he can safely encode many files simultaneously without looking at the content. User gets freed both from unnecessary waste of bitrate and uncontrolled distortions.

The only thing is needed is that some audio developers get interested in this idea and implement it as a computer program.

The whole set of files of the WavPack test I've made is here.

This post has been edited by softrunner: Mar 6 2013, 00:20
Go to the top of the page
+Quote Post
 
Start new topic
Replies
2Bdecided
post Mar 22 2013, 16:37
Post #2


ReplayGain developer


Group: Developer
Posts: 5136
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



Sorry db1989, I'm not trying to personally attack you! I was even a little worried it would look like that when I replied to yet another of your posts. It's just coincidence that several of your recent posts have made me want to jump in and explore that particular topic in more depth. Case in point.

Let's think outside the box: of course CD audio is lossy. It's lossy compared to the stereo source signal. You could even say that it has a very simple static psychoacoustic and listening room model: it assumes we cannot hear anything above 20kHz*, and that noise 90dB* down from peak signal level will typically be inaudible. Those assumptions aren't entirely true 100% of the time, but they'll do for almost any music listening you can imagine.

(* = as envisaged; CD can do better in practice these days with good oversampling and good noise shaping.)

In terms of preserving, say, an analogue tape or vinyl record, we say that CD really is transparent. The frequency limit is beyond all but the best hearing, and the noise floor is below that of the source. Note that this is not the same as saying it preserves even the noise level perfectly, because adding a lower level noise (-90dB) will raise the noise floor (typ. -60dB) a little - it's just that the change is inaudible (a fraction of a dB). That's psychoacoustics again. wink.gif


There are things you can do to a CD quality signal that make barely any more psychoacoustic assumptions than any of the above. They are lossy (referenced to the CD quality signal), but so "safe" that some people are happy saying they are transparent. I call them near-lossless, but there might be a better name.

The point I think I'm trying to make is that, apart from mathematically perfect coding of a set of numbers (which is lossless), it's all shades of grey (though hopefully not 50 of them), rather than black and white. Is it transparent? Is it lossless? Does it use a psychoacoustic model? I am usually perfectly happy with the general definitions we use of these words. However, when you get very very picky, or start to talk about absolute transparency of lossy codecs, or start to talk about the "psychoacoustic" model of near-lossless codecs, I think you have to be really careful.


I agree that anyone who is worried about these things should use a lossless format for the audio ripped from their CDs.

Cheers,
David.
Go to the top of the page
+Quote Post
Nessuno
post Mar 23 2013, 11:00
Post #3





Group: Members
Posts: 422
Joined: 16-December 10
From: Palermo
Member No.: 86562



QUOTE (2Bdecided @ Mar 22 2013, 16:37) *
Let's think outside the box: of course CD audio is lossy. It's lossy compared to the stereo source signal. You could even say that it has a very simple static psychoacoustic and listening room model: it assumes we cannot hear anything above 20kHz*, and that noise 90dB* down from peak signal level will typically be inaudible. Those assumptions aren't entirely true 100% of the time, but they'll do for almost any music listening you can imagine.

Kind of reductio ad infinitum: if we go along with this line of reasoning, we should say that what our hearing system transmits to our brain is a lossy version of the real soud event, and air molecules vibration is a sampled and quantized lossy reproduction of actual instrument's vibrations...
But, if a tree falls in the forest when nobody's there, does it make any noise? wink.gif

As I see it, sound, or better, music production is all about psychoacoustic and the reference model is always our hearing system, so arguing about bandwidth and SNR limits of CD format (*) means willing to (re)produce something that not only nobody could realistically hear, but that wasn't even in composer's or player's or instrument builder's mind in the first place!

(*) as an end user's format at least, just to take into account meaningful reasons in favour of 24/96 the highest technically feasible format at the moment for recording, mixing, mastering stages etc...

This post has been edited by Nessuno: Mar 23 2013, 11:06


--------------------
... I live by long distance.
Go to the top of the page
+Quote Post
2Bdecided
post Mar 28 2013, 10:34
Post #4


ReplayGain developer


Group: Developer
Posts: 5136
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (Nessuno @ Mar 23 2013, 10:00) *
QUOTE (2Bdecided @ Mar 22 2013, 16:37) *
Let's think outside the box: of course CD audio is lossy. It's lossy compared to the stereo source signal. You could even say that it has a very simple static psychoacoustic and listening room model: it assumes we cannot hear anything above 20kHz*, and that noise 90dB* down from peak signal level will typically be inaudible. Those assumptions aren't entirely true 100% of the time, but they'll do for almost any music listening you can imagine.

Kind of reductio ad infinitum: if we go along with this line of reasoning, we should say that what our hearing system transmits to our brain is a lossy version of the real soud event
...which it absolutely is...
QUOTE
and air molecules vibration is a sampled and quantized lossy reproduction of actual instrument's vibrations...
But, if a tree falls in the forest when nobody's there, does it make any noise? wink.gif
I don't want to go that far. wink.gif I only care about what we can hear. Which is exactly what you said...

QUOTE
As I see it, sound, or better, music production is all about psychoacoustic and the reference model is always our hearing system, so arguing about bandwidth and SNR limits of CD format (*) means willing to (re)produce something that not only nobody could realistically hear, but that wasn't even in composer's or player's or instrument builder's mind in the first place!
My point was that, in a really esoteric discussion like this one, we have to be 100% clear what we mean by transparent, and what we mean by psychoacoustic model. It's a failure to understand these two things properly that lets the OP make some statements that many reading here will judge to be ridiculous.

If you can hear 23kHz tones (some people can), CD isn't transparent, by any accurate definition of that word.

Cheers,
David.
Go to the top of the page
+Quote Post
db1989
post Mar 28 2013, 13:41
Post #5





Group: Super Moderator
Posts: 5275
Joined: 23-June 06
Member No.: 32180



QUOTE (2Bdecided @ Mar 28 2013, 09:34) *
If you can hear 23kHz tones (some people can), CD isn't transparent, by any accurate definition of that word.
Such people are very rare, at least as adults. In any case, being able to hear a tone does not necessarily predict what one can and cannot here in actual music where, by definition, multiple tones are composited. As usual, a DBT is the only way to assess transparency or the lack thereof in such a case, and I suspect even people who can hear beyond 20 kHz in pure tones might not have such luck with actual musical signals.
Go to the top of the page
+Quote Post
DonP
post Mar 28 2013, 17:45
Post #6





Group: Members (Donating)
Posts: 1471
Joined: 11-February 03
From: Vermont
Member No.: 4955



QUOTE (db1989 @ Mar 28 2013, 07:41) *
QUOTE (2Bdecided @ Mar 28 2013, 09:34) *
If you can hear 23kHz tones (some people can), CD isn't transparent, by any accurate definition of that word.
Such people are very rare, at least as adults. In any case, being able to hear a tone does not necessarily predict what one can and cannot here in actual music where, by definition, multiple tones are composited. As usual, a DBT is the only way to assess transparency or the lack thereof in such a case, and I suspect even people who can hear beyond 20 kHz in pure tones might not have such luck with actual musical signals.


First, music is not by definition multiple tones at once. It could be a single line melody, or even just rhythm on a single note. Or a CD could have non musical audio.

Second, depending on masking to make it transparent takes it into the realm of lossy, which was the original point of this sub-topic.

Third, why limit the domain to people old enough to have presumably reduced hearing?





Go to the top of the page
+Quote Post
db1989
post Mar 28 2013, 17:50
Post #7





Group: Super Moderator
Posts: 5275
Joined: 23-June 06
Member No.: 32180



QUOTE (DonP @ Mar 28 2013, 16:45) *
First, music is not by definition multiple tones at once. It could be a single line melody, or even just rhythm on a single note. Or a CD could have non musical audio.
OK, then allow me to clarify what I hoped was clear but perhaps was badly worded: by multiple tones, I meant timbres more complex than single sinewaves.

QUOTE
Third, why limit the domain to people old enough to have presumably reduced hearing?
I have no desire to do this.

I was making some general and simplistic points about the ability to hear pure tones at a given frequency vs. that frequency’s relevance in common types of material containing multiple harmonics. I’m not trying to reshape how people develop codecs or claim that I know better. Developers are obviously free to test and process in whichever ways and on whichever types of material, ‘realistic’ or not, they choose. softrunner in particular might need some radical new methodologies to get this project off the ground…
Go to the top of the page
+Quote Post

Posts in this topic
- softrunner   An idea of audio encode algorithm, based on maximum allowed volume of   Mar 6 2013, 00:11
- - saratoga   QUOTE Then the idea of encode algorithm came into ...   Mar 6 2013, 00:21
|- - softrunner   QUOTE (saratoga @ Mar 6 2013, 03:21) The ...   Mar 6 2013, 00:36
|- - saratoga   QUOTE (softrunner @ Mar 5 2013, 18:36) I...   Mar 6 2013, 00:58
- - greynol   None of the lossy codecs commonly discussed on thi...   Mar 6 2013, 02:22
- - DVDdoug   softrunner, If you want to demonstrate to yoursel...   Mar 6 2013, 21:01
- - C.R.Helmrich   QUOTE (softrunner @ Mar 6 2013, 00:11) ju...   Mar 6 2013, 21:21
|- - softrunner   QUOTE (saratoga @ Mar 6 2013, 03:21) The ...   Mar 7 2013, 16:59
|- - 2Bdecided   QUOTE (softrunner @ Mar 7 2013, 15:59) We...   Mar 7 2013, 17:14
|- - greynol   LossyWAV is commonly discussed here and I lamented...   Mar 7 2013, 18:05
||- - saratoga   QUOTE (greynol @ Mar 7 2013, 12:05) Lossy...   Mar 7 2013, 20:25
|- - db1989   QUOTE (softrunner @ Mar 7 2013, 15:59) We...   Mar 7 2013, 18:51
||- - Canar   QUOTE (softrunner @ Mar 7 2013, 15:59) We...   Mar 7 2013, 20:20
|- - Nessuno   QUOTE (softrunner @ Mar 7 2013, 16:59) Bu...   Mar 7 2013, 20:54
|- - C.R.Helmrich   Indeed. Softrunner, if you want mathematical close...   Mar 7 2013, 22:51
- - greynol   @Canar: Please show me a lossy algorithm with no ...   Mar 7 2013, 20:29
|- - Canar   QUOTE (greynol @ Mar 7 2013, 11:29) Pleas...   Mar 7 2013, 20:32
- - softrunner   QUOTE (2Bdecided @ Mar 7 2013, 20:14) You...   Mar 9 2013, 03:09
|- - saratoga   QUOTE (softrunner @ Mar 8 2013, 21:09) QU...   Mar 9 2013, 04:00
|- - greynol   QUOTE (softrunner @ Mar 8 2013, 18:09) Th...   Mar 9 2013, 08:31
|- - Nessuno   softrunner, you evidently lack the theorical bases...   Mar 9 2013, 10:15
|- - db1989   In support of Nessuno’s conclusions, as well as th...   Mar 9 2013, 11:53
||- - greynol   QUOTE (db1989 @ Mar 9 2013, 02:53) * And ...   Mar 9 2013, 17:55
|- - 2Bdecided   QUOTE (softrunner @ Mar 9 2013, 02:09) Th...   Mar 12 2013, 10:47
||- - Dynamic   Lossless is the only true guarantee. LossyWAV...   Mar 12 2013, 12:42
|- - C.R.Helmrich   QUOTE (softrunner @ Mar 9 2013, 03:09) QU...   Mar 12 2013, 21:46
- - Gecko   On a very basic level, lossy encoders have a mecha...   Mar 9 2013, 12:06
- - greynol   So WavPack does have a psychoacoustic model?   Mar 9 2013, 17:46
|- - Gecko   QUOTE (greynol @ Mar 9 2013, 17:46) So Wa...   Mar 10 2013, 17:10
- - greynol   If you know then say.   Mar 10 2013, 17:50
- - Gecko   Well, since Wavpack lossy doesn't just discard...   Mar 10 2013, 19:16
- - greynol   Sorry, but that really doesn't cut it. Could ...   Mar 10 2013, 19:31
- - Gecko   In that case, maybe I need to revise my definition...   Mar 11 2013, 18:49
- - pdq   Can you play the correction file to a Wavpack loss...   Mar 11 2013, 19:25
- - Gecko   I tried the old inversion trick on a drum & ba...   Mar 11 2013, 20:02
|- - bryant   QUOTE (Gecko @ Mar 11 2013, 11:02) I trie...   Mar 28 2013, 04:59
- - db1989   Premises: (1) If a residual signal created by mixi...   Mar 11 2013, 20:21
- - greynol   For the record, I'm not in any position to def...   Mar 11 2013, 20:56
|- - Nessuno   QUOTE (greynol @ Mar 11 2013, 20:56) At a...   Mar 11 2013, 21:57
- - softrunner   QUOTE (2Bdecided @ Mar 12 2013, 13:47) QU...   Mar 22 2013, 03:16
|- - saratoga   QUOTE (softrunner @ Mar 21 2013, 21:16) Q...   Mar 22 2013, 03:24
|- - Gecko   QUOTE (softrunner @ Mar 22 2013, 03:16) B...   Mar 22 2013, 08:48
|- - db1989   QUOTE (softrunner @ Mar 22 2013, 02:16) Q...   Mar 22 2013, 11:51
||- - 2Bdecided   QUOTE (db1989 @ Mar 22 2013, 10:51) QUOTE...   Mar 22 2013, 14:57
||- - db1989   QUOTE (2Bdecided @ Mar 22 2013, 13:57) QU...   Mar 22 2013, 15:20
|- - 2Bdecided   QUOTE (softrunner @ Mar 22 2013, 02:16) I...   Mar 22 2013, 14:43
- - jmvalin   Hey everyone, I just had this great idea that shou...   Mar 22 2013, 07:41
- - 2Bdecided   Sorry db1989, I'm not trying to personally att...   Mar 22 2013, 16:37
|- - db1989   QUOTE (2Bdecided @ Mar 22 2013, 15:37) So...   Mar 22 2013, 18:24
|- - Nessuno   QUOTE (2Bdecided @ Mar 22 2013, 16:37) Le...   Mar 23 2013, 11:00
|- - 2Bdecided   QUOTE (Nessuno @ Mar 23 2013, 10:00) QUOT...   Mar 28 2013, 10:34
|- - db1989   QUOTE (2Bdecided @ Mar 28 2013, 09:34) If...   Mar 28 2013, 13:41
||- - 2Bdecided   QUOTE (db1989 @ Mar 28 2013, 12:41) QUOTE...   Mar 28 2013, 17:30
|||- - db1989   QUOTE (2Bdecided @ Mar 28 2013, 16:30) Ah...   Mar 28 2013, 17:38
||- - DonP   QUOTE (db1989 @ Mar 28 2013, 07:41) QUOTE...   Mar 28 2013, 17:45
||- - db1989   QUOTE (DonP @ Mar 28 2013, 16:45) First, ...   Mar 28 2013, 17:50
||- - Nessuno   QUOTE (db1989 @ Mar 28 2013, 17:50) QUOTE...   Mar 28 2013, 22:29
|- - jmvalin   QUOTE (2Bdecided @ Mar 28 2013, 05:34) It...   Mar 28 2013, 19:42
- - 2Bdecided   RE: An idea of audio encode algorithm, based on maximum allowed volume of   Mar 22 2013, 18:49
- - softrunner   QUOTE (saratoga @ Mar 22 2013, 06:24) QUO...   Mar 25 2013, 03:10
|- - lvqcl   QUOTE (softrunner @ Mar 25 2013, 06:10) A...   Mar 25 2013, 16:12
|- - probedb   QUOTE (softrunner @ Mar 25 2013, 02:10) N...   Mar 25 2013, 16:44
|- - Gecko   QUOTE (softrunner @ Mar 25 2013, 03:10) N...   Mar 25 2013, 18:22
- - greynol   Thanks for chiming-in, David!   Mar 28 2013, 07:20
- - 2Bdecided   I think he implied a noise floor relative to peak ...   Mar 28 2013, 21:19
- - jmvalin   QUOTE (2Bdecided @ Mar 28 2013, 16:19) I ...   Mar 28 2013, 21:49
- - 2Bdecided   QUOTE (jmvalin @ Mar 28 2013, 20:49) I...   Mar 29 2013, 12:16


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 18th September 2014 - 11:29