IPB

Welcome Guest ( Log In | Register )

2 Pages V  < 1 2  
Reply to this topicStart new topic
lossywav for lossy codecs?
doccolinni
post Mar 19 2010, 16:11
Post #26





Group: Members
Posts: 173
Joined: 28-May 09
From: Zagreb, Croatia
Member No.: 70204



QUOTE (halb27 @ Mar 19 2010, 16:07) *
Just theoretical reasoning whether an idea works or not and theoretical reasoning about why it doesn't is a bit strange.

Why? We're humans, that's what we do.
Go to the top of the page
+Quote Post
2Bdecided
post Mar 19 2010, 16:30
Post #27


ReplayGain developer


Group: Developer
Posts: 5137
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (doccolinni @ Mar 19 2010, 15:03) *
The whole point of below-the-noise-floor-bit-zeroing was to remove the bits that are noisy, and now when you've done that you want to also make more bits (that were previously not noise) noisy. A good related expression is "you're sawing the branch on which you're sitting."
That's true, and it's certainly possible to remove more bits while increasing the bitrate (the opposite of what you want) - but there's probably a sweet spot somewhere between no noise shaping (just quantise to the noise floor), and 100% noise shaping to the masked threshold.

Whether it's worth it, I don't know. It seemed to be somewhat worth it (if by worth it, you mean "the bitrate for transparent coding decreases") when I played with it at the start (unreleased due to a Sony patent which may cover part of it - due to expire this year IIRC), but lossyWAV without noise shaping is now more efficient than my early effort with noise shaping, so who knows.

SebG's right that, at the limit, it makes lossyWAV and mp3 (for example) very similar in several ways - but you're right that adding the maximum possible amount of inaudible noise isn't necessarily the most efficient thing with lossyWAV. It isn't with mp3 either - but then mp3 encoders only "add noise" which saves bits - it's not immediately obvious to me how you'd do that with lossyWAV sitting outside a lossless codec - other than trial and error or some kind of adaptive/learning algorithm which figured out what upset the specific lossless codec, and avoided doing it.

A more important point is this: lossyWAV is, or maybe, somewhat useful if/because it doesn't interact in nasty ways with psychoacoustic based codecs. You'll lose this if you introduce noise shaping into lossyWAV, so it's not better for those users.

We've had this discussion at least once, quite a while ago.

Cheers,
David.
Go to the top of the page
+Quote Post
halb27
post Mar 19 2010, 16:35
Post #28





Group: Members
Posts: 2435
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



QUOTE (doccolinni @ Mar 19 2010, 16:11) *
QUOTE (halb27 @ Mar 19 2010, 16:07) *
Just theoretical reasoning whether an idea works or not and theoretical reasoning about why it doesn't is a bit strange.

Why? We're humans, that's what we do.

Yes, we humans do a lot of strange things.
Luckily we don't do it in every single situation. In our situation look at it from the scientist's point of view (think of Popper): one falsification of a theory is sufficient to make the theory obsolete. In our case the falsification is simply done by applying the approach to a number of tracks, something the OP could have easily done before posting. That's all.

This post has been edited by halb27: Mar 19 2010, 16:37


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
SebastianG
post Mar 19 2010, 16:52
Post #29





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



The original question has been answered. I just take issue with some statements doccolinni made. If I got him right, he was questioing the use of "a proper psymodel" for lossyWav. He also said that lossyWav's only freedom is to decide the number of "bits to remove" for every block. This is not true, because it could also decide about the noise's spectral shape. Lots of things I read w.r.t. "noise floor" is misleading. The concept of a "noise floor" that is determined by the "lowest FFT bin" (in spirit) is pretty much useless. There's no reason why lossyWav should be restricted to adding white noise. If you know what you're doing you can get a much better quality-per-bit ratio.

Of course, if you don't know what you're doing, some of the bits you "gained" (by setting them to zero) are lost again when Flac fails to predict the signal. But this happens only if the signal-to-noise ratio is locally very low somewhere. It's not the case if the noise stays locally below the signal everywhere. And by "locally" I refer to both, time and frequency. Since we listen to music and not white noise, we can add colored noise and still stay locally well below the signal. LossyWav is currently "over-coding" much of the lower frequency spectrum.

This post has been edited by SebastianG: Mar 19 2010, 16:53
Go to the top of the page
+Quote Post
doccolinni
post Mar 19 2010, 16:58
Post #30





Group: Members
Posts: 173
Joined: 28-May 09
From: Zagreb, Croatia
Member No.: 70204



QUOTE (SebastianG @ Mar 19 2010, 16:52) *
The original question has been answered. I just take issue with some statements doccolinni made. If I got him right, he was questioing the use of "a proper psymodel" for lossyWav. He also said that lossyWav's only freedom is to decide the number of "bits to remove" for every block.

No, that is not what I was trying to say. Unfortunately I'm too sleepy and tired at the moment to rummage through the things that I've posted and correct the things I've said wrongly to lead you to that impression, so I guess you'll just have to believe me that I didn't mean to say that. laugh.gif

Seriously.
Go to the top of the page
+Quote Post
SebastianG
post Mar 19 2010, 17:05
Post #31





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



QUOTE (doccolinni @ Mar 19 2010, 16:58) *
No, that is not what I was trying to say.


Weird. Because this quote ....

QUOTE (doccolinni) *
[...] but I am unsure though at the moment how precisely could lossyWAV benefit from a proper psymodel when basically all the freedom lossyWAV has got to change the bit values in a single block is one bit - the least significant one of those that remain non-zeroed, because all less significant end up being zeroed and all more significant shouldn't be changed because they're above the noise floor. In fact I have a feeling that anything other than rounding would just produce worse results.

...sounds just like it to me.
Go to the top of the page
+Quote Post
doccolinni
post Mar 20 2010, 09:16
Post #32





Group: Members
Posts: 173
Joined: 28-May 09
From: Zagreb, Croatia
Member No.: 70204



QUOTE (SebastianG @ Mar 19 2010, 17:05) *
QUOTE (doccolinni @ Mar 19 2010, 16:58) *
No, that is not what I was trying to say.


Weird. Because this quote ....

QUOTE (doccolinni) *
[...] but I am unsure though at the moment how precisely could lossyWAV benefit from a proper psymodel when basically all the freedom lossyWAV has got to change the bit values in a single block is one bit - the least significant one of those that remain non-zeroed, because all less significant end up being zeroed and all more significant shouldn't be changed because they're above the noise floor. In fact I have a feeling that anything other than rounding would just produce worse results.

...sounds just like it to me.

What I meant by that is my feeling is that if lossyWAV diffused the quantisation error to more than just the least significant of the remaining non-zeroed bits it would only make the lossless encoding of the remaining bits more difficult. I admit, saying that only rounding would work well was a bit of an overkill, but if lossyWAV keeps "spoiling" the predictable bits that remain I think it would just make things worse. Or maybe it wouldn't, but then the quantisation error diffusion has to be done in a predictable way. Dithering may not help in this case because it produces pure white noise, but noise shaping might. However, even though noise shaping does produce noise that is more prominent on some frequencies than others and that would help make the added noise more difficult to hear for a human if you transfer more noise to the highest frequencies, it's still quite a random noise in itself and would still be relatively difficult to compress losslessly.

This post has been edited by doccolinni: Mar 20 2010, 09:17
Go to the top of the page
+Quote Post
SebastianG
post Mar 20 2010, 10:45
Post #33





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



QUOTE (doccolinni @ Mar 20 2010, 09:16) *
What I meant by that is my feeling is that if lossyWAV diffused the quantisation error to more than just the least significant of the remaining non-zeroed bits it would only make the lossless encoding of the remaining bits more difficult.

Understood. But I disagree. It depends on what you're doing. What should be done is actually not really difficult to figure out.

QUOTE (doccolinni @ Mar 20 2010, 09:16) *
Dithering may not help in this case because it produces pure white noise, but noise shaping might.

If noise shaping is done the way I keep suggesting it, dithering won't be necessary to suppress nonlinear artefacts -- provably. If it's done like I suggest, all the conditions for when a signal is "self-dithering" apply automatically.

QUOTE (doccolinni @ Mar 20 2010, 09:16) *
However, even though noise shaping does produce noise that is more prominent on some frequencies than others and that would help make the added noise more difficult to hear for a human if you transfer more noise to the highest frequencies, it's still quite a random noise in itself and would still be relatively difficult to compress losslessly.

It depends on how the noise relates to the signal. As I said multiple times already: If the noise stays locally below the signal everywhere (say, an SNR of at least 10 dB), all the bits that are gained by zeroing won't effect the signal's predictability in any significant way. Here's why: Keep in mind that the amount of bits per sample you need when doing linear prediction roughly equals the sum over f of log2(1+0.5*psd[f])+o where log2 is the base-2 logarithm and psd[f] refers to the power density at frequency f and o is some offset. With this in mind lets predict the gain in power density for various SNRs. For example, if the SNR is 10 dB, the power gain (assuming non-correlated noise) will be 0.4 dB which translates to 0.07 bits/sample for that particular "region". For lower frequencies we'd be using much higher SNRs because that's what typical psychoacoustic models suggest. Higher SNRs mean lower gains in power. So, at average you'll lose probably 0.02 bits/sample while gaining a lot more due to the zeroed LSBs.

The concept of "noise floor" and "bits above the noise floor" is useless. Just think about what FLAC's doing. Think about how linear predictive coding works. Linear prediction produces a near-"white noise" residual and all the bits you spend for coding is due to the power of this whitish noise. Suddenly the residual is all "noise floor" and nothing predictable anymore. If you think about how what I'm suggesting affects the residual you'll come to the conclusion that the residual -- assuming the same LPC analysis filter -- whill stay near white at roughly the same power which requires roughly the same amount of bits to encode. Q.E.D.

This post has been edited by SebastianG: Mar 20 2010, 10:52
Go to the top of the page
+Quote Post
doccolinni
post Mar 20 2010, 12:37
Post #34





Group: Members
Posts: 173
Joined: 28-May 09
From: Zagreb, Croatia
Member No.: 70204



QUOTE (SebastianG @ Mar 20 2010, 10:45) *
QUOTE (doccolinni @ Mar 20 2010, 09:16) *
What I meant by that is my feeling is that if lossyWAV diffused the quantisation error to more than just the least significant of the remaining non-zeroed bits it would only make the lossless encoding of the remaining bits more difficult.

Understood. But I disagree. It depends on what you're doing. What should be done is actually not really difficult to figure out.

QUOTE (doccolinni @ Mar 20 2010, 09:16) *
Dithering may not help in this case because it produces pure white noise, but noise shaping might.

If noise shaping is done the way I keep suggesting it, dithering won't be necessary to suppress nonlinear artefacts -- provably. If it's done like I suggest, all the conditions for when a signal is "self-dithering" apply automatically.

QUOTE (doccolinni @ Mar 20 2010, 09:16) *
However, even though noise shaping does produce noise that is more prominent on some frequencies than others and that would help make the added noise more difficult to hear for a human if you transfer more noise to the highest frequencies, it's still quite a random noise in itself and would still be relatively difficult to compress losslessly.

It depends on how the noise relates to the signal. As I said multiple times already: If the noise stays locally below the signal everywhere (say, an SNR of at least 10 dB), all the bits that are gained by zeroing won't effect the signal's predictability in any significant way. Here's why: Keep in mind that the amount of bits per sample you need when doing linear prediction roughly equals the sum over f of log2(1+0.5*psd[f])+o where log2 is the base-2 logarithm and psd[f] refers to the power density at frequency f and o is some offset. With this in mind lets predict the gain in power density for various SNRs. For example, if the SNR is 10 dB, the power gain (assuming non-correlated noise) will be 0.4 dB which translates to 0.07 bits/sample for that particular "region". For lower frequencies we'd be using much higher SNRs because that's what typical psychoacoustic models suggest. Higher SNRs mean lower gains in power. So, at average you'll lose probably 0.02 bits/sample while gaining a lot more due to the zeroed LSBs.

The concept of "noise floor" and "bits above the noise floor" is useless. Just think about what FLAC's doing. Think about how linear predictive coding works. Linear prediction produces a near-"white noise" residual and all the bits you spend for coding is due to the power of this whitish noise. Suddenly the residual is all "noise floor" and nothing predictable anymore. If you think about how what I'm suggesting affects the residual you'll come to the conclusion that the residual -- assuming the same LPC analysis filter -- whill stay near white at roughly the same power which requires roughly the same amount of bits to encode. Q.E.D.

Well as far as I understand it your noise shaping method is about to be implemented in 1.3.0 and I hope it turns out to actually work the way you've envisioned it. I'm not trying to be sarcastic BTW, it may or may not work, I just seriously hope it does! The way you explain it it actually sounds like it should work, although I still doubt that it's going to work right out of the box as soon as it gets implemented and will require a bit of testing and fine-tuning.
Go to the top of the page
+Quote Post
2Bdecided
post Mar 22 2010, 10:49
Post #35


ReplayGain developer


Group: Developer
Posts: 5137
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (doccolinni @ Mar 20 2010, 11:37) *
I still doubt that it's going to work right out of the box as soon as it gets implemented and will require a bit of testing and fine-tuning.
Undoubtedly - the closer you push things to the limit of audibility, the more accurate the psychoacoustic model needs to be to prevent audible artefacts from slipping through.

What SebG has been describing for years is certainly right - but don't read from his last post that you could just take FLAC, keep the LPC bits, dump all the non-predictable (i.e. stored as-is) bits, and have something that sounds nice. There's a mismatch between what we hear, and what the LPC can predict. If the two matched perfectly, psychoacoustic codecs would be trivial to get right.


Another sober thought is what kind of gains noise shaping can bring. SebG is right that we're overcoding most signals by a significant amount - but often that "overcoding" is small compared to the amount that's already been thrown away (at the bottom of the dynamic range) by lossyWAV. Remember any "unused" part at the top of the dynamic range (i.e. when the signal is quiet) has always been efficiently stored by FLAC. For many signals, lossyWAV+FLAC is spending bits on the middle chunk of dynamic range where the signal actually sits. Noise shaping helps you to focus this more carefully, using even fewer bits. How many fewer? I'd love to know!

Cheers,
David.

This post has been edited by 2Bdecided: Mar 22 2010, 10:50
Go to the top of the page
+Quote Post
Garf
post Mar 22 2010, 17:38
Post #36


Server Admin


Group: Admin
Posts: 4885
Joined: 24-September 01
Member No.: 13



QUOTE (2Bdecided @ Mar 19 2010, 16:30) *
A more important point is this: lossyWAV is, or maybe, somewhat useful if/because it doesn't interact in nasty ways with psychoacoustic based codecs. You'll lose this if you introduce noise shaping into lossyWAV, so it's not better for those users.


To be honest, I think there is nothing wrong with a patent-free scalable-to-lossless hardware-supported psychoacoustic codec, which is what you'd be making.

Clearly someone thought there was a case for MPEG 4 SLS, for example.
Go to the top of the page
+Quote Post
2Bdecided
post Mar 22 2010, 18:48
Post #37


ReplayGain developer


Group: Developer
Posts: 5137
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



That's a great way of looking at it.

The codec would be scalable, but the streams wouldn't be (e.g. bitrate pealing, at more than a single lossless vs lossy granularity) would be a challenge, I think.

Cheers,
David.
Go to the top of the page
+Quote Post
moozooh
post Mar 23 2010, 00:54
Post #38





Group: Members
Posts: 357
Joined: 22-September 04
From: Moscow
Member No.: 17192



I guess it's somewhat orthogonal (thanks for the word, I'm going to use it more often now!) to the discussion on the last 1.5 pages or so, but is there a way for WavPack lossy to take advantage of LossyWAV, or does it do basically the same thing to the original signal so there is no need to mix them? Thanks in advance.


--------------------
Infrasonic Quartet + Sennheiser HD650 + Microlab Solo 2 mk3. 
Go to the top of the page
+Quote Post
Dynamic
post Mar 23 2010, 03:33
Post #39





Group: Members
Posts: 819
Joined: 17-September 06
Member No.: 35307



{edit: started this post, then came back to it. Now I see moozooh also mentions Wavpack's approach in the post above. I'll leave my text unmodified, however}

I going off on a bit of a wild tangent that I haven't looked into fully, and I expect a number of ideas to be shot down, but maybe there's something viable, though I suspect this whole idea would be a heck of a lot of work.

I wonder if the Wavpack hybrid approach would allow bitrate-peeling (or preferably VBR quality layering). David Bryant's book chapter about the design of wavpack was a good read if you're interested in the way he coded the residual to allow the lossy part to use a curtailed code, while the least significant part was put straight into the correction file.

The predictor has to be based on the lossy version's decoded signal so that if the correction file isn't used it still decodes properly and doesn't drift.

If you had multiple degrees of lossiness you'd base the predictor on the most lossy then provide correction files for each level of scaling towards lossless that you wish to retain. As an example, you might envisage something like:
1) full agressive psymodel VBR aimed at transparency, guessing perhaps somewhere in the range 200-300 kbps, with strong noise shaping to exploit tonal masking, noise masking and temporal masking close to the psymodel's predicted limits.
2) less agressive psymodel approach or something like lossyWAV portable, perhaps 350-400 kbps (and portable would be readily transcodable to FLAC)
3) lossyWAV standard, perhaps 450-550 kbps, probably a suitable near-lossless archive for transcoding to conventional lossy.
4) lossless

You might also add
0) Non-transparent VBR, aimed at decent non-critical listening. Pure guess, maybe 150-200 kbps.

Whichever level is lowest is the waveform on which the predictor must act in case the higher levels of residual correction are not provided (or fail to verify).

The file format could be a simple form of interleaving in time chunks (similar to audio and video interleaving in AVI or MP4 containers for example) in which you'd interleave as many layers as you wished to keep for each purpose or strip only those you need.

The WavPack residual approach might allow finer temporal variation of allowed noise, finer than the codec block size of 512 samples that's typical of lossyWAV. It wouldn't necessarily exploit the wasted bits feature in WV. In conjunction with a psymodel, I'd imagine this could be exploited aggressively for temporal pre-masking and post-masking curves that allow more broad band noise near transient events, mainly for some milliseconds after them, but also for a smaller time before them (which must be controlled to minimise pre-echo).

In conjunction with frequency masking it's not as simple as it might sound. I don't know if it's possible to aggressively noise shape at one layer and progressively add better residual coding to lower noise and flatten the noise spectrum, or whether instead one would have to retain the spectral shaping and simply lower the noise across the spectrum by improving the residual accuracy.

The other approach to this sort of thing might be if there were a lossless reversible transformation into subbands that might still be compatible with predictor/residual coding. But I suspect the types of transform that work for Musepack and MP2 aren't reversible in integer mathematics, so wouldn't allow scaling to lossless.

Is this all rubbish?

This post has been edited by Dynamic: Mar 23 2010, 03:35
Go to the top of the page
+Quote Post
Garf
post Mar 23 2010, 07:49
Post #40


Server Admin


Group: Admin
Posts: 4885
Joined: 24-September 01
Member No.: 13



QUOTE (2Bdecided @ Mar 22 2010, 18:48) *
That's a great way of looking at it.

The codec would be scalable, but the streams wouldn't be (e.g. bitrate pealing, at more than a single lossless vs lossy granularity) would be a challenge, I think.

Cheers,
David.


I think you can operate on the residuals directly (of course, then we go beyond lossywav). But I'm still trying to wrap my head around the results of doing that.
Go to the top of the page
+Quote Post

2 Pages V  < 1 2
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 18th September 2014 - 12:49