IPB

Welcome Guest ( Log In | Register )

9 Pages V  « < 7 8 9  
Closed TopicStart new topic
Near-lossless / lossy FLAC, An idea & MATLAB implementation
Nick.C
post Oct 2 2007, 20:41
Post #201


lossyWAV Developer


Group: Developer
Posts: 1791
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



QUOTE (SebastianG @ Jun 13 2007, 15:23) *
If you want a similar preprocessing for FLAC or WavPack you'd do something like this:
- estimate LPC filter coeffs (H(z)) and temporarily filter the block to get the residual
- check the residual's power and select "wasted_bits" accordingly
- quantize original (unfiltered) samples so that the "wasted_bits" least sigcificant bits are zero
- use 1/H(z) as noise shaping filter.
Sebastian,

This is the second time that David has pointed me in the direction of your suggestion - unfortunately, I am unable to take these concepts and convert to code as I have no idea where to start as to the algorithms that are required. If you have any second-hand code which you would be willing to share, I would gratefully receive it and attempt to implement it in the lossyWAV Delphi project.

Best regards,

Nick.


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
2Bdecided
post Oct 3 2007, 11:16
Post #202


ReplayGain developer


Group: Developer
Posts: 5142
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



Nick,

Sorry, not that bit. I've already done that bit, but haven't released it due to concern over a Sony patent.

The part I meant was this...

QUOTE (SebastianG @ Jun 13 2007, 15:23) *
If you further check what psychoacoustic models usually do you'll notice that they allocate more bits to lower frequencies than to higher frequencies (higher SNR for lower freqs) most of the time. You then can tweak the noise shaping filter to W(z)/H(z) where W(z) is some fixed weighting so that you have a higher SNR for lower freqs.
QUOTE (SebastianG @ Jun 13 2007, 17:42) *
(I derived W(z) by feeding OggEnc with mono pink noise).

...where you can use that weighting for exactly what you're doing now.

So it's "just": feed noise into Ogg, subtract input from output, check noise (implies SNR) at given frequencies using, say, spectral view in Cool Edit, and simulate that rough spectral shape in your code.

Just an idea. I keep meaning to try it but have other things to do!

Cheers,
David.

This post has been edited by 2Bdecided: Oct 3 2007, 11:16
Go to the top of the page
+Quote Post
Nick.C
post Oct 3 2007, 11:24
Post #203


lossyWAV Developer


Group: Developer
Posts: 1791
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



Doh! <slaps forehead> That sounds like a plan to me.... I'll get onto it tonight.


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
Nick.C
post Oct 4 2007, 12:27
Post #204


lossyWAV Developer


Group: Developer
Posts: 1791
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



Didn't get round to OGG noise analysis last night, however reading the arstechnica MP3 explanation, it struck me that there may be some merit in the following:

Instead of a spreading function where values are averaged (in the default case over 4 bins), why not take the max of (last_bin,this_bin,next_bin) values, progressively along the fft bin results.

I have made a test implementation and the difference in bits_to_remove (average) between 4 bin average and this 3 bin max seems to be small.

[edit] Well, that was my impression, but when I ran my 52 sample set at default quality, 4 bin averaging = 39.48MB, 3 bin max = 38.83MB; [/edit]

[edit2] For Guru's 150 sample set at default quality, 4 bin averaging = 89.56MB, 3 bin max = 87.99MB;

Maybe, averaging the two highest values, disregarding the minimum value would be better - I'll try it. [/edit2]

[edit3] For Guru's 150 sample set, at default quality, 2-highest-of-3-average = 90.86MB; [/edit3]

[edit4] For my 52 sample set, at default quality, 2-highest-of-3-average = 40.23MB; [/edit4]

This post has been edited by Nick.C: Oct 4 2007, 14:20


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
Nick.C
post Oct 8 2007, 12:39
Post #205


lossyWAV Developer


Group: Developer
Posts: 1791
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



Looking at the way that bit reduction / dither noise is calculated for each of the dither options, it appears that I neglected to ensure that the rounded value remained within the permissible sample limits when calculating the noise from rounding and dithering. I have re-written my noise calculation subroutine and will revise the constants used in the code to recreate the dither noise surfaces (1..32 bits x 6..15 bit fft length x 3 dither options).

On the experimental spreading function front, I am looking at a spreading function which changes from averaging at small fft lengths to simple maximum at long fft lengths as follows:

CODE
  begin
    pcll:=low_frequency_bin[analysis_number]-1;
    pchl:=high_frequency_bin[analysis_number]-1;

    for pci:=0 to pchl-pcll+1 do
    Begin
      v1:=fft_result[pci];
      v2:=fft_result[pci+1];
      v3:=fft_result[pci+2];

      vMax:=max(v1,max(v2,v3));
      vMin:=min(v1,min(v2,v3));
      vTot:=v1+v2+v3;
      vMid:=vTot-vMax-vMin;
      vAvg:=vTot/3;

      Case fft_bit_length[analysis_number] of
         0.. 6 : fft_result2[pci+1]:=(vAvg);
         7     : fft_result2[pci+1]:=(vMax*1.50+vMid+vMin*0.5)/3;
         8     : fft_result2[pci+1]:=(vMax*2.00+vMid)/3;
         9     : fft_result2[pci+1]:=(vMax*2.50+vMid*0.5)/3;
        10..15 : fft_result2[pci+1]:=(vMax);
      End;
    End;
  end;


This post has been edited by Nick.C: Oct 8 2007, 12:42


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
SebastianG
post Oct 8 2007, 15:21
Post #206





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



Hi, Nick, David!

From what I understand you are looking for some kind of weighting to determine the wasted_bits count, right? I'm not sure whether the weighting trick I described is appropriate here since I used this filter for noise shaping. I calculated the amount of bits to use for steganography (in your case wasted_bits) solely based on the power of the linear prediction residual. Combined with the fixed non-recursive part of the noise shaper the effect was quantization noise with a more or less constant (constant over time) SNR for a specific frequency region.

To be honest I really don't understand why you guys insist on introducing white-only noise. It's like travelling from A (lossless) to B (perceptual lossy) and stopping right in the middle where both disadvantages are combined: lossy encoding (B) + high bitrate (A) necessary due to lack of noise shaping.

IMHO the best thing to do here is following Edler, Faller and Schuller: Perceptual Audio Coding Using a Time-Varying Linear Pre- and Post-Filter. Their psychoacoustic analysis results in a "pre-filter" and a "post-filter". The post filter acts like a noise shaper. To make it work for lossy FLAC just
  • skip the prefilter, we don't need it.
  • derive wasted_bits according to the first sample of the post-filter's impulse response. This first sample tells you the optimal quantizer step size.
  • use the ("normalized") post-filter as noise shaping filter. (Normalized: A noise shaping filter's impulse response must start with the coefficient '1' and has an average log response of 0 dB on a linear frequency scale.)
About sharing code: I'd have to locate the source code, first. It's been a while since I touched it. Exactly what are you interested in? The "complicated" part of it was the levinson durbin algorithm. I could share a Java version if you like. It's not hard to find other source code for it with the help of Google, I suppose. If you want to follow the "Edler et al type approach" you could borrow a lot of Speex code for handling the filters.

Cheers!
SG

This post has been edited by SebastianG: Oct 9 2007, 10:06
Go to the top of the page
+Quote Post
2Bdecided
post Oct 9 2007, 12:59
Post #207


ReplayGain developer


Group: Developer
Posts: 5142
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



Hi Seb,

QUOTE (SebastianG @ Oct 8 2007, 15:21) *
To be honest I really don't understand why you guys insist on introducing white-only noise.


1. It works.
2. I didn't stop there. I've done a noise shaping version. See the previous page!
(It's not truly psychoacoustic though)


QUOTE
It's like travelling from A (lossless) to B (perceptual lossy) and stopping right in the middle where both disadvantages are combined: lossy encoding (B) + high bitrate (A) necessary due to lack of noise shaping.


I see both advantages being combined: no problem samples, little or no transcoding issues, lower bitrate than lossless.

You could probably use Vorbis at high bitrates instead, with possibly slightly more transcoding worries. Also I'm not sure you could be so confident with multi-generation coding; set the threshold correctly, and lossyFLAC seems to go many generations (e.g. 50) without issue.


You could, of course, make this a proper psychoacoustic codec, but I'd only do this for fun - what would be the practical point? You'd be forcing the underlying issues of FLAC onto a psychoacoustic codec - why would you do that? Surely it would be much better to use Vorbis or something without these issues? I don't think myself or Nick are up for designing a new psychoacoustic model(!), though I guess we could "borrow" one.


QUOTE
IMHO the best thing to do here is following Edler, Faller and Schuller: Perceptual Audio Coding Using a Time-Varying Linear Pre- and Post-Filter. Their psychoacoustic analysis results in a "pre-filter" and a "post-filter". The post filter acts like a noise shaper. To make it work for lossy FLAC just
  • skip the prefilter, we don't need it.
  • derive wasted_bits according to the first sample of the post-filter's impulse response. This first sample tells you the optimal quantizer step size.
  • use the ("normalized") post-filter as noise shaping filter. (Normalized: A noise shaping filter's impulse response must start with the coefficient '1' and has an average log response of 0 dB on a linear frequency scale.)
About sharing code: I'd have to locate the source code, first. It's been a while since I touched it. Exactly what are you interested in? The "complicated" part of it was the levinson durbin algorithm. I could share a Java version if you like. It's not hard to find other source code for it with the help of Google, I suppose. If you want to follow the "Edler et al type approach" you could borrow a lot of Speex code for handling the filters.


Thank you for this. All pointers greatfully received!

Does it have any IP attached?
What form is the "post-filter" in?

The reason for the first question is obvious! I ask the second because I know what a noise shaping filter should be like (you missed minimum phase off your list) and it's not trivial getting exactly what you want - the LPC-based method delivers filters which check all the boxes - does this one? If not, is "normalization"/conversion easy?

Cheers,
David.
Go to the top of the page
+Quote Post
SebastianG
post Oct 9 2007, 16:24
Post #208





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



Hi Dave,

QUOTE (2Bdecided @ Oct 9 2007, 13:59) *
2. I didn't stop there. I've done a noise shaping version. See the previous page!

Sorry, I wasn't aware of that.

QUOTE (2Bdecided @ Oct 9 2007, 13:59) *
You could, of course, make this a proper psychoacoustic codec, but I'd only do this for fun - what would be the practical point? You'd be forcing the underlying issues of FLAC onto a psychoacoustic codec - why would you do that? Surely it would be much better to use Vorbis or something without these issues?

It depends. Why were you tackling "lossy FLAC" again? biggrin.gif
I just wanted to mention the benefits of noise shaping. Given a specific target bitrate one can maximize the lowest MNR (mask-to-noise ratio) via noise shaping. This means higher quality at the same rate. To guarantee a certain minimum MNR if only white quantization noise is introduced you have to raise the bitrate -- sometimes by a great amount.

QUOTE (2Bdecided @ Oct 9 2007, 13:59) *
I don't think myself or Nick are up for designing a new psychoacoustic model(!), though I guess we could "borrow" one.

Me, neither. smile.gif

QUOTE (2Bdecided @ Oct 9 2007, 13:59) *
Thank you for this. All pointers greatfully received!

Does it have any IP attached?
What form is the "post-filter" in?

The reason for the first question is obvious! I ask the second because I know what a noise shaping filter should be like (you missed minimum phase off your list) and it's not trivial getting exactly what you want - the LPC-based method delivers filters which check all the boxes - does this one? If not, is "normalization"/conversion easy?

I don't know about the IP issue. The pre- and post filters are minimum-phase IIR filters and each other's inverse. They are "just a frequency warped" version of the LPC-based/autocorrelation method where the autocorrelation coefficients are determined by the output of the psychoacoustic model. Frequency warping is used to match the varying bandwidths of the critical bands. Regarding the missing "minimum phase" property: It may not be obvious but it follows from both properties I mentioned. If a filter's impulse response starts with the sample X and the average log response is log(x) then your filter is also a minimum phase filter. By normalizing I just meant scaling the impulse response so X=1. The difference between what Edler et al did and how it can be applied to FLAC is that the varying "post filter" does both, shaping in frequency and shaping in time whereas the noise shaping filter for FLAC can only shape in frequency and shaping in time is done by varying the wasted_bits count. To isolate these you have to extract the "gain" of the post filter which in this case is equal to the first sample. The postfiter (including gain) is supposed to represent the masking curve, so it makes sense to use it as noise shaper.

Edit: You asked about the form of the post filter:
H(z) = 1 / [1 + a1 D(z) + a2 D^2(z) + a3 D^3(z) + ... + an D^n(z) ] (frequency warped all-pole filter)
where D(z) is a non-linear phase all-pass used as a replacement for the z^-1.
To use it as noise shaper is not more difficult than to use it as synthesis filter for linear prediction coding. However, it is a bit tricky because this form includes a delay-free loop in general. Edler et al point to another paper that describes how to resolve that.

Cheers,
SG

This post has been edited by SebastianG: Oct 10 2007, 09:04
Go to the top of the page
+Quote Post
jmvalin
post Oct 11 2007, 00:15
Post #209


Xiph.org Speex developer


Group: Developer
Posts: 481
Joined: 21-August 02
Member No.: 3134



QUOTE (2Bdecided @ Jun 13 2007, 04:31) *
The idea is simple: lossless codecs use a lot of bits coding the difference between their prediction, and the actual signal. The more complex (hence, unpredictable) the signal, the more bits this takes up. However, the more complex the signal, the more "noise like" it often is. It's seems silly spending all these bits carefully coding noise / randomness.

So, why not find the noise floor, and dump everything below it?

This isn't about psychoacoustics. What you can or can't hear doesn't come into it. Instead, you perform a spectrum analysis of the signal, note what the lowest spectrum level is, and throw away everything below it. (If this seems a little harsh, you can throw in an offset to this calculation, e.g. -6dB to make it more careful, or +6dB to make it more aggressive!).


Sounds like you're trying to get the worse from standard lossy and lossless codecs. What you have now is a *lossy* codec that just uses a really crappy psychoacoustic model *and* is stuck with time-domain linear prediction instead of frequency transforms. BTW, the main reason why lossless codecs use time-domain linear prediction is not because it's better. It's only because that's the only sane way of getting back *exactly* what you encoded without numerical errors or having to code irrelevant information. By going lossy anyway, that advantage of LP no longer applies. I can't see any advantage of your idea compared to a lossy codec at very high rate (e.g. Vorbis q10 or something like that).
Go to the top of the page
+Quote Post
2Bdecided
post Oct 11 2007, 11:09
Post #210


ReplayGain developer


Group: Developer
Posts: 5142
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



That's what I like - real encouragement!

Cheers,
David.
Go to the top of the page
+Quote Post
halb27
post Oct 11 2007, 11:37
Post #211





Group: Members
Posts: 2435
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



Please don't feel discouraged.
I think it's okay if somebody thinks there is no use in this approach.
Pure practically minded persons won't consider using it anyway. It's a way to encode for perfectionists or near-perfectionists. And even these are free to prefer a transform codec with a high quality settings if they like to. Ask 5 perfectionists about what they prefer and you'll get (nearly) 5 different answers with possibly underlying strong emotions. BTW it's the same thing with lossless codecs where differences between many codecs are very small. And for the practically minded it's not different: everybody loves his champion though in an overall sense differences between codecs and encoders may be rather small (looking for instance at AAC, Vorbis, and MPC, but at least at high bitrate even MP3 is competetive most of the time).

This post has been edited by halb27: Oct 11 2007, 11:49


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
Nick.C
post Oct 11 2007, 11:43
Post #212


lossyWAV Developer


Group: Developer
Posts: 1791
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



Ran my 52 sample set through OGG aoTuv 4.51 @ 10 and lossyWAV -2 -spread. lossyWAV output is smaller when compressed to FLAC with the corresponding codec_block_size (in this case 1152 samples), 485 kbps vs 488 kbps

Using fb2k bit compare as a quick way to "see" differences, lossyWAV has fewer samples which are different to the lossless original than OGG and a smaller maximum magnitude of difference than OGG.


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
j7n
post Oct 11 2007, 11:52
Post #213





Group: Members
Posts: 813
Joined: 26-April 04
Member No.: 13720



QUOTE (Nick.C @ Oct 11 2007, 13:43) *
Using fb2k bit compare as a quick way to "see" differences, lossyWAV has fewer samples which are different to the lossless original than OGG and a smaller maximum magnitude of difference than OGG.

What happened to the strong argument that audio quality should not be "seen" and codecs not evaluated by substracting...
Go to the top of the page
+Quote Post
Nick.C
post Oct 11 2007, 11:57
Post #214


lossyWAV Developer


Group: Developer
Posts: 1791
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



QUOTE (j7n @ Oct 11 2007, 11:52) *
QUOTE (Nick.C @ Oct 11 2007, 13:43) *
Using fb2k bit compare as a quick way to "see" differences, lossyWAV has fewer samples which are different to the lossless original than OGG and a smaller maximum magnitude of difference than OGG.
What happened to the strong argument that audio quality should not be "seen" and codecs not evaluated by substracting...
Yes, I know, sorry, I won't do it again. However, as lossyWAV only ever rounds a sample to fewer bits the sample value barely changes. Surely fewer changed samples has some merit?

This post has been edited by Nick.C: Oct 11 2007, 12:07


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
2Bdecided
post Oct 11 2007, 12:26
Post #215


ReplayGain developer


Group: Developer
Posts: 5142
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (halb27 @ Oct 11 2007, 11:37) *
Please don't feel discouraged.
I think it's okay if somebody thinks there is no use in this approach.


Thanks halb27. I wasn't discouraged though. It's there for whoever wants to use it, and I'm fully aware of the strengths and weaknesses.

FWIW there are circumstances where a real psychoacoustic model (even backed off from the assumed threshold of audibility by several dB via the use of "insane" quality settings), is still inferior to having no psychoacoustic model at all. The places should be obvious: where the psychoacoustic model is wrong, where the psychoacoustic model is crippled by the format, and where the psychoacoustic model will interact (unpredictably) with something down stream.

LossyFLAC is there for those instances, and for those people who would like to use lossless, but recognise that sometimes you're wasting 1000kbps+ on making a "perfect" copy of something that has been smashed to pieces before it reached you.


It still surprises me that LossyFLAC works as well as it does. I'm very grateful (we should all be very grateful!) to Nick for all the experimenting he's done. He probably felt deflated with positive ABX results to some of his changes, but what it showed was that lossyFLAC is hitting more or less exactly the right bitrate for the technique to work. A higher bitrate doesn't add anything, and a lower bitrate rapidly falls apart. It seems to have a very sharp "sweet spot".

I don't think for one minute all possible issues are ironed out. This low frequency thing has to be nailed properly in a way that makes some sense, so it'll fix problem samples we haven't found yet! Then there is the question of what happens with M/S (surround) decoding. It's easy to add something to prevent problems - but no one has even looked for problems here yet AFAIK. Finally, there are times when dither is necessary, but in the vast majority of times it isn't. I'm wondering if there could be a check for this? It would probably encoding down, but I'll think about it anyway.

Anyway, thank you programmers for all your hard work, and thank you Nick too for spotting some bugs and implementing genuine improvements.

Cheers,
David.


QUOTE (Nick.C @ Oct 11 2007, 11:57) *
QUOTE (j7n @ Oct 11 2007, 11:52) *
QUOTE (Nick.C @ Oct 11 2007, 13:43) *
Using fb2k bit compare as a quick way to "see" differences, lossyWAV has fewer samples which are different to the lossless original than OGG and a smaller maximum magnitude of difference than OGG.
What happened to the strong argument that audio quality should not be "seen" and codecs not evaluated by substracting...
Yes, I know, sorry, I won't do it again. However, as lossyWAV only ever rounds a sample to fewer bits the sample value barely changes. Surely fewer changed samples has some merit?


You can't draw any conclusions about perceived audio quality from this, but there are obvious reasons to test and report this behaviour, e.g. to understand something (not everything) about what the algorithm is doing.

It tells you what you already know though: Ogg makes no attempt to preserve the original samples numerically, while lossyFLAC will, on average, keep the exactly original value 1 in 2^bits_removed samples. This doesn't tell you anything about what it sounds like. Neither does the maximum difference.

Cheers,
David.
Go to the top of the page
+Quote Post
Nick.C
post Oct 11 2007, 23:43
Post #216


lossyWAV Developer


Group: Developer
Posts: 1791
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



I've got to the pre-alpha test stage of the Bark related bin averaging - I haven't managed to listen to anything yet at a high enough volume (everyone else in the house is sleeping!) but on size of output alone, this is an interesting development.

My 52 sample set: WAV: 121.5 MB; FLAC: 68.2MB; lossyWAV -2: 39.5MB; lossyWAV -2 -spread: 35.3MB.

Late now, must sleep - will listen to the samples in the morning.

[edit] Sounds promising (pardon the pun!) Will post as alpha v0.3.7 [/edit]


Development of Bark related bin averaging has stopped in favour of frequency dependent variable length spreading function.

This post has been edited by Nick.C: Oct 16 2007, 22:45


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
SebastianG
post Oct 21 2007, 11:53
Post #217





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



QUOTE (jmvalin @ Oct 11 2007, 01:15) *
Sounds like you're trying to get the worse from standard lossy and lossless codecs. What you have now is a *lossy* codec that just uses a really crappy psychoacoustic model *and* is stuck with time-domain linear prediction instead of frequency transforms. [...] I can't see any advantage of your idea compared to a lossy codec at very high rate (e.g. Vorbis q10 or something like that).


There ARE some advantages, though:
  • Decoding FLAC is really simple. This is not true for transform based methods -- especially if you can't use floating point math.
  • The decoder doesn't need to know anything about how (spectral) noise shaping has been done. Spectral noise shaping is completely in the hands of the encoder and no extra side information needs to be transmitted. In the MP3/AAC case you need to code scalefactors and codebook indices for each scalefactof band.
LPC based methods for perceptual lossy coding can't compete with AAC/MP3 at low bitrates, on that we agree. But at higher bitrates the advantages of MP3/AAC-like methods are probably close to insignificant and outweighed by the LPC method's decoding simplicity, I suppose.

QUOTE (2Bdecided @ Oct 11 2007, 13:26) *
FWIW there are circumstances where a real psychoacoustic model (even backed off from the assumed threshold of audibility by several dB via the use of "insane" quality settings), is still inferior to having no psychoacoustic model at all.

I totally disagree. Having no model at all is for sure inferior to having a model that's a bit off. Also, even if you don't trust the raw output of a psy model you can still enforce some safety conditions like it's possible with MusePack (--minSMR so_and_so).

Maybe we interpret "having no/some psychoacoustic model" differently. Let's say we do 2-pass VBR to achieve some target bitrate. How can an encoder without an idea of how we perceive things perform better than an encoder who knows about psychoacoustics?

Cheers!
SG

This post has been edited by SebastianG: Oct 22 2007, 10:36
Go to the top of the page
+Quote Post
2Bdecided
post Oct 24 2007, 17:14
Post #218


ReplayGain developer


Group: Developer
Posts: 5142
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (SebastianG @ Oct 21 2007, 11:53) *
QUOTE (2Bdecided @ Oct 11 2007, 13:26) *

FWIW there are circumstances where a real psychoacoustic model (even backed off from the assumed threshold of audibility by several dB via the use of "insane" quality settings), is still inferior to having no psychoacoustic model at all.

I totally disagree. Having no model at all is for sure inferior to having a model that's a bit off. Also, even if you don't trust the raw output of a psy model you can still enforce some safety conditions like it's possible with MusePack (--minSMR so_and_so).

Maybe we interpret "having no/some psychoacoustic model" differently. Let's say we do 2-pass VBR to achieve some target bitrate. How can an encoder without an idea of how we perceive things perform better than an encoder who knows about psychoacoustics?
You can't shoot for a given bitrate (CBR or VBR) with lossyFLAC. You can only shoot for a given quality. Even there, options are limited!

As for "backing off a psychoacoustic model" - well, yes, and at some point you will hit/match lossyFLAC. The idea here is to have a codec which delivers transparency, or transparency plus resilience to anything upstream/downstream. What settings should people use to get that with Vorbis or MPC? I have some ideas, but with lossyFLAC it will be -2 and -1 - that's it. If it works! wink.gif

Cheers,
David.
Go to the top of the page
+Quote Post
SebastianG
post Nov 5 2007, 18:32
Post #219





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



(*)
QUOTE
FWIW there are circumstances where a real psychoacoustic model (even backed off from the assumed threshold of audibility by several dB via the use of "insane" quality settings), is still inferior to having no psychoacoustic model at all.

Let's assume it's true. How would you explain it?

QUOTE (2Bdecided @ Oct 24 2007, 17:14) *
You can't shoot for a given bitrate (CBR or VBR) with lossyFLAC. You can only shoot for a given quality. Even there, options are limited!

I know. I was just being hypothetical. In any case (2pass VBR with target bitrate or quality-controlled VBR) an encoder would benefit from a component that estimates the optimal distribution of distortions in the time/frequency plane. Without such a component you'll get highly varying MNRs. What good is a high mask-noise-ratio in some frequency/time region when in another time/frequency region it's too low? The goal needs to be to maximize the minimum mask-noise-ratio.

By saying (*) aren't you implying that the the benefit of a psychoacoustic model is outweighed by its uncertainty? I don't think current models are that bad.

QUOTE (2Bdecided @ Oct 24 2007, 17:14) *
The idea here is to have a codec which delivers transparency, or transparency plus resilience to anything upstream/downstream.

<=> high min(MNR).


Cheers!
SG

This post has been edited by SebastianG: Nov 7 2007, 13:43
Go to the top of the page
+Quote Post

9 Pages V  « < 7 8 9
Closed TopicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 24th September 2014 - 02:50