Near-lossless / lossy FLAC, An idea & MATLAB implementation
Near-lossless / lossy FLAC, An idea & MATLAB implementation
Jun 12 2007, 20:31
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409
This is an (unoriginal) idea / work in progress. I make no claims for it, but it might be interesting or useful for someone. It is not competitive with wavpack lossy. It is not "finished" either! As far as I know, it is 100% compatible with existing recent lossless FLAC implementations.
The idea is simple: lossless codecs use a lot of bits coding the difference between their prediction, and the actual signal. The more complex (hence, unpredictable) the signal, the more bits this takes up. However, the more complex the signal, the more "noise like" it often is. It's seems silly spending all these bits carefully coding noise / randomness.
So, why not find the noise floor, and dump everything below it?
This isn't about psychoacoustics. What you can or can't hear doesn't come into it. Instead, you perform a spectrum analysis of the signal, note what the lowest spectrum level is, and throw away everything below it. (If this seems a little harsh, you can throw in an offset to this calculation, e.g. -6dB to make it more careful, or +6dB to make it more aggressive!).
How is this applied to FLAC? FLAC has a nice featured called "wasted_bits". If it finds all bits below a certain bit are consistently zero, it simply stores: "the bottom 3 bits are all zeros" and then takes no more effort in encoding them. It checks this once per frame. In FLAC frames can be variable length, but current encoders use a fixed 4096 sample length.
This means if you have a 24-bit file, but it only contains 16-bit audio data (i.e. the bottom 8 bits are zero throughout) then FLAC encodes it just as efficiently as a 16-bit file. The only overhead is a few bits every 4096 samples saying "wasted_bits=8".
It also means that if, say, you have a normal 16bit CD and you find the noise floor during a certain 4096 samples never falls below the 12th bit, you can set bits 13-16 to zero, then feed the result to FLAC, and it will automatically use a lower bitrate for that frame than if you fed it all 16 bits.
Hence "lossy FLAC" is a wav pre-processor for regular lossless FLAC. The interim stage is a "lossy" wav file with 0s in some least significant bits. The final output is a 100% compliant FLAC, which faithfully reproduces this "lossy" wav file. The lossy stage is therefore the pre-processor, and the processed "lossy" wav file, when encoded to FLAC, results in a lower bitrate than the original wav file when encoded to FLAC.
Potentially the quality is very near to what you started with, and more than good enough for many applications. In most places where mp3 doesn't work, I believe that lossy FLAC will.
On music which FLAC already compresses very well, lossy FLAC gives little advantage. Often it does exactly nothing (full 16 bits preserved), or nearly nothing (the last bit or two dropped occasionally). On music which causes the FLAC bitrate to go comparatively high, lossy FLAC usually brings a significant gain. I've seen bitrates fall by 20%-50%. Still, it's not low bitrate encoding, and it's pure VBR.
Problem samples? I don't know - I'm hoping some HA regulars can lend their ears and detective skills here. Standard lossy codec problem samples are probably not that relevant. Wavpack lossy problem samples are more relevant, but lossy FLAC does seem to spot some of these and either quantises less aggressively or not at all (i.e. encoding is pure lossless).
So what can people download? Well, sadly, I'm not a C programmer. I'm attaching a MATLAB script that works as a lossy FLAC pre-processor. You run a .wav file through this, and then encode it to FLAC as normal.
If you haven't got MATLAB, but have an idea for a useful sample to test, upload it to HA (maximum 30 seconds; shorter=better because MATLAB is slow and the code isn't optimised at all!) and I'll upload a lossy FLAC version when I get a chance.
I'll post more about the algorithm later.
P.S. the attachment should be "lossyFLAC.m" but HA won't allow me to upload .m, so I've changed it to .txt.
lossyFLAC.txt ( 8.14K ) Number of downloads: 1475
Jun 15 2007, 19:10
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875
Three rather unrelated but still on-topic comments:
(1) I'd like to note that it's not only the "frame size" that should match. This preprocessor and any lossless encoder exploiting zeroed LSBs should be in perfect sync (not only the same frame sizes but the same frame boundary positions).
(2) It's nice to have those isolated tools ("simplifier" and lossless encoder) but this also limits the performance. So one should either go for a combined tool with variable length blocks or a modified lossless encoder which is smart enough to detect varying "wasted_bits" and partitions the stream accordingly.
(3) Here's another technical thought which might be interesting for Thomas in case he wants to add lossy support to TAK:
Selecting "wasted_bits" to be an integer allows an encoder to control the signal-to-noise ratio in steps of 6 dB only. Compared to other lossy codecs (MP3, AAC control the SNR in steps of roughly 1.1 dB = 1.5*(3/4) dB) this 6 dB step size is quite large. This is an old idea of mine of how to get more resolution: Make it probabilistic. You can store in each frame or subframe (you might want to allow changing the resolution within a frame) the information "wasted_bits = x with probability p and x+1 with probability (1-p)" and use the same pseudo-random number generator in encoder and decoder for deciding the "wasted_bits" value per sample. Also you should think about generating the actual "wasted bits" via this RNG instead of zeroing them. This would be equivalent to subtractive dithering and avoids nonlinear distortions. Entropy coding might be a bit more complicated, though.
Per sample coding could be done like this:
wbits = minWasted + RNG.nextfloat()>p ? 1 : 0; // randomly chosen wasted bits count
waste = RNG.nextIntBits(wbits); // randomly generated LSBs
quantized_to_code = round( (float)(current_sample-waste) / 2^wbits ); // sample to code
quantized_actual = (quantized_to_code << wbits) + waste: // dequantized sample
Of course, the encoder's RNG state should match the decoder's (ie. same seed).
Good news: Noise shaping doesn't need to be part of the format specification but can later be added to the encoder without breaking anything.
This post has been edited by SebastianG: Jun 15 2007, 20:03
|Lo-Fi Version||Time is now: 31st July 2015 - 11:51|