Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Near-lossless / lossy FLAC (Read 176321 times) previous topic - next topic
0 Members and 2 Guests are viewing this topic.

Near-lossless / lossy FLAC

Reply #125
Thanks for all the testing and input halb27, Nick.C, shadowking, TBeck, SebG, smok3, and for the file Porcupine. Special thanks to Josh Coalson and David Bryant.

Thanks for your great idea and all the effort you are putting into it!

frame_size = dynamic* or 1024 (lossless target format dependent)
...
frame_size = default* / fixed: 1024 / 2048 / 4096 / 8192 etc (lossless codec and sample rate dependent)
...
* = not tried / implemented yet.

Can you define a maximum resolution (granularity?) for dynamic frame sizes? For example: Variable frame sizes are always an integer multiple of maybe 256 or 128 samples? I assume this would make possible later TAK support for dynamic frame sizes easier.

TAK is always using fixed frame sizes, but then partitioning those frames into an appropriate number of sub frames of variable size. If one of your dynamic frames crosses a frame border (of a frame containing for instance 4096 samples), encoding will be more efficient if not only a handful of samples are falling into the next frame but at least 128 or 256 samples.

Well, i hope i could make it clear despite my bad english...

  Thomas

Near-lossless / lossy FLAC

Reply #126
David, could you possibly link to a revised Matlab source? I am going to try to convert to Scilab and would prefer to start with the "latest" version.

Thanks,

Nick.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Near-lossless / lossy FLAC

Reply #127
Nick,

This is the latest version. It doesn't include any of the algorithm refinements I discussed in my last post.

It's also missing disc buffering. It loads the whole file into memory, which limits the file length to what your PC's memory can hold, which is much less than you might expect with MATLAB.

It's also missing an adjustment to the block boundaries which I need to add.

However, it generated all the files (apart from the truncated ones) I've uploaded recently, so here it is:

[attachment=3387:attachment]

Good luck with Scilab!

Cheers,
David.

Near-lossless / lossy FLAC

Reply #128
Many thanks - this will be a challenge to the rusty cogs in my ageing grey matter.......

[edit]
Oh, and I decided to download GNU Octave rather than Scilab.
[/edit]
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Near-lossless / lossy FLAC

Reply #129
Quote

frame_size = dynamic* or 1024 (lossless target format dependent)
...
frame_size = default* / fixed: 1024 / 2048 / 4096 / 8192 etc (lossless codec and sample rate dependent)
...
* = not tried / implemented yet.

Can you define a maximum resolution (granularity?) for dynamic frame sizes? For example: Variable frame sizes are always an integer multiple of maybe 256 or 128 samples? I assume this would make possible later TAK support for dynamic frame sizes easier.

TAK is always using fixed frame sizes, but then partitioning those frames into an appropriate number of sub frames of variable size. If one of your dynamic frames crosses a frame border (of a frame containing for instance 4096 samples), encoding will be more efficient if not only a handful of samples are falling into the next frame but at least 128 or 256 samples.

Well, i hope i could make it clear despite my bad english...



It's clear.


To take advantage of the most possible "wasted bits", you need a small block size, probably equivalent to half the smallest FFT length (currently that's 64/2=32 at 44.1kHz sampling). Then, if the smaller FFT size is the limiting factor at a given moment, and it allows more wasted bits for a very short time, you can take advantage of this, whereas you couldn't with a longer block size.

This only brings compression advantages if the lossless codec can take advantage of the smaller block size. Otherwise, it doesn't change the compression at all, since the lossless codec will only take advantage of the lowest number of wasted_bits across whatever block size it's working with. Also, it means you're adding more noise than you need to - you're adding noise for no benefit (though it should be inaudible).

So, there's 3 possibilities

1) pre-processor completely independent from lossless codec (static block size).
2) pre-processor separate from lossless codec, but can run lossless codec with (e.g.) -blocksize n command and check resulting filesize / bitrate (dynamic block size). Can choose best blocksize depending on file size. (clunky!)
3) "pre-processor" integrated completely into lossless codec, meaning that the optimal block size can be decided jointly between lossy and lossless parts of algorithm without clunky calling of separate code.

(1) and (3) are obvious.

(2) means you aggressively (minimum block size) pre-process a block or file of audio, and then run it through the lossless codec at a variety of block sizes. Whichever one gives the best compression is the one you use - and you go back, and re-pre-process the file to ensure you only remove the wasted bits that can be taken advantage of with that block size. In other words, you avoid the problem of adding more noise than will bring you benefit.

I don't know whether it's worth doing this - I haven't tried.


The block sizes can be anything that the lossless codec can exploit. I need to check the code is robust for block sizes which aren't related to the FFT size, and there's little benefit to using these from the pre-processor point of view, but no good reason not to if it makes the lossless codec more efficient.

Cheers,
David.

Near-lossless / lossy FLAC

Reply #130
To take advantage of the most possible "wasted bits", you need a small block size, probably equivalent to half the smallest FFT length (currently that's 64/2=32 at 44.1kHz sampling). Then, if the smaller FFT size is the limiting factor at a given moment, and it allows more wasted bits for a very short time, you can take advantage of this, whereas you couldn't with a longer block size.

Well, my question was unnecessarily complicated. Indeed it should have been as simple as "What is the minimum (theoretical) blocksize?" And the Answer is: 32. Thank you.

(2) means you aggressively (minimum block size) pre-process a block or file of audio, and then run it through the lossless codec at a variety of block sizes. Whichever one gives the best compression is the one you use - and you go back, and re-pre-process the file to ensure you only remove the wasted bits that can be taken advantage of with that block size. In other words, you avoid the problem of adding more noise than will bring you benefit.

I don't know whether it's worth doing this - I haven't tried.

I suppose that later tests will show us useful lower limits for the blocksize. Then possibly even such an exhaustive approach will be practicable.  My intuition is telling me, that blocks of 128 or even 256 samples are the minimum for current codec implementations, which are not specifically prepared for the preprocessor.

Some experiments with my own simple preprocessor provided hints for some significant interaction between bit count reduction and predictor efficiency. If many bits had been removed, the predictor lost most of it's efficiency. Depending on the predictability of the signal, the loss can be equivalent to 1 bit per sample. It's pure speculation but possibly it's sometimes even advantegous to take less bits away. But this could be most efficently evaluated with an integrated solution 3).

You have opened a very intersting field for later research...

Near-lossless / lossy FLAC

Reply #131
Some experiments with my own simple preprocessor provided hints for some significant interaction between bit count reduction and predictor efficiency. If many bits had been removed, the predictor lost most of it's efficiency. Depending on the predictability of the signal, the loss can be equivalent to 1 bit per sample. It's pure speculation but possibly it's sometimes even advantegous to take less bits away.
I saw that too in some of my early testing, where I was simply removing bits without caring about the audible consequences to check compression ratios. I haven't seen it yet with the pre-processor, but then I haven't really looked. I thought it happened with "annoyingly loud sample", but when I went back to check, it hadn't. As you say, it's difficult to handle this properly unless integrated within the lossless codec. Just removing all inaudible bits with a 1024 block size seems to work well enough most of the time, though it will be interesting to see how much more efficiency you can squeeze out with a more careful method.

Cheers,
David.

Near-lossless / lossy FLAC

Reply #132
Terrific performance of 2Bdecided's VBR pre-processor on annoyingloudsong, that's great. 1269 kbps --> 342 kbps transparent reduction is very dynamic, so the VBR pre-processor is meeting the goal of true VBR I would say. I didn't realize that this sample would introduce clipping problems (not necessarily audible in this case, but just existing), so that is interesting too I guess. I agree with everything that 2Bdecided said regarding possible solutions to the clipping. In any case, a perfect solution could be integrated perfectly within the codec itself like he said, so I don't really consider it a true problem just a nuisance.

I guess the main thing left to do is to incorporate this kind of VBR algorithm directly into the encoder(s) which might increase overall efficiency of everything (in regards to how much compression can be achieved transparently). Although the VBR is working near-perfectly as is, right now I think the filesize might be a little bit larger on everything than it needs to be, which might require the pre-processor to be incorporated into the encoder for optimal results.

I suppose the VBR algorithm itself could also be improved slightly by using spreading functions and more accurate psychoacoustics rather than just picking the lowest coefficients out of a FFT, but to me it's probably good enough as is. Doing that would only make the VBR algorithm smarter but it's already smart enough. More important I think would be to try to increase the efficiency and reduce the bitrates, without sacrificing transparency. We can know the limits of what can be achieved by comparing to things like WavPack lossy mode at bitrates that we manually check against.

Near-lossless / lossy FLAC

Reply #133
So, I got the source to work in GNU Octave - quite pleased. Now, does it *really* take about 1 hour to process jost over 5MB of WAV file? (Core2 T2500 @ 2.0GHz, 1MB DDR2-667). Using the 41_30sec.flac converted to WAV then processed, I get a 1951kB FLAC (-8) for 5169kB of WAV - Delighted!

The best possible way forward would be a generic transcoder model but using the preprocessor to process the WAV file created from the input file prior to recoding to the output file, preserving tags, metadata, etc.

<goes looking for the relevant PASCAL fft libraries and digs out freepascal>
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Near-lossless / lossy FLAC

Reply #134
So, I got the source to work in GNU Octave - quite pleased. Now, does it *really* take about 1 hour to process jost over 5MB of WAV file? (Core2 T2500 @ 2.0GHz, 1MB DDR2-667).


Heck, no. It's slow, but not that slow.

It takes about 45 seconds for a ~4MB (~25 seconds) wav file on my PC (2GHz P4 windows XP).

I remember the initial ReplayGain implementation in MATLAB was painfully slow on my 300MHz P2(!), whereas the implementations in mp3gain and foobar2k are a joy to use these days!

Sp, properly coded, I don't see why lossy FLAC should be any slower than a half-decent mp3 encoder, i.e. much faster than real time. Even in MATLAB, it's not optimised at all.


Anyway, congratulations on getting it working!

Cheers,
David.

Near-lossless / lossy FLAC

Reply #135
Another question: I take it that the lossy_variables file is specific to the WAV file being processed?

[edit] And I think that GNU Octave is using a whole lot of swapfile rather than RAM so that may be the slowdown explained immediately - will try on a machine with 2GB...... [/edit]
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Near-lossless / lossy FLAC

Reply #136
The lossy_variables file stores the fft noise thresholds for each bit, which will depend on the frequency limits, fft size, sample frequency etc.

If you don't change any parameters, it'll always be the same for all 44.1kHz 16-bit stereo files.


If you're hitting virtual memory, then it'll take forever. Try a smaller wavefile to see if this is the issue.


Did you need to make many/any alternations to make it run under Octave? If so, can I have a look please? Apparently it's possible to make code which works with both Octave and MATLAB.

Cheers,
David.

Near-lossless / lossy FLAC

Reply #137
If you're hitting virtual memory, then it'll take forever. Try a smaller wavefile to see if this is the issue.
Did you need to make many/any alternations to make it run under Octave? If so, can I have a look please? Apparently it's possible to make code which works with both Octave and MATLAB.


In my case, using Octave:

- I had to replace backslashes with slashes;
- I couldn't get file collections to work (though I didn't try that hard), I went for processing samples one by one;
- I had a serious problem with a free version of Wavwrite I downloaded from the web which corrupts last bit when saving to 16 bit;

Apart from that it works quite impressively under Octave (great idea and promising implementation, by the way), though it's painfully slow:

- About 12 minutes for a 20 seconds sample on a Prescott 2.8ghz (around 30 times slower than realtime);
- About  5 minutes for a 20 seconds sample on a Orleans  2.2ghz (around 15 times slower than realtime);

That is carefully selecting file sizes which don't lead to fill pc actual ram memory, otherwise I would dare to say that virtual memory swapping can make the process to last really too much to be worth.

Near-lossless / lossy FLAC

Reply #138
Thanks for the feedback Josef.

My own hack of wavread has the option of adding dither, but that has to be switched off for this script because it would mess up the least significant bit.

Do you have a link to a legal free version of wavread/write?


In MATLAB, there's a function called profile which lets you see which parts of a script/function are taking all the time.

In mine, it's currently the hanning() call, the result of which can be trivially stored rather than re-generated every time, so I'll change that!

Cheers,
David.

Near-lossless / lossy FLAC

Reply #139
This is quicker...

(though it could be quicker still!)

Cheers,
David.

Near-lossless / lossy FLAC

Reply #140
I've just got your latest code, and have found Octave wavread & write code. I've modified the wav handling code *not* to convert to ±1 as we just convert back again! (although the code I found does not read 24-bit WAV (yet)).

[edit] Been looking at Task Manager while Octave is running the process. There seem to be a v.large number of "I/O Other Bytes" (Read 3293097, Write 23038, Other 135,247,986 and climbing.....). Does this include read / write to the swapfile? [/edit]
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Near-lossless / lossy FLAC

Reply #141
Been playing about at home (C2D @ 3.0GHz, 2GB) and it's not really that much faster - maybe GNU Octave 2.1.73 is not really that fast. Anyway, while playing around, I got to messing about with the spreading function as follows, to see what effect weighting the middle values had.

Code: [Select]
if (choose_spread==1) || (choose_spread==2) || (choose_spread==3) || (choose_spread==4) || (choose_spread==5),
    tcs = (0.5/(choose_spread+1));
    tcl = (0.5-tcs);
    spreading_function{1}=[tcs,tcl,tcl,tcs];
    spreading_function{2}=[tcs,tcl,tcl,tcs];
  else
    spreading_function{1}=[0.250,0.250,0.250,0.250];
    spreading_function{2}=[0.250,0.250,0.250,0.250];
  end


Basically, it makes the files a little bit bigger, although not much real testing - as I said, it's s.....l.....o.....w.....
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Near-lossless / lossy FLAC

Reply #142
If you're using the lossy_variables file to store the noise thresholds, you might need to re-calculate for different spreading functions. On average, if they still sum to 1, it shouldn't matter.

If you do re-calculate, they don't have to sum to 1, since re-calculating will self normalise.


Proper psychoacoustic based spreading functions aren't consistent on a linear scale, since they're approximately log spaced. I don't know if they'd help or hinder.

Cheers,
David.

Near-lossless / lossy FLAC

Reply #143
In accordance with David's wishes that any changes be shared....... 

Code: [Select]
%==========================================================
% lossyFLAC.m
%==========================================================
% David Robinson, 2007
% This code is open source. I'll pick a licence later (need advice), but:
% Any changes to this MATLAB code must be shared freely
% Any changes to the algorithm must be shared freely
% Any other implementations / interpretations must be shared freely
% You are free to include any implementation in commercial and/or closed source code, but must give an acknowledgement in help/about or similar.

% No warrantee / guarantee. This is work in progress and not debugged.

% Things to add:
% dither
% ms checking
% ignore digital silence
% work on .wav files in blocks
% look one before and after for short FFT
% Add 31/32 + retain 5 bits declipping option - done
% Lossless correction file
% optimise, inc:
%  put 20log10 outside of min()
%  buffer hanning

% Set the source path
%==========================================================
source_path='c:/data_nic/octave/wav/';

% external variables (i.e. 'switches' when implemented properly!)
%==========================================================

noise_threshold_shift=0;
% noise threshold shift. average level of added quantisation
% noise relative to the lowest amplitude frequency bin (default=0=equal!)

low_frequency_limit=20;
high_frequency_limit=16000;
% Frequency range over which to check for lowest amplitude signal

minimum_bits_to_keep=0;

choose_spread=1;

fix_clipped=0;
% 0 = do nothing;
% 1 - 8 = reduce by (2^n-1)/(2^n), stay at 16-bits, keep at least 5 bits;
% 9 = reduce by 6dB and switch to 24-bit;

flac_blocksize=1024;
% blocksize of lossless codec

% internal constants (i.e. no need to change them)
%==========================================================

noise_averages=1000;
inaudible=1E-10;    % small number to add to audio to prevent log(0)

% Get a list of all .wav files in that source path
%==========================================================
filenamelist=dir([source_path "*.wav"]);

% Loop through them all, creating lossy versions
%==========================================================
for loop=1:length(filenamelist),
    filename=filenamelist(loop).name;
   
    % load file
    [inaudio,fs,bs]=wavread_raw([filename]); % Octave puts path & filename into "filename".

    % This reduces the amplitude by (2^n-1)/(2^n) and keeps at least 5 bits
    % (1 = 1/2, 2 = 3/4, 3 = 7/8, 4 = 15/16, 5 = 31/32, 6 = 63/64, 7 = 127/128, 8 = 255/256)
    if (fix_clipped>=1) & (fix_clipped<=8),
        inaudio=inaudio.*((2^(fix_clipped)-1)/2^fix_clipped);
        minimum_bits_to_keep=max(minimum_bits_to_keep,5);
    end
    % This pads it to 23 bits: It's treated as 24 bit data at the end, effectively dropping it by 6dB - Same as fix_clipped=1 but saves last bit.
    if (fix_clipped==9),
        inaudio = inaudio.*(2^(23-bs));
        bs=23;
    end

    [samples channels]=size(inaudio);

    % create integer copy (MATLAB wav(e)read loads audio with the range +/-1) - removed, now dealing with integer values.
    % inaudio_int=inaudio.*(2^(bs-1))+inaudible;

    % Set up the FFT analysis lengths - define these in time (seconds)
    % (will then use the nearest power of two based on sampling frequency)
    clear analysis_time spreading_function fft_length low_frequency_bin high_frequency_bin reference_thresholds reference_threshold min_bin bits_to_remove bits_to_remove_table;
    analysis_time(1)=2.0E-2;  % 20ms
    analysis_time(2)=1.5E-3;  % 1.5ms
    number_of_analyses=length(analysis_time);

    % spreading function to apply to FFT before determining lowest amplitude. Keep peak at centre, even if it means padding with zeros
    if (choose_spread >= 1) & (choose_spread <= 10),
        tcs=(0.5/(choose_spread+1));
    else
        tcs=(0.25);
    end
    spreading_function{1}=[tcs,0.5-tcs,0.5-tcs,tcs];
    spreading_function{2}=[tcs,0.5-tcs,0.5-tcs,tcs];
   
    % Loop through each analysis length (typically only two) and set the FFT values
    for analysis_number=1:number_of_analyses,
        % Calculate the closest FFT length
        fft_length(analysis_number)=2^round(log10(analysis_time(analysis_number)*fs)/log10(2));

        % Generate window function
        window_function{analysis_number}=hanning(fft_length(analysis_number));
       
        % Calculate which FFT bin corresponds to the low frequency limit
        low_frequency_bin(analysis_number)=round(fft_length(analysis_number)*low_frequency_limit/fs+((length(spreading_function)-1)/2));
        if low_frequency_bin(analysis_number)<2, low_frequency_bin(analysis_number)=2; end;
        if low_frequency_bin(analysis_number)>fft_length(analysis_number)/2, error('low frequency too high'); end;

        % Calculate which FFT bin corresponds to the high frequency limit
        high_frequency_bin(analysis_number)=round(fft_length(analysis_number)*high_frequency_limit/fs+((length(spreading_function)-1)/2));
        if high_frequency_bin(analysis_number)<2, error('high frequency too low'); end;
        if high_frequency_bin(analysis_number)>fft_length(analysis_number)/2, high_frequency_bin(analysis_number)=fft_length(analysis_number)/2; end;
    end

    variables_filename=['lossy_variables__fs' num2str(fs) '_bs' num2str(bs) '_noa' num2str(number_of_analyses) '_fft' num2str(fft_length) '_lfb' num2str(low_frequency_bin) '_hfb' num2str(high_frequency_bin) '_nts' num2str(noise_threshold_shift) '_sprfunc' num2str(choose_spread) '.mat'];

    % Find out if we've stored the quantisation noise thresholds before
    if exist(variables_filename,'file'),
        load(variables_filename);
        % If not, estimate quantisation noise at each bit in these FFTs
    else
        for analysis_number=1:number_of_analyses,
            clear reference_thresholds;
            reference_thresholds(1:noise_averages,1:bs)=zeros;
            for av_number=1:noise_averages,
                noise_sample=rand(fft_length(analysis_number),1);
                for bits_to_remove=1:bs,
                    % This models the quantisation noise introduced by truncating the last "bits_to_remove" bits from the audio data:
                    this_noise_sample=floor(noise_sample.*((2^bits_to_remove)))-(2^(bits_to_remove-1));
                    fft_result=20*log10(conv(abs(fft(this_noise_sample.*window_function{analysis_number})),spreading_fun
ction{analysis_number}));
                    reference_thresholds(av_number,bits_to_remove)=mean(fft_result(low_frequency_bin(analysis_number):hi
gh_frequency_bin(analysis_number)));
                end
            end
            reference_threshold{analysis_number}=mean(reference_thresholds)-noise_threshold_shift;

            for threshold=1:round(20*log10(2^(bs+4))),
                if isempty(find(reference_threshold{analysis_number}<threshold)),
                    threshold_index{analysis_number}(threshold)=0;
                else
                    threshold_index{analysis_number}(threshold)=max(find(reference_threshold{analysis_number}<threshold));
                end;
            end;
        end;
        save('-mat',variables_filename,'threshold_index');
    end;

    % Loop through each analysis length (typically only two) finding minimum value (min_bin) in each FFT
    for analysis_number=1:number_of_analyses,
        min_bin{analysis_number}(1:floor(samples/(fft_length(analysis_number)/2))-1,1:channels)=zeros;

        % Perform spectral analysis
        for block_start=1:fft_length(analysis_number)/2:samples-fft_length(analysis_number),
            block_number=1+(block_start-1)/(fft_length(analysis_number)/2);
            % On last (partial) block, just do to end of file (better than processing beyond end of file with zero pad, because that would
            % add a hard cut-off transition on gapless files, giving an artificially high spectrum)
            if block_start<samples-fft_length(analysis_number),
                actual_block_start=block_start;
            else
                actual_block_start=samples-fft_length(analysis_number);
            end;
            for channel=1:channels,
                fft_result=conv(abs(fft(window_function{analysis_number}.*inaudio(actual_block_start:actual_block_st
art+fft_length(analysis_number)-1,channel))),spreading_function{analysis_number});
                min_bin{analysis_number}(block_number,channel)=20*log10(min(fft_result(low_frequency_bin(analysis_nu
mber):high_frequency_bin(analysis_number))));
            end;
        end;
        min_bin_length(analysis_number)=length(min_bin{analysis_number}(:,1));
    end;

    clear bits_to_remove;
    bits_to_remove(1:ceil(samples/flac_blocksize))=zeros;

    % loop through flac blocks
    for block_start=1:flac_blocksize:samples,
        block_number=1+round(block_start/flac_blocksize);

        block_end=block_start+flac_blocksize-1;
        if block_end>samples, block_end=samples; end; % Don't jump past end of file!

        for analysis_number=1:number_of_analyses,

            first_block=(block_start-1)/(fft_length(analysis_number)/2);
            last_block=first_block+(flac_blocksize/(fft_length(analysis_number)/2));
            if first_block<1, first_block=1; end; % Don't jump before start of file
            if last_block>min_bin_length(analysis_number), last_block=min_bin_length(analysis_number); end; % Don't jump past end of file!
            if last_block<first_block, first_block=last_block; end;

            for channel=1:channels,
                this_min_bin=round(min(min_bin{analysis_number}(first_block:last_block,channel)));
                if this_min_bin<1, % i.e. if it's quieter than quantisation noise at the least significant bit
                    bits_to_remove_table(analysis_number,channel)=0; % don't remove any bits!
                else
                    bits_to_remove_table(analysis_number,channel)=threshold_index{analysis_number}(this_min_bin);
                end;
            end;
        end;

        bits_to_remove(block_number)=min(min(bits_to_remove_table));

        bits_to_remove(block_number)=bs-max((bs-bits_to_remove(block_number)),minimum_bits_to_keep);

        if bits_to_remove(block_number)>0,
    twoval=(2^bits_to_remove(block_number));
            inaudio(block_start:block_end,1:channels)=round(inaudio(block_start:block_end,1:channels)/twoval).*twoval;
        end
    end

    if fix_clipped==9, bs=24; end

    wavwrite_raw([filename(1:length(filename)-4) '.ss.wav'],inaudio,fs,bs)

    % Make a .bat file to call FLAC twice for comparison: lossless and lossy
    % Note comparison might not be fair, since -b1024 itself gives better
    % compression on _some_ samples
    fid=fopen('temp.bat','w');
    dummy=fprintf(fid,'%s\n',['"C:\\Program Files\\FLAC\\flac.exe" -b' num2str(flac_blocksize) ' -f "' filename '"']);
    dummy=fprintf(fid,'%s\n',['"C:\\Program Files\\FLAC\\flac.exe" -b' num2str(flac_blocksize) ' -f "' filename(1:length(filename)-4) '.ss.wav"']);
    dummy=fclose(fid);
    % Run the .bat file
    % !temp.bat

%==========================================================
end
%==========================================================

Having fun with Octave now - it just seems to be slow on the first processing - subsequent processing is quicker 

 

Nick.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Near-lossless / lossy FLAC

Reply #144
I've had a little and definitely NON-ULTIMATE comparison to verify how different codecs cope with the script.

I've chosen FLAC, WV, TAK and ALS as candidates, which are the major codecs that I know to support wasted bits detection and customizable frame size. For each codec, not always more complex modes turned out to be the most efficient ones.

OFR supports wasted bits, but there's no way to fix the frame size.
LA and APE don't even support wasted bits. Other codecs may easily be added later.

I've chosen 11 non-critical real world music samples using FFT sizes such as 2048, 1024, 512, 256, 128. Please note, that every step in FFT size reduction brings a reduction in S/N ratio for about a couple of dB (which, on the other hand, can also be seen as a better efficiency of the algorhytm).

Compressed samples bitrate varies from around 300kbps to 560kbps. Next table shows average results in kbps.

Code: [Select]
              FLAC -8   ALS -7      WV -x6     TAK -p4m
Avg - 2048      470       426        447         420
Avg - 1024      457       410        442         408
Avg - 0512      444       395        451         399
Avg - 0256      437       382        498          NO
Avg - 0128      447       383        616          NO


Results are promising, though further optimizations may be expected. At least 3:1 compression (470kbps) seems to be generally granted for all codecs on 44.1khz-16bit samples. Obviously, listenings test should clarify if this is competitive with other available options (WVLossy, DualStream).

- In general, 256 seems to be the best option as a frame size, with the exception of WV, which doesn't seem to work well with smaller frames, but offers an acceptable performance with an FFT of 1024. It may be very interesting to see if this is competitive with proprietary WV lossy mode;
- When a frame size of 256 is used, FLAC offers a slightly better performance than WV;
- TAK performances are quite impressive, better than both FLAC and WV for over a 10%, when same frame sizes are taken into account;
- ALS performances are more or less on par with TAK, slightly worse for bigger frame sizes, but scaling a little better for smaller frame sizes.

Near-lossless / lossy FLAC

Reply #145
Joseph,

That's very interesting.

What was the lossless bitrate for each codec, for your sample collection?


I'm not sure the script behaves exactly as I would like it to when the frame size is reduced below 1024. More work to do! (No time now  ).

Cheers,
David.

Near-lossless / lossy FLAC

Reply #146
What was the lossless bitrate for each codec, for your sample collection?


Next table shows detailed results for all samples, frame size=1024 (first column is average bitrate of the four codecs on lossless samples, second column is average bitrate of the four codecs on lossy versions of samples, third column is ratio).

Code: [Select]
       Lsl   Lsy   Lsy/Lsl
F01   1053   374   35,54%
F02    871   490   56,23%
F03    910   398   43,71%
F04    880   374   42,50%
F05    962   462   48,05%
F06    947   419   44,27%
F07    865   351   40,61%
F08    823   430   52,19%
F09    919   358   38,93%
F10    877   521   59,44%
F11    764   546   71,43%
    
AVG.   897   429   47,84%

Near-lossless / lossy FLAC

Reply #147
- In general, 256 seems to be the best option as a frame size, with the exception of WV, which doesn't seem to work well with smaller frames, but offers an acceptable performance with an FFT of 1024. It may be very interesting to see if this is competitive with proprietary WV lossy mode;

Thanks for the testing! 

Yes, WavPack is not well tuned for the very small block sizes. As you found, it should be okay at 1024, but would really rather be up at 4096. If the smaller blocks turn out to be useful, however, I could make it smart enough to intelligently concatenate blocks with the same bitdepth (and skip very short ones if the savings wasn't enough) because WavPack does not require all blocks in a file to be the same length.

Another interesting note about the clipping discussion above is that WavPack should be happy with blocks that have clipped samples because it looks for any redundancies in the LSBs (not just all zeros). It would definitely be useful to leave the option in to do nothing special for that case.

However, I really believe that the most useful application of this is for FLAC because of the large installed base of hardware players. If it really turns out to be robustly transparent across samples (even artificial ones) then it could be incorporated directly into the WavPack lossy mode (including the correction file) without any format changes. Additionally, it could be made even more efficient because it would not be limited to just the quantization levels that were powers of two.

Near-lossless / lossy FLAC

Reply #148
Right, spent a bit of time processing some of the files in 69/70, etc. and got the following:
Code: [Select]
Title                                   WAV FLAC  PP10  PP37
=============================================================
annoyingloudsong.ss                         1106   341   356
birds.ss                                     763   465   450  
E50_PERIOD_ORCHESTRAL_E_trombone_strings.ss  862   467   438
glass_short.ss                               776   635   641
jump_long.ss                                 946   481   464
S30_OTHERS_Accordion_A.ss                    709   709   722!!
S35_OTHERS_Maracas_A.ss                      679   616   539
S53_WIND_Saxophone_A.ss                      598   486   493
=============================================================
Average                                1411  805   525   513
                                       100% 57.0% 37.2% 36.4%
                                             100% 65.2% 63.7%
=============================================================
FLAC = Normal FLAC;
PP10 = PreProcessed, [0.250,0.250,0.250,0.250] spreading, no reduction by multiplication, minimum bits to keep=0;
PP37 = PreProcessed, [0.125,0.375,0.375,0.125] spreading, 127/128 reduction (0.07dB), minimum bits to keep.=0.

No audible artifacts, to my ears anyway. Currently praying to the code gods to produce an Octave compiler to make the whole thing quicker...... or, alternatively psyching myself up to port to Pascal (as it's the only compilable language I really know.....)
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)