Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: lossyWAV Development (Read 559303 times) previous topic - next topic
0 Members and 5 Guests are viewing this topic.

lossyWAV Development

Reply #275

BTW. just a little anecdote. By accident I lossyFlac-ed a track that had only silence. Foobar2000 replaygained +64 dB and the noise was very clear to hear. ...
Which version of lossyWAV was that? Recent versions default to no dither, so this problem should not happen unless you dither.
Nick.

It was v0.3.5 with -skew 7 and nothing else.
I don't see it as a real problem though, it is more like a side effect in combination with replaygain. But it seems to proof that even from silence bit's can be removed 

It's not the normal dither - silence and near-silence should be (and with the MATLAB script, are) transparent irrespective of system gain or dither chosen, because lossyFLAC won't touch silence - it won't even re-dither it.

Nick, did you have "always dither" set to on in that version?

Cheers,
David.

There shouldn't have been - it was removed at about v0.3.2.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #276
lossyWAV alpha v0.3.7 attached. Removed due to suspect spreading function and superseded by alpha v0.3.8 below.

"-spread" parameter now enables Bark spreading function rather than previous experimental 3 bin average to 3 bin max spreading function.

As stated in the original thread, for my 52 sample set:

WAV : 121.5MB;
FLAC : 68.2MB;
lossyWAV -2 : 39.5MB;
lossyWAV -2 -spread : 35.3MB;

The reassuring thing about the new spreading function is that those files that you would expect (from simple 3 or 4 bin averaging) very few bits to be removed still have very few bits removed.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #277
I don't beleive it: Nick, did you work throughout the night? How do you manage to be so fast?

A big, big thank you to you!

And the result looks very, very promising.

Sure I'll try my usual test samples with this new version using -spread.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #278
a) The codec blocksize of -1/-2/-3 is now 2304/1152/576?
[/size]a) Yes;

Could you please add a switch to set the block sizes to 2048/1024/512? I would like to evaluate the optimum encoder settings for TAK. Unfortunately TAK currently only supports block sizes which are powers of 2...

I am very impressed by your (and 2BDecided's) work! For me LossyFlac is an exciting new option. Thanks also to the hard working testers.

  Thomas

lossyWAV Development

Reply #279
Could you please add a switch to set the block sizes to 2048/1024/512? I would like to evaluate the optimum encoder settings for TAK. Unfortunately TAK currently only supports block sizes which are powers of 2...


Thomas, I will enable the "-flac" and "-tak" parameters tonight which will set the codec_block_size for FLAC to 2304/1152/576 and for TAK to 2048/1024/512.

I would also welcome any feedback whatsoever regarding my Bark spreading function - I can't hear anything wrong with the output, but I want independent critical input to determine whether it's worth keeping, needs work, or just needs to be trashed.

Nick.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #280
I will enable the "-flac" and "-tak" parameters tonight which will set the codec_block_size for FLAC to 2304/1152/576 and for TAK to 2048/1024/512.

But the blocksizes of -tak could also be used with FLAC. 
Maybe it's somewhere in this thread, but where did the 576 size come from again?
In theory, there is no difference between theory and practice. In practice there is.

lossyWAV Development

Reply #281
Sorry, but -spread as of this version isn't so good.

I got used to only produce the .lossy.wavs via the command interpreter and watched the messages lossyWav produced, and I was very astonished about the rather high bits removed average of Atem-lied and keys_1644ds. So I was very curious about the audio quality.

Atem-lied is relatively good with so many bits removed (acceptable for -3 IMO), but I could abx it 9/10.
keys_1644ds however is bad (no abxing required).

So I guess the current implementation is a bit aggressive.

Nick: For experimentation maybe you can provide a parameter for the -spread option.
Something like:
One of the parameter values represents a spreading_length of 1 for low frequencies and a short or moderate fft_length, as well as a strong overall restriction like 4 to any spreading_length.
An other parameter value represents for a spreading_length of 1 for low frequencies and a short fft_length, a spreading_length of 2 for low frequencies and a moderate fft_length, as well as a rather strong overall restriction like 6 to any spreading_length, but switches to a speading_length of 6 only when fft_length is long.
These parameter values have quality in mind. More parameter values are welcome of course switching gradually from the pure quality target towards the efficiency target.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #282
Sorry, but -spread as of this version isn't so good.

I got used to only produce the .lossy.wavs via the command interpreter and watched the messages lossyWav produced, and I was very astonished about the rather high bits removed average of Atem-lied and keys_1644ds. So I was very curious about the audio quality.

Atem-lied is relatively good with so many bits removed (acceptable for -3 IMO), but I could abx it 9/10.
keys_1644ds however is bad (no abxing required).

So I guess the current implementation is a bit aggressive.

Nick: For experimentation maybe you can provide a parameter for the -spread option.
Something like:
One of the parameter values represents a spreading_length of 1 for low frequencies and a short or moderate fft_length, as well as a strong overall restriction like 4 to any spreading_length.
An other parameter value represents for a spreading_length of 1 for low frequencies and a short fft_length, a spreading_length of 2 for low frequencies and a moderate fft_length, as well as a rather strong overall restriction like 6 to any spreading_length, but switches to a speading_length of 6 only when fft_length is long.
These parameter values have quality in mind. More parameter values are welcome of course switching gradually from the pure quality target towards the efficiency target.

Before abandoning the Bark averaging method, I think that it should be expanded. At the moment each of the first 25 Bark ranges (0 to 24) are averaged then the minimum average value taken as the value for which to calculate bits to remove. I feel that this is too coarse and the granularity should be reduced by using half or even quarter Bark averaging. I will have a think about this and post v0.3.8 soon.

The -spread in v0.3.6 used 3 bin averaging at short FFT lengths (<=64 samples)and gradually changed to 3 bin maximum at long FFT lengths (>=1024 samples). This seems to be closer to what you mention above (although not exactly).

Thanks for the listening time!

I will enable the "-flac" and "-tak" parameters tonight which will set the codec_block_size for FLAC to 2304/1152/576 and for TAK to 2048/1024/512.
But the blocksizes of -tak could also be used with FLAC. 
Maybe it's somewhere in this thread, but where did the 576 size come from again?

Maybe what is required is a CD sector related codec_block_size (2304/1152/576 samples) or a power of two equivalent (2048/1024/512 samples). This could be easily implemented by a "-cd" or "-CD" switch to change from power of two blocks to CD sector multiple blocks. I will incorporate this for v0.3.8.

Thanks for the input!
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #283
I see: you immediately did the whole thing and averaged over an entire critical band.

Not exactly what I have in mind.
I wouldn't bring the critical band as such so much into focus. Guess that's what 2Bdecided is afraid of.
I'd rather have the original averaging in primary focus, but with (cautious) corrections according to the widths of the critical bands.
Qualitywise I think it is essential to concentrate on the lower spectrum and use the critical band idea to hold the spreading_length very small when only one or few bins fall into a critical band.
With this it's not even necessary to look at every single critical band, but just do the averaging differently within larger frequency ranges (for instance for low to moderate fft_length use a spreading_length of 1 below ~ 800 Hz, 2 in the ~ 800-2000 Hz range, 3 in the ~ 2-8 kHz range, and 4 for the ~ 8+ kHz range, and increase these spreading_lengths very softly with increasing fft_length).

This is all with quality in mind.
Once high quality is settled (we still have an open problem with guruboolez' sample) we might become less cautious and try a bit more adventurous tactics.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #284
...Maybe it's somewhere in this thread, but where did the 576 size come from again?

Maybe what is required is a CD sector related codec_block_size (2304/1152/576 samples) or a power of two equivalent (2048/1024/512 samples). This could be easily implemented by a "-cd" or "-CD" switch to change from power of two blocks to CD sector multiple blocks. I will incorporate this for v0.3.8.

This would break the idea of -flac, -tak, etc. as targeting specific lossless encoders. Why do you want to do that?
-tak does everything that is needed.
I think GeSomeone's question targets at why at the moment the blocksizes are 2304/1152/576 and maybe why they should be like that for -flac.
According to the FLAC documentation it looks like the FLAC blocksize should be a multiple of 576, but this is not so as I did use FLAC with a blocksize of 1024. Because of this was my suggestion to use a default blocksize of 1024 with -1, -2, and -3 when not using -flac, -tak, etc., especially as my experiments didn't show up a significant saving in bitrate when using 576 instead of 1024.

Anyway I welcome the activation of -tak, -flac, etc.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #285
From the FLAC format page:

Code: [Select]
Block size in inter-channel samples:

    * 0000 : reserved
    * 0001 : 192 samples
    * 0010-0101 : 576 * (2^(n-2)) samples, i.e. 576/1152/2304/4608
    * 0110 : get 8 bit (blocksize-1) from end of header
    * 0111 : get 16 bit (blocksize-1) from end of header
    * 1000-1111 : 256 * (2^(n-8)) samples, i.e. 256/512/1024/2048/4096/8192/16384/32768

I like 576 because it increases the bits_to_remove by processing over a shorter time frame. If the concensus is that standard codec_block_size should be 1024 samples, then so be it.

The reason that -flac and -tak have not yet been activated it that, basically, there are no codec specific settings yet. The only reason to implement them now would be because of the codec_block_size issue.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #286
The reason that -flac and -tak have not yet been activated it that, basically, there are no codec specific settings yet. The only reason to implement them now would be because of the codec_block_size issue.

Yes, but it brings already certainty to any user whatever lossless codec he uses. By using -tak a TAK user knows lossyWav will work fine with TAK. No need IMO to think of -tak etc. as of a super-optimized version for the specific codec. Things start with codec blocksize.

As for the blocksize without a target codec option I still think it's good to default it to 1024 universally. Clear thing, easy to memorize, and should also do it efficiently in any situation known so far. Optimizing blocksize is then the clear task of -flac, etc. However it's not really of primary concern. To me it's fine also with the way it is.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #287
Default to 1024 for all quality settings will be implemented in v0.3.8
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #288
Thank you.
lame3995o -Q1.7 --lowpass 17

 

lossyWAV Development

Reply #289
I see: you immediately did the whole thing and averaged over an entire critical band.

Not exactly what I have in mind.
I wouldn't bring the critical band as such so much into focus. Guess that's what 2Bdecided is afraid of.
I'd rather have the original averaging in primary focus, but with (cautious) corrections according to the widths of the critical bands.
Qualitywise I think it is essential to concentrate on the lower spectrum and use the critical band idea to hold the spreading_length very small when only one or few bins fall into a critical band.
With this it's not even necessary to look at every single critical band, but just do the averaging differently within larger frequency ranges (for instance for low to moderate fft_length use a spreading_length of 1 below ~ 800 Hz, 2 in the ~ 800-2000 Hz range, 3 in the ~ 2-8 kHz range, and 4 for the ~ 8+ kHz range, and increase these spreading_lengths very softly with increasing fft_length).

This is all with quality in mind.
Once high quality is settled (we still have an open problem with guruboolez' sample) we might become less cautious and try a bit more adventurous tactics.
Oops - missed this post entirely. I'm getting disillusioned with my approach to Bark averaging - will park it and start on something akin to what you've just mentioned, i.e. spreading_function_lengths increase with both frequency and fft_length. Looking at the geometric fft_length increase, should the spreading_function_length also increase in that manner, i.e. sfl[n+1]:=sfl[n]*2; or should it increase more slowly?
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #290
More slowly. At the moment I think it would be good to keep spreading_length pretty much in the region we're used to even for long fft lengths. Spreading length must not increase with each increase of fft length.
As with frequency dependency for the spreading length I am thinking of only a very rough dependency on fft_length.
Something like: use something like the frequency dependency I mentioned (spreading length 1 to 4 according to a rough frequency classification - let's call this the basic frequency dependency rule) for a fft length <= 256, add 1 to the spreading length of the basic frequency dependency rule for a fft length > 256 but <= 1024, and add 2 to the spreading length of the basic frequency rule for a fft length > 1024. Maybe add 3 to the spreading length of the basic frequency dependency rule for extremely long ffts.

You see: even with highest frequency and longest fft length a spreading length of 6 or 7 as a maximum.

I guess this is a bit too conservative, but as long as we don't know it's better to play it safe. Variations can be done later (or by means of a -spread parameter value).
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #291
lossyWAV alpha v0.3.8 attached. Superseded.

Having made an abortive attempt at Bark related bit reduction determination, I have been changing the spreading method a bit, firstly having reverted to the original FFT bin averaging (3 or 4 bins dependent on quality level). As can be seen below, I have introduced two elements to the method: firstly, average 3 bins below 3.7kHz and 4 bins above; secondly, use the "square mean root" value as a slightly more conservative result (compared to simple averaging).

Reducing to very few bins (i.e. 1 or 2) drastically reduces the bits_to_remove and has not been implemented.

Code: [Select]
lossyWAV alpha v0.3.8 : WAV file bit depth reduction method by 2Bdecided.
Transcoded to Delphi by Nick.C & Halb27 from a script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Options:

-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-nts <n>      noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
              (reduces overall bits to remove, -1 bit = -6.0206dB)
-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists.

Advanced Options:

-spread <n>   select spreading method : 0<=n<=3; default=0
              0 = fft bin averaging : 3 or 4 bins, (original method);
              1 = fft bin averaging : 3 bins below 3.7kHz, 4 bins above;
              2 = fft bin square mean root : 4 bins;
              3 = fft bin square mean root : 3 bins below 3.7kHz, 4 bins above
-skew <n>     skew results of fft analyses by n dB (0.0<=n<=12.0, default=0.0)
              with a (sin-1) shaping over the frequency range 20Hz to 3.7kHz.
              (artificially decrease low frequency bins to take into account
              higher SNR requirements at low frequencies)

-dither       dither output using triangular dither; default=off
-noclip       clipping prevention amplitude reduction; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode
-below        set process priority to below normal.
-low          set process priority to low.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #292
I'm not sure how useful this is, or whether it makes any sense to integrate into lossyWAV, but I have created a “smart” normalization program that I think might fix one of the troubling issues of lossyWAV (at least for me). It might even work well in other situations where normalization is desired, although I don't know enough about those to say.

Most normalization programs work by applying a scaling factor on every audio sample such that a maximum value sample (i.e., -32768/+32767) is reduced to some desired lower value. After applying the scale factor, they may or may not apply dither and noise-shaping (they probably should, but most I've seen don't). This works great at normal audio levels, but can cause trouble at very low levels. The problem is that by using various forms of noise shaping, well produced CDs contain information below the LSB. To preserve this information (and the characteristics of the original noise floor spectrum) it is important to preserve the exact sample values at low levels.

This suggests an alternative algorithm that maps low-level samples to the output exactly, but then goes non-linear at higher values to ensure that the desired peak limit is not exceeded (this is sometimes called soft clipping). This fixes the low-level sample problem, however soft-clipping introduces unacceptably high levels of harmonic distortion in full-scale signals.

The algorithm I chose for this program combines the two methods by calculating a running RMS level (with attack and decay) and using that to determine the ideal transfer function. At low levels it maps samples without modification to the output (with rogue high samples being softly clipped). At high levels it uses the simple scaling factor (where there's enough signal that dither and noise-shaping are not needed). In between the high and low level areas is a 12 dB transition zone where the program linearly interpolates between the two methods based on the position in the zone. In this transition zone a small amount of odd harmonic distortion is added to the signal, but it's very low in level.

I am attaching a zip file with the program source and a Windows executable (the program compiles fine on Ubuntu Linux and probably most others). This has not been tested too much (especially in error conditions) so be careful!

David

lossyWAV Development

Reply #293
Thanks for the code - I will certainly have a look at it to see how you did it!

On amplitude reduction, lossyWAV no longer reduces amplitude by default - the user has to specify the "-noclip" parameter.

Many thanks,

Nick.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #294
Reducing to very few bins (i.e. 1 or 2) drastically reduces the bits_to_remove and has not been implemented.

Thank you for the new version. Will try it out as soon as possible.
If averaging over 1 or 2 bins yields unappropriate bitrates your current approach is most appropriate I think.
Just for clarity:
a) is bits_to_remove too low also when applied to very short fft lengths when averaging over 1 or 2 bins in the frequency range below ~ 700 Hz?
    In the end you must have done something like that when averaging over entire critical bands - bits_to_remove was not too low then.
b) is it also not worth while averaging over say 2 bins in the low frequency range with very short fft lengths when considering it being applied to quality mode -1?

Another question as -tak etc. is not enabled yet:
Is codec blocksize now a constant 1024 with any quality mode?

BTW as you are doing the hard work: Please remove me from the author list of lossyWav.exe. It's not appropriate. I'm glad I could contribute a bit with the wavIO unit, but in the end it's absolutely minor contribution. Of course I will continue to maintain wavIO, so feel free to tell me about any changes you like to have realised.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #295
"-tak" is not yet enabled, default codec_block_size is 1024 samples for all quality levels as previously discussed.

I am looking at other permutations of spreading, including one which has 3 intermediate frequency splits and averages as follows:

20Hz to 800Hz : 2 bins;
800Hz to 3.7kHz : 3 bins;
3.7kHz to 8kHz : 4 bins;
8kHz > 16kHz : 5 bins;

I'll let you know how this one works out.

Nick.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #296
Sounds good.

Please don't see it as a bad thing in case bits_to_remove should go down a bit.
After all we are still left with guruboolez' sample he could abx.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #297
lossyWAV alpha v0.3.9 attached. Superseded.

Default spreading method made slightly more conservative;
Code rationalised for spreading methods 1 to 3;
Spreading method 4 introduced, 2 fft bin averaging 20Hz to 800Hz; 3 fft bin averaging 800Hz to 3.7kHz; 4 bin averaging 3.7kHz to 16kHz. (5 fft bin averaging 8kHz to 16kHz was not successful - too many bits removed).

Code: [Select]
lossyWAV alpha v0.3.9 : .....WAV file bit depth reduction method by 2Bdecided.
Transcoded to Delphi by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Options:

-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-nts <n>      noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
              (reduces overall bits to remove, -1 bit = -6.0206dB)
-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists.

Advanced Options:

-spread <n>   select spreading method : 0<=n<=4; default=0
              0 = fft bin averaging : 3 or 4 bins, (less agressive than orig.);
              1 = fft bin averaging : 3 bins below 3.7kHz, 4 bins above;
              2 = fft bin square mean root : 4 bins;
              3 = fft bin square mean root : 3 bins below 3.7kHz, 4 bins above
              4 = fft bin averaging : 2 bins from 20Hz to 800Hz; 3 bins from
                  800Hz to 3.7kHz; 4 bins from 3.7kHz to 16kHz.
-skew <n>     skew results of fft analyses by n dB (0.0<=n<=12.0, default=0.0)
              with a (sin-1) shaping over the frequency range 20Hz to 3.7kHz.
              (artificially decrease low frequency bins to take into account
              higher SNR requirements at low frequencies)

-dither       dither output using triangular dither; default=off
-noclip       clipping prevention amplitude reduction; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode
-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #298
Wonderful. Thanks a lot.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #299
I have been processing permutations with v0.3.10 (only faster than v0.3.9) and -spread 4 seems to be a candidate for default spreading function. However, I feel that the 800Hz / 3.7kHz / 8kHz intermediate steps might need moved to more suitable points in the frequency range between 20Hz and 16kHz.

Another thing I need advice with is licensing - portions of the code are (heavily modified) LGPL, so LGPL seems to be the way to go, however, I don't know exactly what I need to add to the .exe or license.txt file no enact it. As well as that, the method is David Robinson's implementation of an idea - all I have done is transcode and tweak a bit.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)