Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: lossyWAV Development (Read 559319 times) previous topic - next topic
0 Members and 6 Guests are viewing this topic.

lossyWAV Development

Reply #350
I am currently looking at what impact a spreading_function_length of 1 would have and how to implement it. It could be as simple as if FFT_length<256 then spreading_function_length=1. if 256 or 512 then 1,2,3,4. if 1024 or above then 2,3,4,5.

Wonderful, thank you. In case this brings bits to remove too much down there's still room for compromise especially for FFT_length < 256. Guess for the high frequency range spreading_length needs not be 1 even with short FFT lengths.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #351

I am currently looking at what impact a spreading_function_length of 1 would have and how to implement it. It could be as simple as if FFT_length<256 then spreading_function_length=1. if 256 or 512 then 1,2,3,4. if 1024 or above then 2,3,4,5.

Wonderful, thank you. In case this brings bits to remove too much down there's still room for compromise especially for FFT_length < 256. Guess for the high frequency range spreading_length needs not be 1 even with short FFT lengths.
I added a final table to the bottom of the spreadsheet which takes the max(1,int(log2(number_of_bins_in_critical_band_width))) - this yields a sensible starting point.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #352
I added a final table to the bottom of the spreadsheet which takes the max(1,int(log2(number_of_bins_in_critical_band_width))) - this yields a sensible starting point.

Fine, this table shows under what circumstances Width of Critical Band Width in FFT Bins is < 1 which is most critical IMO. IMO it should be >1 (better: >= 2), resp. spreading_length should be 1 in case 'Width of Critical Band Width in FFT Bins > 1' cannot be achieved.
This is with respect to where these requirements are not fulfilled at the moment. I'm not talking about making spreading length larger than 5 in the high frequency area with long FFTs though to a cautiously chosen extent this may be possible - especially for -2 and more so -3. This is something that can be considered later.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #353
I added a final table to the bottom of the spreadsheet which takes the max(1,int(log2(number_of_bins_in_critical_band_width))) - this yields a sensible starting point.

Fine, this table shows under what circumstances Width of Critical Band Width in FFT Bins is < 1 which is most critical IMO. IMO it should be >1 (better: >= 2), resp. spreading_length should be 1 in case 'Width of Critical Band Width in FFT Bins > 1' cannot be achieved.
This is with respect to where these requirements are not fulfilled at the moment. I'm not talking about making spreading length larger than 5 in the high frequency area with long FFTs though to a cautiously chosen extent this may be possible - especially for -2 and more so -3. This is something that can be considered later.
I see where you're coming from.

With respect to the matrix calculation mentioned earlier, please note the average bitrates for my 52 sample set, processed at quality level -2 with -SNR and -SKEW as the only other parameters.
Code: [Select]
BitRate   SNR=00  SNR=03  SNR=06  SNR=09  SNR=12  SNR=15  SNR=18  SNR=21  SNR=24  SNR=27  SNR=30
SKEW=00   468.4   468.4   468.4   468.4   468.4   469.2   471.4   476.2   483.2   494.7   508.7
SKEW=03   468.7   468.7   468.7   468.7   468.8   469.8   472.0   477.3   484.9   497.3   512.1
SKEW=06   468.9   468.9   468.9   468.9   469.0   470.3   472.8   478.5   486.9   499.9   515.5
SKEW=09   469.5   469.5   469.5   469.5   469.6   471.0   473.8   479.9   488.9   502.4   518.7
SKEW=12   470.1   470.1   470.1   470.1   470.2   471.8   474.9   481.4   491.1   505.1   522.1
SKEW=15   470.9   470.9   470.9   470.9   471.1   472.7   476.2   483.1   493.5   507.7   525.4
SKEW=18   471.9   471.9   471.9   471.9   472.1   473.9   477.6   484.8   495.9   510.2   528.7
SKEW=21   473.3   473.3   473.3   473.3   473.5   475.3   479.2   486.7   498.3   513.0   531.9
SKEW=24   475.2   475.2   475.2   475.2   475.4   477.0   481.3   488.9   500.9   515.6   535.1
SKEW=27   477.5   477.5   477.5   477.5   477.7   479.2   483.6   491.2   503.7   518.6   538.3
SKEW=30   480.5   480.5   480.5   480.5   480.6   482.0   486.4   494.0   506.6   521.7   541.6
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #354
So from this table a higher value of skew than usual so far isn't critical as long as the snr value isn't chosen very high.
We're in a world of heuristics, but to me the skew option is more meaningful than the snr option.
So values up to say skew=21 or 24 and snr=18 are well acceptable IMO for -1 judging from your table.
(Sure I have headroom in mind for the variable spreading function modifications).
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #355
So from this table a higher value of skew than usual so far isn't critical as long as the snr value isn't chosen very high.
We're in a world of heuristics, but to me the skew option is more meaningful than the snr option.
So values up to say skew=21 or 24 and snr=18 are well acceptable IMO for -1 judging from your table.
(Sure I have headroom in mind for the variable spreading function modifications).
I think that the higher skew values increase bitrate on some samples, but not all, e.g. Atem_Lied.

I have re-written the spread procedure and it is now prepared to accept spreading_function_lengths which vary with fft_length, although I have not yet nailed down the exact relationship between fft_length  / bin frequency and spreading_function_length - that's a job for tomorrow. The price of the re-write is about 5% added to the process time.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #356
I wouldn't care about the 5% added processing time.

Sure everbody is different, but as a first approximation I guess anybody who accepts the file size increase from ~ 200 kbps of a transform codec to ~ 450 kbps of this approach in favor of an expected extremely high quality doesn't care very much about encoding speed (which is a two stage process here anyway).
Though more speed is welcome everything is fine as long as processing time doesn't really hurt.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #357
I've tried a first attempt at spreading which varies with every fft_length. Reference: FLAC=788.6kbps / 67.91MB

1st iteration: no averaging at 64 sample fft_length, -2 yields 619.6kbps / 53.36MB (64:1,1,1,1,1; 256:1,1,2,2,3; 1024:2,3,3,4,5).

2nd iteration : less conservative version, -2 yields 485.8kbps / 41.84MB (64:2,2,2,3,3; 256:2,2,3,3,4; 1024:2,3,3,4,5).

3rd iteration (64:2,2,2,2,2; 256:2,2,2,3,3; 1024:2,3,3,4,5) yields 510.3kbps / 43.95MB. This same iteration with "-nts 0" yields 491.7kbps / 42.35MB.

This in comparison with the current fixed spreading yields 470.2 kbps / 40.49MB.

I've decided to release the 3rd iteration as alpha v0.3.16 - attached. Superseded.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #358
I've tried a first attempt at spreading which varies with every fft_length. Reference: FLAC=788.6kbps / 67.91MB

When there is no averaging at 64 sample fft_length, -2 yields 619.6kbps / 53.36MB (64:1,1,1,1,1; 256:1,1,2,2,3; 1024:2,3,3,4,5).

A less conservative version (still more conservative than previous 2,3,3,4,5 for all fft_lengths) yields 485.8kbps / 41.84MB (64:2,2,2,3,3; 256:2,2,3,3,4; 1024:2,3,3,4,5).

Another iteration (64:2,2,2,2,2; 256:2,2,2,3,3; 1024:2,3,3,4,5) yields 510.3kbps / 43.95MB

This in comparison with the current fixed spreading yields 470.2 kbps / 40.49MB.

Thank you.
IMO this shows the routes that are not promising and those that are::

(64:1,1,1,1,1; 256:1,1,2,2,3; 1024:2,3,3,4,5):  a lot too conservative. Probably due to spreading_lenth too short in the mid and high frequency range.

(64:2,2,2,3,3; 256:2,2,3,3,4; 1024:2,3,3,4,5): this or a variation of this is a promising candidate IMO for a -1 spreading length strategy.

Do you mind trying: (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5)? I still care most about the very low frequency edge.

Just a question: What's your sample set? If it's regular music we should try to hold bitrate down. If it's problem samples we shouldn't care about bitrate going up. Ideally bitrate is kept rather low with regular music and increases significantly with problem samples (not necessarily individually but as classes of well- and bad-behaving samples).
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #359
Do you mind trying: (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5)? I still care most about the very low frequency edge.

Just a question: What's your sample set? If it's regular music we should try to hold bitrate down. If it's problem samples we shouldn't care about bitrate going up. Ideally bitrate is kept rather low with regular music and increases significantly with problem samples (not necessarily individually but as classes of well- and bad-behaving samples).
Done - attached alpha v0.3.16b : 494.2kbps / 42.56MB. Superseded.

My sample set is:[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
04 - Black Sabbath - Iron Man.wav
06_florida_seq.wav
10 - Dungeon - The Birth- The Trauma Begins.wav
14_Track03beginning.wav
16_Track03entreaty.wav
18_Track04cakewithtea.wav
34_Gabriela_Robin___Cats_on_Mars.wav
41_30sec.wav
A02_metamorphose.wav
A03_emese.wav
Angelic.wav
annoyingloudsong.wav
aps_Killer_sample.wav
Atem_lied.wav
ATrain.wav
Bachpsichord.wav
badvilbel.wav
bibilolo.wav
BigYellow.wav
birds.wav
bruhns.wav
cricket__insect___edit_.wav
dither_noise_test.wav
E50_PERIOD_ORCHESTRAL_E_trombone_strings.wav
eig.wav
Furious.wav
glass_short.wav
harp40_1.wav
herding_calls.wav
jump_long.wav
keys_1644ds.wav
ladidada_10s.wav
Liebe_so_gut_es_ging.wav
Moon_short.wav
Poets_of_the_fall___Shallow.wav
rach_original.wav
rawhide.wav
Rush___Hold_Your_Fire___Turn_the_Page.wav
S13_KEYBOARD_Harpsichord_C.wav
S30_OTHERS_Accordion_A.wav
S34_OTHERS_GlassHarmonica_A.wav
S35_OTHERS_Maracas_A.wav
S53_WIND_Saxophone_A.wav
SeriousTrouble.wav
swarm_of_wasps__edit_.wav
thewayitis.wav
the_product.wav
triangle.wav
triangle_2_1644ds.wav
trumpet.wav
VELVET.wav
wait.wav
[/size]If you're worried about the low frequency range, use more -skew.....
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #360
Do you mind trying: (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5)? I still care most about the very low frequency edge.
...
Done - attached alpha v0.3.16b : 494.2kbps / 42.56MB.

Thank you. So as 494.2kbps is the result of (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5) I think that's very, very promising, and this is especially true as your sample set consists more or less of short problem samples.
With this in mind I guess it's even acceptable to go a bit more conservative (as a target for -1 when we're done), something like
(64:1,1,1,2,4; 256:1,1,2,3,4; 1024:1,3,3,4,5) - looking at your wonderful 'Width of Critical Band Width in FFT Bins' table more closely.

I'd love to go through my 51 regular song collection I used before with this setting, if you can provide such a version. BTW default for -skew and -snr is still 12 for each of these options?
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #361
Do you mind trying: (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5)? I still care most about the very low frequency edge.
...
Done - attached alpha v0.3.16b : 494.2kbps / 42.56MB.
Thank you. So as 494.2kbps is the result of (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5) I think that's very, very promising, and this is especially true as your sample set consists more or less of short problem samples.
With this in mind I guess it's even acceptable to go a bit more conservative (as a target for -1 when we're done), something like
(64:1,1,1,2,4; 256:1,1,2,3,4; 1024:1,3,3,4,5) - looking at your wonderful 'Width of Critical Band Width in FFT Bins' table more closely.

I'd love to go through my 51 regular song collection I used before with this setting, if you can provide such a version. BTW default for -skew and -snr is still 12 for each of these options?
Your wish is my command...... lossyWAV alpha v0.3.16c attached : 536.5 kbps / 46.20MB Superseded. Yes, -snr 12 -skew 12 is the default for all options. The spreading table is fixed (currently, this will change) for all quality levels.

Looking at the quality levels more carefully, maybe all 3 should use the 64/256/1024 sample fft analyses that -2 uses and the only other variables would be -snr, -skew, codec_block_size (512 samples for -3) and -nts.

Oh, and I realise that I was taking the lower of the min(min_result,average_result-snr) then adding the (negative) noise threshold shift to that. I've changed this to min(min_result+noise_threshold_shift,average_result-snr). Which will reduce bitrate slightly but not carelessly. lossyWAV alpha v0.3.16d attached : 527.8kbps / 45.45mB. Superseded, although default spreading is the same in v0.3.18.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #362
... lossyWAV alpha v0.3.16c attached : 536.5 kbps / 46.20MB. ...

Thank you. Appropriate result for your more-or-less problem sample set IMO. But behavior on regular music is important. I'll run this version on my regular music sample set tonight and will report tomorrow.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #363
Results (average bitrate according to foobar) for my 50 (51 was wrong) regular song collection:

a) prior result I had a few weeks ago (don't remember the version but certainly with a fixed spreading_length of 4): 507 kbps.

b) result of 0.3.16d: 475 kbps.

c) For comparison result of 0.3.15: 425 kbps.

No special options specified.


So I think for -1 this is an adequate spreading length strategy (more exact: a good start. Fine tuning is necessary).

A closer look at fiocco, the sample guruboolez was on the edge to abx (his result was at least good enough to show that the fiocco quality should improve though it was very good already).
guruboolez' versions (guess it was 0.3.1 - but certainly a fixed spreading_length of 4 version) result: 436 kbps.
0.3.16d result: 507 kbps. So this makes the expectation reasonable that this way the small remaining problem is gone. Sure it is most welcome if guruboolez could confirm.
0.3.15 result for comparison: 472 kbps. Already a very good step into the right direction. Very remarkable moreover as average bitrate came down in general with switching from a fixed spreading length of 4 to the variable spreading length.

As for fine tuning:
Judging from what we got so far:
- if it's up to hold average bitrate low it is essential to keep spreading length relatively long at the high frequency edge. Luckily this can easily be done also with respect to the heuristic requirement that several bins (at least 1) should fall into each critical band.
- if it's up to hold up the heuristic requirement that several bins should fall into each critical band (as far as it's possible at all) it's essential to hold spreading length low (usually 1) at the low frequency edge. Luckily if done carefully this doen't seem to have an unacceptable impact on average bitrate.

So fine tuning (finding promising compromises) can be done with these considerations in mind considering the extreme ends, and especially with respect to the target that average bitrate of regular samples should be held low while it's welcome if it goes up with problem samples. Sure everything within the restricted possibilities we have.

I welcome most your idea to have a fixed fft analysis strategy (fft length of 64, 256, 1024) for any quality setting (as done with -2 so far).
Sufficient IMO and makes fine tuning a lot more easy:
For fine tuning purposes can you provide spreading length options of the kind:
-spreading64 11234
-spreading256 12334
-spreading1024 23345
or similar.
This way anybody can try to find a promising spreading length strategy.
I'd love to search for such strategies for -1, -2, -3, and I wouldn't have to bother you with building new lossyWav versions for whatever comes to my mind.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #364
I welcome most your idea to have a fixed fft analysis strategy (fft length of 64, 256, 1024) for any quality setting (as done with -2 so far).
Sufficient IMO and makes fine tuning a lot more easy:
For fine tuning purposes can you provide spreading length options of the kind:
-spreading64 11234
-spreading256 12334
-spreading1024 23345
or similar.
This way anybody can try to find a promising spreading length strategy.
I'd love to search for such strategies for -1, -2, -3, and I wouldn't have to bother you with building new lossyWav versions for whatever comes to my mind.
I was thinking about this early this morning: it might be easier to implement a -spread parameter that takes a 15 character hexadecimal numeric input (would we ever exceed spreading_function_length=15?) and puts the results in the spreading_function table for each analysis length. This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). I'm very glad that the problem samples are improving while the average bitrate is not growing too much.

So, expect a new build with the possibility to use "-spread 112341233423345" to control the spreading function. Now, where's the cliParameter unit, I must rip it apart and rebuild it.......

Okay,  cliParameter unit duly ripped and rebuilt. There is an unexplained considerable slowdown of processing, but for evaluation of spreading functions it should be okay. lossyWAV alpha v0.3.17 attached. Superseded, slowdown "cured".

[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV alpha v0.3.17 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Options:

-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-nts <n>      set noise_threshold_shift to n dB (-15dB<=n<=0dB, default=-1.5dB)
              (reduces overall bits to remove by 1 bit for every 6.0206dB)
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (0dB<=n<=48dB, default=12dB)
-skew <n>     skew fft analysis results by n dB (0db<=n<=48db, default=12dB)
              in the frequency range 20Hz to 3.45kHz
-spf <15hex>  manually input the 3 spreading functions as 3 x 5 hex characters;
              e.g. 444444444444444, default=111241123423345; Hex characters
              must be one of 1,2,3,4,5,6,7,8,9,A,B,C,D,E,F (zero excluded).
-o <folder>   destination folder for the output file
-clipping     disable clipping prevention by iteration; default=off
-force        forcibly over-write output file if it exists; default=off

Advanced / System Options:

-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #365
Excellent. Thank you.
Means I will have a lot of (interesting) work this evening.

... This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). ...

Sorry, I don't understand this. Can you please explain it a bit?
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #366

... This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). ...
Sorry, I don't understand this. Can you please explain it a bit?
Basically, you will need to input a 15 character hexadecimal string, regardless of how many analyses will actually be carried out at the specified quality level (-1 = 2048/1024/256/64 sample fft_length; -2 = 1024/256/64 sample fft_length; -3 = 1024/64 sample fft_length). What would happen is that the user always inputs 3 spreading functions and those three are mapped to 64, 256 and 1024 fft_length spreading. Then, copies are made into the spreading functions for 128, 512 and 2048 fft_length spreading functions.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #367

... This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). ...
Sorry, I don't understand this. Can you please explain it a bit?
Basically, you will need to input a 15 character hexadecimal string, regardless of how many analyses will actually be carried out at the specified quality level (-1 = 2048/1024/256/64 sample fft_length; -2 = 1024/256/64 sample fft_length; -3 = 1024/64 sample fft_length). What would happen is that the user always inputs 3 spreading functions and those three are mapped to 64, 256 and 1024 fft_length spreading. Then, copies are made into the spreading functions for 128, 512 and 2048 fft_length spreading functions.

I imagined it to be like that - just wanted to make sure.
In this case the user doesn't have full control of the spreading length for every fft length.
If for instance it turns out to be important for the 1024 bin fft that there is a 1 in the spreading like in (1,3,3,4,5), it would be so for a 2048 bin fft as well and might have a negative impact on bitrate.
There are dependancies which I'd prefer to see avoided.

I thought you wanted to be content with 3 analyses. So do you still think of using a fft length of 2048 for -1?
If yes I'd prefer a 20 character hex string covering all fft lengths used (64, 256, 1024, 2048) in this order, and you ignore the 256 and 2048 part if -3 is used resp. you ignore the 2048 part if -2 is used.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #368
... This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). ...
Sorry, I don't understand this. Can you please explain it a bit?
Basically, you will need to input a 15 character hexadecimal string, regardless of how many analyses will actually be carried out at the specified quality level (-1 = 2048/1024/256/64 sample fft_length; -2 = 1024/256/64 sample fft_length; -3 = 1024/64 sample fft_length). What would happen is that the user always inputs 3 spreading functions and those three are mapped to 64, 256 and 1024 fft_length spreading. Then, copies are made into the spreading functions for 128, 512 and 2048 fft_length spreading functions.
I imagined it to be like that - just wanted to make sure.
In this case the user doesn't have full control of the spreading length for every fft length.
If for instance it turns out to be important for the 1024 bin fft that there is a 1 in the spreading like in (1,3,3,4,5), it would be so for a 2048 bin fft as well and might have a negative impact on bitrate.
There are dependancies which I'd prefer to see avoided.

I thought you wanted to be content with 3 analyses. So do you still think of using a fft length of 2048 for -1?
If yes I'd prefer a 20 character hex string covering all fft lengths used (64, 256, 1024, 2048) in this order, and you ignore the 256 and 2048 part if -3 is used resp. you ignore the 2048 part if -2 is used.
Changed to 20 hexchar string, 128 & 512 fft_length removed. I do want to move to only 3 analyses, just don't want to upset anybody..... lossyWAV alpha v0.3.18 attached. Superdeded;[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV alpha v0.3.18 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Options:

-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-nts <n>      set noise_threshold_shift to n dB (-15dB<=n<=0dB, default=-1.5dB)
              (reduces overall bits to remove by 1 bit for every 6.0206dB)
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (0dB<=n<=48dB, default=12dB)
-skew <n>     skew fft analysis results by n dB (0db<=n<=48db, default=12dB)
              in the frequency range 20Hz to 3.45kHz
-spf <4x5hex> manually input the 4 spreading functions as 4 x 5 hex characters;
              e.g. 44444-44444-44444-44444, default=11124-11234-23345-34456;
              Hex characters must be one of 1 to 9 and A to F (zero excluded).
-o <folder>   destination folder for the output file
-clipping     disable clipping prevention by iteration; default=off
-force        forcibly over-write output file if it exists; default=off

Advanced / System Options:

-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]Test results for v0.3.18:

My 52 sample set: WAV: 121.53MB; FLAC: 68.2MB / 792.0kbps; -1: 46.42MB / 539.0kbps; -2: 45.45MB / 527.8kbps; -3: 38.88MB / 451.5kbps.

Guru's 150 sample set: WAV: 252.36MB; FLAC: 122.17MB / 683.2kbps; -1: 95.95MB / 536.5kbps; -2: 93.81MB / 524.6kbps; -3: 84.96MB / 475.1kbps.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #369
Wonderful. Thank you.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #370
We're in a world of heuristics, but to me the skew option is more meaningful than the snr option.

What I understood what SKEW was for, it is an "offset" to SNR to give the low freqs (where we would more easily discern noise) a better snr. (with a stretch you could call it a form of noise shaping)

So if you change SNR, this will impact the values where SKEW is applied too.

If I'm correct the effect on quality (==snr?) would be
- when you raise SKEW  you (only) give better snr to the lower frequenties
- when you raise SNR and lower SKEW (at the same time) you (only) give the high freqs a better snr.

So choose where you want the extra quality...  or just vary the SNR.

BTW. Has anybody found that SKEW above 9 improves a problem sample?
In theory, there is no difference between theory and practice. In practice there is.

lossyWAV Development

Reply #371

We're in a world of heuristics, but to me the skew option is more meaningful than the snr option.

What I understood what SKEW was for, it is an "offset" to SNR to give the low freqs (where we would more easily discern noise) a better snr. (with a stretch you could call it a form of noise shaping)

So if you change SNR, this will impact the values where SKEW is applied too.

If I'm correct the effect on quality (==snr?) would be
- when you raise SKEW  you (only) give better snr to the lower frequenties
- when you raise SNR and lower SKEW (at the same time) you (only) give the high freqs a better snr.

So choose where you want the extra quality...  or just vary the SNR.

BTW. Has anybody found that SKEW above 9 improves a problem sample?

Well, the skew option is more meaningful to me than the snr option just because I have an imagination about the effect of skew (though I don't really know how useful it is), but I personally don't really understand the idea behind snr. Maybe Nick can help.
I personally accept that we are partially doing a bit of rather wild experimenting as long as this is done in a pretty conservative way that makes sure the very good quality already achieved.
I have liked the idea of skew as I have always seen too much averaging at the low frequency edge IMO. Now that this is gonna change due to variable spreading maybe the skew option will partially loose it's usefulness. For being conservative, especially with -1, however skew may still be welcome.
I also see snr in favor of conservatism, but because of lacking insight so far my heart is more with skew.
Let's see what will come out.
lame3995o -Q1.7 --lowpass 17

 

lossyWAV Development

Reply #372
We're in a world of heuristics, but to me the skew option is more meaningful than the snr option.
What I understood what SKEW was for, it is an "offset" to SNR to give the low freqs (where we would more easily discern noise) a better snr. (with a stretch you could call it a form of noise shaping)

So if you change SNR, this will impact the values where SKEW is applied too.

If I'm correct the effect on quality (==snr?) would be
- when you raise SKEW  you (only) give better snr to the lower frequenties
- when you raise SNR and lower SKEW (at the same time) you (only) give the high freqs a better snr.

So choose where you want the extra quality...  or just vary the SNR.

BTW. Has anybody found that SKEW above 9 improves a problem sample?
Well, the skew option is more meaningful to me than the snr option just because I have an imagination about the effect of skew (though I don't really know how useful it is), but I personally don't really understand the idea behind snr. Maybe Nick can help.
I personally accept that we are partially doing a bit of rather wild experimenting as long as this is done in a pretty conservative way that makes sure the very good quality already achieved.
To me, -snr is a safety net that calculates the average of all the relevant fft bins and then deducts the value (default=12) to derive a threshold value. If the minimum result of the relevant fft bins is below the threshold value then the minimum result is used, if above then the threshold value is used. It is easily disabled with -snr 0.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #373
To me, -snr is a safety net that calculates the average of all the relevant fft bins and then deducts the value (default=12) to derive a threshold value. If the minimum result of the relevant fft bins is below the threshold value then the minimum result is used, if above then the threshold value is used.

If that's all then they are not related, and I was wrong.  I must be mixing up -SNR with some other noise threshold.
In theory, there is no difference between theory and practice. In practice there is.

lossyWAV Development

Reply #374
To me, -snr is a safety net that calculates the average of all the relevant fft bins and then deducts the value (default=12) to derive a threshold value. If the minimum result of the relevant fft bins is below the threshold value then the minimum result is used, if above then the threshold value is used.
If that's all then they are not related, and I was wrong.  I must be mixing up -SNR with some other noise threshold.
If you introduce a large -skew value then the minimum *may* be affected, but the average will definitely be affected as the fft results are skewed before the spreading / averaging is done.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)