Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: lossyWAV Development (Read 561681 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

lossyWAV Development

Reply #25
I can't ABX, but don't have a quiet environment so please don't rely on me!

I haven't had chance to try your code, but the bitrates are comparable to the original code with ns=6. Look back in the original thread to see what halb27 could ABX at ns=6 - I think it was "furious". It might be worth trying.

I think resampling to 32k is the way to go for lower bitrates, if your DAP supports it and your ears can't hear it (I'm OK on both counts!).

Sorry I haven't had time to add anything constructive.

Cheers,
David.

lossyWAV Development

Reply #26
Right - revised source (and 1 external function) - uses wavreadraw and wavwriteraw - not attached, but basically don't convert raw audio data into +/- 1.0 range.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #27
I really like this idea of a preprocessor, and of course the near-lossless small flac files! But how can I use the MATLAB script. I don't have matlab and it seems impossible to get a trial. Could this preprocessor be turned into a foobar2000 dsp plugin by any chance? or a commandline program? 
and why aren't wavreadraw and wavwriteraw attached!

Thanks
Bobby

lossyWAV Development

Reply #28
Please attach the wavreadraw and wavewriteraw so I can give this prog a spin. I dont know how to code matlab to use raw wav.

im a noob!

lossyWAV Development

Reply #29
Wavread and wavwrite are copyrighted Matlab code and I will not post them - however they are easily modifiable - look for a section which multiplies (wavwrite) or divides (wavread) the audio data by 32767 or 32768 and insert a "%" before that line to "REM" it out - that will sort it for 16 bit audio. Oh, and save the functions to a different name or you will have broken the originals.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #30
Thanks. That was a bit of an oversight on my part

 

lossyWAV Development

Reply #31
Realising that there are only so many parameters to be played with without destroying the audio quality of the output.......

I've been playing around with the spreading function - previously length=4 (i.e. [0.25,0.25,0.25,0.25]) - I've tried even numbers of length from 6 to 16 and am pleasantly surprised by the results. Atem_Lied attached for spreading function lengths of 8, 12 and 16 for your listening pleasure(?!).

[edit]
Following the processing of these samples (constant spreading_function_length with variable fft_length per analysis), I've started "playing about" with variable spreading_function_length with variable fft_length per analysis. There should be some processed results later tonight.
[/edit]

[edit2]
Right, samples attached - .ssx1.flac is 3 analyses (1024,256,64 fft_lengths) and corresponding spreading_function_lengths: 16,8,4;.ssx2.flac is 3 analyses (1024,256,64 fft_lengths) and corresponding spreading_function_lengths: 64,16,4;
[/edit2]



<files removed - obsolete>
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #32
The ss12, ss16, ssx1 and ssx2 versions are easily abxable.
Not quite so with ss8 - it took me a lot of concentration. Guess with 'normal' though concentrated listening it will go unnoticed.

But: what are the advantages against 2Bdecided's original apprach? Are you attaining a significantly lower bitrate?
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #33
Thanks for the listening time!

The bitrate is coming down a fair amount. For the 41 samples in the set, all using triangular dither:

WAV=98.6MiB;
FLAC=56.9MiB;
2Bdecided's (fft_length=1024,64; codec_block_length=1024; spreading_function_length=4,4)=35.4MiB;
NIC .ss20 (fft_length=1024,64; codec_block_length=576; spreading_function_length=4,4)=34.0MiB;
NIC .ss30 (fft_length=1024,256,64; codec_block_length=576; spreading_function_length=4,4,4)=34.7MiB;

Revised script appended.

<files removed - obsolete>
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #34
Realising that there are only so many parameters to be played with without destroying the audio quality of the output.......

I've been playing around with the spreading function - previously length=4 (i.e. [0.25,0.25,0.25,0.25]) - I've tried even numbers of length from 6 to 16 and am pleasantly surprised by the results. Atem_Lied attached for spreading function lengths of 8, 12 and 16 for your listening pleasure(?!).

[edit]
Following the processing of these samples (constant spreading_function_length with variable fft_length per analysis), I've started "playing about" with variable spreading_function_length with variable fft_length per analysis. There should be some processed results later tonight.
[/edit]

[edit2]
Right, samples attached - .ssx1.flac is 3 analyses (1024,256,64 fft_lengths) and corresponding spreading_function_lengths: 16,8,4;.ssx2.flac is 3 analyses (1024,256,64 fft_lengths) and corresponding spreading_function_lengths: 64,16,4;
[/edit2]



<files removed - obsolete>

Ok. today i was able to abx all 3 versions ss12, ssx1 and ss8.

What do you want now with all these attached files above?
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

lossyWAV Development

Reply #35
@ Nick.C:

I appreciate 2BDecided's and your work very much.
But if you go and produce an inflation of numerous variants I guess we're heading into a problem.
On one hand I'm afraid not a lot of members will love to do such listening tests on the 121st of your variants, but what's worse is: you may find a variant producing a good atem-lied encoding and save 15% against 2BDecided's version. But what about general quality outside of Atem-lied?

IMO it would be best if you and 2BDecided work together even more closely in the sense that you go along a specific approach which you both think is most promising. And for this provide various listening samples for us to give quality feedback to you.
Though a saving of bitrate is very welcome the more important target at the moment IMO is a robust excellent quality. Don't worry but so far to me it seems that an approach closer to 2BDecided's original one seems to produce the more reliable results. But I think if you bring your both ideas together something great will come out. Maybe it's not so appropriate to produce something that makes the lossy flac encoding competitive with say wavPack lossy regarding bitrate. After all we have wavPack lossy for that. But as FLAC is widely supported on music players there is sense in having lossy FLAC files of extremely high quality of significant smaller size than the lossless ones.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #36
Apologies for "going through the permutations" on the various options available in the script. Simplistically, it comes down to:

2 or 3 analyses? (processing time implication, slight size increase on 3);
fixed or variable spreading_function_length? (smaller size on variable);
ELF on or off - still unproven.

So, The only ones that are likely to be "better" than .ss8.flac are .ss20; .ss21; .ss30 and .ss31, i.e. .ss(2 or 3 analyses)(0=fixed;1=variable spreading_function_length).

From Halb27's and Wombat's comments earlier I would guess .ss20 or .ss31 are realistic candidates.

Having it narrowed down to two (or possible 1 - .ss20, as it's the closest to the original concept), is it worth producing a set of selected samples for ABX? If so, which samples would you recommend of those previously mentioned in the main thread (or others...)?
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #37
I welcome most if you can narrow it down to one, more so if this is closest to 2BDecided's original version and in case .ss31 doesn't give hope for the chance of significantly improving things over .ss20.

I propose
  • atem-lied
  • furious
  • keys (pointed to by shadowking)
  • triangle (pointed to by shadowking)
  • badvilbel
These are specific problem samples where problems with these kind of codecs should be most obvious.
We should also have samples where 'normal' hiss is most prominent.
I know just
  • bruhns (given by guruboolez)
but we should have more samples.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #38
I agree with Halb27. I don't have time to test all these modes and I don't know what happens at lower bitrates. Wavpack usually sounds good from 230 k , but others I tested don't - shorten, rkau (violent bursts of noise etc). On metamorphose sample I heard some similar phenom with the preprocessor.

I am happy with the original 2Bdecided method. People will be very suspicious with the thought of lossy FLAC etc. If we can from the start produce a near lossless reduction that is virtually not *abxable* under any condition and as good as lossless from a practical point of view then that will be more acceptable than another threshold than won't always hold. Once someone with lots of time and effort finds some fault people will start spreading bad rumours that we are destroying lossless compression etc etc

On the other hand 512k is not small but still much more so than lossless. If one desires an extreme high quality that holds up to anything then that will be a new 'lossless' to the masses @ 512 k.. size won't be the issue but imperfection will.

So wavpack , optimfrog, flac @ 512k end-to-all quality is better than 350k - 99% perfect quality when you package the lossy mode with FLAC name.

lossyWAV Development

Reply #39
.. So wavpack , optimfrog, flac @ 512k end-to-all quality is better than 350k - 99% perfect quality when you package the lossy mode with FLAC name. ..

Perfectly said.
Never thought about quality demands being higher for lossy .flac files but I think this is absolutely true.
A lossy flac file should be indistinguable from the original with a probability of 1 (within the limitations of getting sure of that in practice).
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #40
Which prompts me to consider introducing a *negative* noise_threshold_shift value (say -1 or -2) to the parameter setting (i.e. reduce bits to remove slightly).

Using 2 analyses, fixed length spreading function, NTS=-1, the sample set increases from 34.0MiB to 34.9MiB lossy flac (56.9MiB flac / 98.6MiB wav).

<files all now found - thanks to Halb27!>

Atem_Lied, Badvilbel, Bruhns, Furious, Keys & Triangle_2 attached - 2 analyses; fixed length spreading_function; NTS=-1.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #41
I think there will be room for two or three settings only...

1. Transcode and multi-gen proof (or overkill option for cautious people). Re-encode it 20 times at this setting and it'll still be alright. Transcode it to anything and it'll still sound (about) as good as encoding straight from the original.

2. Normal. Chances of ABXing original from lossyFLAC normal should tend to zero, but it probably won't stand up to 20 generations of re-encoding.

3. Compact. Allows you to introduce known compromises to get the filesize down if you want to, e.g. resampling to 32kHz.


I have tried to deliver number 2 on that list. If it fails ABX with anything, then some of the parameters will need to be tightened up. So far it hasn't, but let's see. I never dreamed that someone would be as inventive as Nick in using these parameters to reduce the bitrate - I intended to use them to tweak the code to improve quality (if necessary).

I think it's obvious how to deliver number 1 - shift the noise threshold (already implemented) down and put in some extra checks (e.g. extra FFT size - already implemented, M/S checking - not yet implemented). Some of the extra checks might end up in number 2 anyway if it's ABXed - we'll see.

I believe Nick is trying to deliver number 3 on that list. To be honest, with a flat noise floor, I don't think there's much that can be done to deliver this. The noise floor is already pretty much where I think it should be - at the same level (or, if it's shifted, related to the level) of the minimum noise floor in the recording. If the existing calculation is wrong, and it puts noise above or below the existing noise level, then this should be fixed and integrated into number 2. The only extra steps you can take are to ignore stuff above a fixed frequency (already implemented by myself), or to take account of the MAF (already implemented by Nick). Anything else, how ever clever, must by definition be pushing the noise above the noise floor of the original recording. It may be audible, it may not - you'd need a psychoacoustic model to decide. However, I've already seen people ABX tracks with the noise threshold 6dB up (i.e. 1 more bit removed) so it doesn't seem that there's much room for improvement. There could be some - it depends on the signal, how much you want to lower the bitrate by, and how hard you're willing to work to do it.

What can deliver number 3 (at least for most signals) is to use a shaped noise floor, as suggested by SebG on page one of the original thread...

http://www.hydrogenaudio.org/forums/index....st&p=498376

This is basically what's described here...

http://telecom.vub.ac.be/Research/DSSP/Pub.../AES-2002-B.pdf
(there are other similar papers by the same authors)


I tried a cheats version by designing the minimum phase noise feedback filter directly from the desired magnitude response (quite easy, and already built into MATLAB sig proc toolbox, though I'd coded it myself before I found this!), but that doesn't take account of the constraints of gain (which should average to unity on a log scale, if I understand it correctly), and needing the first filter coefficient to be 1. If scaling the coefficients to make the first coefficient be 1 also happens to result in a reasonable gain, it works well. Normally this won't happen, and you'll add tens of dB of extra noise!

So to make it work, I (or someone!) will have to implement what's described in that paper. I haven't worked on LPCs before, but they seem to describe a short cut, and I'll give it a go when I get chance.

Cheers,
David.

lossyWAV Development

Reply #42
IMO the efficiency option 3 can be considered seperately.
As you mentioned anyone who is out for smaller file size can achieve it right now by resampling to 32 kHz in advance (that's what I do with wavPack lossy).
This kind of noise shaping sounds interesting, but it's a new building block and can be done later.
At the moment it makes things more complicated and thus keeps us further away from what is needed most: that a nice guy come up and create an exe program from your idea.
Maybe it would help if you could provide a more detailed description of it that can be understood by a programmer without very detailed DSP knowledge.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #43
... Currently hunting for those furious & bruhns - can't find them - they seem to have been removed. ...

Here they are:
[attachment=3578:attachment] [attachment=3579:attachment]
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #44
Many thanks!
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #45
...Atem_Lied, Badvilbel, Keys & Triangle attached - 2 analyses; fixed length spreading_function; NTS=-1. ...

Atem_lied: 9/10 (pretty hard for me to abx)
badvilbel: could not abx
keys: 8/10 (easier to abx than shown by the score - didn't catch the problem with my first two guesses)

triangle: guess I wasn't specific enough with the triangle sample I was thinking of. Thought of this one:
[attachment=3580:attachment]
I don't have the original of your triangle version.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #46
IMO the efficiency option 3 can be considered seperately.
As you mentioned anyone who is out for smaller file size can achieve it right now by resampling to 32 kHz in advance (that's what I do with wavPack lossy).
This kind of noise shaping sounds interesting, but it's a new building block and can be done later.
At the moment it makes things more complicated and thus keeps us further away from what is needed most: that a nice guy come up and create an exe program from your idea.
Maybe it would help if you could provide a more detailed description of it that can be understood by a programmer without very detailed DSP knowledge.


Last point first: I'd have thought that "a programmer without very detailed DSP knowledge" could work from the MATLAB code (and an FFT library) more easily than from a description. If there's anything confusing about the code, I'd be more than happy to help. I would stress that it's not optimised. It's there for people to find problem samples, and update it. However, I guess this will be much easier if it's an exe, so to solve the chicken and egg situation, an exe would be great!

The noise shaping will have to wait until someone has the time to do it anyway. It might end up in option 2 if it works well enough, or be switchable separately.

So yes, certainly, if anyone can take on the task of coding it properly, please go for it. Nicks code is clearer than mine, but I don't think the experimental quality reducing options should be included, unless they work.

Cheers,
David.

lossyWAV Development

Reply #47
............but I don't think the experimental quality reducing options should be included, unless they work.


Neither do I - I'm only now realising the importance of maintaining excellent quality in any processing to be implemented and subsequently encoded in the flac format - the last thing I would want to do is adversely skew "public" opinion against flac due to a poor lossy implementation.

I will post a clean version of the script without any extraneous experimental gubbins - in the hope that someone can turn it into a usable binary.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #48
... Bruhns, Furious, ....Triangle_2 ...

Bruhns: Did two sessions on two different spots that were suspicious to me and got at 7/10 in each session.
            Very hard for me.
Triangle: Could not abx a difference..
Furious: Could not abx a difference.

So as far as to my results towards these samples the quality of your variant is very good to me keeping in mind that these are hard problems for wavPack lossy suspected to be not eeasy for this preprocessor too.
A good candidate for 2BDecided's option 3 when it's up to that.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #49
If you're up to some more listening, 2Bdecided originally added a third analysis as an "overkill" option. The other way to increase bitrate is to introduce a negative noise_threshold_shift. The attached samples were processed with 3 analyses, noise_threshold_shift=-2; triangular_dither; force_dither_lsb=1; fix_clipped automatically if necessary after bit reduction and rounding.


Revised script attached - no longer requires external amplitude function but still requires modified wavread/write functions.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)