IPB

Welcome Guest ( Log In | Register )

> Upload forum rules

- No over 30 sec clips of copyrighted music. Cite properly and never more than necessary for the discussion.


- No copyrighted software without permission.


- Click here for complete Hydrogenaudio Terms of Service

50 Pages V  « < 21 22 23 24 25 > »   
Closed TopicStart new topic
lossyWAV Development, WAV bit reduction by 2BDecided
halb27
post Nov 27 2007, 09:48
Post #551





Group: Members
Posts: 2446
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



QUOTE (Axon @ Nov 27 2007, 08:53) *
I don't see a use for a quasi-lossy bitrate reduction of a lossless format, if the reduction is known to produce artifacts in reasonable configurations. If people are able to ABX this in a wide variety of different modes, that doesn't give me much confidence in using lossyWAV at all, no matter what the settings. If I can deal with a probabilistic chance of artifact audibility, why not stay with lossy?

a) We have a system of lossyWAV + a lossless codec which makes up for a lossy codec.
So you do lossy encoding when using lossyWAV, and the good and bad of the procedure must be measured against that of other lossy codecs. Which is a very subjective thing of course when comparing lossy codecs of very good quality.
b) AFAIK nobody ever experienced an artifact even with our lowest quality mode -3. Quality was extremely good from the very start. You're welcome to do some listening tests and report about it.
QUOTE (Axon @ Nov 27 2007, 08:53) *
This doesn't seem like the sort of algorithm that lends itself to tuning.

??? For a very long period we had great quality but at a bitrate of ~500 kbps on average. But we've investigated and optimized David Bryant's idea of doing the averaging of the FFT outcome according to the length of the critical bands, and we differentiate on doing this depending on FFT length. We've optimized the -skew parameter where a rather high -skew value does an extremely good job at differentiating between spots in the music which have to be handled defensively or not. We've introduced the -snr parameter which adds benefits for the differentiation work of -skew. We've found a solution to the theoretical clipping issue. We've improved the way the FFT analyses covers the lossyWAV blocks for security reasons.
So we ended up with an average bitratre of ~350 kbps for -3 with not the least quality issue known. -2 and -1 IMO provide for adequately varying internals to make it promising for the cautious minded of the various kind.
As a consequence IMO the only really useful option apart from the quality parameter is -nts. I personally however wouldn't mind if the advanced options are kept even in the final release if they are clearly marked as such (maybe hidden in the commandline help, but documented in the external documentation).

This post has been edited by halb27: Nov 27 2007, 10:10


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
Nick.C
post Nov 27 2007, 11:00
Post #552


lossyWAV Developer


Group: Developer
Posts: 1815
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



QUOTE (Axon @ Nov 27 2007, 06:53) *
Forgive me for asking a fundamental (and admittedly critical) question; I'm very late to this particular party. Before I start, I must say this idea (and all the work that has gone into it) is incredible, and I would not hesitate to use it once the kinks are ironed out. From the original post by 2BDecided:
QUOTE
This isn't about psychoacoustics. What you can or can't hear doesn't come into it. Instead, you perform a spectrum analysis of the signal, note what the lowest spectrum level is, and throw away everything below it. (If this seems a little harsh, you can throw in an offset to this calculation, e.g. -6dB to make it more careful, or +6dB to make it more aggressive!).
I don't see a use for a quasi-lossy bitrate reduction of a lossless format, if the reduction is known to produce artifacts in reasonable configurations. If people are able to ABX this in a wide variety of different modes, that doesn't give me much confidence in using lossyWAV at all, no matter what the settings. If I can deal with a probabilistic chance of artifact audibility, why not stay with lossy?

This doesn't seem like the sort of algorithm that lends itself to tuning. If the technique is independent of psychoacoustics, then the only advanced setting that ought to exist is -skew.

Is that too harsh? Perhaps I'm being overly critical on beta code?
The beta nature only really reflects the status of the code with respect to bug reports which will (probably) come in. This method / pre-processor was initially intended to allow the benefits of a lossy codec to be "wrapped" in a lossless codec. The method is David's, Halb27 and I have only implemented it in Delphi and added a few tweaks along the way.

At various points along the way, people have assisted with setting determination through personal ABX'ing of particularly problematic samples (Big thanks to Halb27, Shadowking, Wombat & Gurubooleez). Valued input has been made by 2Bdecided, Bryant, TBeck, Mitch 1 2, Josef Pohm, SebastianG, user, collector, Dynamic, GeSomeone, Robert, verbajim, [JAZ], BGonz808, M & Jesseq.

At the present time I don't think that the method is "known" to produce any artifacts with default settings (however if anyone can tell me differently, I would be very appreciative of the particular sample to try and iron it out).

Yes there have been very few individuals involved in ABX'ing / settings development, but I take it that that just means that this is a niche program only wanted by a few people.

From a purely personal perspective, I have found the drive to develop it through feedback from those who have made comments along the way and from a desire to use lossyFLAC on my iPAQ (GSPlayer v2.25 & GSPFlac.DLL)

In keeping with David's wishes, the only command line options in the final revision will be quality levels -1,-2 & -3 and the -nts parameter (unless, as Halb27 has indicated we leave the advanced options in the code but don't "advertise" them outside of the accompanying PDF / TXT file.

Why don't you give it a try? It's certainly robust enough to handle a Foobar2000 transcode of about 1500 files without falling over (the largest of which was circa 60 minutes).

This post has been edited by Nick.C: Nov 27 2007, 11:40


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
Synthetic Soul
post Nov 27 2007, 12:02
Post #553





Group: Super Moderator
Posts: 4887
Joined: 12-August 04
From: Exeter, UK
Member No.: 16217



QUOTE (Nick.C @ Nov 27 2007, 10:00) *
Yes there have been very few individuals involved in ABX'ing / settings development, but I take it that that just means that this is a niche program only wanted by a few people.
Personally, I have been following this thread avidly from the start, but I lack the ears to be testing very high quality lossy audio, or the expertise to offer technical advise or cause debate.

This gives me an opportunity to thank you all though for the work that you have put in. I think this is an extremely exciting development.

I think Axon's question was well worth the ask: much of the discussion in this thread is - to complete laymans like myself - of a complex technical nature. Given that lossyWAV sits somewhere between high quality 'psychoacoustic' lossy and lossless quality, it is necessary to explain to the general masses what users can expect from this process.

Personally I have been considering a Wavpack lossy backup of my music for a while. It is possible that using lossyWAV as a pre-processor may be more suited to my needs (or whims).

Also, I cannot simply 'give it a try'. I am highly unlikely to find an issue. What I need to know is that people with excellent ears and technical knowledge can assure me that this process will create a near-perfect archive from which I can safely transcode to lossy for use on my DAP, or car stereo.

After re-reading my post I think I've just realised why we're called 'users'. smile.gif


--------------------
I'm on a horse.
Go to the top of the page
+Quote Post
Nick.C
post Nov 27 2007, 12:58
Post #554


lossyWAV Developer


Group: Developer
Posts: 1815
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



QUOTE (Synthetic Soul @ Nov 27 2007, 11:02) *
QUOTE (Nick.C @ Nov 27 2007, 10:00) *
Yes there have been very few individuals involved in ABX'ing / settings development, but I take it that that just means that this is a niche program only wanted by a few people.
Personally, I have been following this thread avidly from the start, but I lack the ears to be testing very high quality lossy audio, or the expertise to offer technical advise or cause debate.

This gives me an opportunity to thank you all though for the work that you have put in. I think this is an extremely exciting development.

I think Axon's question was well worth the ask: much of the discussion in this thread is - to complete laymans like myself - of a complex technical nature. Given that lossyWAV sits somewhere between high quality 'psychoacoustic' lossy and lossless quality, it is necessary to explain to the general masses what users can expect from this process.

Personally I have been considering a Wavpack lossy backup of my music for a while. It is possible that using lossyWAV as a pre-processor may be more suited to my needs (or whims).

Also, I cannot simply 'give it a try'. I am highly unlikely to find an issue. What I need to know is that people with excellent ears and technical knowledge can assure me that this process will create a near-perfect archive from which I can safely transcode to lossy for use on my DAP, or car stereo.

After re-reading my post I think I've just realised why we're called 'users'. smile.gif
Thanks are always appreciated.

I totally agree that the question is valid and requires an answer. Technically, I am not really the person to answer it, just the programmer.

Also, I will be using my lossyFLAC collection in tandem with my FLAC collection rather than replacing the latter with the former, essentially, lossyFLAC is my lossy transcode.

Until more ears have validated the current quality level settings, we're not going to be in the position to reassure new users of the quality of the output.


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
halb27
post Nov 27 2007, 13:21
Post #555





Group: Members
Posts: 2446
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



QUOTE (Synthetic Soul @ Nov 27 2007, 13:02) *
Personally I have been considering a Wavpack lossy backup of my music for a while. It is possible that using lossyWAV as a pre-processor may be more suited to my needs (or whims).

Also, I cannot simply 'give it a try'. I am highly unlikely to find an issue. What I need to know is that people with excellent ears and technical knowledge can assure me that this process will create a near-perfect archive from which I can safely transcode to lossy for use on my DAP, or car stereo.

After re-reading my post I think I've just realised why we're called 'users'. smile.gif

The more I'm into audio compression the more I think it's upto personal decisions (and personal a priori preferences) what codec and setting to use. Objective findings always have a limited scope.
My personal key event was the 128 kbps listening test of Lame 3.97 where Lame came out more or less on par with codecs like Vorbis. I have no doubt this test was done with great care, but I personally would never use 3.97 at a bitrate of 128 kbps (due to the 'sandpaper' noise and similar problems). Luckily 3.98 has overcome these problems, and is still about to improve.

So it's true that more listening experience by especially well-respected ears is most welcome, but IMO it's not a sine qua non thing. Technical knowledge can't assure transparency anyway.

So in the end what IMO counts is that any experience tells that everything is fine so far (finally we do have public experience though we like to get more). And of course any potential user must like the idea of being close to lossless (from the technical view of the overall procedure which is not necessarily related to quality), and must not care about a bitrate of 350 kbps or higher. Otherwise he wouldn't use it.

As you have considered using wavPack lossy you don't care about extremely high bitrate, and you like the idea of being with a clean signalpath associated with going a near-lossless way, cause otherwise you would use very hiqh quality Vorbis or similar. Using lossyWAV you're more or less in the same situation as if you used wavPack lossy. We can expect wavPack lossy high mode at 400 kbps using dynamic noise shaping giving transparent results in nearly any situation and non-annoying results even on the hardest stuff, and all this without a real quality control so far. With lossyWAV the situation is the same (hopefully even better due to the existing quality control which can be said to have proved being effective).

The main problem with very high quality codecs is: while it's easy to prove the codec has an issue by giving a sample, it's impossible to prove a codec is transparent in a universal sense. So in the end the most adequate attitude IMO is once very high quality is assured at least in a basic sense: don't care as long as no counterexamples are given.

This post has been edited by halb27: Nov 27 2007, 13:34


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
Synthetic Soul
post Nov 27 2007, 14:10
Post #556





Group: Super Moderator
Posts: 4887
Joined: 12-August 04
From: Exeter, UK
Member No.: 16217



Thank you both for your responses.

QUOTE (halb27 @ Nov 27 2007, 12:21) *
Technical knowledge can't assure transparency anyway.
If it's technical knowledge of a lossless operation then it can.

The techniques that are being used in lossyWAV are complete gibberish to me. In my limited understanding though, what was originally proposed was the removal of near-useless bits from the WAVE, to make mor efficient use of basic compression routines within the encoders (e.g.: FLAC's wasted_bits). You speak below of "a clean signalpath": this is really what I am discussing. If someone with a technical knowledge of the algorithms used can assure users that the resulting signal has merely had some negligable information removed with no further processing then that to me would suggest that there was less room for a bug in the algorithm, or that the decision making process was more simple and therefore less prone to erratic behaviour. I don't think I'm making myself clear. smile.gif

QUOTE (halb27 @ Nov 27 2007, 12:21) *
As you have considered using wavPack lossy you don't care about extremely high bitrate, and you like the idea of being with a clean signalpath associated with going a near-lossless way, cause otherwise you would use very hiqh quality Vorbis or similar. Using lossyWAV you're more or less in the same situation as if you used wavPack lossy. We can expect wavPack lossy high mode at 400 kbps using dynamic noise shaping giving transparent results in nearly any situation and non-annoying results even on the hardest stuff, and all this without a real quality control so far. With lossyWAV the situation is the same (hopefully even better due to the existing quality control which can be said to have proved being effective).
Exactly.

QUOTE (halb27 @ Nov 27 2007, 12:21) *
The main problem with very high quality codecs is: while it's easy to prove the codec has an issue by giving a sample, it's impossible to prove a codec is transparent in a universal sense. So in the end the most adequate attitude IMO is once very high quality is assured at least in a basic sense: don't care as long as no counterexamples are given.
Agreed. And, of course, such claims are will be taken with a pinch of salt until a lot of testing has been undertaken. And, of course, testing high quality encodes is not easy.


--------------------
I'm on a horse.
Go to the top of the page
+Quote Post
Nick.C
post Nov 27 2007, 14:18
Post #557


lossyWAV Developer


Group: Developer
Posts: 1815
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



QUOTE (Synthetic Soul @ Nov 27 2007, 13:10) *
If someone with a technical knowledge of the algorithms used can assure users that the resulting signal has merely had some negligable information removed with no further processing then that to me would suggest that there was less room for a bug in the algorithm, or that the decision making process was more simple and therefore less prone to erratic behaviour. I don't think I'm making myself clear. smile.gif
As I have an implicit knowledge of the workings of the 3 main procedures involved in the process (having transcoded them from Matlab > Delphi > IA-32 Assembler) I will work on a process flow explanation.


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
Synthetic Soul
post Nov 27 2007, 15:08
Post #558





Group: Super Moderator
Posts: 4887
Joined: 12-August 04
From: Exeter, UK
Member No.: 16217



QUOTE (Nick.C @ Nov 27 2007, 13:18) *
As I have an implicit knowledge of the workings of the 3 main procedures involved in the process (having transcoded them from Matlab > Delphi > IA-32 Assembler) I will work on a process flow explanation.
I would be very interested to read a non-technical explanation of the processes involved; however I feel awful for increasing your workload.

Please only do so if you believe that it will be necessary for other users to make the decision also.

Thanks again.


--------------------
I'm on a horse.
Go to the top of the page
+Quote Post
halb27
post Nov 27 2007, 15:15
Post #559





Group: Members
Posts: 2446
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



QUOTE (Synthetic Soul @ Nov 27 2007, 15:10) *
... If someone with a technical knowledge of the algorithms used can assure users that the resulting signal has merely had some negligable information removed with no further processing then that to me would suggest that there was less room for a bug in the algorithm, or that the decision making process was more simple and therefore less prone to erratic behaviour. ...

Yes, that's what makes the procedure attractive to me too though I'm afraid we won't get a kind of security from the mere process itself.
I can try to describe the procedure from my understanding which isn't perfect at all:

As you write the basic idea is to form (now) 512 sample blocks and decide for each block how many of the least significant bits not to use (set to 0). Lossless codecs like FLAC can make use of the reduced number of bits per sample in these blocks, and in order to be effective the block size of the lossless codec should be identical to the lossyWAV block size (or an integer multiple of it in case the lossless codec works more efficient in an overall sense with longer blocks). FLAC works fine with a blocksize of 512.

The usual 16 bit accuracy of wave samples is necessary mainly to give a good accuracy to low volume spots in the music and allow for a good dynamic range. At moderate to low volume spots far less than 16 bits are used for signal representation (that's why lossless codecs yield a good compression ratio in these cases). At high volume spots not the entire 16 bits are needed usually. Roughly speaking a certain number of rather high value bits are needed for loud spots (while the lower value bits can be zero), and a certain number of low value bits are needed for quieter music (and the high value bits are zero). That's the main background of the method. We care about the louder spots and reduce accuracy of representation here.
Dropping a certain amount of least significantly bits means adding noise to the original. This added noise is not necessarily perceived as the kind of analog noise/hiss known from for instance tape recordings.

So the main thing is to decide on how many least significant bits to drop. From a bird's view the frequency spectrum of the 512 sample block is calculated and the frequency region with the lowest energy is searched. The idea is to preserve this energy, don't let it get drowned in the added noise, and this done by keeeping sample accuracy high enough by looking up this minimum energy level in a table that tells how many bits are possible to remove depending on energy level and frequency. The table was found a priori by examining white noise behavior with respect to our purposes.

The real process is a bit more complicated letting several FFTs do the frequency spectrum analysis according to what they're best at: short FFTs responding to quickly changing signals but with a very restricted resolution at low to medium frequencies, and long FFTs giving good frequency resolution but not responding very quickly. Nick.C has done a good job in letting the FFTs cover the lossyWAV blocks very accurately - more than was done originally.
Moreover in order not to have to keep up high accuracy due to pure hazard, a certain averaging is done over the outcome of the FFT analyses. A lot of tuning has been done on this in order to achieve good quality and relatively low bitrate.
A huge sensitivity bias is given to the low to medium frequency range by using the -skew and -snr options. This is done in analogy to the fact that the usual transform codecs give priority to the accurate representation of low to medium frequencies. The improvement in quality control by using -skew is so strong that we have decided that a noise threshold of +6 is sufficient for -3 (in the a priori theory -nts should be 0).
For -2 we also default to the slightly positive -nts 2, and only with -1 we use a defensive -2. Other than that that the different quality levels differ for the main part in how they do the FFT analyses. With -3 we use 2 different FFT lengths for each block, -2 uses 3 different FFT lengths, and it's a total of 4 FFT lengths for -1. Moreover the averaging of the FFT results is done in an increasingly defensive way when going from -3 to -1.

After having decided about how many least significant bits to remove (set to 0) the samples of the lossyWAV block are rounded to the corresponding values. This rounding can lead to clipping, but we have found a solution to avoid it (by simply dropping less bits in the block so long until no clipping occurs).

Hope that helps.

This post has been edited by halb27: Nov 27 2007, 15:53


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
Mitch 1 2
post Nov 27 2007, 15:28
Post #560





Group: Members
Posts: 31
Joined: 3-October 06
From: Australia
Member No.: 35904



I've started a new wiki article here. The article is incomplete and probably inaccurate. It is also in need of a "technical details" section, possibly along the lines of what you posted above.


--------------------
lossyFLAC (lossyWAV -q 0; FLAC -b 512 -e)
Go to the top of the page
+Quote Post
halb27
post Nov 27 2007, 15:35
Post #561





Group: Members
Posts: 2446
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



QUOTE (Mitch 1 2 @ Nov 27 2007, 16:28) *
I've started a new wiki article here. The article is incomplete and probably inaccurate. It is also in need of a "technical details" section, possibly along the lines of what you posted above.

Wonderful idea, good job.


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
Nick.C
post Nov 27 2007, 16:22
Post #562


lossyWAV Developer


Group: Developer
Posts: 1815
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



QUOTE (halb27 @ Nov 27 2007, 14:15) *
.....Nick.C has done a good job in letting the FFTs cover the lossyWAV blocks very accurately - more than was done originally.....
I don't think that that is the case, the original method overlaps the ends of the codec_block by half an fft_length and overlaps fft's by half an fft_length. The -overlap parameter overlaps the ends by half an fft_length and overlaps fft's by 5/8 of an fft_length.


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
Synthetic Soul
post Nov 27 2007, 16:36
Post #563





Group: Super Moderator
Posts: 4887
Joined: 12-August 04
From: Exeter, UK
Member No.: 16217



QUOTE (halb27 @ Nov 27 2007, 14:15) *
I can try to describe the procedure from my understanding which isn't perfect at all:
...
Hope that helps.
Yes. Thank you for your time. I'm slowly getting there. smile.gif

I'm not sure if you can answer this, and it may be better left for the documentation, but I am left wondering between the differences of -1, -2 and -3. Is -3 thought to be transparent in all known situations now? The obvious next question being: so why bother with -2 and -3?

I guess the same could be said with LAME -V0 and 320kbps CBR, but I'm expecting lossyWAV to have less of a grey area.

Personally, I'd like to hope that -2 (as default) is 'considered transparent until a problem sample can be found', -3 is overkill for the more paranoid amongst us, and -1 introduces a slight amount of risk. Apologies if the description of these presets has been discussed recenty elsewhere.


--------------------
I'm on a horse.
Go to the top of the page
+Quote Post
Nick.C
post Nov 27 2007, 16:42
Post #564


lossyWAV Developer


Group: Developer
Posts: 1815
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



QUOTE (Synthetic Soul @ Nov 27 2007, 15:36) *
QUOTE (halb27 @ Nov 27 2007, 14:15) *
I can try to describe the procedure from my understanding which isn't perfect at all:
...
Hope that helps.
Yes. Thank you for your time. I'm slowly getting there. smile.gif

I'm not sure if you can answer this, and it may be better left for the documentation, but I am left wondering between the differences of -1, -2 and -3. Is -3 thought to be transparent in all known situations now? The obvious next question being: so why bother with -2 and -3?

I guess the same could be said with LAME -V0 and 320kbps CBR, but I'm expecting lossyWAV to have less of a grey area.

Personally, I'd like to hope that -2 (as default) is 'considered transparent until a problem sample can be found', -3 is overkill for the more paranoid amongst us, and -1 introduces a slight amount of risk. Apologies if the description of these presets has been discussed recenty elsewhere.
Exactly those last descriptions, but in reverse order: -1 = overkill; -2 = what you said; -3 = (may, although not yet proven) introduce a slight amount of risk.

The reason for -1 is that you may want to do other things with the output of lossyWAV; -2 is considered to be a very robust intermediate between -1 and -3; -3 is the "I want a lower bitrate and I want "acceptable" (rather than transparent) output" setting, which at the moment is better than its target.

My view of the process:

Read WAV header from input file;
Write WAV header to output file;

Create reference_threshold tables for each fft_length for each bits_to_remove (1 to 32) - not required as precalculated data is used to re-create the surface for each window / dither combination (yes, it changes with both..... sad.gif) - This calculates the mean fft output from the analysis of the difference between the random noise signal and its bit_removed compatriot;

Create threshold_indices from selected reference_threshold table (window / dither combo) - basically, determine how many bits_to_remove for a given minimum dB value;

Read WAV data in a codec_block_size chunk (all channels at once) and for each channel:

Carry out FFT analyses (3 for 1024 sample fft on 512 codec_block_size up to 33 for a 64 sample fft on 512 codec_block_size) on each channel of the codec_block, for each fft_analysis:

Calculate magnitudes of FFT output (from complex number);

Skew magnitudes (currently -36dB at 20Hz to 0dB at 3545Hz, following a 1-sin(angle) curve where angle is the proportion of 1 given by (log(this_bin_frequency)-log(min_bin_frequency))/(log(max_bin_frequency)-log(min_bin_frequency))) by the relevant amount;

Spread skewed magnitudes using the relevant spreading function (e.g. 23358-...... means average 2 bins in the first zone, 3 in the second and third zones, 5 in the fourth zone and 8 in the fifth zone), retaining the minimum value and the average value of the skewed results;

minimum_threshold=floor(min(minimum_skewed_result+nts,average_skewed_result-snr));

Look up Threshold_Index table for the relevant fft_length to determine bits to remove for that particular fft_analysis;

When all fft_analyses for a particular codec_block are complete, determine the minimum bits_to_remove value and use that to:

Remove_bits: For each sample in each channel of the codec_block bit_removed_sample:=round(sample/(2^bits_to_remove))*(2^bits_to_remove). If in the remove_bits process a sample falls outwith the upper or lower bound then decrease bits_to_remove and start the remove_bits process again.

Write processed codec_block and repeat;

Close files and exit.

This post has been edited by Nick.C: Nov 27 2007, 17:48


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
Synthetic Soul
post Nov 27 2007, 17:20
Post #565





Group: Super Moderator
Posts: 4887
Joined: 12-August 04
From: Exeter, UK
Member No.: 16217



QUOTE (Nick.C @ Nov 27 2007, 15:42) *
Exactly those last descriptions, but in reverse order: -1 = overkill; -2 = what you said; -3 = (may, although not yet proven) introduce a slight amount of risk.
Excellent news. smile.gif I will have to spend some time reading your explanation as it, on a quick skim, still seems quite technical to me. Perhaps, as I try to comprehend myself, I can suggest a n00b translation to your technical explanation, that may help to produce the final documentation?

Anyway, the reason I came to post again: WOW!

I have tested lossyWAV previously, but - given the frequency of releases - have really been waiting for it to get to beta before testing fully.

I have just used it and FLAC on my TAK corpus, and am astounded by the savings, using the default settings.

CODE
File FLAC lossyWAV+FLAC
===========================
00 1054 376
01 728 366
02 765 390
03 1013 413
04 883 425
05 860 469
06 1084 455
07 981 419
08 1052 399
09 873 393
10 1026 511
11 853 367
12 834 422
13 1016 435
14 954 403
15 867 390
16 1068 397
17 861 376
18 787 442
19 909 394
20 1142 400
21 760 384
22 1022 410
23 1030 394
24 917 433
25 914 384
26 810 401
27 878 354
28 1040 449
29 912 442
30 895 419
31 913 411
32 1010 402
33 1018 397
34 831 429
35 939 410
36 1038 402
37 1084 439
38 825 381
39 999 413
40 1007 408
41 1037 505
42 1054 408
43 897 418
44 839 364
45 924 425
46 898 431
47 890 398
48 1014 414
49 999 412

Bloody good work gentlemen!

I am under the impression that I can also use TAK and WavPack already. I need to do some more reading to see, if anything, what I need to do to test these also.


--------------------
I'm on a horse.
Go to the top of the page
+Quote Post
halb27
post Nov 27 2007, 17:37
Post #566





Group: Members
Posts: 2446
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



QUOTE (Nick.C @ Nov 27 2007, 17:22) *
QUOTE (halb27 @ Nov 27 2007, 14:15) *
.....Nick.C has done a good job in letting the FFTs cover the lossyWAV blocks very accurately - more than was done originally.....
I don't think that that is the case, the original method overlaps the ends of the codec_block by half an fft_length and overlaps fft's by half an fft_length. The -overlap parameter overlaps the ends by half an fft_length and overlaps fft's by 5/8 of an fft_length.

Oops, I thought the new overlapping was done throughout. So without the -overlap option FFT overlapping is done as before and it takes the -overlap option to do the new overlapping (we discussed something like 8 pages back)?

This post has been edited by halb27: Nov 27 2007, 17:37


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
Nick.C
post Nov 27 2007, 17:39
Post #567


lossyWAV Developer


Group: Developer
Posts: 1815
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



QUOTE (halb27 @ Nov 27 2007, 16:37) *
QUOTE (Nick.C @ Nov 27 2007, 17:22) *
QUOTE (halb27 @ Nov 27 2007, 14:15) *
.....Nick.C has done a good job in letting the FFTs cover the lossyWAV blocks very accurately - more than was done originally.....
I don't think that that is the case, the original method overlaps the ends of the codec_block by half an fft_length and overlaps fft's by half an fft_length. The -overlap parameter overlaps the ends by half an fft_length and overlaps fft's by 5/8 of an fft_length.
Oops, I thought the new overlapping was done throughout. So without the -overlap option FFT overlapping is done as before and it takes the -overlap option to do the new overlapping (we discussed something like 8 pages back)?
Yes, exactly - the new 5/8th fft_length overlapping system doesn't have me totally "sold" to make it the default, but it is still a selectable option.

@Synthetic Soul - smile.gif Glad you like it sir - now, does it bear listening to? Oh, and which quality level was that?

@Axon - Thanks for stimulating a very interesting series of posts!

This post has been edited by Nick.C: Nov 27 2007, 17:53


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
Mitch 1 2
post Nov 27 2007, 17:55
Post #568





Group: Members
Posts: 31
Joined: 3-October 06
From: Australia
Member No.: 35904



QUOTE (Synthetic Soul @ Nov 28 2007, 02:20) *
Anyway, the reason I came to post again: WOW!

I have tested lossyWAV previously, but - given the frequency of releases - have really been waiting for it to get to beta before testing fully.

I have just used it and FLAC on my TAK corpus, and am astounded by the savings, using the default settings.
You ain't seen nothin' yet! You should try using lossyWAV -3 with FLAC -8 -b 512.

This post has been edited by Mitch 1 2: Nov 27 2007, 17:58


--------------------
lossyFLAC (lossyWAV -q 0; FLAC -b 512 -e)
Go to the top of the page
+Quote Post
Synthetic Soul
post Nov 27 2007, 18:17
Post #569





Group: Super Moderator
Posts: 4887
Joined: 12-August 04
From: Exeter, UK
Member No.: 16217



QUOTE (Nick.C @ Nov 27 2007, 16:39) *
@Synthetic Soul - smile.gif Glad you like it sir - now, does it bear listening to? Oh, and which quality level was that?
I've been casually listening to the files while testing, and of course can hear no discernable difference. Default settings for both lossyWAV (-2) and FLAC (-5).

I will soon be posting results for WavPack and TAK defaults also.

QUOTE (Nick.C @ Nov 27 2007, 16:39) *
@Axon - Thanks for stimulating a very interesting series of posts!
Indeed. I've not felt it was the time to get involved before now, but I think it's now time for us more casual testers to show our interest. smile.gif


--------------------
I'm on a horse.
Go to the top of the page
+Quote Post
Synthetic Soul
post Nov 27 2007, 18:30
Post #570





Group: Super Moderator
Posts: 4887
Joined: 12-August 04
From: Exeter, UK
Member No.: 16217



OK, here's my results for FLAC, WavPack and TAK on default settings

CODE
Encoder          |  Command
===================================================================
FLAC 1.2.1       |  flac -b 512 <source>
WavPack 4.42a2   |  wavpack --merge-blocks --blocksize=512 <source>
TAK 1.0.2 Final  |  takc -e -fsl512 <source>

===============================================================
File  |    FLAC    Lossy  |    WavPack Lossy  |    TAK    Lossy
CODE
00 | 1054 376 | 1048 367 | 1034 360
01 | 728 366 | 728 374 | 708 359
02 | 765 390 | 766 395 | 742 378
03 | 1013 413 | 1013 421 | 997 406
04 | 883 425 | 880 421 | 867 413
05 | 860 469 | 858 491 | 798 445
06 | 1084 455 | 1077 458 | 1071 447
07 | 981 419 | 976 418 | 955 410
08 | 1052 399 | 1046 395 | 1040 391
09 | 873 393 | 871 401 | 823 372
10 | 1026 511 | 1029 524 | 1011 504
11 | 853 367 | 853 374 | 827 355
12 | 834 422 | 832 429 | 811 414
13 | 1016 435 | 1010 435 | 1000 425
14 | 954 403 | 948 402 | 927 396
15 | 867 390 | 864 397 | 841 380
16 | 1068 397 | 1066 400 | 1059 393
17 | 861 376 | 860 382 | 829 365
18 | 787 442 | 783 440 | 774 431
19 | 909 394 | 907 393 | 879 382
20 | 1142 400 | 1140 396 | 1130 394
21 | 760 384 | 767 390 | 740 370
22 | 1022 410 | 1014 408 | 1004 400
23 | 1030 394 | 1025 391 | 1022 385
24 | 917 433 | 913 444 | 888 423
25 | 914 384 | 910 381 | 884 371
26 | 810 401 | 811 404 | 784 383
27 | 878 354 | 871 366 | 855 346
28 | 1040 449 | 1033 459 | 1019 443
29 | 912 442 | 911 444 | 877 421
30 | 895 419 | 889 431 | 843 403
31 | 913 411 | 914 415 | 874 389
32 | 1010 402 | 1003 401 | 992 393
33 | 1018 397 | 1009 398 | 994 387
34 | 831 429 | 859 457 | 793 411
35 | 939 410 | 940 417 | 908 395
36 | 1038 402 | 1032 399 | 1027 393
37 | 1084 439 | 1088 453 | 1071 430
38 | 825 381 | 829 392 | 796 367
39 | 999 413 | 993 408 | 986 399
40 | 1007 408 | 999 405 | 990 398
41 | 1037 505 | 1029 516 | 1012 497
42 | 1054 408 | 1046 403 | 1035 395
43 | 897 418 | 901 426 | 882 408
44 | 839 364 | 830 377 | 798 354
45 | 924 425 | 920 425 | 909 414
46 | 898 431 | 899 435 | 881 426
47 | 890 398 | 882 393 | 875 384
48 | 1014 414 | 1006 412 | 997 401
49 | 999 412 | 992 409 | 984 400
==============================================================
Avg | 940 412 | 937 415 | 917 400


This post has been edited by Synthetic Soul: Nov 27 2007, 18:36


--------------------
I'm on a horse.
Go to the top of the page
+Quote Post
Josef Pohm
post Nov 27 2007, 18:47
Post #571





Group: Members
Posts: 40
Joined: 2-April 06
Member No.: 29099



QUOTE (Mitch 1 2 @ Nov 27 2007, 15:28) *
I've started a new wiki article here. The article is incomplete and probably inaccurate. It is also in need of a "technical details" section, possibly along the lines of what you posted above.

As your documentation reports which codecs support LossyWAV and which don't, the following is my experience about the missing ones.

MP4ALS and LPAC support LossyWAV very very well.

SHN should, but I didn't bother to actually check.

On the other side, unless I made some kind of mistake, in my tests APE, LA and ALAC didn't even show to be able to support wasted bits detection at all! OFR supports wasted bits but I can't see a way for it to use a 512 samples frame size (nor my OPINION is that OFR was designed to work with such a small frame size).
Go to the top of the page
+Quote Post
Synthetic Soul
post Nov 27 2007, 18:54
Post #572





Group: Super Moderator
Posts: 4887
Joined: 12-August 04
From: Exeter, UK
Member No.: 16217



QUOTE (Mitch 1 2 @ Nov 27 2007, 16:55) *
You ain't seen nothin' yet! You should try using lossyWAV -3 with FLAC -8 -b 512.
Using -8 does little for my corpus by the looks of it. I've only tested the first 25 files so far, but it only take the average bitrate from 933 to 930 for those files.

In fact, using lossyFLAC and encoding using -5 yields, on average, a file 43.90% the size of the standard FLAC, but with -8 it is merely 43.93% the size. wink.gif

Edit: Sorry, in my haste to test I have forgotten that I'm still using lossyWAV files processed using -2. Perhaps with -3 there is a more drastic improvement.

This post has been edited by Synthetic Soul: Nov 27 2007, 18:56


--------------------
I'm on a horse.
Go to the top of the page
+Quote Post
Nick.C
post Nov 27 2007, 20:44
Post #573


lossyWAV Developer


Group: Developer
Posts: 1815
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



QUOTE (Josef Pohm @ Nov 27 2007, 17:47) *
QUOTE (Mitch 1 2 @ Nov 27 2007, 15:28) *
I've started a new wiki article here. The article is incomplete and probably inaccurate. It is also in need of a "technical details" section, possibly along the lines of what you posted above.
As your documentation reports which codecs support LossyWAV and which don't, the following is my experience about the missing ones.

MP4ALS and LPAC support LossyWAV very very well.

SHN should, but I didn't bother to actually check.

On the other side, unless I made some kind of mistake, in my tests APE, LA and ALAC didn't even show to be able to support wasted bits detection at all! OFR supports wasted bits but I can't see a way for it to use a 512 samples frame size (nor my OPINION is that OFR was designed to work with such a small frame size).
As long as the target codec can work on a multiple of the lossyWAV codec_block_size, or use -cbs xxx to set the lossyWAV codec_block_size to the same as the target codec, or I get off my behind and implement a -ofr parameter to specify codec specific settings (as for WMALSL).

We may be early beta, but if anyone has any ideas as to improvements / additions / changes they might like to see then let me know you can pm me or e-mail me from here if you don't want to post publicly.

I am gratified to see that the code is quite robust as the error reports have dwindled.... <avalanche!>

Mitch 1 2 is doing a great job with the wiki article, I should get round to my bit of it.


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
Axon
post Nov 27 2007, 21:33
Post #574





Group: Members (Donating)
Posts: 1985
Joined: 4-January 04
From: Austin, TX
Member No.: 10933



Thanks for the excellent responses.

I think I may not have stated my concerns accurately or completely in my first post. I was certainly wrong to assume that artifacts have been found in recent -1 and -2 tests. But my beef isn't quite with the existence of artifacts, or that the bit reduction process is necessarily obscure (although halb's and Nick's posts did a lot to explain them). It's that the entire design process of the algorithm seems obscure, and clarifying it (and potentially formalising it) would go a long way to help explain to users exactly what this is good for.

2BDecided's original post seemed to imply that the transparency of bit reduction can be solely proved based on one psychoacoustic principle: spectral masking below a noise floor. This appears to be one of the more fundamental results of psychoacoustics, and fairly hard limits on audibility can be determined a priori to listening tests, based on the literature.

This is the biggest advantage lossyWAV may have compared to other lossy formats. Most lossy encoders exploit multiple psychoacoustic effects to reduce bitrate while maintaining transparency. If one effect is exploited too aggressively, out of several effects being exploited in parallel, transparency is lost and an artifact is audible. But lossyWAV, if it only relies upon spectral masking, has only one point of failure, and one that is very well understood. The quantization distortion should not induce artifacts under any other psychoacoustic effect. That's in incredibly strong selling point to convince people to use lossyWAV for many, many applications.

But the sheer number of tunings that have occured in the final product (regardless of whether or not they are eventually made available to the user) made me question how ironclad this advantage really is. It seems to me that the algorithm should be proven transparent a priori of any listening tests, based entirely on signal processing principles, and only very little psychoacoustic principles (based only on masking the quantization noise with the background noise). But instead, the settings seem like they are based primarily on listening tests. Those are a correct testing method for lossy codecs, but for an encoder this agonizingly close to being able to be formally verified? The tunings have the slight air of a sausage factory behind them. The end result is tasty, but the means to the end are rather unsavory.

Perhaps lossyWAV has simply evolved to use slightly more psychoacoustic phenomena than a simple theory of spectral masking. That appears to be the justification for -skew and the spreading functions. Certainly, a tight argument can be made for taking into account the width of the critical bands to adjust the sensitivity of low/high frequencies. But it still seems like the other options are pretty much pulled out of a hat.

What would be ideal is if each step of the algorithm is shown to follow logically from critical band masking theory, or from a small finite set of psychoacoustic effects, and to show that the algorithm is immune to artifacts from other effects.

Perhaps I'm talking out of line by asserting that an algorithm like this can be formally verified?
Go to the top of the page
+Quote Post
Nick.C
post Nov 27 2007, 21:53
Post #575


lossyWAV Developer


Group: Developer
Posts: 1815
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



QUOTE (Axon @ Nov 27 2007, 20:33) *
Thanks for the excellent responses.

I think I may not have stated my concerns accurately or completely in my first post. I was certainly wrong to assume that artifacts have been found in recent -1 and -2 tests. But my beef isn't quite with the existence of artifacts, or that the bit reduction process is necessarily obscure (although halb's and Nick's posts did a lot to explain them). It's that the entire design process of the algorithm seems obscure, and clarifying it (and potentially formalising it) would go a long way to help explain to users exactly what this is good for.

2BDecided's original post seemed to imply that the transparency of bit reduction can be solely proved based on one psychoacoustic principle: spectral masking below a noise floor. This appears to be one of the more fundamental results of psychoacoustics, and fairly hard limits on audibility can be determined a priori to listening tests, based on the literature.

This is the biggest advantage lossyWAV may have compared to other lossy formats. Most lossy encoders exploit multiple psychoacoustic effects to reduce bitrate while maintaining transparency. If one effect is exploited too aggressively, out of several effects being exploited in parallel, transparency is lost and an artifact is audible. But lossyWAV, if it only relies upon spectral masking, has only one point of failure, and one that is very well understood. The quantization distortion should not induce artifacts under any other psychoacoustic effect. That's in incredibly strong selling point to convince people to use lossyWAV for many, many applications.

But the sheer number of tunings that have occured in the final product (regardless of whether or not they are eventually made available to the user) made me question how ironclad this advantage really is. It seems to me that the algorithm should be proven transparent a priori of any listening tests, based entirely on signal processing principles, and only very little psychoacoustic principles (based only on masking the quantization noise with the background noise). But instead, the settings seem like they are based primarily on listening tests. Those are a correct testing method for lossy codecs, but for an encoder this agonizingly close to being able to be formally verified? The tunings have the slight air of a sausage factory behind them. The end result is tasty, but the means to the end are rather unsavory.

Perhaps lossyWAV has simply evolved to use slightly more psychoacoustic phenomena than a simple theory of spectral masking. That appears to be the justification for -skew and the spreading functions. Certainly, a tight argument can be made for taking into account the width of the critical bands to adjust the sensitivity of low/high frequencies. But it still seems like the other options are pretty much pulled out of a hat.

What would be ideal is if each step of the algorithm is shown to follow logically from critical band masking theory, or from a small finite set of psychoacoustic effects, and to show that the algorithm is immune to artifacts from other effects.

Perhaps I'm talking out of line by asserting that an algorithm like this can be formally verified?
Certainly not talking out of line, but beyond my limited knowledge, as I said - I'm just the programmer. The -skew and -spread (and -snr I suppose) functions and settings have certainly been arrived at heuristically. I've worked up beta v0.5.2 (attached) Superseded... to allow the original concept settings to be implemented using a -0 parameter (as closely as possible due to slight changes in the conv / spread combined function). Use -0 -clipping to emulate the original method settings, -0 -fft 10101 -clipping to emulate the three analysis version. -nts is the only other parameter available to you under the original method.

As to number of tunings, -fft, -nts, -snr, -skew and -spread are the only tunings used in the 3 default quality settings, others such as -clipping, -dither, -overlap, -window, -allowable are all defaulted to off.

I must stress that looking at the file sizes of the output of vanilla -0, I am fairly certain that artifacts will show in Atem_lied at the very least.

***** -0 is not a permanent quality setting, merely a response to a request. *****

This post has been edited by Nick.C: Nov 28 2007, 17:35


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post

50 Pages V  « < 21 22 23 24 25 > » 
Closed TopicStart new topic
2 User(s) are reading this topic (2 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 25th December 2014 - 19:14