IPB

Welcome Guest ( Log In | Register )

2 Pages V   1 2 >  
Reply to this topicStart new topic
Horrible performance of lossless codecs
Garf
post Jun 17 2003, 18:14
Post #1


Server Admin


Group: Admin
Posts: 4885
Joined: 24-September 01
Member No.: 13



http://sjeng.org/ftp/vorbis/Garf_Bl33p!.flac

Any idea why the performance of most lossless codecs is so horrible on this very simple signal?
I would expect prediction to be almost perfect for it.

But:

FLAC bitrate:580kbps
APE bitrate:750kbps
WavPack: also >500kbps

ZIP bitrate: 21kbps
7z/RAR bitrate: 2.5kbps
Go to the top of the page
+Quote Post
k.m.krebs
post Jun 17 2003, 18:39
Post #2





Group: Members
Posts: 61
Joined: 21-August 02
From: vancouver, b.c.
Member No.: 3145



I just tried it with optimfrog using the 'normal' compression setting, and the output size was 51.9 KB.


--------------------
k.m.krebs \ 833-45: http://833-45.net
Go to the top of the page
+Quote Post
rompel
post Jun 17 2003, 21:12
Post #3





Group: Members (Donating)
Posts: 37
Joined: 3-November 02
From: California
Member No.: 3678



QUOTE (Garf @ Jun 17 2003 - 09:14 AM)
http://sjeng.org/ftp/vorbis/Garf_Bl33p!.flac

Any idea why the performance of most lossless codecs is so horrible on this very simple signal?
I would expect prediction to be almost perfect for it.

But:

FLAC bitrate:580kbps
APE bitrate:750kbps
WavPack: also >500kbps

ZIP bitrate: 21kbps
7z/RAR bitrate: 2.5kbps

bzip2 131bps

I think the problem is with the use of Rice-coding. My understanding is that it was designed for normally distributed residuals, and is decidedly sub-optimal where you have mostly zeros with an occasional + or - 65535.

Good test sample!

--John
Go to the top of the page
+Quote Post
bryant
post Jun 17 2003, 22:59
Post #4


WavPack Developer


Group: Developer (Donating)
Posts: 1291
Joined: 3-January 02
From: San Francisco CA
Member No.: 900



Those last few percent of compression that the best lossless compressors get is at the cost of an enormous amount of complexity in the prediction algorithms. They look at dozens of previous samples and have dozens of adjusting coefficients. So, a single transition will generate a whole train of non-zero residual values to encode. Another version that simply used the previous sample as the prediction would encode this much better, but would virtually never work better on any sample of real music (in fact, WavPack's "fast" mode compresses this sample to about half the size of the "high" mode for this reason).

Also, no lossless compressor is going to take advantage of exactly repeating sequences of numbers like a general data compressor, because these never occur in audio data and require completely different coding algorithms (i.e. dictionary based, no Rice coding).

An "ideal" compressor could be made to try several different simple and complex algorithms to detect cases like this, but most people would not be willing to put up with the encoding time penalty unless it improved performance on any "real" samples.

BTW, your sample scared my cat! smile.gif
Go to the top of the page
+Quote Post
jcoalson
post Jun 18 2003, 09:01
Post #5


FLAC Developer


Group: Developer
Posts: 1526
Joined: 27-February 02
Member No.: 1408



QUOTE (Garf @ Jun 17 2003 - 12:14 PM)
http://sjeng.org/ftp/vorbis/Garf_Bl33p!.flac

Any idea why the performance of most lossless codecs is so horrible on this very simple signal?
I would expect prediction to be almost perfect for it.

But:

FLAC bitrate:580kbps
APE bitrate:750kbps
WavPack: also >500kbps

ZIP bitrate: 21kbps
7z/RAR bitrate: 2.5kbps

I think Bryant's right. Note that for regular signals like this, tuning the blocksize in FLAC gets you a lot, e.g. "flac -8 --lax --blocksize=384" is about 169kbps. But BWT compressors like bzip2 will really kick on this signal, they'll probably use just a few bits to encode a single cycle, and a little for the dictionary.

Josh
Go to the top of the page
+Quote Post
NumLOCK
post Jun 18 2003, 09:53
Post #6


Neutrino G-RSA developer


Group: Developer
Posts: 852
Joined: 8-May 02
From: Geneva
Member No.: 2002



Garf, I haven't looked at the WAV yet, but that FLAC file is very very strange ! It looks like a killer sample for RAR !

Original (FLAC) = 4'394'931 bytes
RAR (Best) => 80'162 bytes
RAR (Good) => 81'523 bytes
RAR (Normal) => 27'680 bytes blink.gif
RAR (Fast) => 55'240 bytes
RAR (Fastest) => 105'302 bytes

Just looking at the FLAC output with a hex editor shows that it's extremely redundant.


--------------------
Try Leeloo Chat at http://leeloo.webhop.net
Go to the top of the page
+Quote Post
NumLOCK
post Jun 18 2003, 10:05
Post #7


Neutrino G-RSA developer


Group: Developer
Posts: 852
Joined: 8-May 02
From: Geneva
Member No.: 2002



I win ! B)

My packer projects brings the .WAV down to 76 bytes.

Compression ratio: 1:139264 blink.gif


--------------------
Try Leeloo Chat at http://leeloo.webhop.net
Go to the top of the page
+Quote Post
Atlantis
post Jun 18 2003, 10:07
Post #8





Group: Members
Posts: 250
Joined: 27-December 02
From: ROMA, Italy
Member No.: 4269



QUOTE (bryant @ Jun 17 2003 - 10:59 PM)
Those last few percent of compression that the best lossless compressors get is at the cost of an enormous amount of complexity in the prediction algorithms. They look at dozens of previous samples and have dozens of adjusting coefficients. So, a single transition will generate a whole train of non-zero residual values to encode. Another version that simply used the previous sample as the prediction would encode this much better, but would virtually never work better on any sample of real music (in fact, WavPack's "fast" mode compresses this sample to about half the size of the "high" mode for this reason).

Also, no lossless compressor is going to take advantage of exactly repeating sequences of numbers like a general data compressor, because these never occur in audio data and require completely different coding algorithms (i.e. dictionary based, no Rice coding).

An "ideal" compressor could be made to try several different simple and complex algorithms to detect cases like this, but most people would not be willing to put up with the encoding time penalty unless it improved performance on any "real" samples.

BTW, your sample scared my cat! smile.gif

QUOTE (NumLOCK @ Jun 18 2003 - 09:53 AM)
Just looking at the FLAC output with a hex editor shows that it's extremely redundant.


This problem can't be avoided adding a function, in lossless codecs, that checks if the output file is redundant as numlock noted and changes the compression mode used?

Just my 0,0000002 euros
smile.gif

edit = well the problem of time penalty will still be there...

This post has been edited by Atlantis: Jun 18 2003, 10:12


--------------------
Vital papers will demonstrate their vitality by spontaneously moving from where you left them to where you can't find them.
Go to the top of the page
+Quote Post
Peter
post Jun 18 2003, 10:08
Post #9


foobar2000 developer


Group: Admin
Posts: 3285
Joined: 30-September 01
Member No.: 84



QUOTE (NumLOCK @ Jun 18 2003 - 09:53 AM)
Garf, I haven't looked at the WAV yet, but that FLAC file is very very strange ! It looks like a killer sample for RAR !

Original (FLAC) = 4'394'931 bytes
RAR (Best) => 80'162 bytes
RAR (Good) => 81'523 bytes
RAR (Normal) => 27'680 bytes  blink.gif
RAR (Fast) => 55'240 bytes
RAR (Fastest) => 105'302 bytes

Just looking at the FLAC output with a hex editor shows that it's extremely redundant.

If we're into RAR killer samples, I remember seeing some files (it was some Unreal Tournament mod IIRC) where RAR miserably lost it to ZIP, even to its own ZIP compressor. I can dig them up if anyone interested.
Go to the top of the page
+Quote Post
NumLOCK
post Jun 18 2003, 10:11
Post #10


Neutrino G-RSA developer


Group: Developer
Posts: 852
Joined: 8-May 02
From: Geneva
Member No.: 2002



QUOTE (zZzZzZz @ Jun 18 2003 - 10:08 AM)
If we're into RAR killer samples, I remember seeing some files (it was some Unreal Tournament mod IIRC) where RAR miserably lost it to ZIP, even to its own ZIP compressor. I can dig them up if anyone interested.

Your file might make RAR heuristics fails, and thus pick the wrong algorithm biggrin.gif

If you still have it I'd be interested, yes.


--------------------
Try Leeloo Chat at http://leeloo.webhop.net
Go to the top of the page
+Quote Post
eltoder
post Jun 18 2003, 12:01
Post #11





Group: Members
Posts: 158
Joined: 16-May 03
From: nsk.su
Member No.: 6653



Sorry, Garf, but why you have uploaded FLAC instead of ZIP than? rolleyes.gif

-Eugene


--------------------
The greatest programming project of all took six days; on the seventh day the programmer rested. We've been trying to debug the !@#$%&* thing ever since. Moral: design before you implement.
Go to the top of the page
+Quote Post
Tripwire
post Jun 18 2003, 12:02
Post #12





Group: Members
Posts: 156
Joined: 28-December 02
Member No.: 4272



QUOTE (Atlantis @ Jun 18 2003 - 01:07 AM)
QUOTE (NumLOCK @ Jun 18 2003 - 09:53 AM)
Just looking at the FLAC output with a hex editor shows that it's extremely redundant.


This problem can't be avoided adding a function, in lossless codecs, that checks if the output file is redundant as numlock noted and changes the compression mode used?

Had the same idea. Some ultrasuperduperhigh compression mode would run two compressors at once, the regular audio compressor and e.g. a ZIP compressor, and then store the blocks that are compressed the most.
Go to the top of the page
+Quote Post
thop
post Jun 18 2003, 12:31
Post #13





Group: Members
Posts: 18
Joined: 6-June 03
Member No.: 7034



QUOTE (eltoder @ Jun 18 2003 - 03:01 AM)
Sorry, Garf, but why you have uploaded FLAC instead of ZIP than?  rolleyes.gif

laugh.gif
Go to the top of the page
+Quote Post
anza
post Jun 18 2003, 12:59
Post #14





Group: Members
Posts: 1317
Joined: 4-January 03
From: Finland
Member No.: 4418



At least that sample made me understand the true performance of my dial-up modem (56k):


biggrin.gif
Go to the top of the page
+Quote Post
Garf
post Jun 18 2003, 13:02
Post #15


Server Admin


Group: Admin
Posts: 4885
Joined: 24-September 01
Member No.: 13



Your modem uses compression, you could say it zips-on-the-fly.

I didn't realize it zipped so well until after I uploaded it.
Go to the top of the page
+Quote Post
NumLOCK
post Jun 18 2003, 13:19
Post #16


Neutrino G-RSA developer


Group: Developer
Posts: 852
Joined: 8-May 02
From: Geneva
Member No.: 2002



Zero-knowledge compressors have a big advantage on this "audio" sample: they don't believe it's audio.

After seeing a short header, followed by a few alternances of:
FF7F (ie: 32767 in hex & reversed byte-order) repeated 102 times
and:
0080 (ie: -32768 in hex & reversed byte-order) repeated 100 times

... the packer will suppose this would continue for a long time.

For example, in a well-tuned arithmetic coder you can encode a whole alternance (2*102 + 2*100 bytes) in a matter of bits.

Using arithmetic coding, if you remove the WAV header one could encode this whole file in less than 10 bytes, and still be able to handle any input file (or unexpected data at the end of this file) correctly.

Even if you know the repeating sequence has a very high probability to appear, you still must reserve those 0.000001% probability for all other cases. That's also why the file would take 10 bytes, and not zero bit biggrin.gif

Huffman encoding, on the other hand can only assig whole bits, ie: it can assign "0" to the most common sequence, and "1xxxxxxxxxxxxxxx..." to all possible others.

This post has been edited by NumLOCK: Jun 18 2003, 13:25


--------------------
Try Leeloo Chat at http://leeloo.webhop.net
Go to the top of the page
+Quote Post
eltoder
post Jun 18 2003, 13:29
Post #17





Group: Members
Posts: 158
Joined: 16-May 03
From: nsk.su
Member No.: 6653



Yes, and bzip is the winner here. It compressed the file to 983 bytes. WinRar (with PPM forced) compressed to 4755 bytes.

Such results are pretty obvious and very similiar to those of Short_Block_Test_2, which is very sparse as well: http://eltoder.nm.ru/temp/Short_Block_Test_2.res

-Eugene


--------------------
The greatest programming project of all took six days; on the seventh day the programmer rested. We've been trying to debug the !@#$%&* thing ever since. Moral: design before you implement.
Go to the top of the page
+Quote Post
NumLOCK
post Jun 18 2003, 13:37
Post #18


Neutrino G-RSA developer


Group: Developer
Posts: 852
Joined: 8-May 02
From: Geneva
Member No.: 2002



RAR 3.x can shrink the wav further to ~4756 bytes, when forcing markov ("text") compression at maximum order (99).

Edit:
LOL eltoder, what did you do to RAR, to gain that extra byte ? laugh.gif
About bzip2, cool but that's still far away from my 76 bytes.

This post has been edited by NumLOCK: Jun 18 2003, 13:53


--------------------
Try Leeloo Chat at http://leeloo.webhop.net
Go to the top of the page
+Quote Post
eltoder
post Jun 18 2003, 14:00
Post #19





Group: Members
Posts: 158
Joined: 16-May 03
From: nsk.su
Member No.: 6653



QUOTE (NumLOCK @ Jun 18 2003 - 04:37 AM)
RAR 3.x can shrink the wav further to ~4756 bytes, when forcing markov ("text") compression at maximum order (99).

Edit:
LOL eltoder, what did you do to RAR, to gain that extra byte ?  laugh.gif
About bzip2, cool but that's still far away from my 76 bytes.

The maximal order is in fact 63, not 99. And the best results are obtained at order 60 tongue.gif
And could you share you great program with us? wink.gif

-Eugene


--------------------
The greatest programming project of all took six days; on the seventh day the programmer rested. We've been trying to debug the !@#$%&* thing ever since. Moral: design before you implement.
Go to the top of the page
+Quote Post
NumLOCK
post Jun 18 2003, 14:09
Post #20


Neutrino G-RSA developer


Group: Developer
Posts: 852
Joined: 8-May 02
From: Geneva
Member No.: 2002



QUOTE (eltoder @ Jun 18 2003 - 02:00 PM)
The maximal order is in fact 63, not 99. And the best results are obtained at order 60  tongue.gif
And could you share you great program with us? wink.gif

-Eugene

Oh. I should have tried all possible values then laugh.gif

Well, why not.. (thanks for the compliment btw) but I'm a bit ashamed, it's awfully slow, and optimized for data, not audio (except for Garf audio of course). Also the name of input and output files is hardcoded in the source.. rolleyes.gif

Edit: Btw I absolutely love your signature !

This post has been edited by NumLOCK: Jun 18 2003, 14:10


--------------------
Try Leeloo Chat at http://leeloo.webhop.net
Go to the top of the page
+Quote Post
eltoder
post Jun 18 2003, 14:19
Post #21





Group: Members
Posts: 158
Joined: 16-May 03
From: nsk.su
Member No.: 6653



QUOTE (NumLOCK @ Jun 18 2003 - 05:09 AM)
it's awfully slow, and optimized for data, not audio (except for Garf audio of course). Also the name of input and output files is hardcoded in the source..  rolleyes.gif

I absolutely need this program! laugh.gif

And I like my sig too, but now I think it's a bit long wink.gif

-Eugene


--------------------
The greatest programming project of all took six days; on the seventh day the programmer rested. We've been trying to debug the !@#$%&* thing ever since. Moral: design before you implement.
Go to the top of the page
+Quote Post
NumLOCK
post Jun 18 2003, 14:32
Post #22


Neutrino G-RSA developer


Group: Developer
Posts: 852
Joined: 8-May 02
From: Geneva
Member No.: 2002



Ok, I'll send it to you tonite wink.gif

Be advised though, to compress such files well I had to hack the probability modelling curve.. so you (or we) would have to find the parameters again.

Also, it's a bit-based program (it compresses bits, not bytes). However, it uses heuristics and tries to still exploit byte, word, dword alignments, when it is possible.

The probability estimation part is a mess, it uses variable-length PPM, hashtables, dynamic decay curves, etc etc.. It looks more like a nuclear physics simulator than a packer. I think a complete rewrite should be done ASAP laugh.gif

IIRC, I have an improved arithmetic coding backend, and crazy ideas lying around, that I could use in the new project (when time allows).

Edit: I'll have a bit more time when my laptop's back from Shinjuku-Ku with a new hdd.

This post has been edited by NumLOCK: Jun 18 2003, 14:40


--------------------
Try Leeloo Chat at http://leeloo.webhop.net
Go to the top of the page
+Quote Post
JamesBond
post Jun 18 2003, 15:22
Post #23





Group: Members
Posts: 25
Joined: 3-August 02
Member No.: 2921



SBC Archiver 0950 beta:

sbc.exe c -m3 -b63 newarchive Garf_Bl33p!.wav

compresses to 217 Bytes... smile.gif

This post has been edited by JamesBond: Jun 18 2003, 15:23
Go to the top of the page
+Quote Post
NumLOCK
post Jun 18 2003, 15:36
Post #24


Neutrino G-RSA developer


Group: Developer
Posts: 852
Joined: 8-May 02
From: Geneva
Member No.: 2002



Very impressive. It blasts away RKIVE, 777... even though the latter uses arithmetic coding.

Edit: SBC is one of the 1st archivers to associate block-sorting and arithmetic coding wink.gif

This post has been edited by NumLOCK: Jun 18 2003, 15:50


--------------------
Try Leeloo Chat at http://leeloo.webhop.net
Go to the top of the page
+Quote Post
eltoder
post Jun 18 2003, 15:50
Post #25





Group: Members
Posts: 158
Joined: 16-May 03
From: nsk.su
Member No.: 6653



All of them do. Imagine PPM or BWT without arithmetic coding? wink.gif

-Eugene


--------------------
The greatest programming project of all took six days; on the seventh day the programmer rested. We've been trying to debug the !@#$%&* thing ever since. Moral: design before you implement.
Go to the top of the page
+Quote Post

2 Pages V   1 2 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 20th September 2014 - 17:10