IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Compressing audio by analyzing it as an image, was: "New compression idea?"
justsomeguy
post Aug 8 2010, 05:49
Post #1





Group: Members
Posts: 12
Joined: 30-December 02
Member No.: 4322



I am by no means an expert on audio compression but I just was thinking about compressing audio by analyzing it as an image.

Just as a test I created a 1-bit image that was 441 x 65536 in size. 441 to represent 1/100th of a second at 44.1khz and 65536 for the amplitude range. I drew a 1 pixel line straight across so that each individual pixel represented a sample point for the audio (could have drawn a sin wave but a straight line was easier). So I had 1 pixel in each 65536 columns that was white instead of black. Anyhow, I saved this as a png file (lossless) with a size of 3,803 bytes, I then compressed this to .7z (lossless) and got a size of 337 bytes. So if my thinking is right 441x65536 should cover all possible bits in 1/100th of a second of audio with a size of 337 bytes, which would be 337b x 100 x 60 = 2,022,000 bytes for 1 minute of audio and unless I'm mistaken would be lossless as well.

Ok I'm actually probably missing something here that makes this not possible so go easy on me if I'm totally off base here.
What do you think?
Go to the top of the page
+Quote Post
washu
post Aug 8 2010, 07:08
Post #2





Group: Members
Posts: 135
Joined: 16-February 03
From: Ottawa
Member No.: 5032



Nope, won't work. No real signal is as simple as what you made. Make anything more complicated and it won't compress to 337 bytes.
Go to the top of the page
+Quote Post
Juha
post Aug 8 2010, 08:04
Post #3





Group: Members
Posts: 443
Joined: 14-February 07
From: EU-FIN
Member No.: 40610



After reading the thread subject, I was waiting for word "Fractal" being mentioned there in text ...

Since it's 'bout an idea ... here's one losless method (least for Windows O/S):

- compress the hard drive using build-in system feature rolleyes.gif

Juha

This post has been edited by Juha: Aug 8 2010, 08:17
Go to the top of the page
+Quote Post
[JAZ]
post Aug 8 2010, 11:09
Post #4





Group: Members
Posts: 1764
Joined: 24-June 02
From: Catalunya(Spain)
Member No.: 2383



Interesting.

I did a similar test, saving to a .bmp format, monocrome with a straight line. The size, as to be expected, was around 3.6MB. Compressed with .7z gave 739bytes. (so we can get that png's compression was unnecessary)

With a little more complex drawing (a curve), the final .7z sized 1.3KB. I believe that for a real signal (and stereo!), the would be still some compression compared to a .wav file, but i doubt that it could be better than standard audio lossless encoders. Also, the encoders/decoders need to manage a notably bigger amount of data (1 sec of CD audio in this image format = 300MB!)

7zip does a dictionary encoding on the audio, which is only available because with lots of white space it is quite predictable. The more the signal varies, the less predictable will be the bytes of those single dots.
Go to the top of the page
+Quote Post
Juha
post Aug 8 2010, 11:52
Post #5





Group: Members
Posts: 443
Joined: 14-February 07
From: EU-FIN
Member No.: 40610



QUOTE
After reading the thread subject, I was waiting for word "Fractal" being mentioned there in text ...


... looks like "fractal audio coding" has been recearched already - http://research.cs.queensu.ca/home/xiao/doc/Thesis.pdf

Juha
Go to the top of the page
+Quote Post
justsomeguy
post Aug 8 2010, 23:00
Post #6





Group: Members
Posts: 12
Joined: 30-December 02
Member No.: 4322



well I was bored and tired last night when I was thinking about it. Obviously a straight line would compress far easier than an actual signal. So as another test I generated a text file 441 x 65536 all zeroes then set a 1 in a random position for each 441 columns. Probably worst case signal I could think of. Well the smallest I could get the final file after playing with different compressions was 4090 bytes which is 24,540,000 bytes per minute then double that for stereo without any kind of joint stereo concept.

So ya, not very efficient.
Go to the top of the page
+Quote Post
Zarggg
post Aug 9 2010, 04:11
Post #7





Group: Members
Posts: 547
Joined: 18-January 04
From: bethlehem.pa.us
Member No.: 11318



audio != text != image

Without even getting into the details, this much should already be apparent. They do not compress the same way.
Go to the top of the page
+Quote Post
justsomeguy
post Aug 9 2010, 07:23
Post #8





Group: Members
Posts: 12
Joined: 30-December 02
Member No.: 4322



yes I'm aware audio != text != image. I was just thinking of a different way of interpreting the audio. Such as grooves on a record or magnetic material in cassettes. You see a representation of audio as an image all the time like looking at a wave in audacity. Technically you should be able to create a program that could look at the actual wave form and produce the audio from that, like digital grooves. That's what I was thinking. Anyways doesn't matter, it would never be practical anyways.
Go to the top of the page
+Quote Post
Iain
post Aug 9 2010, 09:51
Post #9





Group: Members
Posts: 126
Joined: 16-August 03
Member No.: 8386



Computers do not know what the data is that they are compressing, it is just a bunch of numbers. So a mono audio signal is the same as a long 1 pixel high image as far as a comptuer is concerned. The best compression approach is determined by the information contained in the data, and that is something a human is best at determining.

A method that I just thought of that might be helpful for sample based music, is to compare the current bar to the previous and encode the differences. So if the same drum samples are being used at a constant tempo there is some redundacy to be exploited.

This post has been edited by Iain: Aug 9 2010, 09:55
Go to the top of the page
+Quote Post
Soap
post Aug 9 2010, 11:17
Post #10





Group: Members
Posts: 1013
Joined: 19-November 06
Member No.: 37767



QUOTE (Iain @ Aug 9 2010, 04:51) *
A method that I just thought of that might be helpful for sample based music, is to compare the current bar to the previous and encode the differences. So if the same drum samples are being used at a constant tempo there is some redundacy to be exploited.

Perhaps you should read about how audio encoders work. You'd possibly enjoy it. The method you describe is simplistic compared to the tools they carry in their bag.


--------------------
Creature of habit.
Go to the top of the page
+Quote Post
Iain
post Aug 9 2010, 12:20
Post #11





Group: Members
Posts: 126
Joined: 16-August 03
Member No.: 8386



QUOTE (Soap @ Aug 9 2010, 03:17) *
Perhaps you should read about how audio encoders work. You'd possibly enjoy it. The method you describe is simplistic compared to the tools they carry in their bag.


I realise that audio encoders are very clever and use all kind of methods. I was not aware that my suggestion was one of them, and in the spirit of this thread (novel compression ideas) I thought I would throw it out there regardless of its merit.
Go to the top of the page
+Quote Post
dhromed
post Aug 9 2010, 12:38
Post #12





Group: Members
Posts: 1287
Joined: 16-February 08
From: NL
Member No.: 51347



QUOTE (justsomeguy @ Aug 9 2010, 08:23) *
yes I'm aware audio != text != image. I was just thinking of a different way of interpreting the audio. Such as grooves on a record or magnetic material in cassettes. You see a representation of audio as an image all the time like looking at a wave in audacity.


PCM Audio is exactly equivalent to a %samples%1 greyscale image with a certain pixel bit depth. One could go for a 1-bit image that's 2^n high and %samples% wide like you did, but staying with a 1-D sample axis and a bit depth per sample makes other audio concepts (like filtering, noise, dither, antialias etc) transferrable to the image domain as well.

You won't be able to paint any reasonable audio in Photoshop, though, because it has a size limit of 300,000 pixels which at 44.1KHz would amount a few seconds of audio. smile.gif

The waveform, as you may realise, is a plot graph of relative sound pressure generated by the audio signal. Like that, all other visual displays of sound, however accurate of intuitive, are interpretations of the 1s and 0s. A spectrogram is closest to how we perceive sound, but it's actually grossly inaccurate and utterly unsuitable as a "visual" method of storage*.



*) this doesn't mean that there aren't any programs that can generate audio from an image by interpreting it as a spectrogram. There was an interesting little experimental program I played with some years ago, but I can't remember the name. Something with a C.
Go to the top of the page
+Quote Post
db1989
post Aug 9 2010, 12:41
Post #13





Group: Super Moderator
Posts: 5275
Joined: 23-June 06
Member No.: 32180



QUOTE (Soap @ Aug 9 2010, 11:17) *
QUOTE (Iain @ Aug 9 2010, 09:51) *
Computers do not know what the data is that they are compressing, it is just a bunch of numbers. So a mono audio signal is the same as a long 1 pixel high image as far as a comptuer is concerned. The best compression approach is determined by the information contained in the data, and that is something a human is best at determining.

A method that I just thought of that might be helpful for sample based music, is to compare the current bar to the previous and encode the differences. So if the same drum samples are being used at a constant tempo there is some redundacy to be exploited.
Perhaps you should read about how audio encoders work. You'd possibly enjoy it. The method you describe is simplistic compared to the tools they carry in their bag.
Simplistic 'on paper', maybe, but surely not in terms of computation?

Any samples will be mixed with other instruments, perhaps interpolated due to having originated in a different sample rate, etc. So an encoder (lossless or lossy, though I suppose the latter is slightly more feasible) can't just say "four of these".

It's not much more far-fetched than expecting an encoder to be able to compress, almost to nothing, a whole chorus that has the same notes/lyrics as the last; that can't be done, since the instrumental/vocal takes will be different, etc.

Iain actually hinted at this himself with "that is something a human is best at determining." The samples, section, etc. may sound the same to us, but the subtle differences may be enough to make lossless compression quite inefficient, i.e. they won't be the same to a computer. And it certainly wouldn't be lossless to encode such regions as "just put another sample/chorus that sounds (about) the same here" (and I imagine a lossy method for this is unlikely to emerge).

This post has been edited by dv1989: Aug 9 2010, 12:45
Go to the top of the page
+Quote Post
Cubist Castle
post Aug 9 2010, 12:52
Post #14





Group: Members
Posts: 16
Joined: 5-July 08
Member No.: 55323



QUOTE (dhromed @ Aug 9 2010, 12:38) *
*) this doesn't mean that there aren't any programs that can generate audio from an image by interpreting it as a spectrogram. There was an interesting little experimental program I played with some years ago, but I can't remember the name. Something with a C.

Coagula. Pretty fun.
Go to the top of the page
+Quote Post
Juha
post Aug 9 2010, 15:26
Post #15





Group: Members
Posts: 443
Joined: 14-February 07
From: EU-FIN
Member No.: 40610



Photosounder is a one-of-a-kind image-sound editing program. - http://photosounder.com/

Juha
Go to the top of the page
+Quote Post
SCOTU
post Aug 9 2010, 18:07
Post #16





Group: Members
Posts: 118
Joined: 9-July 10
Member No.: 82156



QUOTE (Iain @ Aug 9 2010, 04:51) *
Computers do not know what the data is that they are compressing, it is just a bunch of numbers. So a mono audio signal is the same as a long 1 pixel high image as far as a comptuer is concerned. The best compression approach is determined by the information contained in the data, and that is something a human is best at determining.


The Computer itself may not know what the data is, however that doesn't mean it doesn't care. Music compression, Video compression, Image compressions, and archive compression are all largely different because the data is interpreted as different things. Namely the difference is what similarities can be exploited. If you assume that the signal is just random, you have to work with a set of rules where you can't look for similarity too often. However, if you knew that there were 5.1 Channels of audio that had large similarities between the channels, you can already start to look somewhere else. If you know it's an image, you can look at adjacent pixels for similarity. If you know it's a video, you can look at adjacent frames for motion changes.

It's perfectly valid to attempt to find different ways to find similarities, and different interpretations can offer that.

tl;dr: the computer does care what type of data something is, as it has specialized algorithms for compressing different types of data.
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 30th July 2014 - 06:20