FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda), Formerly "lossless codecs and CUDA"
Jul 13 2008, 05:34
Joined: 20-January 03
From: A Tropical Isle
Member No.: 4640
With my recent purchase of a 9000 series nVidia graphics card, I started thinking, has anyone investigated if nVidia's CUDA could be useful for lossless compression? I'm not even remotely close to being a programmer, so I haven't a clue how the code works, but it seems like CUDA is valuable for coding/decoding. I know nVidia is already holding a contest to speed up LAME (which ends in about 2 weeks), so perhaps it could be used to speed up lossless compressors? The fastest modes of several codecs are already blazing fast, approaching the limits of hard drives, but perhaps the high-compression modes could be sped-up through CUDA. Maybe, if the speed-up is enough, developers could even implement more ways to gain compression while still maintaining good encoding rates. It would be pretty cool if compression levels like La's best could be done at 50x or something.
Anyway, my curiosity is large, so just thought I'd ask. :)
Sep 18 2009, 19:05
Joined: 2-October 08
Member No.: 59035
Is there anybody here who knows the math behind Cholesky decomposition used in ffmpeg as an alternative method of LPC coefficients search?
This method is too slow for CPU, but i thought i'd give it a shot on GPU.
The problem is, GPU doesn't do double precision very well.
The lls code from ffmpeg doesn't work on single precision floats due to overflows.
My first idea was to scale down the signal to avoid overflows, but results were poor.
There's something i don't understand about this algorithm: in theory, LPC coeffs shouldn't depend on the scale of the signal - after all, they are linear
I have a suspicion that in practice this algorithm does depend on the scale of the signal a lot. I don't pretend to understand this math, but:
First suspicious piece of code is this (from av_solve_lls):
double sum= covar[i][j];
for(k=i-1; k>=0; k--)
sum -= factor[i][k]*factor[j][k];
When the signal is multiplied by 10, covar[i][j] is multiplied by 100, and both factor[i][k] and factor[j][k] are multiplied by 100, so factor[i][k]*factor[j][k] is multiplied by 10000. So this sum doesn't scale in any predictable fashion.
I also don't understand this magic 'threshold' business.
if(sum < threshold)
How should the threshold scale with the signal? Should the sum always be set to 1.0 if it's below threshold, or to some value depending on the scale of the signal? Or am i on the wrong track completely?
I also found this old post from Josh:
I have actually been doing experiments solving the full prediction linear system with SVD; this should give a lower bound on the compression achievable by the FLAC filter.
Is there any working code left from those experiments, and how successful were they?
This post has been edited by Gregory S. Chudov: Sep 18 2009, 19:23
|Lo-Fi Version||Time is now: 25th July 2014 - 12:52|