IPB

Welcome Guest ( Log In | Register )

18 Pages V   1 2 3 > »   
Reply to this topicStart new topic
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda), Formerly "lossless codecs and CUDA"
gib
post Jul 13 2008, 05:34
Post #1





Group: Members
Posts: 227
Joined: 20-January 03
From: A Tropical Isle
Member No.: 4640



With my recent purchase of a 9000 series nVidia graphics card, I started thinking, has anyone investigated if nVidia's CUDA could be useful for lossless compression? I'm not even remotely close to being a programmer, so I haven't a clue how the code works, but it seems like CUDA is valuable for coding/decoding. I know nVidia is already holding a contest to speed up LAME (which ends in about 2 weeks), so perhaps it could be used to speed up lossless compressors? The fastest modes of several codecs are already blazing fast, approaching the limits of hard drives, but perhaps the high-compression modes could be sped-up through CUDA. Maybe, if the speed-up is enough, developers could even implement more ways to gain compression while still maintaining good encoding rates. It would be pretty cool if compression levels like La's best could be done at 50x or something.

Anyway, my curiosity is large, so just thought I'd ask. :)
Go to the top of the page
+Quote Post
Martel
post Jul 13 2008, 09:29
Post #2





Group: Members
Posts: 553
Joined: 31-May 04
From: Czech Rep.
Member No.: 14430



I apologize for being completely incorrect. sad.gif

This post has been edited by Martel: Jul 13 2008, 10:53


--------------------
IE4 Rockbox Clip+ AAC@192; HD 668B/HD 518 Xonar DX FB2k FLAC;
Go to the top of the page
+Quote Post
Garf
post Jul 13 2008, 10:00
Post #3


Server Admin


Group: Admin
Posts: 4883
Joined: 24-September 01
Member No.: 13



QUOTE (Martel @ Jul 13 2008, 10:29) *
If I'm not mistaken, lossless coding usually employs dictionary methods (like LZW/LZMA) which generate a lot of random access and branching operations.


Not at all!

Most lossless audio compressors use large predictive LPC filters. This would be an operation that is well fit to a GPU, if it weren't for a small detail: because of the need to be LOSSLESS, the operations are often integer, not floating point. It would be possible to do it in floating point also, but then there is a need to have PRECISELY defined operations, rounding, precision. Exactly what GPU's dont have.

Despite all the hype, there aren't that many things GPUs are actually good at.
Go to the top of the page
+Quote Post
gib
post Jul 14 2008, 03:52
Post #4





Group: Members
Posts: 227
Joined: 20-January 03
From: A Tropical Isle
Member No.: 4640



Ah, I see now. Thanks very much for the response, Garf.
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Sep 10 2009, 03:27
Post #5





Group: Developer
Posts: 690
Joined: 2-October 08
From: Ottawa
Member No.: 59035



Here is good news.

An alfa version of flac encoder for GPU.

I only tested it on GTS 250, so i'm eager to hear from people with other cards.

As all my applications, this requires .NET framework.

And this time of course a CUDA-enabled graphics card.

Source code as usual on SourceForge.

UPD1: A bit more optimized version re-tuned to not so paranoid compression levels.
UPD2: added pipe encoding for use with fb2k (encoder parameters: -5 - -o %d)
UPD3: seeking problem with pipe encoding in fb2k fixed, lower compression levels speed up.
UPD4: general speed improvement
UPD5: wasted_bits/lossyWav support
UPD6: final optimizations
UPD7: rice partitioning on GPU (--gpu-only), multi-core CPU utilization support (--cpu-threads #)
UPD8: default compression level changed to -7, rice partitioning on GPU on by default, memory/IO optimizations
UPD9: bugfix release; UPD91 - fb2k pipe input fix

* Download: Attached File  FlaCuda091.rar ( 97.7K ) Number of downloads: 1613
* Old version: Attached File  FlaCuda06.rar ( 84.9K ) Number of downloads: 954


This post has been edited by Gregory S. Chudov: Jan 10 2010, 17:30


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Dr_Colossus
post Sep 10 2009, 05:58
Post #6





Group: Members
Posts: 71
Joined: 8-July 08
Member No.: 55505



Sounds awesome, care to elaborate on the performance for those of us without a CUDA capable card.
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Sep 10 2009, 06:05
Post #7





Group: Developer
Posts: 690
Joined: 2-October 08
From: Ottawa
Member No.: 59035



Less impressive than i hoped to, but this is only initial version, and GPUs grow faster each day.
On my GTS 250 it's approximately as fast as my C# encoder (which is fast by the way).
FlaCuda -4 achieves the same compression ratio as reference flac -8 (version 1.2.1 on Core 2 Duo@3Gz) at approximately double-triple speed.
FlaCuda -8 is as slow as flac -8, but gives an extra 0.5% of compression ratio.
Would be nice if someone could thoroughly compare them on a different hardware and post his/her results here.

This post has been edited by Gregory S. Chudov: Sep 10 2009, 06:17


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Grunpfnul
post Sep 10 2009, 09:24
Post #8





Group: Members
Posts: 51
Joined: 30-May 09
From: Germany
Member No.: 70242



No love for ati? *sniff*
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Sep 10 2009, 09:49
Post #9





Group: Developer
Posts: 690
Joined: 2-October 08
From: Ottawa
Member No.: 59035



There is love, but there's no implementation ^^
But i guess someone else can do it, now that we have a proof-of-concept


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Case
post Sep 10 2009, 16:30
Post #10





Group: Developer (Donating)
Posts: 2181
Joined: 19-October 01
From: Finland
Member No.: 322



I ran some tests with my Core i7 940 (stock speed) and GeForce GTX 285. Original wav file was 237368588 bytes in size. Not too impressive results:
FLAC -5 : Elapsed Time : 00:00:08.268 (181929373 bytes)
FLAC -8 : Elapsed Time : 00:00:30.560 (181788832 bytes)
FlaCuda -4 : Elapsed Time : 00:00:09.204 (181892106 bytes)
FlaCuda -5 : Elapsed Time : 00:00:10.904 (181763725 bytes)
FlaCuda -8 : Elapsed Time : 00:00:12.370 (181676614 bytes)
FlaCuda -11: Elapsed Time : 00:00:23.883 (181734405 bytes)
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Sep 10 2009, 20:34
Post #11





Group: Developer
Posts: 690
Joined: 2-October 08
From: Ottawa
Member No.: 59035



Thank you!


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Ron Jones
post Sep 10 2009, 20:44
Post #12





Group: Members
Posts: 412
Joined: 9-August 07
From: Los Angeles
Member No.: 46048



I'm anxious to see how this would perform on the next generation of NVIDIA hardware (GT300), which is supposedly significantly faster in general computational performance than the previous architecture (G200).

Very exciting -- thank you!
Go to the top of the page
+Quote Post
thundat00th
post Sep 10 2009, 21:20
Post #13





Group: Members
Posts: 120
Joined: 13-September 08
From: Louisville, KY
Member No.: 58234



QUOTE (Grunpfnul @ Sep 10 2009, 03:24) *
No love for ati? *sniff*

crying.gif as much as i love ati i wish they had as much support for things as nvidia does (they need to get to work on that hardware havok physics)

maybe the "evergreen" release here in a bit will improve things (i hope)

as far as this goes, i would be interested in lossy gpu encoding, and that might work a bit better regarding the inaccurate floating point calculations

ati stream support crying.gif pwease?

This post has been edited by thundat00th: Sep 10 2009, 21:21


--------------------
My $.02, may not be in the right currency
Go to the top of the page
+Quote Post
hlloyge
post Sep 10 2009, 22:42
Post #14





Group: Members
Posts: 695
Joined: 10-January 06
From: Zagreb
Member No.: 27018



Here are my test results:

Klaus Shultze - Dreams Deluxe Edition, size 797 MB
Core2Duo 8200, Geforce 9600GT with passive cooling

Encoding with FLAC 1.2.1 in command line, -6, version from Sourceforge, 38 seconds

And this...

PS D:\temp_2> .\CUETools.FlaCuda.exe -6 '.\Klaus Schulze - Dreams Deluxe Edition.wav'
CUETools.FlaCuda, Copyright Đ 2009 Gregory S. Chudov.
This is free software under the GNU GPLv3+ license; There is NO WARRANTY, to
the extent permitted by law. <http://www.gnu.org/licenses/> for details.
Filename : .\Klaus Schulze - Dreams Deluxe Edition.wav
File Info : 44100kHz; 2 channel; 16 bit; 01:19:00.8800000
Results : 61,11x; 499280528 bytes in 00:01:17.5764372 seconds;

Windows 7 32 bit.

Well... not that impressive biggrin.gif

(edit) wrote 10 seconds too much for flac encode...

This post has been edited by hlloyge: Sep 10 2009, 22:43
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Sep 10 2009, 23:33
Post #15





Group: Developer
Posts: 690
Joined: 2-October 08
From: Ottawa
Member No.: 59035



What was the file size for flac -6? We should compare the speed at the same compression ratio, e.g. output file size, not at the same compression level, because e.g. -6 for flac is much lower compression than -6 for flacuda. Please, try to compare flacuda -5 vs flac -8, and compare both execution times and file sizes.

Here's a graph i made of Case's results:


This shows x3 speedup of flac -8 compression.

This post has been edited by Gregory S. Chudov: Sep 10 2009, 23:40


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Wombat
post Sep 11 2009, 00:33
Post #16





Group: Members
Posts: 985
Joined: 7-October 01
Member No.: 235



Not to shabby. Tried it on a C2D@3600+GTX260

Dream Theater, Awake

Original 793.976.444 Bytes
Flac 1.21 -8 568.604.561 Bytes ~94 sec. encoding time
Flaccuda -8 567.956.198 Bytes ~53 sec.

I donīt have a recent Flake version at hand so i donīt know how much comes from Cuda alone.

Edit:
Flaccuda -6 568.280.716 Bytes ~48 sec.

This post has been edited by Wombat: Sep 11 2009, 00:50
Go to the top of the page
+Quote Post
GHammer
post Sep 11 2009, 02:21
Post #17





Group: Members
Posts: 224
Joined: 11-May 03
From: China
Member No.: 6546



This is on a 9500 GT

FlaCuda
Filename : Clocks.wav
File Info : 44100kHz; 2 channel; 16 bit; 00:05:07.4670000
Results : 43.10x; 35657424 bytes in 00:00:07.1331000 seconds;

Flac 1.2.1
Clocks.wav: wrote 35796074 bytes, ratio=0.660
2.91 seconds

Both were just run as <executable> Clocks.wav
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Sep 11 2009, 03:24
Post #18





Group: Developer
Posts: 690
Joined: 2-October 08
From: Ottawa
Member No.: 59035



QUOTE (GHammer @ Sep 11 2009, 05:21) *
Flac 1.2.1
Clocks.wav: wrote 35796074 bytes, ratio=0.660
2.91 seconds

That's a bit too small file for comparison. And it's better to compare against flac -8. Default flac compression level is very fast, i don't think it can be beaten by FlaCuda, at least yet. FlaCuda is focusing on higher compression.


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Lucho
post Sep 11 2009, 08:28
Post #19





Group: Members
Posts: 14
Joined: 19-November 08
Member No.: 62733



GPU audio encoding will be useful when OpenCL get adopted by both ATI and Nvidia for now is just "proof of concept"
Go to the top of the page
+Quote Post
hlloyge
post Sep 11 2009, 19:52
Post #20





Group: Members
Posts: 695
Joined: 10-January 06
From: Zagreb
Member No.: 27018



Here I am again, this time, more detailed:

Flac 1.2.1 vs Cuda 01

File: album.wav 643566044

Windows 7, C2Q9400 @ 2.66 GHz, Geforce 9500 GS

flac -8: wrote 405957413 bytes, ratio=0,631 in 99 seconds
cuda -8: 34,98x; 405731414 bytes in 00:01:44.2910429 seconds;

Is there multicore flac encoder? smile.gif that would be a nice thing to test...
Go to the top of the page
+Quote Post
Justin Ruggles
post Sep 12 2009, 02:45
Post #21





Group: Developer
Posts: 165
Joined: 3-June 06
From: Raleigh, NC
Member No.: 31393



QUOTE (hlloyge @ Sep 11 2009, 14:52) *
Is there multicore flac encoder? smile.gif that would be a nice thing to test...

http://softlab-pro-web.technion.ac.il/Proj.../downloads.html

I haven't tested this personally or done anything about trying to adapt the code for inclusion in Flake.
Go to the top of the page
+Quote Post
gib
post Sep 12 2009, 06:10
Post #22





Group: Members
Posts: 227
Joined: 20-January 03
From: A Tropical Isle
Member No.: 4640



Hey, wow. This topic of mine was bumped, and with proof of concept software to boot. Thank you, Gregory!

Here are my results to add to the data (I used flac 1.2.1 -8 and Flacuda01 -8 as suggested):

CPU: Athlon X2 @ 2.35 GHz
GPU: 9600 GSO @ 600 MHz

File 1: 656647868 bytes
Flac: 466183490 in 148 seconds
cuda: 465898530 in 65 seconds

File 2: 654389948 bytes
Flac: 362792762 in 145 seconds
cuda: 360670158 in 63 seconds

More than 2x faster and better compression too. That's pretty impressive.
Go to the top of the page
+Quote Post
PatchWorKs
post Sep 12 2009, 09:40
Post #23





Group: Members
Posts: 498
Joined: 2-October 01
Member No.: 168



Well, I believe that even a small gain is always welcome.

I'm not a developer, so I dunno if possible, but: what about a liboil-like library but for GPGPU encodings, so *any* codec could benefit from GPU computations ?
Go to the top of the page
+Quote Post
hlloyge
post Sep 12 2009, 10:32
Post #24





Group: Members
Posts: 695
Joined: 10-January 06
From: Zagreb
Member No.: 27018



Again: C2D8200, Geforce 9600GT

album.wav to flac -8

original: 578046380
flac: 344489508 in 80 seconds
cuda: 344226134 bytes in 00:00:52.8150209 seconds

Nice.
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Sep 12 2009, 11:33
Post #25





Group: Developer
Posts: 690
Joined: 2-October 08
From: Ottawa
Member No.: 59035



QUOTE (PatchWorKs @ Sep 12 2009, 12:40) *
I'm not a developer, so I dunno if possible, but: what about a liboil-like library but for GPGPU encodings, so *any* codec could benefit from GPU computations ?

Not sure. The code i wrote is quite codec specific. The catch is in a relatively slow connection between CPU and GPU. I had to implement practically the whole FLAC algorithm on the device, so that i won't have to transfer intermediate values between host and GPU, only the final result.

FLAC turned out to be very convenient for GPU. Probably the most convenient. One look at e.g. ALAC algorithm was enough to understand it can never get the same benefit.

QUOTE (hlloyge @ Sep 12 2009, 13:32) *
original: 578046380
flac: 344489508 in 80 seconds
cuda: 344226134 bytes in 00:00:52.8150209 seconds

Nice.

Thank you. And how about FlaCuda -5? It should provide enough compression to beat flac -8.

This post has been edited by Gregory S. Chudov: Sep 12 2009, 11:34


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post

18 Pages V   1 2 3 > » 
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 29th July 2014 - 05:05