IPB

Welcome Guest ( Log In | Register )

17 Pages V  < 1 2 3 4 > »   
Reply to this topicStart new topic
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda), Formerly "lossless codecs and CUDA"
Dologan
post Sep 12 2009, 11:40
Post #26





Group: Members (Donating)
Posts: 478
Joined: 22-November 01
From: United Kingdom
Member No.: 519



Wow, nice work Gregory!

Just wondering... how did you get around the limitation mentioned by Garf earlier on this thread about GPUs only doing floating point and therefore not being suitable for lossless encoding?
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Sep 12 2009, 11:52
Post #27





Group: Developer
Posts: 689
Joined: 2-October 08
From: Ottawa
Member No.: 59035



Current GPUs do integer computations quite alright.


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Dologan
post Sep 12 2009, 12:15
Post #28





Group: Members (Donating)
Posts: 478
Joined: 22-November 01
From: United Kingdom
Member No.: 519



Hmm, does the encoder do pipe encoding (i.e. for proper foobar2000 use)?
Go to the top of the page
+Quote Post
Wombat
post Sep 12 2009, 13:41
Post #29





Group: Members
Posts: 977
Joined: 7-October 01
Member No.: 235



Some questions regarding Flaccuda.
Back when flake was new i had problems encoding at higher compression then standard Flac and playback on my Slimdevice.
Does Flaccuda use the same options at the corresponding compression level of Flac? At least it looks like i can play back Flaccuda -8 on my Slimdevice. How does it come it compresses better then?
Shouldnīt it be named "FlakeCuda" in the end?
Go to the top of the page
+Quote Post
Maurits
post Sep 12 2009, 14:48
Post #30





Group: Members
Posts: 370
Joined: 30-September 05
From: London, Europe
Member No.: 24805



How hard would it be to convert this CUDA version into a more versatile OpenCL implementation? It is said that OpenCL is largely based on CUDA but non vendor-specific. That suggests it should be easy to adapt.

That way it wouldn't be limited to NVIDIA GPUs. In fact, it would even remove the limit of just using a GPU as OpenCL can combine all available GPUs and processor cores in the system as if it was one unit.

This post has been edited by Maurits: Sep 12 2009, 14:59
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Sep 12 2009, 15:07
Post #31





Group: Developer
Posts: 689
Joined: 2-October 08
From: Ottawa
Member No.: 59035



QUOTE (Dologan @ Sep 12 2009, 15:15) *
Hmm, does the encoder do pipe encoding (i.e. for proper foobar2000 use)?

It does now (version 02), but i would suggest to be careful with your precious files while this is still an alfa version.

QUOTE (Wombat @ Sep 12 2009, 16:41) *
Some questions regarding Flaccuda.
Back when flake was new i had problems encoding at higher compression then standard Flac and playback on my Slimdevice.
Does Flaccuda use the same options at the corresponding compression level of Flac? At least it looks like i can play back Flaccuda -8 on my Slimdevice. How does it come it compresses better then?
Shouldnīt it be named "FlakeCuda" in the end?

It doesn't use the same options at the corresponding compression levels. But it does stick to a so called FLAC subset (supported by hardware devices) for compression levels 0-8. Compression levels 9-11 are non-subset, and might not play on some devices. Flake has the same conventions.

Better compression is achieved mainly by brute-force search of optimal compression parameters (stereo modes, LPC orders, and window functions). Flac does this only at level 8, and it only tries one window function, and not the best one.

As much as i'm greateful to Justin for his wonderful Flake encoder, but unlike my C# Flake port, FlaCuda is not a derivative work. Flake's algorithm was written for CPU, not GPU, and those are two very different realms. Flake does a great job at smart guessing the best compression parameters, while FlaCuda just makes a brute-force search on a GPU. FlaCuda however contains a C# Flake library, and uses it for FLAC decompression, if source file is flac, or if --verify mode is enabled.

QUOTE (Maurits @ Sep 12 2009, 17:48) *
How hard would it be to convert this Cuda version into a more versatile OpenCL implementation?

That way it wouldn't be limited to NVIDIA GPUs. In fact, it would even remove the limit of just using a GPU as OpenCL can combine all available GPUs and processor cores in the system as if it was one unit.

I'm not yet experienced enough in this matter, but i assume that this versatility will come for a price of speed. I will try to verify this later.

This post has been edited by Gregory S. Chudov: Sep 12 2009, 15:24


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Wombat
post Sep 12 2009, 15:39
Post #32





Group: Members
Posts: 977
Joined: 7-October 01
Member No.: 235



Thanks for explaining it. Really a nice work you have done, thanks for that. Now i know what i can use Cuda for, it should really be mentioned on the nvidia Cuda pages.
Go to the top of the page
+Quote Post
Maurits
post Sep 12 2009, 15:57
Post #33





Group: Members
Posts: 370
Joined: 30-September 05
From: London, Europe
Member No.: 24805



QUOTE (Gregory S. Chudov @ Sep 12 2009, 15:07) *
QUOTE (Maurits @ Sep 12 2009, 17:48) *
How hard would it be to convert this Cuda version into a more versatile OpenCL implementation?

That way it wouldn't be limited to NVIDIA GPUs. In fact, it would even remove the limit of just using a GPU as OpenCL can combine all available GPUs and processor cores in the system as if it was one unit.

I'm not yet experienced enough in this matter, but i assume that this versatility will come for a price of speed. I will try to verify this later.


That's possible. Although a performance hit might be offset by the fact that OpenCL combines the CPU and all available GPUs. The biggest difference I seem to find after some research is that there are a couple of things implemented in CUDA that OpenCL doesn't have yet. However, if you don't use any of these additional features for your specific implementation that wouldn't matter.

I must say that I am only speculating, I don't know much about this matter either, I was just wondering...
Go to the top of the page
+Quote Post
Dologan
post Sep 12 2009, 17:40
Post #34





Group: Members (Donating)
Posts: 478
Joined: 22-November 01
From: United Kingdom
Member No.: 519



QUOTE (Gregory S. Chudov @ Sep 12 2009, 15:07) *
QUOTE (Dologan @ Sep 12 2009, 15:15) *
Hmm, does the encoder do pipe encoding (i.e. for proper foobar2000 use)?

It does now (version 02), but i would suggest to be careful with your precious files while this is still an alfa version.

Wow, thanks! Pipe encoding seems to be working, with no differences in the decoded data of the resulting file. Speed for -8 is ~70x on my 8800GT vs ~40x on a single core of my Q6600. However, the resulting file appears to be lacking any length and bitrate information and so seeking is impossible.

Also, obviously foobar2000 isn't ready to properly handle GPU encoding. With the converter set to handle three simultaneous encoding processes for my quad core, FlaCuda actually slows down to around ~35x overall, whereas the standard Flac naturally scales well to ~110x

So yeah, not quite flac-replacement ready, then wink.gif Looks promising for inherently single thread things like rip+encodes, though (once/if it gets tagging arguments implemented, that is)

This post has been edited by Dologan: Sep 12 2009, 18:02
Go to the top of the page
+Quote Post
guruboolez
post Sep 12 2009, 20:09
Post #35





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



My results on a old Core2Duo E6300 and small Nvidia GeForce 9400 GT. I took two disks: one solo piano disc that compress very well (<400 kbps) and a baroque orchestral work that doesn't (750 kbps).


PIANO MUSIC

CODE
WAV        594.191 KB
FLAC -5    163.122 KB    49313 milliseconds    x69.94
FLAC -8    159.276 KB   116641 milliseconds    x29.57
CUDA -0    158.750 KB    60188 milliseconds    x57.30
CUDA -4    158.024 KB    88531 milliseconds    x38.96
CUDA -8    156.881 KB   176656 milliseconds    x19.52
CUDA 11    156.799 KB   527922 milliseconds     x6.53



VIVALDI

CODE
WAV        754.037 KB
FLAC -5    393.834 KB    68047 milliseconds    x64.32
FLAC -8    393.279 KB   160109 milliseconds    x27.33
CUDA -0    394.796 KB    78688 milliseconds    x55.62
CUDA -4    394.034 KB   111469 milliseconds    x39.26
CUDA -8    393.191 KB   223328 milliseconds    x19.59
CUDA 11    392.079 KB    675656 milliseconds    x6.47


On this cheap GPU, FlaCuda 0.2 performs rather well. It can't be as fast as the CPU but this encoder could approach this speed at -0 and sometimes compress better than flac.exe -8! Nevertheless the CPU has two cores and only one was used for this benchmark.
If I'm not wrong a similar 9400 GPU is used in the ION system. It means that cheap and powerless nettops or netbooks with ION chipset could perfectly be used for batch flac encoding. To be confirmed...

SMALL DECODING SPEED:

CODE
FLAC -8:   x409
CUDA -8:   x392
CUDA 11:   x285


As you can see there's a drastic fall in decoding speed with flacuda -11 (tested with latest foobar2000). On my Sansa Clip (2GB) the playback seems to be fine (I just tried one file though).


More tests are needed but it looks like a very interesting encoder which should work nicely on a ION chipset.
Go to the top of the page
+Quote Post
Garf
post Sep 12 2009, 21:35
Post #36


Server Admin


Group: Admin
Posts: 4881
Joined: 24-September 01
Member No.: 13



QUOTE (Gregory S. Chudov @ Sep 12 2009, 12:52) *
Current GPUs do integer computations quite alright.


It used to be so, on the Nvidia side, that you can only do 24 bit arithmetic, which might be enough for FLAC. I don't know about ATI. 32-bits (i.e. normal) arithmetic is only possible with a huge performance penalty.

New versions of CUDA or the cards might have changed this, or FLAC might have been simple enough that it wasn't an issue.

PS. Are these posts comparing multithreaded FLAC implementations on the host? (I don't know if those exist)
Go to the top of the page
+Quote Post
arri
post Sep 12 2009, 22:16
Post #37





Group: Members
Posts: 9
Joined: 22-October 07
Member No.: 48099



Just finished my tests:

image.wav:
flac 1.2.1   -8 : 52 sec
flac-cuda  -8 : 32 sec

image.wav divided in 10 songs (1.wav, 2.wav etc.)
flac 1.2.1  -8 : 52 sec
flac-cuda  -8 : 32 sec

flac 1.2.1-icl : 30 sec

flac 1.2.1-icl is operating on both cores on my processor.
Intel Core 2 Duo E8500; Nvidia 8800 GT

flac 1.2.1-icl I found sometime ago somewhere in hydrogenaudio cool.gif
Go to the top of the page
+Quote Post
Wombat
post Sep 12 2009, 22:51
Post #38





Group: Members
Posts: 977
Joined: 7-October 01
Member No.: 235



QUOTE (arri @ Sep 12 2009, 22:16) *
Just finished my tests:

image.wav:
flac 1.2.1   -8 : 52 sec
flac-cuda  -8 : 32 sec

image.wav divided in 10 songs (1.wav, 2.wav etc.)
flac 1.2.1  -8 : 52 sec
flac-cuda  -8 : 32 sec

flac 1.2.1-icl : 30 sec

flac 1.2.1-icl is operating on both cores on my processor.
Intel Core 2 Duo E8500; Nvidia 8800 GT

flac 1.2.1-icl I found sometime ago somewhere in hydrogenaudio cool.gif


Afaik there isnīt a good Multi-Core version and i canīt believe a different compile can speed up by 75%. Please upload this version somewhere or link to its source.
Go to the top of the page
+Quote Post
arri
post Sep 12 2009, 23:40
Post #39





Group: Members
Posts: 9
Joined: 22-October 07
Member No.: 48099



QUOTE (Wombat @ Sep 12 2009, 23:51) *
Afaik there isnīt a good Multi-Core version and i canīt believe a different compile can speed up by 75%. Please upload this version somewhere or link to its source.


I think those different flac encoders I have came from rarewares

Go to the top of the page
+Quote Post
Wombat
post Sep 12 2009, 23:53
Post #40





Group: Members
Posts: 977
Joined: 7-October 01
Member No.: 235



QUOTE (arri @ Sep 12 2009, 23:40) *
QUOTE (Wombat @ Sep 12 2009, 23:51) *
Afaik there isnīt a good Multi-Core version and i canīt believe a different compile can speed up by 75%. Please upload this version somewhere or link to its source.


I think those different flac encoders I have came from rarewares

It canīt be that compile and please donīt waste my time with trying some versions you link to cause you "think" it may be the one.
Go to the top of the page
+Quote Post
Case
post Sep 13 2009, 10:00
Post #41





Group: Developer (Donating)
Posts: 2177
Joined: 19-October 01
From: Finland
Member No.: 322



I made a more thorough comparison with the new version. I combined a wav from 18 different genres giving hopefully a better representation of real abilities. This compares each compression mode. Horizontal scale is compression ratio and vertical scale is encoding speed vs realtime. With this test set CUDA version was more efficient starting from compression mode 6 but then only faster than FLAC's modes 7 and 8.
Attached Image
Go to the top of the page
+Quote Post
hlloyge
post Sep 13 2009, 14:36
Post #42





Group: Members
Posts: 695
Joined: 10-January 06
From: Zagreb
Member No.: 27018



It sure isn't that compile, as they (at least for me) run at the same speed for -8.
Go to the top of the page
+Quote Post
alvaro84
post Sep 13 2009, 16:19
Post #43





Group: Members
Posts: 128
Joined: 9-August 06
Member No.: 33830



I've done a quick test, how a 2+ year old full-fledged mainstream CPU (to be more precise: one core of it) stands against a pretty cheap, a little better than low-end GPU of its own era, both overclocked. The Core2 (E6420, Conroe core) duo runs at 3328Mhz with ddr2-832 cl4; the 8600GT runs at 580/1296/837MHz, this is all it can do with passive cooling (probably at a decreased core voltage).

CPU: 49.8x (3328/416MHz)
GPU: 69.4x (580/1296/837MHz)
GPU: 66.4x (540/1188/702MHz)

lv6:
GPU: 54.3x (580/1296/837MHz)

I've tested both -5 and -6 because for my test material file size with FLAC 1.2.1 -8 fell right between FLACuda -5 and -6.
Decoding speed (performed by fb2k):
1.1.2 -8: 615x
CUDA -5: 618x
CUDA -6: 572x
(FLACUDA -11 encoded much slower, ~12x; and it also decoded slower, ~300x)

Considering how insane performance (and extremely power hogging) GPUs are around these days, a GPU FLAC encoder seems a good idea.

I just found one glitch: the decoded voice data seems identical but the FLAC/Cuda files are not seekable in my fb2k 0.9.6.9. The parameters were -6 - -o %d
(OK, I see, I'm not alone with this problem)

[p.s. I also made a comparison with TAK -p2m what I regularly use: 77.7x encoding by one CPU core, 3.5% smaller (968 vs 1002kbps) and decodes at 384x speed - definitely slower than FLAC, except extreme FLACuda files]
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Sep 13 2009, 16:47
Post #44





Group: Developer
Posts: 689
Joined: 2-October 08
From: Ottawa
Member No.: 59035



Thank you for detailed test results. Looking at them i decided to focus on optimizing performance at lower compression levels. Version 03 must be noticeably faster at levels 0..7. I also fixed the problem with files being unseekable when using pipe encoding from fb2k.

This post has been edited by Gregory S. Chudov: Sep 13 2009, 16:50


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
alvaro84
post Sep 13 2009, 19:33
Post #45





Group: Members
Posts: 128
Joined: 9-August 06
Member No.: 33830



QUOTE (Gregory S. Chudov @ Sep 13 2009, 16:47) *
Thank you for detailed test results. Looking at them i decided to focus on optimizing performance at lower compression levels. Version 03 must be noticeably faster at levels 0..7. I also fixed the problem with files being unseekable when using pipe encoding from fb2k.


The good #1: the resulting FLAC is seekable!
The good #2: -6 is definitely faster, 60.4x vs 54.3x
The bad: the files are slightly larger, now I need -7 to get smaller result than Flac 1.1.2 -8 (CPU -8: 37810k; CUDA -6: 37857k; CUDA -7: 37791k)
The ugly: FLACuda -7 is slower than CPU FLAC -8. On my 'nose heavy' system, that is.

Hm, I probably should try with different tracks (my ad-hoc test sample is a ZUN theme from the Changeability of Strange Dream album, strictly speaking it's not a Touhou soundtrack, but similar to the game background music).
Is it the seek table that makes -6 files larger?

update: In case of a Rammstein track GPU -6 got smaller than CPU -8. Need more samples to test.
(Sorry, I was a bit hasty to post about it. Human error unsure.gif)

This post has been edited by alvaro84: Sep 13 2009, 19:38
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Sep 13 2009, 19:42
Post #46





Group: Developer
Posts: 689
Joined: 2-October 08
From: Ottawa
Member No.: 59035



QUOTE (alvaro84 @ Sep 13 2009, 22:33) *
Is it the seek table that makes -6 files larger?

Nope. Old-style -6 can be invoked by parameters "-5 -l 12". That's a lower-case L there, not a digit 1.

This post has been edited by Gregory S. Chudov: Sep 13 2009, 19:43


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Case
post Sep 13 2009, 21:39
Post #47





Group: Developer (Donating)
Posts: 2177
Joined: 19-October 01
From: Finland
Member No.: 322



Seems to me like other modes got a speed boost too:
Attached Image
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Sep 14 2009, 20:25
Post #48





Group: Developer
Posts: 689
Joined: 2-October 08
From: Ottawa
Member No.: 59035



Phew. I think i finally squeezed everything i could out of it, at least for now.

Version 04 should be faster than anything.


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Case
post Sep 14 2009, 21:18
Post #49





Group: Developer (Donating)
Posts: 2177
Joined: 19-October 01
From: Finland
Member No.: 322



Impressive.
Attached Image
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Sep 14 2009, 21:26
Post #50





Group: Developer
Posts: 689
Joined: 2-October 08
From: Ottawa
Member No.: 59035



Thank you.


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post

17 Pages V  < 1 2 3 4 > » 
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 22nd July 2014 - 13:22