IPB

Welcome Guest ( Log In | Register )

18 Pages V  « < 5 6 7 8 9 > »   
Reply to this topicStart new topic
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda), Formerly "lossless codecs and CUDA"
odyssey
post Jan 29 2010, 17:53
Post #151





Group: Members
Posts: 2296
Joined: 18-May 03
From: Denmark
Member No.: 6695



QUOTE (Maggi @ Jan 29 2010, 17:20) *
according to nVidia's driver page, you are not using the latest drivers .. try these:

http://www.nvidia.com/object/win7_winvista...96.21_whql.html
Not sure they will work. Geforce 9300 isn't listed under supported products. I've been struggling previously to get newer releases to work on this. Note that while it's called Geforce 9300, it's a mainboard chipset based on a nForce 730i chip - Not to mix that up with the Geforce 9300 GS chip, which are entirely different.

QUOTE (Maggi @ Jan 29 2010, 17:20) *
as for your graphics card, it should be able to handle CUDA, if it has at least 256MB of local memory
I have not yet verified with 100% accuracy that I have activated 256MB on it, but according to Windows, it seems that I have. I'll be back on this one smile.gif

QUOTE (Maggi @ Jan 29 2010, 17:20) *
but to be honest, I wouldn't expect any miracles, since it seems to be equipped with a mere of 16 cores
Well it IS sold as having CUDA capability, so no matter how good it would ever perform it should, and I'll be glad if I could just get FlaCUDA up and running. It compresses better than native FLAC, so if it's just able to compress my lossless music even further I'm happy smile.gif

This post has been edited by odyssey: Jan 29 2010, 17:55


--------------------
Can't wait for a HD-AAC encoder :P
Go to the top of the page
+Quote Post
alvaro84
post Jan 29 2010, 19:11
Post #152





Group: Members
Posts: 128
Joined: 9-August 06
Member No.: 33830



Having a mere 16 cores is not that bad - my 8600GT has only 32 and it's faster in FLAC encoding (I've installed new drivers a few weeks ago and tried the actual flaCUDA) than 2 stock flac encoders running in parallel on my 3.33GHz conroe core2duo - so if these 16 cores have the same clock rate (which I'm not sure about at all...) it can still be faster than a single threaded software encoder on virtually any non-overclocked CPU.
It's almost scary how well these low level GPUs stand against much higher class CPUs of their own age smile.gif
Go to the top of the page
+Quote Post
odyssey
post Jan 29 2010, 20:37
Post #153





Group: Members
Posts: 2296
Joined: 18-May 03
From: Denmark
Member No.: 6695



Problem found: It's due to Microsoft's RDP-lameness. When using remote desktop, the graphics adapter is disabled and replaced by the one used for RDP.

So thanks MS, I can't use CUDA programs using RDP!


--------------------
Can't wait for a HD-AAC encoder :P
Go to the top of the page
+Quote Post
Maggi
post Feb 1 2010, 09:43
Post #154





Group: Members
Posts: 122
Joined: 31-May 07
Member No.: 43892



QUOTE (odyssey @ Jan 29 2010, 17:53) *
QUOTE (Maggi @ Jan 29 2010, 17:20) *
according to nVidia's driver page, you are not using the latest drivers .. try these:

http://www.nvidia.com/object/win7_winvista...96.21_whql.html
Not sure they will work. Geforce 9300 isn't listed under supported products. I've been struggling previously to get newer releases to work on this. Note that while it's called Geforce 9300, it's a mainboard chipset based on a nForce 730i chip - Not to mix that up with the Geforce 9300 GS chip, which are entirely different.


look again ... wink.gif
QUOTE
GeForce 9 series:
9500 GS, 9600 GT, 9200, 9800 GX2, 9500 GT, 9600 GS, 9300, 9800 GT, 9400 GT, 9300 GS, 9400, 9600 GSO, 9300 GE, 9800 GTX/GTX+



QUOTE (odyssey @ Jan 29 2010, 17:53) *
QUOTE (Maggi @ Jan 29 2010, 17:20) *
as for your graphics card, it should be able to handle CUDA, if it has at least 256MB of local memory
I have not yet verified with 100% accuracy that I have activated 256MB on it, but according to Windows, it seems that I have. I'll be back on this one smile.gif


you could try and run GPU-z for getting those details, as well as information about which APIs are supported by your card

http://www.techpowerup.com/gpuz/


QUOTE (odyssey @ Jan 29 2010, 17:53) *
QUOTE (Maggi @ Jan 29 2010, 17:20) *
but to be honest, I wouldn't expect any miracles, since it seems to be equipped with a mere of 16 cores
Well it IS sold as having CUDA capability, so no matter how good it would ever perform it should, and I'll be glad if I could just get FlaCUDA up and running. It compresses better than native FLAC, so if it's just able to compress my lossless music even further I'm happy smile.gif


fair enough ... smile.gif


QUOTE (odyssey @ Jan 29 2010, 20:37) *
Problem found: It's due to Microsoft's RDP-lameness. When using remote desktop, the graphics adapter is disabled and replaced by the one used for RDP.

So thanks MS, I can't use CUDA programs using RDP!


now that's a major bummer ... how about using eg. TightVNC for your remote activities ?

http://www.tightvnc.com/

Cheers,
Maggi
Go to the top of the page
+Quote Post
RED_404
post Apr 24 2010, 09:44
Post #155





Group: Members
Posts: 1
Joined: 24-April 10
Member No.: 80115



I'm getting "Error : Exception of type 'GASS.CUDA.CUDAException' was thrown."

CODE
CUETools.FlaCuda.exe -11 Priceless.wav
FlaCuda#.91, Copyright © 2009 Gregory S. Chudov.
This is free software under the GNU GPLv3+ license; There is NO WARRANTY, to
the extent permitted by law. <http://www.gnu.org/licenses/> for details.
Filename : Priceless.wav
File Info : 44100kHz; 2 channel; 16 bit; 00:04:07.6270000
Error : Exception of type 'GASS.CUDA.CUDAException' was thrown.


I ran the deviceQuery.rar and got this
CODE
CUDA Device Query (Driver API) statically linked version
There is 1 device supporting CUDA

Device 0: "GeForce GTX 480"
CUDA Driver Version: 3.0
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 1576468480 bytes
Number of multiprocessors: 15
Number of cores: 120
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 0.81 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)

Test PASSED


OS: Windows 7 x64
GPU: GeForce GTX 480
Graphics Driver: 197.55 (8.17.11.9755)


Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Apr 24 2010, 09:54
Post #156





Group: Developer
Posts: 699
Joined: 2-October 08
From: Ottawa
Member No.: 59035



Wow. Congrats on getting a GTX 480 smile.gif Sorry, Fermi cards are not supported yet.
I think i'll have to wait for the release of GTX 460, because GTX 480/470 are a bit over my budget.


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
dragmore
post Apr 24 2010, 20:11
Post #157





Group: Members
Posts: 1
Joined: 23-December 09
Member No.: 76270



Hi. Did some tests on Bel Canto's CD comparing Flac 1.21 to ure newest version .91.
HW: Intel Core 2 Quad Q9550, 8GB RAM, Nvidia GTX260 Driver: 197.45 @ Win7x64, Intel X-25 SSD

BelCanto.wav File Info : 44100kHz; 2 channel; 16 bit; 00:47:44.2000000 Results

Results:

FLAC 1.21
Mode -3 : Belcanto.wav: wrote 293901039 bytes, ratio=0,582 ,15,2Sec
Mode -6 : Belcanto.wav: wrote 284872007 bytes, ratio=0,564 ,20,4Sec
Mode -8 : Belcanto.wav: wrote 283904326 bytes, ratio=0,562 ,72.4Sec

FlaCuda#.91,

Mode -3 : 495,34x; 284585708 bytes in 00:00:05.7823308 seconds;
Mode -6 : 504,41x; 283252159 bytes in 00:00:05.6783248 seconds;
Mode -8 : 418,60x; 283217473 bytes in 00:00:06.8423914 seconds;

CPU Options:
c:\CDRIPS\cuda>CUETools.FlaCuda.exe -8 --cpu-threads 2 ..\BelCanto.wav
Results : 433,81x; 283217473 bytes in 00:00:06.6023776 seconds;

c:\CDRIPS\cuda>CUETools.FlaCuda.exe -8 --cpu-threads 3 ..\BelCanto.wav
Results : 392,28x; 283217473 bytes in 00:00:07.3014177 seconds;

c:\CDRIPS\cuda>CUETools.FlaCuda.exe -8 --cpu-threads 4 ..\BelCanto.wav
Results : 406,71x; 283217473 bytes in 00:00:07.0424028 seconds;


Every other time i get a :

Error : Exception of type 'GASS.CUDA.CUDAException' was thrown.
Unhandled Exception: ErrorLaunchTimeout

Description:
Stopped working

Problem signature:
Problem Event Name: CLR20r3
Problem Signature 01: cuetools.flacuda.exe
Problem Signature 02: 1.0.0.0
Problem Signature 03: 4b49fea7
Problem Signature 04: CUDA.NET
Problem Signature 05: 2.3.7.0
Problem Signature 06: 4ae56b31
Problem Signature 07: 345
Problem Signature 08: 22
Problem Signature 09: GASS.CUDA.CUDAException
OS Version: 6.1.7600.2.0.0.256.1
Locale ID: 1044

Besides the crash, i must say, IMPRESSIVE wink.gif

This post has been edited by dragmore: Apr 24 2010, 20:13
Go to the top of the page
+Quote Post
me7
post Apr 25 2010, 19:00
Post #158





Group: Members
Posts: 177
Joined: 23-August 06
Member No.: 34375



Wow, FLAC -8 works at ~50x on my laptop, FlaCuda -11 does ~150x, very impressive.

Is FlaCuda with the "--verify" switch considered to be safe for archive use? I understand that software can never be guaranteed to be error free and I don't ask for it, I just wonder if you consider your code (with the verify option) robust enough to be an alternative to the official FLAC.

As far as I understand you use the parallel processors on the GPU find the best "next step" (don't know how it's called in FLAC terminology) and then execute it on the CPU. Is this approach limited to FLAC or can similar computations of other audio/video formats use it?
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Apr 26 2010, 11:24
Post #159





Group: Developer
Posts: 699
Joined: 2-October 08
From: Ottawa
Member No.: 59035



QUOTE (dragmore @ Apr 24 2010, 23:11) *
Unhandled Exception: ErrorLaunchTimeout

Exactly how often does it happen? Is there any pattern to this?
Does it look like a screenshot in this article: http://www.microsoft.com/whdc/device/displ...dm_timeout.mspx ?
Anybody else having those problems?

QUOTE (me7 @ Apr 25 2010, 22:00) *
Is FlaCuda with the "--verify" switch considered to be safe for archive use?

Yes. --verify guarantees that produced file can be decoded, at least with CUETools.Flake decoder, and it's audio contents is identical to the source.
In theory, it cannot give a 100% guarantee that produced file can be decoded with reference FLAC decoder, because --verify uses other decoder, but so far nobody reported any such problems.

QUOTE (me7 @ Apr 25 2010, 22:00) *
As far as I understand you use the parallel processors on the GPU find the best "next step" (don't know how it's called in FLAC terminology) and then execute it on the CPU.

More or less. Recent versions can do almost everything on GPU, and latest version does this by default (can be disabled with --slow-gpu option). CPU only does some sanity checks, formats the resulting data as a FLAC bitstream and writes it to file.

QUOTE (me7 @ Apr 25 2010, 22:00) *
Is this approach limited to FLAC or can similar computations of other audio/video formats use it?

Effective parallel processing is possible only if format is suitable for it. For example, ALAC uses adaptive compression, which makes it very inconvenient for parallel processing. Maybe FLAC isn't the only codec which can benefit from GPU encoding, but for most codecs the task will be much harder and the speed won't be that impressive. Most of the GPU code in FlaCuda is very specific for FLAC.

As for video, there are several GPU encoders for x264 video codec, most if not all of them are proprietary.


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
MachineHead
post May 4 2010, 00:23
Post #160





Group: Members
Posts: 403
Joined: 17-September 02
From: Hell
Member No.: 3380



Getting a crash when I try to convert individual wave files using FlaCuda 091. The error codes are nearly the same as mentioned in an earlier post. Interestingly, FlaCuda does not crash if converting a wavpack image file with embedded cue to flac image with embedded cue.

Windows error report below.

Problem signature:
Problem Event Name: CLR20r3
Problem Signature 01: cuetools.flacuda.exe
Problem Signature 02: 1.0.0.0
Problem Signature 03: 4b49fea7
Problem Signature 04: mscorlib
Problem Signature 05: 2.0.0.0
Problem Signature 06: 4a27471d
Problem Signature 07: 349e
Problem Signature 08: 1c5
Problem Signature 09: System.IO.IOException
OS Version: 6.1.7600.2.0.0.256.48
Locale ID: 1033


This was using foobar2000. I also grabbed that error code:


Conversion failed: The encoder has terminated prematurely with code -532459699 (0xE0434F4D); please re-check parameters


Commandline parameters are set to: -8 - -o %d


--------------------
Looking for a digital idiot? Look no further.
Go to the top of the page
+Quote Post
modernartistry
post May 29 2010, 07:26
Post #161





Group: Members
Posts: 12
Joined: 29-May 10
Member No.: 80969



Hell FlaCuda091 is ultra fast. Using a NVidia GT8800 with foobar v1.0.3 and "-8 - -o %d --verify" parameters. Only thing is that my HDD is limiting the encoding speed. A 46min Wav file took up less than 10sec. Detailed results coming up soon.
Thanks for that encoder. I hope and wish that the FlaCuda will be compatibel with all other software player, devices and decoder.

This post has been edited by modernartistry: May 29 2010, 07:28
Go to the top of the page
+Quote Post
Bad Monkey
post May 29 2010, 19:54
Post #162





Group: Members
Posts: 90
Joined: 22-August 07
Member No.: 46407



nVidia 8800 GT with Intel Q6600 Quad core.

Foobar 1.03 transcoding FLAC to FLAC (Pink Floyd Final Cut [13 tracks])

CODE
*** FLAC 1.2.1 @ 4 threads ***

level 8
Total encoding time: 0:22.277, 124.90x realtime

*** FlaCuda 0.91 ***

-8 - -o %d --verify
Total encoding time: 1:01.184, 45.47x realtime

-8 --cpu-threads 2 - -o %d --verify
Total encoding time: 0:50.529, 55.06x realtime

-8 --cpu-threads 3 - -o %d --verify
Total encoding time: 0:42.807, 65.00x realtime

-8 --cpu-threads 3 - -o %d
Total encoding time: 0:42.011, 66.23x realtime

-8 --cpu-threads 4 - -o %d --verify
Total encoding time: 0:42.027, 66.20x realtime

-8 --cpu-threads 4 - -o %d
Total encoding time: 0:41.356, 67.28x realtime

-8 --slow-gpu --cpu-threads 4 - -o %d --verify
Total encoding time: 0:37.939, 73.34x realtime


CPU usage with FlaCuda never peaks above 25% per core. Seems for a practical scenario with a quad core CPU it doesn't compete.
Go to the top of the page
+Quote Post
NullC
post Jun 4 2010, 06:58
Post #163





Group: Developer
Posts: 200
Joined: 8-July 03
Member No.: 7653



I'm amused by flacuda's speed.... I can't think of too much use for 800x realtime flac encoding, but I thought I throw out something that I'm too lazy to implement that flacuda's speed would make almost reasonable:

_Optimal_ block size selection. Flac lets you change the frame size on the fly. Truly optimal selection across all supported sizes would be a bit insane, but globally optimal selection on a subset of sizes is not too terrible.

Lets consider all powers of two from 64 to 32768, there are ten sizes. At every 64 sample offset through the file, encode all ten sizes, and store the resulting sizes. Making the hand-wavy assumption that the computation per sample is constant this will be 1023x slower than normal.

Take the sizes and construct a directed graph with a vertex at every 64th sample and 10 edges leaving the sample connecting it to the vertex for the sample 64,128,256,etc. away. Assign the coding cost for the block at each of the sizes to each of the edges. Now run the Dijkstra shortest path algorithm from the first to last or last to first vertex. The result will be the globally optimal frame size selection given the available block sizes.

Either re-encode or, if you wasted a lot of ram saving the results of the first past, reassemble the final stream.

Limiting yourself to powers of two in the flac subset over the range 64-4096 would be 127x the number of processed samples processed, 32-4096 would be 255x. The cuda implementation might be able to maintain almost decent speeds while doing this extra work. ;) This isn't limited to power of two sizes, but you probably want to arrange it so that your smallest size is a common factor of all the sizes you use.


This post has been edited by NullC: Jun 4 2010, 07:12
Go to the top of the page
+Quote Post
Brat2007
post Jul 29 2010, 19:36
Post #164





Group: Members
Posts: 1
Joined: 29-July 10
Member No.: 82629



QUOTE (Gregory S. Chudov @ Sep 18 2009, 19:05) *
Is there anybody here who knows the math behind Cholesky decomposition used in ffmpeg as an alternative method of LPC coefficients search?
This method is too slow for CPU, but i thought i'd give it a shot on GPU.
The problem is, GPU doesn't do double precision very well.



Gregory,

maybe you can find background info here : http://www.cise.ufl.edu/research/sparse/ch...OLMOD/Cholesky/
Go to the top of the page
+Quote Post
Mataglap
post Jul 30 2010, 02:24
Post #165





Group: Members
Posts: 18
Joined: 24-December 02
Member No.: 4222



Man, that's some fast encoding! Nice work!

Is there any chance that tag writing will be added to the binary, so that it can be used with EAC?

Go to the top of the page
+Quote Post
Wombat
post Jul 30 2010, 13:12
Post #166





Group: Members
Posts: 1036
Joined: 7-October 01
Member No.: 235



QUOTE (Mataglap @ Jul 30 2010, 03:24) *
Man, that's some fast encoding! Nice work!

Is there any chance that tag writing will be added to the binary, so that it can be used with EAC?


You already can with metaflac. I gave an example here flacuda.exe & metaflac.exe in EAC
Go to the top of the page
+Quote Post
SCOTU
post Jul 30 2010, 17:31
Post #167





Group: Members
Posts: 118
Joined: 9-July 10
Member No.: 82156



Has anyone tested this when using it multiple times in parallel? The ability to single threaded encoding on several files at once is pretty amazing, wondered if this wouldn't use up all the GPU and could also be run several in parallel. I don't run into a hard drive bottleneck as easily as most as I use a raid 0 configuration of high end desktop hard drives.
Go to the top of the page
+Quote Post
alvaro84
post Jul 30 2010, 19:58
Post #168





Group: Members
Posts: 128
Joined: 9-August 06
Member No.: 33830



My 8600GT got used fully by one instance of the CUDA encoder, more threads gave no advantage. On the top of that it can be I/O limited very quickly (I tried it with the source being on different HDD than the target).
Go to the top of the page
+Quote Post
Mataglap
post Jul 30 2010, 20:46
Post #169





Group: Members
Posts: 18
Joined: 24-December 02
Member No.: 4222



QUOTE (Wombat @ Jul 30 2010, 05:12) *
QUOTE (Mataglap @ Jul 30 2010, 03:24) *
Man, that's some fast encoding! Nice work!

Is there any chance that tag writing will be added to the binary, so that it can be used with EAC?


You already can with metaflac. I gave an example here flacuda.exe & metaflac.exe in EAC


Yep, you did. smile.gif Clever! Thanks.
Go to the top of the page
+Quote Post
modernartistry
post Aug 3 2010, 01:39
Post #170





Group: Members
Posts: 12
Joined: 29-May 10
Member No.: 80969



Is there any further developement on FlaCuda? Current version is 0.91?

Another test:

FlaCuda 091 + Foobar 1.03
Music: Edenbridge - Solitair / Symphonic Metal Album in a wav file, timelenght 57:25min
Hardware: Intel Dual Core E8400 / Nvidia GT8800 (with newer 92b core)

FLAC 1.2.1 level 8 (2 threads)
Total encoding time: 1:13.711, 46.74x realtime

FlaCuda -8 - -o %d --verify
Total encoding time: 0:31.216, 110.37x realtime

FlaCuda -8 --cpu-threads 2 - -o %d --verify
Total encoding time: 0:24.492, 140.67x realtime

FlaCuda is a good choice for Dual Core system. As Bad Monkey above wrote a quad core may be faster than gpu.

This post has been edited by modernartistry: Aug 3 2010, 02:17
Go to the top of the page
+Quote Post
Bad Monkey
post Aug 3 2010, 07:11
Post #171





Group: Members
Posts: 90
Joined: 22-August 07
Member No.: 46407



I have an 8800GT too but I was unable to get much above 70x, per my post above, with FlaCuda. Your results would beat my Q6600's benchmark of 125x. Am I missing something?
Go to the top of the page
+Quote Post
modernartistry
post Aug 3 2010, 20:10
Post #172





Group: Members
Posts: 12
Joined: 29-May 10
Member No.: 80969



QUOTE (Bad Monkey @ Aug 3 2010, 08:11) *
I have an 8800GT too but I was unable to get much above 70x, per my post above, with FlaCuda. Your results would beat my Q6600's benchmark of 125x. Am I missing something?


Hm. As i wrote i have the newer version of 8800GT that came out in february 2008 with 512MB RAM instead of 378MB. This GPU core (G92) was the fastest out there. followed by the G200b core a year later which is nearly the same.
Might be that you have an older modell or drivers? I used the same setting as you. Maybe your harddrive is too slow?

My GPU card spec:
512 MByte GDDR3
65 nm
Stream-Processors: 112
RAM bandwith: 256-bit
Core-frequenz: 600 MHz
Shader-frequenz: 1500 MHz
RAM-frequenze: 900 MHz

This post has been edited by modernartistry: Aug 3 2010, 20:21
Go to the top of the page
+Quote Post
Bad Monkey
post Aug 4 2010, 07:10
Post #173





Group: Members
Posts: 90
Joined: 22-August 07
Member No.: 46407



Yeah I have 512 MB but core clock is only 450 MHz / VRAM 700 MHz. Okay.

Am going to upgrade to a GTX 460 sometime soon. So that'll be interesting. Haha.
Go to the top of the page
+Quote Post
alvaro84
post Aug 4 2010, 07:23
Post #174





Group: Members
Posts: 128
Joined: 9-August 06
Member No.: 33830



There must be some other limit, because this 70x matches my results with a 8600GT and an early version of flaCUDA. Any 8800GT should be much faster than it.
HDD speed, perhaps? Those are mechanical and thus seriously limited when they have to read/write more threads at once (have to move their heads back and forth). Whenever I tested any encoder I used a different HDD for destination and did not use more than 2 threads, ever (it wouldn't even benefit my core2duo, to begin with biggrin.gif).
I'm planning on getting an SSD in a few months (for system and some temp area) so I'll test 2-thread encoding again.

edit. I forgot that I'm planning on replacing my vcard to a Redeon too. Well, so much for CUDA...

This post has been edited by alvaro84: Aug 4 2010, 07:27
Go to the top of the page
+Quote Post
Bad Monkey
post Aug 4 2010, 07:58
Post #175





Group: Members
Posts: 90
Joined: 22-August 07
Member No.: 46407



If there is another limit clearly it would have to be something not shared with the CPU [turning in faster results @ 125x], which is obviously not the case with a HDD restriction. In any case the FLAC result above is only 260 MB.
Go to the top of the page
+Quote Post

18 Pages V  « < 5 6 7 8 9 > » 
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 19th September 2014 - 03:27