IPB

Welcome Guest ( Log In | Register )

18 Pages V  « < 8 9 10 11 12 > »   
Reply to this topicStart new topic
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda), Formerly "lossless codecs and CUDA"
SCOTU
post Nov 11 2010, 18:04
Post #226





Group: Members
Posts: 118
Joined: 9-July 10
Member No.: 82156



377MB is easily within the realm of HDD speed for a 7 second time. A good drive like an F3 should easily be able to get over 100MB/s Obviously it's not limited by Hard Drive speed.
Go to the top of the page
+Quote Post
Wombat
post Nov 11 2010, 18:45
Post #227





Group: Members
Posts: 986
Joined: 7-October 01
Member No.: 235



Yes, most likely. At these speeds the whole i/o system has to do some work, so HD speed is only one factor. For example it reads and writes the file at once at the same HDD makes something completely different then just writing. Like mentioned earlier on my HDD, an older 500GB WD i get hiccups that worsen the whole encoding speed result.
Using compression -0 makes things much faster up to 800x, so my SSD isnŽt limiting.

One other thing i noticed. Using Flaccl seems to stress my GPU more as FlaCuda. Evga Precission shows a GPU load of ~75% for Cuda and up to the 90% for Flaccl. Also it creates higher temperatures. Or is it that it doesnŽt use 100% cause it already is faster as the data can be delivered?

Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Nov 11 2010, 19:00
Post #228





Group: Developer
Posts: 690
Joined: 2-October 08
From: Ottawa
Member No.: 59035



One of possible bottlenecks is the PCI express transfer speed.

For example, i've read many reports on AMD forums that AMD GPU drivers fail to provide decent DMA transfer speeds on certain (i.e. non-AMD) chipsets.

NVIDIA drivers usually don't have this problem, and recent motherboards with PCI express 2.0 X16 can do up to 6Gbit per second (which should be enough for up to ~1600x encoding speed), but older PCI Express 1.0 can be capped at ~800x encoding speed, and if you have some kind of Crossfire/SLI configuration, or some other adapter, like SATA 6Gbit/USB 3.0/SSD using a second PCI express slot, speeds can drop drastically.


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
viktor
post Nov 11 2010, 19:51
Post #229





Group: Members
Posts: 297
Joined: 17-November 06
Member No.: 37682



QUOTE (Metroid @ Nov 11 2010, 00:17) *
QUOTE (Wombat @ Nov 10 2010, 21:48) *
Interesting. Having that close numbers between SSD and HDD makes me wonder. Do you have Win7 and some kind of Readyboost kicking in?


I'm using the Windows 7 64 Professional but all HDD features are disabled, the IntelSSDToolbox did the trick and that is one way of disabling all, the other way is manual via registry but I bet you know all this. Well even the pagefile is disabled which makes me wonder if that was the case.


pagefile shouldn't be disabled on an ssd.

read:

http://blogs.msdn.com/b/e7/archive/2009/05...drives-and.aspx

(search for pagefile)
Go to the top of the page
+Quote Post
Metroid
post Nov 11 2010, 21:21
Post #230





Group: Members
Posts: 8
Joined: 6-November 10
Member No.: 85469



QUOTE (viktor @ Nov 11 2010, 18:51) *
pagefile shouldn't be disabled on an ssd.

read:

http://blogs.msdn.com/b/e7/archive/2009/05...drives-and.aspx

(search for pagefile)


I read that year ago. It's dependent of how much ram you have anyway.

This post has been edited by Metroid: Nov 11 2010, 21:23
Go to the top of the page
+Quote Post
Kykc
post Nov 12 2010, 13:10
Post #231





Group: Members
Posts: 2
Joined: 1-December 08
Member No.: 63601



QUOTE (Gregory S. Chudov @ Nov 9 2010, 15:50) *
FLACCL 0.2:[attachment=6172:flaccl02.rar]

Thanks! Works perfect on my gtx470. Any chance, that this encoder will support non 16bits per sample format?
CODE
Unhandled Exception: System.Exception: Bits per sample must be 16.
   at CUETools.Codecs.FLACCL.FLACCLWriter..ctor(String path, Stream IO, AudioPCM
Config pcm)
   at CUETools.FLACCL.cmd.Program.Main(String[] args)
Go to the top of the page
+Quote Post
alvaro84
post Nov 16 2010, 14:20
Post #232





Group: Members
Posts: 128
Joined: 9-August 06
Member No.: 33830



I'm back after replacing my vcard to a Radeon HD5670. I've tried your OpenCL encoder too, and... I can't tell you exact speeds (varies like crazy...), but it looks at least twice as fast as the stock CPU encoder on my Core 2 Duo (Conroe) at 3.1GHz and gets a bit smaller output.
At least it isn't worse than my 8600GT + CUDA encoder used to be. Which I unfortunately can't (don't want to do the hassle to) re-test whistling.gif
I'll do a search back in the topic as I haven't yet copied my last test results to my new drives yet (this is why it would have been better to store it on my pendrive as I do with many things).
Anyway, this time the result shouldn't be limited by I/O, it's been tested on my new SSD.

edit. it seems it's faster. Thank goodness, replacing an HDD to an SSD and a video card to a faster one actually led to some improvement biggrin.gif Back than I wrote it's faster than CPU encoding on 2 threads (and I mentioned speeds like ~70x at 3.33GHz). Now it's twice as fast.

Is this encoder compatible with an onboard HD3200, or it's something newer...?

This post has been edited by alvaro84: Nov 16 2010, 14:39
Go to the top of the page
+Quote Post
Case
post Nov 19 2010, 18:16
Post #233





Group: Developer (Donating)
Posts: 2182
Joined: 19-October 01
From: Finland
Member No.: 322



Just got GTX 580 and had to test encoding speeds. FlacCL 0.2 seems to lose to old GTX 285 with compression ratios >8 but wins with the others. Once again --cpu-threads 2 setting was fastest for me but this card got a bit slower with --slow-gpu setting.

Attached Image
Go to the top of the page
+Quote Post
viktor
post Nov 19 2010, 18:26
Post #234





Group: Members
Posts: 297
Joined: 17-November 06
Member No.: 37682



reporting that i got further with the 10.11 driver, but still fails:



maybe it'll get even further with 10.12...

This post has been edited by viktor: Nov 19 2010, 18:27
Go to the top of the page
+Quote Post
Wombat
post Nov 19 2010, 19:10
Post #235





Group: Members
Posts: 986
Joined: 7-October 01
Member No.: 235



QUOTE (Case @ Nov 19 2010, 19:16) *
Just got GTX 580 and had to test encoding speeds. FlacCL 0.2 seems to lose to old GTX 285 with compression ratios >8 but wins with the others. Once again --cpu-threads 2 setting was fastest for me but this card got a bit slower with --slow-gpu setting.

Attached Image


Funny to see 1400x speed, holy sh... smile.gif

Btw. i found some files flaccl failes to encode, so if anyone else finds some, Gregory fixed it and will hopefully release another version soon.
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Nov 25 2010, 22:51
Post #236





Group: Developer
Posts: 690
Joined: 2-October 08
From: Ottawa
Member No.: 59035



QUOTE (Kykc @ Nov 12 2010, 16:10) *
Thanks! Works perfect on my gtx470. Any chance, that this encoder will support non 16bits per sample format?

Maybe. If i figure out how to do 64-bit arithmetic effectively on GPU.

QUOTE (alvaro84 @ Nov 16 2010, 17:20) *
Is this encoder compatible with an onboard HD3200, or it's something newer...?

Unfortunately HD3xxx and HD4xxx do not seem to support OpenCL properly.

QUOTE (Case @ Nov 19 2010, 21:16) *
Just got GTX 580 and had to test encoding speeds.

Thanks a lot! Your graphs are very helpful as always.

QUOTE (viktor @ Nov 19 2010, 21:26) *
reporting that i got further with the 10.11 driver, but still fails:

I'm afraid i give up on HD4XXX... It was bad enough it doesn't support atomics, but it doesn't seem to support barrier synchronization properly either, and i can't think of a way to do without them.

QUOTE (Wombat @ Nov 19 2010, 22:10) *
Funny to see 1400x speed, holy sh... smile.gif

Let's hope it can do even better smile.gif


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Nov 25 2010, 23:01
Post #237





Group: Developer
Posts: 690
Joined: 2-October 08
From: Ottawa
Member No.: 59035



Attached File  flaccl03.rar ( 114.43K ) Number of downloads: 330

Supported devices:

1) NVIDIA Geforce 4XX (Fermi) and older GF200 series GPUs
Requires fresh drivers (e.g. http://www.nvidia.com/object/win7-winvista...hql-driver.html)
2) ATI Radeon HD 5XXX
Requires fresh drivers (e.g. http://sites.amd.com/us/game/downloads/Pag...on_win7-64.aspx)
Be sure to download "AMD Catalyst™ Accelerated Parallel Processing (APP) Technology Edition", not "Catalyst Software Suite (64 bit) English Only". This contains both display drivers and opencl.
Option to select opencl platform if you have both NVIDIA and AMD installed on single computer: --opencl-platform "ATI Stream"

3) Multicore CPU
Requires "AMD Catalyst™ Accelerated Parallel Processing (APP) Technology Edition" or "Intel OpenCL SDK" (alpha version, 32-bit systems only) http://software.intel.com/en-us/articles/intel-opencl-sdk/
Option to use CPU encoding: --opencl-type cpu --opencl-platform "Intel OpenCL" (or "ATI Stream")

New in this version: experimental option --fast-gpu forces encoder to do even more work on GPU, which can slow things up, but can be a bit faster if you are limited by PCIe transfer speeds in lower compression modes, or it can be effective if you don't want to give additional cpu threads to encoder with --cpu-threads, or if you use --verify. Which you should, if you use --fast-gpu, because it's experimental and it might corrupt your data.

This version processes 32 frames at a time (previous did 16), to better utilize high-end GPUs, but it can make slower GPU a bit unresponsive during encoding, in which case you can use option --task-size 16.

This post has been edited by Gregory S. Chudov: Nov 25 2010, 23:03


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Wombat
post Nov 26 2010, 01:37
Post #238





Group: Members
Posts: 986
Joined: 7-October 01
Member No.: 235



I did encode a bunch of files. The ones that caused an error before encode without a problem now!

-8 with verify and 2 threads on 965P/GTX260/q9550 doesnŽt run much faster with "--fast-gpu" enabled. Seems my GTX260 is more on the slow side meanwhile. Strangely this switch uses ~ the same amount of video memory as without. I expected a change here.

Thanks for the new version, good to know it works on Fermi now.

Go to the top of the page
+Quote Post
viktor
post Nov 26 2010, 08:32
Post #239





Group: Members
Posts: 297
Joined: 17-November 06
Member No.: 37682



i'll keep on testing this with each new catalyst release, maybe it's just a driver problem. or have you been told hd4xxx is unable to do this by hw?
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Nov 26 2010, 08:50
Post #240





Group: Developer
Posts: 690
Joined: 2-October 08
From: Ottawa
Member No.: 59035



This version can work without atomics, which are not supported by hd4xxx hardware, and the rest of the problem might in theory be driver related, but i don't think AMD would fix it if it could. They are more interested in selling new cards then extending the life of old ones.


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
viktor
post Nov 26 2010, 10:51
Post #241





Group: Members
Posts: 297
Joined: 17-November 06
Member No.: 37682



QUOTE (Gregory S. Chudov @ Nov 26 2010, 09:50) *
This version can work without atomics, which are not supported by hd4xxx hardware, and the rest of the problem might in theory be driver related, but i don't think AMD would fix it if it could. They are more interested in selling new cards then extending the life of old ones.


ah, i see the point there smile.gif
Go to the top of the page
+Quote Post
Case
post Nov 26 2010, 18:17
Post #242





Group: Developer (Donating)
Posts: 2182
Joined: 19-October 01
From: Finland
Member No.: 322



Seems to be somewhat faster than the previous version. On my system '--fast-gpu' switch is faster up to compression mode -7, -8 and higher are faster without it. Option '--fast-gpu' combined with any '--cpu-threads' setting slows things down regardless of compression mode. This time '--cpu-threads 4' was faster than the usual '--cpu-threads 2'. Also for some reason compression speed results varied from run to run a bit more than normally so I ran each test 3-6 times and picked the fastest results for the stats.
Highest compression modes start to be so close to each other in performance that the graph gets unclear there. CL finally beats old CUDA results in speed for me but filesize is a bit larger.
Attached Image
Attached Image


Edit: added benchmark results for Radeon HD 5870 with the new FLACCL v0.3 encoder.

This post has been edited by Case: Nov 28 2010, 09:25
Go to the top of the page
+Quote Post
Wombat
post Nov 26 2010, 19:06
Post #243





Group: Members
Posts: 986
Joined: 7-October 01
Member No.: 235



QUOTE (Case @ Nov 26 2010, 19:17) *
CL finally beats old CUDA results in speed for me but filesize is a bit larger.


I wonder if it is possible to squeeze out some more kbs, or at least the same compression as FlaCuda? At these speeds iŽd trade some better compression for speed.
Go to the top of the page
+Quote Post
alvaro84
post Nov 29 2010, 17:40
Post #244





Group: Members
Posts: 128
Joined: 9-August 06
Member No.: 33830



Ugh... it's almost scary. I've seen speeds above 200x during conversion when it was mid-files... decoding them from TAK itself wasn't much faster than 400x, it became a bottleneck when encoding FLAC blink.gif

And it's a stock clocked radeon 5670 with passive cooling...

This post has been edited by alvaro84: Nov 29 2010, 17:49
Go to the top of the page
+Quote Post
Nowings69
post Jan 29 2011, 13:33
Post #245





Group: Members
Posts: 95
Joined: 22-December 09
From: nicyoume
Member No.: 76223



There is not an official logo yet(I dont have Fermi series GPU too)
so this is simple
FLACCL

FlaCuda



Go to the top of the page
+Quote Post
Wombat
post Feb 23 2011, 01:39
Post #246





Group: Members
Posts: 986
Joined: 7-October 01
Member No.: 235



Just curious. Did anyone get the Cuda version somehow running on a Fermi GPU?
Go to the top of the page
+Quote Post
Wombat
post Mar 7 2011, 16:20
Post #247





Group: Members
Posts: 986
Joined: 7-October 01
Member No.: 235



Just a tiny quirk for FLACCL#0.4 that came with CueTools 2.11. When using it with a groupsize fo 256 i get "Error: size reported incorrectly" and it crashes.
Go to the top of the page
+Quote Post
motion_blur
post Mar 8 2011, 23:18
Post #248





Group: Members
Posts: 13
Joined: 8-March 11
Member No.: 88816



It is not getting better with the bitrates.
My test sample: Opeth - For Absent Friends. The sample is very "basic", just two guitars.

FLAC 1.2.1 -8 472 kbps
libFlake#0.1 -11 457 kbps
FlaCuda#.91 -11 460 kbps
FLACCL#0.3 -11 461 kbps
FLACCL#0.4 -11 462 kbps

This post has been edited by motion_blur: Mar 8 2011, 23:22
Go to the top of the page
+Quote Post
Wombat
post Mar 8 2011, 23:32
Post #249





Group: Members
Posts: 986
Joined: 7-October 01
Member No.: 235



QUOTE (motion_blur @ Mar 8 2011, 23:18) *
It is not getting better with the bitrates.
My test sample: Opeth - For Absent Friends. The sample is very "basic", just two guitars.

FLAC 1.2.1 -8 472 kbps
libFlake#0.1 -11 457 kbps
FlaCuda#.91 -11 460 kbps
FLACCL#0.3 -11 461 kbps
FLACCL#0.4 -11 462 kbps


What you want with that? Do encode some hundred files and report again.
Go to the top of the page
+Quote Post
Wombat
post Mar 9 2011, 01:16
Post #250





Group: Members
Posts: 986
Joined: 7-October 01
Member No.: 235



Here i did a quick and dirty comparison, 414 files of mixed music, -8

FlaCuda#0.91
9.25GB (9 934 589 919 bytes), 739kbps
FlacCL#0.3 Groupsize 256
9.25GB (9 936 741 894 bytes), 739kbps
FlacCL#0.4
9.25GB (9 936 310 351 bytes), 739kbps
Flake#0.1
9.25GB (9 941 925 617 bytes), 739kbps

So ~2MB difference with 9GB of music is not really a degeneration for the OpenCL port.
Seing the GPU encoder speeds against Flake#0.1 for encoding is a funny eye opener still smile.gif

Edit: i never liked the idea flake and alike encoders can encode non-standard files, so -11 is a setting iŽll never touch. Better use a different codec then.

This post has been edited by Wombat: Mar 9 2011, 01:24
Go to the top of the page
+Quote Post

18 Pages V  « < 8 9 10 11 12 > » 
Reply to this topicStart new topic
5 User(s) are reading this topic (4 Guests and 0 Anonymous Users)
1 Members: bilbo

 



RSS Lo-Fi Version Time is now: 30th July 2014 - 14:08