IPB

Welcome Guest ( Log In | Register )

2 Pages V  < 1 2  
Reply to this topicStart new topic
aoTuVbeta6.02
john33
post May 5 2011, 11:06
Post #26


xcLame and OggDropXPd Developer


Group: Developer
Posts: 3760
Joined: 30-September 01
From: Bracknell, UK
Member No.: 111



QUOTE (lvqcl @ May 1 2011, 22:05) *
In the meantime you can test my compile.
It doesn't have built-in FLAC reader and resampler but since you use oggenc2 as encoding backend for foobar2000 they are useless anyway.

What compiler optimisations are you using?


--------------------
John
----------------------------------------------------------------
My compiles and utilities are at http://www.rarewares.org/
Go to the top of the page
+Quote Post
Anakunda
post May 5 2011, 11:49
Post #27





Group: Members
Posts: 450
Joined: 24-November 08
Member No.: 63072



I have compared speeds of my test oggenc compile posted earlier with Rarewarez's and the speeds are up to pair, used IntelC++ with maximum optimizations, I'd be curious too about the LancerMod compiling params, maybe reduced floating precission? But still I don't expect so much speedup from only that.

This post has been edited by Anakunda: May 5 2011, 11:51
Go to the top of the page
+Quote Post
Steve Forte Rio
post May 5 2011, 13:02
Post #28





Group: Members
Posts: 441
Joined: 4-October 08
From: Ukraine
Member No.: 59301



Quick comparison (single thread):

john33 (x64) - 45.15x realtime

lvqcl (x64, SSE3) - 60.50x realtime

Windows 7 x64, Intel Core i3 530
Go to the top of the page
+Quote Post
lvqcl
post May 5 2011, 17:59
Post #29





Group: Developer
Posts: 3325
Joined: 2-December 07
Member No.: 49183



Some SSE optimizations from Lancer code (for aoTuV 5) are still applicable for aoTuV 6. But the speed increase is 25...30% only.

My tests (Core2 Q9300 @2.5 GHz):
CODE
venc: 20.9x realtime

Rarewares compiles:
generic: 21.2x
P4: 34.5x
x64: 37.1x

My compiles of oggenc2 without code from Lancer:
32-bit: 34.2x
x64: 36.8x
(almost the same as oggenc2 from Rarewares)

My compiles of oggenc2 with code from Lancer (these were uploaded):
32-bit SSE: 38.1x
32-bit SSE2: 46.1x
32-bit SSE3: 46.0x

64-bit SSE2: 47.8x
64-bit SSE3: 48.9x

I DIDN'T test these compiles on AMD processors.


QUOTE (john33 @ May 5 2011, 14:06) *
What compiler optimisations are you using?

Compiler: MSVS 2010 SP1 + Intel Composer XE 2011 upd3.
Options:
Whole program optimization = Yes
C/C++ optimization: /O3 /Ob2 /Oi /Ot /Qipo
Code Generation: /GF /MT /GS- /arch:SSE3 /fp:fast=2

Since your compiles are a bit faster (well, less that 1%, but anyway) may I ask about your compiler options?
Go to the top of the page
+Quote Post
john33
post May 5 2011, 18:45
Post #30


xcLame and OggDropXPd Developer


Group: Developer
Posts: 3760
Joined: 30-September 01
From: Bracknell, UK
Member No.: 111



QUOTE (lvqcl @ May 5 2011, 17:59) *
Some SSE optimizations from Lancer code (for aoTuV 5) are still applicable for aoTuV 6. But the speed increase is 25...30% only.
...

I didn't realise that you had ported some of the Lancer mods.
QUOTE (lvqcl @ May 5 2011, 17:59) *
Compiler: MSVS 2010 SP1 + Intel Composer XE 2011 upd3.
Options:
Whole program optimization = Yes
C/C++ optimization: /O3 /Ob2 /Oi /Ot /Qipo
Code Generation: /GF /MT /GS- /arch:SSE3 /fp:fast=2

Since your compiles are a bit faster (well, less that 1%, but anyway) may I ask about your compiler options?

Compiler: MSVS 2008 + Intel Compiler 11.1.067.
Options:
Whole program optimization = No
C/C++ optimisation: /O3 /Ob2 /Oi /Ot /Og /Qip /Qfp-speculation:fast
Code Generation: /GF /EHsc /MT /GS /QaxSSSE3 /fp:fast
(That's for x64)
I've not tried fast=2, does that win you anything?

The P4 compile is the same except: /arch:IA32 /QaxSSE2 in place of /QaxSSSE3

This post has been edited by john33: May 5 2011, 18:49


--------------------
John
----------------------------------------------------------------
My compiles and utilities are at http://www.rarewares.org/
Go to the top of the page
+Quote Post
lvqcl
post May 5 2011, 19:38
Post #31





Group: Developer
Posts: 3325
Joined: 2-December 07
Member No.: 49183



QUOTE (john33)
I've not tried fast=2, does that win you anything?

I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).

Destroid: can you patch oggenc2 from Rarewares with iccpatch utility (several are mentioned on this page) and test again?
Go to the top of the page
+Quote Post
john33
post May 5 2011, 19:48
Post #32


xcLame and OggDropXPd Developer


Group: Developer
Posts: 3760
Joined: 30-September 01
From: Bracknell, UK
Member No.: 111



QUOTE (lvqcl @ May 5 2011, 19:38) *
QUOTE (john33)
I've not tried fast=2, does that win you anything?

I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).
...

I'll give it a try. wink.gif


--------------------
John
----------------------------------------------------------------
My compiles and utilities are at http://www.rarewares.org/
Go to the top of the page
+Quote Post
_mē_
post May 5 2011, 20:43
Post #33





Group: Members
Posts: 231
Joined: 6-April 09
Member No.: 68706



QUOTE
I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).

But how about sound quality? Is it affected? You know, 0.3% ain't much.

This post has been edited by _mē_: May 5 2011, 20:43
Go to the top of the page
+Quote Post
Destroid
post May 6 2011, 12:22
Post #34





Group: Members
Posts: 545
Joined: 4-June 02
Member No.: 2220



QUOTE (lvqcl @ May 5 2011, 18:38) *
Destroid: can you patch oggenc2 from Rarewares with iccpatch utility (several are mentioned on this page) and test again?
I am sorry to inform that I have not tried compiling these encoders before.
But... I can concur with your some of your other benches:
QUOTE
My tests (Core2 Q9300 @2.5 GHz):
CODE
venc: 20.9x realtime
...
My compiles of oggenc2 without code from Lancer:
32-bit: 34.2x

If in regards to the ICL "bias" in disfavor of AMD, I'm not 100% sure if this is the case.

Would it be worth asking john33 like to attempt compiles of MSVC that used SSE/2? I thought the generic compile only ended at ASM (just an half-wit suggestion).

edit: lvqcl- just realized it is patch, not compiler thing, report back when later. Also, I seem to recall something about 'early' SSE2 vs. 'true' SSE2 instruction unsure.gif after all, this is early Athlon64 processor and dilapidated :\

edit2: quick test of iccpatch definitely improved Rarewares P4 compile on my AMD about 15-20 percent at default Vorbis rate -q 3 setting.

This post has been edited by Destroid: May 6 2011, 12:51


--------------------
"Something bothering you, Mister Spock?"
Go to the top of the page
+Quote Post
Destroid
post May 6 2011, 21:10
Post #35





Group: Members
Posts: 545
Joined: 4-June 02
Member No.: 2220



Back with a new batch of test results. Same commentary track as previous test in this thread but at -q3 (still overkill bitrate). Threw in blacksword lancer, which I included only as a perspective on optimizations.
CODE
Oggenc 2.83 aoTuv 5 Lancer 20061103 SSE2 31.956x 89.0 kb/s
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 25.679x 89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCPATCH 20.078x 89.9 kb/s
Venc aoTuV 6.03 13.381x 89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4 12.335x 89.9 kb/s

The ICCPATCH really does have quite an impact on this particular AMD processor running Rarewares P4 compile.


--------------------
"Something bothering you, Mister Spock?"
Go to the top of the page
+Quote Post
Destroid
post May 7 2011, 07:03
Post #36





Group: Members
Posts: 545
Joined: 4-June 02
Member No.: 2220



I re-ran the Vorbis tests again, this time at -q2. Tested effect of ICCpatch on lvqcl's compile and changed to last Blacksword compile (1 whole week newer). I was also curious to test LAME compiles from Rarewares with ICCpatch. Here's the results:

CODE
using test WAV 16bit, 48KHz, 2ch, 1,025,507,372 bytes

encoder & version (all run at -q2) time rate filesize
_____________________________________________ ______ _______ ________________
Oggenc 2.83 aoTuv 5 Lancer 20061110 SSE2 2m 57s 30.196x 52,521,704 bytes
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 3m 27s 25.801x 51,621,665 bytes
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 w/ICCpatch 3m 34s 24.959x 51,621,665 bytes
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCpatch 4m 30s 19.782x 51,621,285 bytes
Venc aoTuV 6.03 6m 22s 13.978x 51,621,326 bytes
OggEnc 2.87 aoTuV 6.03 john33 P4 7m 10s 12.421x 51,621,530 bytes

Foobar2000 bit-compare tracks:
OGG files of lvqcl patched vs. unpatched = No differences in decoded data found
OGG files of john33 patched vs. unpatched = Differences found: 47294972 sample(s), starting at 3.2973333 second(s), peak: 0.0511622 at 4980.8489065 second(s), 2ch


version (all run at -V6) time rate filesize
___________________________ ______ _______ ________________
LAME 3.98.4 4m 52s 18.256x 60,421,848 bytes
LAME 3.98.4 (ICCpatch) 4m 46s 18.673x 60,421,848 bytes
LAME 3.99 beta 0 6m 29s 13.706x 59,409,552 bytes
LAME 3.99 beta 0 (ICCpatch) 4m 36s 19.306x 59,409,552 bytes

Foobar2000 bit-compare tracks:
MP3 files of 3.98.4 patched vs. unpatched = No differences in decoded data found
MP3 files of 3.99 beta 0 patched vs. unpatched = No differences in decoded data found


--------------------
"Something bothering you, Mister Spock?"
Go to the top of the page
+Quote Post
lvqcl
post May 7 2011, 10:09
Post #37





Group: Developer
Posts: 3325
Joined: 2-December 07
Member No.: 49183



QUOTE (Destroid @ May 7 2011, 00:10) *
CODE
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2        25.679x   89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCPATCH    20.078x      89.9 kb/s

As I said, my compiles (with some optimizations from Lancer) are 25...30% faster than pure C code. 25.679/20.078 = 1.28, as expected.

QUOTE (Destroid @ May 7 2011, 00:10) *
CODE
OggEnc 2.87 aoTuV 6.03 john33 P4        12.335x      89.9 kb/s

IMHO using /arch:.... option in addition to (or instead of) /Qax... should increase encoding speed on non-Intel processors.
Go to the top of the page
+Quote Post

2 Pages V  < 1 2
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 24th July 2014 - 12:29