IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
aoTuV 5.7/bs1 x64/x86 Compiles for AMD chips
AshenTech
post Jan 6 2010, 01:41
Post #1





Group: Members
Posts: 78
Joined: 11-November 08
Member No.: 62144



http://www.agner.org/optimize/blog/read.php?i=49

as this artical explains, Intels CPU dispatcher automatically generates multi optimization paths when compiling, Any cpu the dispatcher detects as not being Intel is sent a FAR slower code path, this means that most likely if this guide is followed
http://www.agner.org/optimize/#manual_cpp
the compiled software would endup being alot faster on non-intel cpus such as amd and via.

Was hoping somebody with the skill (john33 if hes got the time) could compile a non-cpu bias version so we can see how this extreamly bias dispatcher is effecting non-intel users.

btw, in pcmark the dif on a via nano was 47.4% perf boost by making the program see it as an intel cpu rather then via, thats HUGE......
Go to the top of the page
+Quote Post
lvqcl
post Jan 6 2010, 02:17
Post #2





Group: Developer
Posts: 3418
Joined: 2-December 07
Member No.: 49183



Obviously you can use generic aoTuV 5.7 compile from http://www.rarewares.org/ogg-oggenc.php

You can also test aoTuV 5.7 P4 compile vs. P3 vs. generic... Usually ICC compile is faster than generic (MSVC) not only on Intel but on AMD processors too (usually but not always).
Go to the top of the page
+Quote Post
AshenTech
post Jan 6 2010, 02:54
Post #3





Group: Members
Posts: 78
Joined: 11-November 08
Member No.: 62144



QUOTE (lvqcl @ Jan 5 2010, 20:17) *
Obviously you can use generic aoTuV 5.7 compile from http://www.rarewares.org/ogg-oggenc.php

You can also test aoTuV 5.7 P4 compile vs. P3 vs. generic... Usually ICC compile is faster than generic (MSVC) not only on Intel but on AMD processors too (usually but not always).


yes, but intels compiler sends AMD chips and any non-intel chip a less then optimal code path is my point, IF you can make the software THINK your using a "Genuine_Intel" cpu you get the optimal path, otherwise, intels cpu dispatcher choose a less then optimal path(manytimes still faster then other compilers work, but still slower then it should be)

This just just a request to give it a shot to see if it makes a diffrance, till intel puts out an unbias version of their compiler(they already signed an agreement with amd to do this for AMD chips, but that isnt gonna help via if they dont change the use if CPUID strings for choosing code path rather then quaring the cpu for supported features.

ars showed a 47.4% boost on the via nano by faking the cpuid string as intel, and a smaller boost(think it was like 10%) by identifying the cup as amd...

QUOTE
My my. Swap CentaurHauls for AuthenticAMD, and Nano's performance magically jumps about 10 percent. Swap for GenuineIntel, and memory performance goes up no less than 47.4 percent. This is not a test error or random occurrence; I benchmarked each CPUID multiple times across multiple reboots on completely clean Windows XP installations. The gains themselves are not confined to a small group of tests within the memory subsystem evaluation, but stretch across the entire series of read/write tests. Only the memory latency results remain unchanged between the two CPUIDs.


http://arstechnica.com/hardware/reviews/20...no-review.ars/6

if the link wont work you can use google cache to view it(its been loading VERY slow the last few days)

this is what im talking about, huge diffrances by changing cpuid string.

This post has been edited by AshenTech: Jan 6 2010, 03:08
Go to the top of the page
+Quote Post
roozhou
post Jan 28 2010, 17:52
Post #4





Group: Members
Posts: 14
Joined: 18-March 08
Member No.: 52124



And why not manually optimize the time critical part by assembly, just like ffmpeg and x264 devels have done?
Go to the top of the page
+Quote Post
Yirkha
post Feb 13 2010, 03:43
Post #5





Group: FB2K Moderator
Posts: 2359
Joined: 30-November 07
Member No.: 49158



Because unless you can find a nice mathematical or programming trick how to do things differently, the extra work is rarely worth it with todays optimizing compilers.


--------------------
Full-quoting makes you scroll past the same junk over and over.
Go to the top of the page
+Quote Post
forart.eu
post Mar 7 2010, 14:59
Post #6





Group: Members
Posts: 74
Joined: 10-December 09
From: italy
Member No.: 75798



Another interesting solution could be adopt Orc* that - according to the newest Schrödinger release - seems to optimize code very mutch:
QUOTE
  • Orc: Complete conversion to Orc and removal of liboil dependency.
  • Added a lot of orc code to make things faster. A lot faster.
QUOTE
we’ve switched over to using Orc instead of liboil for signal processing code. Dirac is a very configurable format, and normally would require thousands of lines of assembly code — Orc generates this at runtime from simple rules. (Hey, it was easier to write Orc than write all that assembly!)


I'm not a developer (nor a binary builder), so I simply don't know if it's applicable to Vorbis (and, why not, Theora) too.

BTW, hope that inspire...

* note: ORC means Oil Runtime Compiler, not Open Research Compiler...

This post has been edited by forart.eu: Mar 7 2010, 15:27
Go to the top of the page
+Quote Post
X-Fi6
post Mar 30 2010, 08:07
Post #7





Group: Members
Posts: 13
Joined: 8-October 09
Member No.: 73798



"Runtime Compiler" really should give your answer away rolleyes.gif not to be anti-Java or anything. Though a well-developed compiler with the right optimizations should typically produce code that's faster than a self-compiling program, unless it was done right... Could we see some benchmarks of Orc?

This post has been edited by X-Fi6: Mar 30 2010, 08:09


--------------------
Mixing audio perfectly doesn't take more than onboard.
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 30th October 2014 - 19:46