IPB

Welcome Guest ( Log In | Register )

integer multiplications on IA32 architecture.
wkwai
post Aug 6 2003, 14:24
Post #1


MPEG4 AAC developer


Group: Developer
Posts: 398
Joined: 1-June 03
Member No.: 6943



Hi,


I am used to working with Assembly Language Programming on the Pentium processor generation( 166 - 200 Mhz MMX). I noticed that for operations like int16 and int32 multiplications / divisions, it used to take as long as 20 clock cycles to complete the an instruction execution. However I noticed that on a Celeron processor, (using the VTune 7.0 evaluation kit from Intel's website) it takes on 1 clock cycle to execute.. Could anyone verify this? In the past, we would use a combination of shift and add operations to implement integer multiplications / divisions.


wkwai
Go to the top of the page
+Quote Post
 
Start new topic
Replies
NumLOCK
post Aug 6 2003, 15:06
Post #2


Neutrino G-RSA developer


Group: Developer
Posts: 852
Joined: 8-May 02
From: Geneva
Member No.: 2002



Hi,

QUOTE
I noticed that for operations like int16 and int32 multiplications / divisions, it used to take as long as 20 clock cycles to complete the an instruction execution.

What is the 'an' instruction ?

edit: ok, if I ignore the 'an': 20 cycles seems way out of line. Ensure your mul instruction doesn't fetch its argument from memory.

QUOTE
However I noticed that on a Celeron processor, (using the VTune 7.0 evaluation kit from Intel's website) it takes on 1 clock cycle to execute.. Could anyone verify this?

If you mean 1 cycle latency for 'mul' or 'imul' 32x32bit instruction, it is impossible.
Any x86-compatible processor to date will need at the very least 2 cycles (IIRC) because of the high frequency. I think the fastest one was the K6, with 2 cycle latency and 3-cycle execution time for mul/imul.

edit: I think on K6, the 32 lowest bits were available in 2 cycles, and the higher 32 bits were available 1 cycle later.

QUOTE
In the past, we would use a combination of shift and add operations to implement integer multiplications / divisions.

Yeah.
Nowadays, it's a bit different though: thanks to improved multiplication circuitry it's usually worth using special instructions only for:

- result = n*2^k => shl reg, k
- result = 3*n+k => lea reg, [reg+2*reg+k]
- result = 5*n+k => lea reg, [reg+4*reg+k]

In most other cases the multiply will be faster. Plus (depending on your program) you'll avoid saturating the AGU (address generation unit). Also while the mul runs, you can do something else.

Regards

This post has been edited by NumLOCK: Aug 6 2003, 15:17


--------------------
Try Leeloo Chat at http://leeloo.webhop.net
Go to the top of the page
+Quote Post

Posts in this topic


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 26th July 2014 - 19:19