integer multiplications on IA32 architecture.
I am used to working with Assembly Language Programming on the Pentium processor generation( 166 - 200 Mhz MMX). I noticed that for operations like int16 and int32 multiplications / divisions, it used to take as long as 20 clock cycles to complete the an instruction execution. However I noticed that on a Celeron processor, (using the VTune 7.0 evaluation kit from Intel's website) it takes on 1 clock cycle to execute.. Could anyone verify this? In the past, we would use a combination of shift and add operations to implement integer multiplications / divisions.

Another question, I have a Celeron 650 Mhz.. I thought that the Celeron Processor is almost identical to PII? But someone just said that the latest versions of the Celeron processors are based on the new P4 architecture???  I am wondering if mine would support SSE2 instructions.

The very first Celerons (PPGA, not FCPGA) were PII's with less L2 cache.
Starting at the 533MHz clock rate (and going to about 1.4GHz), the Celerons were PIII's architecture with less L2 cache, meaning SSE (1 not 2).
This is the type of Celeron you have.

After 1.4Ghz or so, the Celeron moved to the NetBurst Architecture (PIV less cache, SSE2).

edit: note that there were 500 and 533 MHz PPGA AND FCPGA Celerons, the former being quite easy to spot because of the heat spreader. For more informaiton visit sandpile.org

