Tweak3D - Your Freakin' Tweakin' Source!
Intel Pentium 4 Guide (Page 5/17)


Posted: December 10, 2000
Written by: Tuan "Solace" Nguyen

Advanced Dynamic Execution

With the Pentium 4’s 20-stage hyper pipeline, processing instructions should be a breeze for the CPU right? Wrong.

Starting with the original Pentium, Intel introduced a technique called Branch Prediction. This technique used complex error correcting algorithms to minimize latency and increase efficiency. The processor would actually try to predict which path an instruction will take. What if a branch mispredicts happened? The instruction would have to be actually processed from the start of the pipeline. Usually the processor is above 93% correct and it didn’t take the Pentium long to correct itself since it only had a 5-stage pipeline.

The Pentium 4 however, has a 20-stage pipeline. If a branch mispredict occurred near the end of the pipeline, it would hit performance more significantly. It has to return all the way to the beginning and start all over. Intel realizes this and thus, implemented a system of complex caching and buffers to avoid such cases.


I always thought it went A B C D

The Advanced Dynamic Execution engine is a very deep, out-of-order speculative execution engine that keeps exec units constantly processing instructions. It does this by providing a large pool of instructions from which the execution units can choose. Some instructions can be executed before other depending on their function. Others that are dependant on the result of another instruction cannot be executed out of order. It would be like chewing on gum before you actually placed it into your mouth -- not possible. These instances where an instruction relies on an outcome of another is called a dependency. Dependencies obviously need time, and thus, slow down the pipeline flow. This is one of the more common forms of stalls in waiting for data to be loaded from memory on a cache miss. The NetBurst architecture can have up to 126 instructions in this pool to fetch from. Compare this to the P6’s much smaller pool of 42 instructions and you begin to see the great potentials of this processor.

You can see what a branch mispredict can do to a processor’s efficiency. The Pentium 4’s Advanced Dynamic Execution engine helps speed things up again when something like a mispredict occurs. But this isn’t the only weapon up Intel’s sleeves.

Rapid Execution Engine

Intel has left common grounds and created a very distinct arithmetic unit. Through a combination of architectural, physical and circuit design techniques, the Pentium 4’s ALU’s run at twice the frequency of the processor core. So our 1.5GHz Pentium 4 is executing arithmetic logics at a blazing 3GHz. This allows the ALU’s to execute certain instructions with a latency that is half the duration of the core clock and the results are higher execution throughput as well as reduced latency of processing arithmetic logic.


Double the ALU performance

With this technique, the Pentium 4 can have data waiting for it before it even requires it. This puts critical instructions an arm’s reach away and speeds up mathematical intense applications like those that deal with 3D rendering in real-time. Do I sense Quake 3 fragging anyone?

Next Page

  • News
  • Forums
  • Tweaks
  • Articles
  • Reviews