IDF Spring 2005 - Predicting Future CPU Architecture Trendsby Anand Lal Shimpi on March 3, 2005 7:43 PM EST
- Posted in
- Trade Shows
The successor to Pentium 4 is...
For quite some time now we've been trying to figure out what the successor to the Pentium 4's Netburst architecture would be. When the Pentium M was first released, everyone expected it to be the direct successor to the Pentium 4, but things obviously didn't work out that way.
Intel had Tejas ready to go, the successor to Prescott, but at that time it was clear that the path that they had chosen for the Pentium 4 had come to an end - limited by power. The Pentium M was a reasonable competitor, but not exactly a revolutionary successor to the Pentium 4. Based on our conversations and our experiences at IDF we're finally able to start piecing together what the eventual successor to the Pentium 4 will be. Remember that the Pentium 4 architecture will continue to exist throughout 2005 as the Pentium D and Pentium Extreme Edition, but with Intel's decision to drop the number 4 it's clear that they are ready for a departure from the Pentium 4 brand and architecture.
The first question has always been pipeline depth, will the successor to Netburst have a long pipeline like Prescott, or a short pipeline like the Pentium M. The answer appears to be somewhere in between Pentium M and Prescott, realistically being much closer to Willamette's 20 stage integer pipeline than Prescott's 31 stage pipe, for strictly power reasons. Intel is no longer doing as much research as they once were in branch prediction, indicating an end to the extreme pipeline growth that we've seen since the introduction of the Pentium 4. There has been a lot of research into areas such as continuous flow pipelines, but it's unclear whether that sort of technology will make its way into the next iteration of the Pentium line.
A lot of the lessons learned in the Pentium M will of course be applied to the Netburst successor, with Micro Ops Fusion being mentioned quite frequently. Intel management is finally aware that clock speed isn't the sole seller of CPUs, so they are more willing to design more elegant, high IPC cores at lower clock speeds this next time around - a lot of this is due to the success with Centrino (part of the reason why you see a switch to the Pentium brand name instead of Pentium 4, Pentium may very well become a platform much like Centrino).
For the next generation desktop microarchitecture, Intel still appears to be committed to the current style of big out-of-order cores, meaning that we won't see any Cell-style architectures from Intel this next time around. For the most part, we think this makes a lot of sense at the present time given the applications that are currently being run. Intel's thoughts are this; if they were to move immediately to a simpler core architecture and use a large number of them in parallel, that leaves too much opportunity for another company to build a CPU made up of fewer, more powerful cores, which on the current applications would perform better, or at least be easier to program for.
In the generation after the Pentium 4 successor, things may change as Intel has talked about having a handful of big cores and then multiple smaller cores for more specific, extremely parallel workloads. Looking at Intel's view of their microprocessors starting at around 2010, they start to appear a lot like Cell. What may inevitably happen is that Cell may be a bit ahead of its time in the marketplace.
Hyper Threading (SMT) will not die with the Pentium 4, in fact, the number of threads per core will go from 2 up to 4 threads before the end of the decade. The move to 8 threads per core won't happen anytime soon however, apparently there is a pretty sizeable performance gain by enabling 4 threads per core, but not as much when going from 4 to 8 threads.
Larger and software controlled caches will be much more common going forward, also eerily similar to the Cell architecture (the Cell SPEs only have local memory, which is similar to the idea of a software controlled cache).
You can expect a continued focus on SIMD performance, a perfect example happens to be the improvements in SIMD performance in Yonah's core that we reported on earlier.
Although we're quite convinced that an on-die memory controller would result in the best performance per transistor expended on a new architecture, we're doubtful that Intel would consider one. We may have to wait until stacked die and wafer technology before we see any sort of serious reduction in memory latency through techniques other than more caches and more cores.