AMD Launches Opteron 6300 series with "Piledriver" coresby Johan De Gelas on November 5, 2012 12:00 PM EST
Taking Small Steps forward
We did an in depth analysis of the Bulldozer core and we came to the conclusion that there are three primary weak spots that resulted in the underwhelming performance of the Bulldozer core:
- The L1 instruction cache: when running two threads simultaneously, the cache misrate increased significantly; the associativity is too low.
- The branch misprediction penalty
- Lower than expected clock speed
Secondary bottlenecks were the high latency and low bandwidth of the L2 cache, and the very high latency of the L3 cache, which signficantly increased the overall memory latency.
The lack of clock speed has been partially solved in Piledriver with the use of hard edge flops and the resonant clock edge, which is especially useful for clock speeds beyond 3GHz. Vishera, the desktop chip with Piledriver cores, runs at clock speeds of up to 4GHz, 11% higher than Bulldozer, without any measureable increase in power consumption. As you can see further below, the clockspeed increase are a lot smaller for the Opteron 6300: about 4-6%. The fastest but hottest (140W TDP) Opteron now clocks at 2.8GHz instead of 2.7GHz, and the "regular" Opteron 6380 now runs at 2.5GHz instead of 2.4GHz (Opteron 6278). That means that the Opteron is still not able to fully leverage the deeply pipelined, high clockspeed architecture: the power envelope of 115W is still limiting the maximum clockspeed. The more complex and less deeply pipelined Intel Xeon E5 runs at 2.7GHz with a 115W TDP.
Piledriver also comes with a few small improvements in the branch prediction unit. Two out of three of the worst bottlenecks got somewhat wider. The most important bottleneck, the L1 Icache, is only going to be fixed with the next iteration, Steamroller.
The L2 cache latency and bandwidth has not changed, but AMD did quite a few optimizations. From AMD engineering:
"While the total bandwidth available between the L2 and the rest of the core did not change from Bulldozer to Piledriver, the existing bandwidth is now used more effectively. Some unnecessary instruction decode hint data writes to the L2 that were present in Bulldozer have been removed in Piledriver. Also, some misses sent to the L2 that would get canceled in Bulldozer are prevented from being sent to the L2 at all in Piledriver. This allows the L2’s existing resources to be applied toward more useful work.”
We talked about the whole list of other improvements when we looked at Trinity:
- Smarter prefetching
- A perceptron branch predictor that supplements the primary BPU
- Larger L1 TLB
- Schedulers that free up tokens more quickly
- Faster FP and integer dividers and SYSCALL/RET (kernel/System call instructions)
- Faster Store-to-Load forwarding
Lastly, the new Opteron 6300 can now support one DDR3 DIMM per channel at 1866MHz. With 2 DPC, you get a 1600MHz at 1.5V.
We're still working to get hardware in house for testing, but we wanted to provide some analysis of what to expect with Abu Dhabi in the meantime.