AMD's Steamroller Detailed: 3rd Generation Bulldozer Core
by Anand Lal Shimpi on August 28, 2012 4:39 PM EST- Posted in
- CPUs
- Bulldozer
- AMD
- Steamroller
Cache Improvements
The shared L1 instruction cache grew in size with Steamroller, although AMD isn’t telling us by how much. Bulldozer featured a 2-way 64KB L1 instruction cache, with each “core” using one of the ways. This approach gave Bulldozer less cache per core than previous designs, so the increase here makes a lot of sense. AMD claims the larger L1 can reduce i-cache misses by up to 30%. There’s no word on any possible impact to L1 d-cache sizes.
Although AMD doesn’t like to call it a cache, Steamroller now features a decoded micro-op queue. As x86 instructions are decoded into micro-ops, the address and decoded op are both stored in this queue. Should a fetch come in for an address that appears in the queue, Steamroller’s front end will power down the decode hardware and simply service the fetch request out of the micro-op queue. This is similar in nature to Sandy Bridge’s decoded uop cache, however it is likely smaller. AMD wasn’t willing to disclose how many micro-ops could fit in the queue, other than to say that it’s big enough to get a decent hit rate.
The L1 to L2 interface has also been improved. Some queues have grown and logic is improved.
Finally on the caching front, Steamroller introduces a dynamically resizable L2 cache. Based on workload and hit rate in the cache, a Steamroller module can choose to resize its L2 cache (powering down the unused slices) in 1/4 intervals. AMD believes this is a huge power win for mobile client applications such as video decode (not so much for servers), where the CPU only has to wake up for short periods of time to run minor tasks that don’t have large L2 footprints. The L2 cache accounts for a large chunk of AMD’s core leakage, so shutting half or more of it down can definitely help with battery life. The resized cache is no faster (same access latency); it just consumes less power.
Steamroller brings no significant reduction in L2/L3 cache latencies. According to AMD, they’ve isolated the reason for the unusually high L3 latency in the Bulldozer architecture, however fixing it isn’t a top priority. Given that most consumers (read: notebooks) will only see L3-less processors (e.g. Llano, Trinity), and many server workloads are less sensitive to latency, AMD’s stance makes sense.
Looking Forward: High Density Libraries
This one falls into the reasons-we-bought-ATI column: future AMD CPU architectures will employ higher levels of design automation and new high density cell libraries, both heavily influenced by AMD’s GPU group. Automated place and route is already commonplace in AMD CPU designs, but AMD is going even further with this approach.
The methodology comes from AMD’s work in designing graphics cores, and we’ve already seen some of it used in AMD’s ‘cat cores (e.g. Bobcat). As an example, AMD demonstrated a 30% reduction in area and power consumption when these new automated procedures with high density libraries were applied to a 32nm Bulldozer FPU:
The power savings comes from not having to route clocks and signals as far, while the area savings are a result of the computer automated transistor placement/routing and higher density gate/logic libraries.
The tradeoff is peak frequency. These heavily automated designs won’t be able to clock as high as the older hand drawn designs. AMD believes the sacrifice is worth it however because in power constrained environments (e.g. a notebook) you won’t hit max frequency regardless, and you’ll instead see a 15 - 30% energy reduction per operation. AMD equates this with the power savings you’d get from a full process node improvement.
We won’t see these new libraries and automated designs in Steamroller, but rather its successor in 2014: Excavator.
Final Words
Steamroller seems like a good evolutionary improvement to AMD’s Bulldozer and Piledriver architectures. While Piledriver focused more on improving power efficiency, Steamroller should make a bigger impact on performance.
The architecture is still slated to debut in 2013 on GlobalFoundries' 28nm bulk process. The improvements look good on paper, but the real question remains whether or not Steamroller will be enough to go up against Haswell.
126 Comments
View All Comments
mantikos - Tuesday, August 28, 2012 - link
This is what Buldozer should've been from the get go pretty muchCeriseCogburn - Wednesday, August 29, 2012 - link
So they're going to build..First they Bulldoze the place
Then they bring in the Piledriver laying the postings
Then the Steamroller for the building surroundings
Next comes Excavator ! to destroy all the former work
Great plan amd...
shtldr - Thursday, August 30, 2012 - link
And then will come the ultimate AMD CPU, called Undertaker, and bury the company once and for all.rarson - Tuesday, September 18, 2012 - link
Yeah, because screw competition. I want crappy products at high prices. Long live Intel!MrSpadge - Wednesday, August 29, 2012 - link
It looks really promising, indeed. Lot's of fine tuning there, actually more than just "fine". And they don't need to beat Intel for top performance anyway, just keep up the pressure and give us good mainstream chips with solid single thread performance!CeriseCogburn - Wednesday, August 29, 2012 - link
That's what they get beat on all the time, single thread performance - oh and multi thread for that matter.They've been getting creamed on single thread, specifically.
Your exclamation point sure points to a fine fantasy never happening future though given the failure that the present is.
Must take a lot of fanboyism and some strong prozac in the water.
Spunjji - Thursday, August 30, 2012 - link
Anyone would think that seeing as he's *hoping* for better single-thread performance, that he thus knows *they don't have it now*. But no. You didn't.Must take being a catastrophic ass-hat and some serious piss on your chips for you to jump on somebody for a completely inoffensive post.
CeriseCogburn - Friday, October 12, 2012 - link
Oh, from the uker amd fanboy where prozac is in the water, who prays to god for a return to what he misses about amd.News flash idiot: no one pisses on chips here. That's done where you live.
Now : " just keep up the pressure and give us good mainstream chips with solid single thread performance! "
Anyone with a brain would think we're already there, and will continue to be, pissy boy.
d3mag0gu3 - Tuesday, April 2, 2013 - link
Lol dude. You sound about three years old. Go sit in the corner until you can participate in class properly.redwarrior - Sunday, March 31, 2013 - link
Less and less applications are single-threaded, it's a dying part of the market. AMD is every bit as goodas Intel and better in its price class. Most apps perform betteron FX-8350 than I5 3570k. The FPS are up there with 3770k on many new games. This will only get better over the next year as more and more games offer 8 core processor support. There is absolutely no compelling reason to go Intel for cpu's under $300. With Steamroller the ascension of AMD to a BETTER alternative to Intel will only accelerate. All the initial bad reviews which were based on erroneous testing procedures and old benchmarks are proving to be ancient history and poor analysis