AMD Kaveri Docs Reference Quad-Channel Memory Interface, GDDR5 Option
by Anand Lal Shimpi on January 16, 2014 10:51 PM ESTOur own Ryan Smith pointed me at an excellent thread on Beyond3D where forum member yuri ran across a reference to additional memory controllers in AMD's recently released Kaveri APU. AMD's latest BIOS and Kernel Developer's Guide (BKDG 3.00) for Kaveri includes a reference to four DRAM controllers (DCT0 - 3) with only two in use (DCT0 and DCT3). The same guide also references a Gddr5Mode option for each DRAM controller.
Let me be very clear here: there's no chance that the recently launched Kaveri will be capable of GDDR5 or 4 x 64-bit memory operation (Socket-FM2+ pin-out alone would be an obvious limitation), but it's very possible that there were plans for one (or both) of those things in an AMD APU. Memory bandwidth can be a huge limit to scaling processor graphics performance, especially since the GPU has to share its limited bandwidth to main memory with a handful of CPU cores. Intel's workaround with Haswell was to pair it with 128MB of on-package eDRAM. AMD has typically shied away from more exotic solutions, leaving the launched Kaveri looking pretty normal on the memory bandwidth front.
In our Kaveri review, we asked the question whether or not any of you would be interested in a big Kaveri option with 12 - 20 CUs (768 - 1280 SPs) enabled, basically a high-end variant of the Xbox One or PS4 SoC. AMD would need a substantial increase in memory bandwidth to make such a thing feasible, but based on AMD's own docs it looks like that may not be too difficult to get.
There were rumors a while back of Kaveri using GDDR5 on a stick but it looks like nothing ever came of that. The options for a higher end Kaveri APU would have to be:
1) 256-bit wide DDR3 interface with standard DIMM slots, or
2) 256-bit wide GDDR5 interface with memory soldered down on the motherboard
I do wonder if AMD would consider the first option and tossing some high-speed memory on-die (similar to the Xbox One SoC).
All of this is an interesting academic exercise though, which brings me back to our original question from the Kaveri review. If you had the building blocks AMD has (Steamroller cores and GCN CUs) and the potential for a wider memory interface, would you try building a high-end APU for the desktop? If so, what would you build and why?
I know I'd be interested in a 2-module Steamroller core + 20 CUs with a 256-bit wide DDR3 interface, assuming AMD could stick some high-bandwidth memory on-die as well. More or less a high-end version of the Xbox One SoC. Such a thing would interest me but I'm not sure if anyone would buy it. Leave your thoughts in the comments below, I'm sure some important folks will get to read them :)
127 Comments
View All Comments
PEJUman - Thursday, January 16, 2014 - link
my thoughts exactly! When I finally digested the Kaveri reviews, I wondered if these 6-8 CUs @ 700Mhz are bandwidth starved. and wondered what if it had L3 with 256MB GDDR5 onboard/package.In fact this realization stopped me from buying one (and I really, really, really wanted AMD to stay alive). Worse, the controller seems to top out at ~2400Mhz DDR3.
Intel had their strategy right: realizing that iGPU is yet not able to do AAA games at 1080p and opted to optimize 720p and lower to playable territory (lower # of 'CUs' with bandwidth stopgap : crystalwell).
On the other hand, I think AMD have a good understanding of balanced design. IMHO Temash is very balanced: single channel DDR3 @ 1600, feeding 2 CU @ 300 + 4 LP cores. ~1/3 of Kaveri's bandwidth for 1/6 ~ 1/8 Kaveri's compute.
If AMD would have earned my money if they came out with Quad/Triple channels DDR3 or Dual Channel DDR3 + GDDR5 L3..
Bindibadgi - Thursday, January 16, 2014 - link
The quad-channel interface is very likely to be aimed at GDDR5. There's almost no chance of 256-bit DDR3. 35-45W Kaveri with 128 to 256-bit GDDR5 in a thin, light laptop would be very price:performance competitive with Intel. It would even have "Radeon" graphics branding, which is essential in some markets that are conditioned after years of only buying discrete GPUs (even if they are very low end).There's no JEDEC spec for socketed GDDR5 so it will have to be soldered down, which makes PC motherboards out of the question. Even if you think niche mini-ITX would be cool it won't happen because the price would be "astronomical" compared to other boards, and there's limited facility to express it has memory built in.
What BGA options does AMD offer for mobile?
Also you'll never see Xbox-size SoC in retail - those divisons are completely separate and if AMD wants to grow its custom semi-con business it wouldn't push it to gaming PCs that directly compete against Xboxes.
smartypnt4 - Thursday, January 16, 2014 - link
I'm not sure AMD would care if it competes with itself against Xboxes and PS4's, since (as has been stated in every XB1 vs. PS4 argument ever), it's the games that sell the console, not the hardware. Even if such a mini ITX build does pull buyers away from consoles, why should AMD care?Also, I do agree that a monolithic die that size would never make it to consumer retail, which I personally feel is a damn shame.
I hadn't thought about the laptop aspect, though... That's incredibly insightful and a very interesting proposition. That might actually even work. I was miffed with Intel because they didn't include Iris Pro on any lower wattage (<35W) SKUs because I feel that Iris Pro would've helped something like the retina MacBook Pro quite a bit. If AMD could do a high-bandwidth (relatively) low-ish power SoC first, that'd be killer.
Bindibadgi - Thursday, January 16, 2014 - link
Of course AMD should care, because their customers care. Why would x customer go to AMD to develop and pay for a new chip, if AMD then spins it off for itself competing with said customer? If only games sold consoles no one would even care the slightest that PS4 is more powerful or Wii U is significantly underpowered, yet it's all fanboys yap on about.RE: Laptops - due to costs though you'll likely only ever see 128-bit GDDR5 used, because AMD has never yet commanded a premium price in notebooks. 256-bit requires more PCB layers, more RAM chips, more dev time; whereas 128-bit is a relatively easy and cheap 4-layer affair. I'm not sure how well GloFo's 28HPM process can scales down in voltage/power, and AMD's current core designs require frequency (and thus voltage) to achieve significant IPC, which is the enemy of low power.
Iris Pro with L4 cache requires a LOT of extra transistors and WideIO packaging tech, which is very cost intensive still. Therefore it must be packed it with a premium chip to make it worthwhile. More EUs is still cheaper to add than L4, so a 40EU+2C/4T chip is the better cost and thermal alternative.
smartypnt4 - Friday, January 17, 2014 - link
Oh, I'm aware it's incredibly cost prohibitive to do a big L4 like that. But I still maintain that a 30W SKU with 2 cores and Iris Pro would've been nice. You'd save a little money on the die size (admittedly not much though), but doing 2 cores would keep the power manageable.And the point about not hurting relations with Sony and MSFT is a good one. Why bite the hand that feeds you, etc. From a consumer standpoint, AMD doesn't care much I'd wager because they're still selling chips, but partner relations is another matter I didn't think of.
Bindibadgi - Friday, January 17, 2014 - link
Iris Pro's lowest TDP is 55W or 45W with 64MB L4 AFAIK. Either way, to hit 30W on 22nm (not sure if the L4 eDRAM, which is a separate die, is also 22nm) would require a very low CPU/GPU clock to compensate or halving/thirding the L4 again, which would not bring benefit.AMD cares about selling chips - for sure - but ALL tech companies care about IP protection and long term existence. Trust if fundamental to doing tech business they rely on others to work with them, so IP considerations are a big deal. Visit TSMC, for example, is like going into a fortified government facility.
smartypnt4 - Friday, January 17, 2014 - link
Iris Pro is available on 57W and 47W processors. The clock rates change by 400MHz between the Iris Pro and regular SKUs (2.0/2.4, 2.3/2.7, 2.4/2.8 for HD5200/HD4600) for nominal clock rates. The Turbo rates change by 300MHz.I hadn't actually looked at those before. Much higher than I thought. I'd wonder how much of that is due to differences in configuration between the 40EU (HD5X00) and 20 EU (HD4{6,4}00), and how much is due to the additional power needed by the eDRAM and its accompanying changes (pinout, controller, etc.). I suppose it likely wouldn't be worth it to do eDRAM on a dual core chip if the total cost of the chip would be basically the same as a quad core anyway, just with slightly lower TDP.
In essence, yes, you're correct. I do hope they do a dual core Broadwell SKU with eDRAM, though. If the GPU improvements are even close to what Intel says they are, they're gonna need a higher bandwidth memory solution.
mczak - Thursday, January 16, 2014 - link
No, 256bit gddr5 would make no sense at all. The chip simply isn't fast enough to be able to benefit from that. 128bit gddr5 or 256bit ddr3 are options which would make sense.That said, it's not just those docs talking about 4 channels which got some discussions going. If you look at the die shots compared to llano and trinity, the ddr3 area seems doubled (on the upper edge for llano/trinity, lower edge for Kaveri) - you can even compare that to quad-channel (xeon) sandy bridge chips and it looks pretty similar too. Which would imply those 4 controllers are really present (but not wired for fm2+ chips). But this is just speculation.
Gabrielsp85 - Thursday, January 16, 2014 - link
isn't X1 DDR3 at 256bit?hojnikb - Friday, January 17, 2014 - link
nope