AMD Kaveri Docs Reference Quad-Channel Memory Interface, GDDR5 Option
by Anand Lal Shimpi on January 16, 2014 10:51 PM ESTOur own Ryan Smith pointed me at an excellent thread on Beyond3D where forum member yuri ran across a reference to additional memory controllers in AMD's recently released Kaveri APU. AMD's latest BIOS and Kernel Developer's Guide (BKDG 3.00) for Kaveri includes a reference to four DRAM controllers (DCT0 - 3) with only two in use (DCT0 and DCT3). The same guide also references a Gddr5Mode option for each DRAM controller.
Let me be very clear here: there's no chance that the recently launched Kaveri will be capable of GDDR5 or 4 x 64-bit memory operation (Socket-FM2+ pin-out alone would be an obvious limitation), but it's very possible that there were plans for one (or both) of those things in an AMD APU. Memory bandwidth can be a huge limit to scaling processor graphics performance, especially since the GPU has to share its limited bandwidth to main memory with a handful of CPU cores. Intel's workaround with Haswell was to pair it with 128MB of on-package eDRAM. AMD has typically shied away from more exotic solutions, leaving the launched Kaveri looking pretty normal on the memory bandwidth front.
In our Kaveri review, we asked the question whether or not any of you would be interested in a big Kaveri option with 12 - 20 CUs (768 - 1280 SPs) enabled, basically a high-end variant of the Xbox One or PS4 SoC. AMD would need a substantial increase in memory bandwidth to make such a thing feasible, but based on AMD's own docs it looks like that may not be too difficult to get.
There were rumors a while back of Kaveri using GDDR5 on a stick but it looks like nothing ever came of that. The options for a higher end Kaveri APU would have to be:
1) 256-bit wide DDR3 interface with standard DIMM slots, or
2) 256-bit wide GDDR5 interface with memory soldered down on the motherboard
I do wonder if AMD would consider the first option and tossing some high-speed memory on-die (similar to the Xbox One SoC).
All of this is an interesting academic exercise though, which brings me back to our original question from the Kaveri review. If you had the building blocks AMD has (Steamroller cores and GCN CUs) and the potential for a wider memory interface, would you try building a high-end APU for the desktop? If so, what would you build and why?
I know I'd be interested in a 2-module Steamroller core + 20 CUs with a 256-bit wide DDR3 interface, assuming AMD could stick some high-bandwidth memory on-die as well. More or less a high-end version of the Xbox One SoC. Such a thing would interest me but I'm not sure if anyone would buy it. Leave your thoughts in the comments below, I'm sure some important folks will get to read them :)
127 Comments
View All Comments
testbug00 - Friday, January 17, 2014 - link
Being custom doesn't magically make it faster.The Jaguar cores in XBO are pretty much custom with cache stuff, and perhaps a bit of optimization to get higher clocks at the same power.
mczak - Saturday, January 18, 2014 - link
From all we know, the jaguar cores in both XBO and PS4 are 100% stock. This is why these systems have 2 cpu modules with 4 cores in each (both modules having 2MB L2 cache), because a standard module with jaguar cores doesn't scale further than 4 cores.But certainly there is additional logic _outside_ the cpu cores - in particular the interconnect between the cpu modules and northbridge logic very obviously is different (and more complicated - the cpu modules reportedly are cache-coherent).
SaberKOG91 - Friday, January 17, 2014 - link
I just want to weigh in her since I'm a computer engineer and this I can clarify a few things. XB1 and PS4 use the same Jaguar cores. I'm not sure where Gabrielsp85 got the idea that they aren't the same. What I do know, is that PS4 uses GDDR5 memory, while XB1 uses DDR3 + eSRAM. Since GDDR5 is more geared towards graphics workloads, Sony likely chose this option to improve graphics performance at the expensive of having to spend more time optimizing the CPU side to not be as latency bound to RAM. The XB11 on the other hand uses the more CPU centric DDR3, with eSRAM to increase bandwidth for GPU operations.All that said, I can't be sure of the specifics of the memory buses on these chips. The XB1 memory bus has been listed as 68.3 GB/s which would indicate a quad-channel DDR3 controller with 4 64-bit lanes of DDR3-2133. The PS4 on the other hand uses a GDDR5 bus (guessing a 256-bit width) for both the CPU and GPU and has a peak bandwidth of 176 GB/s. In addition there seems to be 256 MB of DDR3 for CPU "background tasks".
I can tell you right now that the PS4 is listed as having a peak throughput of 1.84 TFLOPS, while the XB1 has a peak throughput of 1.31 TFLOPS. The 1.84 TFLOPS of the PS4 is due to the 6 additional compute units adding 384 shaders, but also indicates that the GPU runs closer to 800 MHz. These numbers of course neglect bandwidth to the GPU. You would need 7.36 TB/s of bandwidth to offload all of those calculations to system memory for the PS4, and 5.28 TB/s for the XB1.
In the end, I would expect the PS4 to run faster because of the higher system memory bandwidth. Especially if a mantle-like API is used to reduce draw-call overhead to the CPU, reducing the CPU's need for bandwidth.
jabber - Friday, January 17, 2014 - link
I still reckon the reason for the DDR3/GDDR5 difference between the Xbox and the PS4 is more down to MS starting their R&D and supply contracts 6-12 months or so earlier than Sony and therefore not getting the good deal on GDDR5 at the time of enquiry. Had the timing been different they would probably both be GDDR5 equipped.jasonelmore - Friday, January 17, 2014 - link
no that's definitely not it. GDDR5 availability is scarce. That's why Microsoft could keep pumping out weekly shipments of Xbox One while PS4 was getting shipped out every 2 weeks with much less units.Also since the Xbox was built to run 2 OS's at the same time with a hyper visor, onboard cache was needed to prevent constantly having to read/write to main memory. The X1 cpu is more efficient because the majority of it's memory bandwidth is served on die at very low power compared to ps4's gddr5 constantly reading and writing and higher latencies and greater power.
Apple does the same thing with the A7, Intel is doing it with Iris Pro. They take advantage of the super low latency low power super cache's therefore making it more efficient (Note: efficient doesn't mean more powerful, however MSFT seem's to have struck a good balance)
nathanddrews - Friday, January 17, 2014 - link
Microsoft is shipping more units? Shipping data and sales data thus far would contradict that statement, with the PS4 outpacing the Xbone by about 45%. The PS4 is currently having trouble meeting demand.(For the record, I own neither and plan not to as I believe they are both underpowered, overpriced, and over-DRMed.)
As far as wanting AMD to make a higher end or more potent APU, I can't say that it would interest me at all. Let's assume that I could buy a version for $250 that was an order of magnitude more powerful than the 7850K that was design around a NUC-size. Chances are good that the entire package would be around $500 when all was said and done. Now what's my upgrade path?
A DIY mITX setup in a slightly larger enclosure with a weaker CPU and more potent GPU would like not only cost about the same (or less) but also provide more options for upgrades - at least providing one more generation of improvements for less than the cost of a new NUC+APU setup.
I'm not saying a NUC-style AMD platform would be a waste, just that the value diminishes when longevity is considered. This seems like a question to ask when (if) AMD ever delivers on their promises.
PEJUman - Friday, January 17, 2014 - link
I think AMD's 45W is still way too hot for NUC style platform. I don't think they are really chasing these down. the whole NUC thing is a bit odd, I think ITX size is already small enough for most people to simply plop on a desktop/VESA mount. Going even smaller to NUC size while increasing the price premium and reducing performance (thermal headroom, IO options and memory bandwidth are typically less on a NUC) is quite odd. at the current $500+ NUC price point, an ultrabook for another $50-200 adds an 768p/1080p IPS screen, battery backup, keyboard, plus real portability.Suffice to say additional L3 or 256 wide DDR3 will also add to the socketing requirements, again relegating the whole APU system into mATX (maybe ITX if they opted for L3 on package) minimum size.
I would imagine then your upgrade path would be as simple as swapping the APU and/or adding memory/L3 (if socketed).
nathanddrews - Friday, January 17, 2014 - link
I thought we were talking about a future iteration, like Excavator.Alexvrb - Sunday, January 19, 2014 - link
Why do you think NUC is odd? You said it yourself, they're expensive. Now you know why it exists: to boost margins. You can't do that unless you have something "different" to push, in this case primarily size as compared to previous SFF designs.With that being said, but I don't like it. I don't mind SFF in general but I like a greater degree of customization, variety, and competitive pricing. So if I want a SFF machine I look to ITX.
testbug00 - Friday, January 17, 2014 - link
No, Microsoft choose stuff that would let them scale down the price of their console faster.The GDDR5 prices will not fall any significant amount, meanwhile, the parts for XBO should.
remember, Microsoft is a few billion in the hole on the xbox brand if you look at all loses and gains just by the brand over the years. The goal is to make money.