Server CPUs in 2010

AMD’s best core in 2010 is a slightly improved revision of the current six-core Opteron “Istanbul” with the following additions:

• Finally a “real” C1E state which reduces power for each core that is idleing
• Support for DDR-3

In theory, DDR-3 1333 offers 66% higher bandwidth, but in practice the Stream benchmark does not measure more than a 25% boost in bandwidth. The latency of going off-die is about the same. That means that the performance increase in most server applications will not be tangible. Only the most bandwidth intensive HPC applications will get a boost of 10 to 20%.

Currently, AMD's six-core Opteron can match the performance of Intel’s quadcore Xeon 5500 at the same clockspeed in some important server applications: OLAP databases, virtualization and web applications. Intel’s best Xeon wins with a significant margin in OLTP, ERP and rendering. A large part of the HPC market is a lost cause: a quadcore Intel Xeon 5570 at 2.93 GHz is about twice as fast as a AMD Opteron 2389 at 2.9 GHz. The fact that we could not find any Opteron 2435 results in LS-Dyna is another indication of what to expect: the 10-20% higher performance in HPC applications will not be a large step forward.

Intel is going to increase performance by 20-30% per CPU (50% more cores), while AMD’s CPUs will see only marginal increases. So basically, Intel’s performance advantage is going to grow by 20 to 30%, except in HPC workloads where it is already running circles around the competition. Not an enviable position to be in for AMD.

Suppose that you are the strategic brain behind AMD. The competition offers better “per chip” and “per core” performance. The last thing you want to do is to offer the same kind of server platform. If a six-core Opteron (“Lisbon") goes head to head with a six-core Xeon (“westmere EP”), it will not be pretty: the Intel chip will beat the AMD chip in performance and performance/watt (remember, westmere EP is a 32 nm CPU). Despite this, AMD found some clever ways to make their server platforms interesting…

Cheaper 4-Socket Servers

 

“Know your enemies and know yourself”.

In which usage scenario’s are Intel’s offerings less compelling? The Nehalem-EX is a powerful platform, but it is also a completely different one than the “Westmere EP” platform. The Nehalem-EX's most important market is the 4-socket/8-socket x86 market, where about 400,000 servers are sold per year, or about 5% of the total x86 server market. It is also a pretty complex platform with two I/O hubs and 16 (!) memory buffers chips on a 4-socket board. The Nehalem EX platform does not only want to conquer the high end 4 and 8-socket x86 server market, it also wants to convince the more paranoid RISC and Itanium buyers:

 
 
 
AMD uses the same building blocks for it’s midrange 4-socket platform as it does for the high-end 2-socket platform and calls it the G34 infrastructure. The consequence is that the RAS features stay the same, and as a result, AMD can not completely compete with the Nehalem EX platform when it comes to RAS. But that is not really a problem, as some of the "high-end" RAS features aren't used by 98% of the x86 crowd who buy the more expensive 2-socket and 4-socket servers. To compete with the 8 core/16 thread Nehalem EX, AMD puts two DDR3 Istanbuls together, which communicate via a hypertransport link and calls it a twelve core Opteron 6100 (Socket G34). A server based on the Opteron 6100 can probably come close to the performance of the lower-end and midrange Nehalem EX, but it is a lot cheaper to design and produce. The disadvantage is that it only has 12 DIMM slots per CPU, while the Nehalem EX has 16 DIMM slots per CPU.

Our first impression is that AMD will find it hard to win the high end database and ERP market. The quadcore Nehalem 5500 already outperforms the six-core Opteron “Istanbul” by a large margin (30-50%). The Opteron 6100 also has 50% more cores, but it is likely that a “native octalcore” will scale a bit better than a two times 6-core design. For the virtualization market, the higher amount of DIMM slots are an advantage for the Nehalem EX. At first sight, it looks like it will be pretty tough for AMD to regain market share in this part of the server market.
Index 2 Socket Servers, Ultra Low Power, Bulldozer & Conclusion
Comments Locked

107 Comments

View All Comments

  • BaronMatrix - Tuesday, November 24, 2009 - link

    I mean every other sentence was, but Intel has this, Intel has that. Are you trying to be Fox News?
  • Chlorus - Tuesday, November 24, 2009 - link

    How could mentioning their competition possibly be relevant? I know, right?
  • jeffrey - Tuesday, November 24, 2009 - link

    Any updates on Tukwila?

    Last I heard was Q1 of 2010 for OEM shipment.

    Since this article mentioned Top500, maybe a conversation or Blog post on the status of that processor platform as a whole?
  • HurleyBird - Tuesday, November 24, 2009 - link

    • 4 Bulldozer modules are about 60 to 80% faster than one six-core Opteron 6100 CPU in SPECInt_rate.

    The 6100 is a 12-core product, no? So either "six-core" or "Opteron 6100" is a typo. If it's 60-80% faster than the 12-core product, then that's a pretty huge increase.
  • JohanAnandtech - Tuesday, November 24, 2009 - link

    Thanks for the feedback. I adapted the text and made it more clear: one 2x 8 core Interlagos is 60 to 80% faster than a 2x6 core Magny cours.
  • Adsski - Tuesday, November 24, 2009 - link

    Given a 60 - 80% increase whilst going from 12 to 16 cores, would it be reasonable to state that this is a 20 - 35% improvement per core.
  • MDme - Tuesday, November 24, 2009 - link

    It begs the question though,

    Is this 8 core or module interlagos CPU? because an 8 "core" interlagos cpu might be a 4 module (ergo 8 integer core) cpu or is this an 8 core (as in module) with 16 integer cores.

    If it is a 4 module (8 integer core) cpu then the improvement will be as much as 40-70% per module.

    If it is an 8 module (16 integer core) cpu then it will be as stated by a few a 20-35% improvement per module

    This is confusing since AMD and the article did not clarify their statement whether they are talking interlagos as a 16-core in the traditional sense or were they referring to it as a 16 core ( 8 module) CPU - an octal core with 16 integer "cores")
  • JFAMD - Tuesday, November 24, 2009 - link

    Perhaps we have not been clear, so let me clarify.

    There is a bulldozer module, NOT a bulldozer core.

    Each bulldozer module has 2 integer cores in it.

    A 16-core interlagos will consist of 8 modules and 16 total integer cores.

    Just for the record, the hardware will see 16 cores, not 8 modules. The operating system will see 16 cores, not 8 modules. The application will see 16 cores, not 8 modules. There are only 2 places that people will see the modules: in the architectural layout and on my powerpoint slides. Beyond those two places, it is all cores.
  • MDme - Wednesday, November 25, 2009 - link

    I see. Thanks for the clarification. With this info however, do you think AMD can achieve it's goal of having the "fastest" uarch out there when they do release? Isn't that what they claimed for Bulldozer? Since this is 2011, Intel will have further improvements down the line too.
  • JohanAnandtech - Tuesday, November 24, 2009 - link

    That is reasonable. It is hard to tell as Bulldozer will scale quite a bit better too. But it is clear that AMD has - finally - made some serious progress in integer single threaded performance too.

Log in

Don't have an account? Sign up now