Intel Woodcrest, AMD's Opteron and Sun's UltraSparc T1: Server CPU Shoot-out
by Johan De Gelas on June 7, 2006 12:00 PM EST- Posted in
- IT Computing
The New Intel Platform
The biggest advantage of Intel's newest Bensley platform is longevity: the Dempsey, Woodcrest and quad-core Clovertown Xeon all use the same socket and platform.
Bensley also eliminates the shared Xeon bus by giving each CPU an independent bus running at 1333 MHz. This is somewhat similar to the old Athlon MP platform, and it should be noted that this makes the Blackford Northbridge or MCH a pretty complex chip. Blackford also offers up to 4 memory channels and 24 PCI Express lanes.
The Dual Independent Bus (DIB) will not make much difference for Woodcrest and Dempsey as only some HPC applications are really limited by the FSB bandwidth. Three years of benchmarking tell us that most server and workstation application are not bottlenecked by the modern FSB speeds. The Opteron platform does not scale so much better thanks to NUMA in dual and quad core configurations. No, in most applications, the low latency integrated memory controller makes the difference, not FSB/NUMA bandwidth. Of course, with Clovertown, or two Woodcrests on one chip, a shared FSB might become a bottleneck, and in that case a DIB is a good idea.
The biggest innovation of Blackford is the introduction of fully buffered DIMMs (FB-DIMMs). On the FB-DIMM PCB we still find parallel DDR-2, but the Advanced Memory Buffer (AMB) converts this parallel data stream into a serial one to the Blackford chip. The serial links between the memory subsystem and the chipset not only eliminate skew problems but they also greatly simplify the routing on the motherboard. Routing quad-channel DDR-2 would be a nightmare.
The AMB, which you see under the heatsink in the middle of the DIMM, solves the skew and routing problems, and it comes with a relatively small price premium. The AMB also allows full duplex operation from the chipset to the AMB, where other memory bus designs are half duplex and introduce extra latency when alternating between send and receive modes. However, the AMB dissipates about 5 Watt and increases latency. This means that with 8 DIMMs or more, the advantage of using 65 Watt Woodcrest CPUs over 89-92 W Opterons will be gone.
The Blackford chipset uses X8 PCI Express links to talk to other various chips such as the ESB-2 I/O bridge, or "Southbridge" to keep it simple. The other PCI Express links can be used for 10 Gbit Ethernet or a SATA or SAS controller. A workstation version of Blackford, Greencreek will offer dual X16 PCI Express for running multiple workstation graphic cards.
The biggest advantage of Intel's newest Bensley platform is longevity: the Dempsey, Woodcrest and quad-core Clovertown Xeon all use the same socket and platform.
Bensley also eliminates the shared Xeon bus by giving each CPU an independent bus running at 1333 MHz. This is somewhat similar to the old Athlon MP platform, and it should be noted that this makes the Blackford Northbridge or MCH a pretty complex chip. Blackford also offers up to 4 memory channels and 24 PCI Express lanes.
The Dual Independent Bus (DIB) will not make much difference for Woodcrest and Dempsey as only some HPC applications are really limited by the FSB bandwidth. Three years of benchmarking tell us that most server and workstation application are not bottlenecked by the modern FSB speeds. The Opteron platform does not scale so much better thanks to NUMA in dual and quad core configurations. No, in most applications, the low latency integrated memory controller makes the difference, not FSB/NUMA bandwidth. Of course, with Clovertown, or two Woodcrests on one chip, a shared FSB might become a bottleneck, and in that case a DIB is a good idea.
The biggest innovation of Blackford is the introduction of fully buffered DIMMs (FB-DIMMs). On the FB-DIMM PCB we still find parallel DDR-2, but the Advanced Memory Buffer (AMB) converts this parallel data stream into a serial one to the Blackford chip. The serial links between the memory subsystem and the chipset not only eliminate skew problems but they also greatly simplify the routing on the motherboard. Routing quad-channel DDR-2 would be a nightmare.
The AMB, which you see under the heatsink in the middle of the DIMM, solves the skew and routing problems, and it comes with a relatively small price premium. The AMB also allows full duplex operation from the chipset to the AMB, where other memory bus designs are half duplex and introduce extra latency when alternating between send and receive modes. However, the AMB dissipates about 5 Watt and increases latency. This means that with 8 DIMMs or more, the advantage of using 65 Watt Woodcrest CPUs over 89-92 W Opterons will be gone.
The Blackford chipset uses X8 PCI Express links to talk to other various chips such as the ESB-2 I/O bridge, or "Southbridge" to keep it simple. The other PCI Express links can be used for 10 Gbit Ethernet or a SATA or SAS controller. A workstation version of Blackford, Greencreek will offer dual X16 PCI Express for running multiple workstation graphic cards.
91 Comments
View All Comments
blackbrrd - Wednesday, June 7, 2006 - link
Finally Intel can give AMD some real competition in the two socket server market. This shows why Dell only wanted to go with AMD for 4S and not 2S server systems...245w vs 374w and a huge performance lead over the previous Intel generation is a huge leap for Intel.
It will be interesting to see how much these systems are going to cost:
1) is the fb-dimm's gonna be expensive?
2) is the cpu's gonna be expensive?
3) is the motherboards gonna be expensive?
For AMD neither the ram nor the motherboards are expensive, so I am curious how this goes..
If anybody thinks I am an Intel fanboy, I have bought in this sequence: intel amd intel intel, and I would have gotten and amd instead of an intel for the last computer, except I wanted a laptop ;)
JarredWalton - Wednesday, June 7, 2006 - link
For enterprise servers, price isn't usually a critical concern. You often buy what runs your company best, though of course there are plenty of corporations that basically say "Buy the fastest Dell" and leave it at that.FB-DIMMs should cost slightly more than registered DDR2, but not a huge difference. The CPUs should actually be pretty reasonably priced, at least for the standard models. (There will certainly be models with lots of L3 cache that will cost an arm and a leg, but that's a different target market.)
Motherboards for 2S/4S are always pretty expensive - especially 4S. I would guess Intel's boards will be a bit more expensive than equivalent AMD boards on average, but nothing critical. (Note the "equivalent" - comparing boards with integrated SCSI and 16 DIMM slots to boards that have 6 DIMM slots is not fair, right?)
Most companies will just get complete systems anyway, so the individual component costs are only a factor for small businesses that want to take the time to build and support their own hardware.
blackbrrd - Wednesday, June 7, 2006 - link
Registered DDR2 is dirt cheap, so if FB-DIMMs are only slightly more expensive thats really good.About compareing 6 DIMM slot and 16 DIMM slot motherboards, I agree, you can't do it. The number of banks is also important, we have a motherboard at work with 8 ranks and 6 DIMM slots, so only two of the slots can be filled with the cheapest 2gb dual rank memory. Currently 2gb single ranks modules are 50% more expensive than dual rank modules.
Which brings another question.. Does FB-DIMM have the same "problem" with rank limit in addition to slot limit? Or does the FB part take care of that?
BaronMatrix - Wednesday, June 7, 2006 - link
why are we running servers with only 4GB RAM. I have that in my desktop. Not ot nitpick but I think you shuld load up 16GB and rerun the tests. If not this is a low end test, not HPC. I saw the last Apache comparison and it seems like the benchmark is different. Opteron was winning by 200-400% in those tests. What happened?JohanAnandtech - Wednesday, June 7, 2006 - link
Feel free to send me 12 GB of FBDIMMs. And it sure isn't a HPC test, it is a server test."I saw the last Apache comparison and it seems like the benchmark is different. Opteron was winning by 200-400% in those tests. What happened? "
A new Intel architecture called "Core" was introduced :-)
BaronMatrix - Wednesday, June 7, 2006 - link
I didn't say the scores, I said the units in the benchmark. I'm not attacking you. It just stuck out in my head that the units didn't seem to be the same as the last test with Paxville. By saying HPC, I mean apps that use 16GB RAM, like Apache/Linux/Solaris. I'm not saying you purposely couldn't get 12 more GB of RAM but all things being equal 16GB would be a better config for both systems.I've been looking for that article but couldn't find it.
JohanAnandtech - Wednesday, June 7, 2006 - link
No problem. Point is your feedback is rather unclear. AFAIK, I haven't tested with Paxville. Maybe you are referring to my T2000 review, where we used a different LAMP test, as I explained in this article. In this article the LAMP server has a lot more PHP and MySQL work.http://www.anandtech.com/IT/showdoc.aspx?i=2772&am...">http://www.anandtech.com/IT/showdoc.aspx?i=2772&am...
See the first paragraph
And the 4 GB was simply a matter of the fact that Woodcrest had 4 GB of FB DIMM.
JarredWalton - Wednesday, June 7, 2006 - link
Most HPC usage models don't depend on massive amounts of RAM, but rather on data that can be broken down into massively parallel chunks. IBM's BlueGene for example only has 256MB (maybe 512MB?) of RAM per node. When I think of HPC, that's what comes to mind, not 4-way to 16-way servers.The amount of memory used in these benchmarks is reasonable, since more RAM only really matters if you have data sets that are too large to fit with the memory. Since our server data sets are (I believe) around 1-2GB, having more than 4GB of RAM won't help matters. Database servers are generally designed to having enough RAM to fit the vast majority of the database into memory, at least where possible.
If we had 10-14GB databases, we would likely get lower results (more RAM = higher latency among other things), but the fundamental differences between platforms shouldn't change by more than 10%, and probably closer to 5%. Running larger databases with less memory would alter the benchmarks to the point where they would largely be stressing the I/O of the system - meaning the HDD array. Since HDDs are so much slower than RAM (even 15K SCSI models), enterprise servers try to keep as much of the regularly accessed data in memory as possible.
As for the Paxville article, click on the "IT Computing" link at the top of the website. Paxville is the second article in that list (and it was also linked once or twice within this article). Or http://www.anandtech.com/IT/showdoc.aspx?i=2745">here's the direct link.
BaronMatrix - Wednesday, June 7, 2006 - link
Thx for the link, but the test I was looking at was Apache and showed concurrency tests. At any rate, just don't think I was attacking you. I was curious as to the change in units I noticed.MrKaz - Wednesday, June 7, 2006 - link
How much will it cost?If Conroe XE 2.9Ghz is 1000$.
Then I assume that this will cost more.
I think looks good, but it will depends a lot of the final price.
Also does that FBdimm have a premium price over the regular ones?