AMD's 3rd generation Opteron versus Intel's 45nm Xeon: a closer look
by Johan De Gelas on November 27, 2007 6:00 AM EST- Posted in
- IT Computing
Software Rendering: zVisuel (32-bit Windows)
This benchmark is the zVisuel Kribi 3D test, which is exclusive to AnandTech.com and which simulates the assembly of a mechanical watch. The complete model is very detailed with around 300,000 polygons and a lot of texture, bump, and reflection maps. We render more than 1000 frames and report the average FPS (frames per second). All this is rendered on the "Kribi 3D" engine, an ultra-powerful real-time software rendering 3D engine. That all this happens at reasonable speeds is a result of the fact that the newest AMD and Intel architectures contain four cores and can perform up to eight 32-bit FP operations per clock cycle and per core. The people of zVisuel told us that - in reality - the current Core architecture can sustain six FP operations in well-optimized loops. Profiling for Barcelona architecture is not yet complete, so we did our best with CodeAnalyst 2.74 for Windows. We only profiled the non-AA benchmark so far.
ZVisuel Kribi3D Profiling | |
Profile | Total |
Average IPC (on Opteron 2350) | 1 |
Instruction mix | |
Floating Point | 31% |
SSE | 35% |
Branches | 6% |
L1 datacache ratio | 0.63 |
L1 Instruction ratio | 0.22 |
Performance indicators on Opteron 2350 | |
Branch misprediction | 8% |
L1 datacache miss | 1% |
L1 Instruction cache miss | 1% |
L2 cache miss | 0% |
This is a very different engine than the scanline-rendering engine of 3ds Max. SSE instructions play a very dominant role, and the zVisuel Kribi 3D benchmark gives us a view on how the different CPUs perform on well-optimized SSE applications. While the application seems to run almost perfectly from the L2 cache, this seems to be a result of well-tuned, predictable access to the memory. We noticed that hardware prefetching and the new Seaburg chips help this benchmark a lot:
Zvisuel Intel Platform Performance Comparison | |||
CPU | HW Prefetch on | HW Prefetch disabled | Difference |
Dual Xeon E5365 3.0 (Blackford) | 99.9 | 87.7 | 14% |
Dual Xeon E5365 3.0 (Seaburg) | 110 | 104.2 | 6% |
Dual Xeon E5472 3.0 (Seaburg) | 124.8 | 110 | 13% |
Let us see all the results.
Although we haven't done a detailed analysis, we can assume that the "Super Shuffle Engine" and "Radix-16" divider that Intel has implemented in the Xeon 5472 is paying off here. AMD Opteron 2360 SE at 2.5GHz can overtake the best Xeon at 65nm, but the new Xeon has a tangible lead. A silver lining to the cloud hanging over AMD is that the Opteron 23xx series scale perfectly with clock speeds: compare the 2GHz with the 2.5GHz results. Still, Intel has the advantage when it comes to SSE processing.
The results with AA show that the memory subsystem of the Xeon 53xx is a major bottleneck, but the new Seaburg chipset has made this bottleneck a bit smaller. The result is a crushing victory for the latest Intel architecture. Enough FP testing, let us see what Barcelona can do when running typical integer server workloads.
43 Comments
View All Comments
Regs - Tuesday, November 27, 2007 - link
I would not expect any from vendors and wholesalers until early next year.Matter of fact I wouldn't want one until then anyhow. I would at least wait until B3 stepping.
TA152H - Tuesday, November 27, 2007 - link
Johan,From my understanding, x87 is now obsolete and not even supported in x86-64. Can you verify this? I know I had read it, from your article you state that Intel improved it, so I'm not as sure. I had assumed one of AMD's handicaps was the disproportionate, and nearly useless, x87 processing power their processors carried, but now I am not as sure. Is x87 supported in x86-64, and if not, why would Intel increase their x87 capabilities when it's clearly a deprecated technology?
JohanAnandtech - Tuesday, November 27, 2007 - link
The x87 instructions can be used in legacy mode and long mode. But it is true that Scalar SSE instructions are preferred by AMD and Intel.x87 performance as many 32 bit programs are still important (look at 3DSMAx 32 bit).
If Intel's newest Core architecture would not have improved the x87 FP it would probably have looked silly as so many 32 bit programs still use it intensively. Secondly, as you can see, things like the Radix-16 circuitry are used by both the SIMD as the x87 units.
Gholam - Tuesday, November 27, 2007 - link
Do you have any plans to benchmark Opteron vs Xeon in an ESX Server environment?DeepThought86 - Tuesday, November 27, 2007 - link
This is exactly what I was thinking of too. I want to change my mode of working to run several separate VM's, one for programming, one for Office etc and really want to know how Phenom compares to Q6600 for those uses. Well, this article looks at the server versions of those chips but for VMware the performance might be more comparable than, say, SuperPi 1M benchmarks!DeepThought86 - Tuesday, November 27, 2007 - link
I forgot to add, since Phenom would presumably also have the nested table support as Barcelona, how much performance improvement would this yield? I'd love to knowsht - Tuesday, November 27, 2007 - link
I was about to ask the same question after reading the concludingYou may feel for example that using four instances in our SPECjbb test favors AMD too much, but there is no denying that using more virtual machines on fewer physical servers is what is happening in the real world.
Since the CPUs have features that should accelerate virtualization, it would really be interesting to see how they compete there. My only addition to your request would be to add KVM as host as well (and XEN and what not as well if you care, though I really think only KVM is of interest).
JohanAnandtech - Tuesday, November 27, 2007 - link
Indeed, we are working on that. The software that we described here (http://www.anandtech.com/IT/showdoc.aspx?i=2997&am...">http://www.anandtech.com/IT/showdoc.aspx?i=2997&am... is being adapted to testing virtualized applications. We are also looking into the parameters that can really influence the results of a benchmark on a virtualized server.JohanAnandtech - Tuesday, November 27, 2007 - link
Indeed, we are working on that. The software that we described here (http://www.anandtech.com/IT/showdoc.aspx?i=2997&am...">http://www.anandtech.com/IT/showdoc.aspx?i=2997&am... is being adapted to testing virtualized applications. We are also looking into the parameters that can really influence the results of a benchmark on a virtualized server.AssBall - Tuesday, November 27, 2007 - link
Thanks, Johan.This has been one of the clearer and better proofread articles I have read here lately. It was interesting, unbiased, and insightful. I am excited to see what you get into for your next project.