AMD's 3rd generation Opteron versus Intel's 45nm Xeon: a closer look
by Johan De Gelas on November 27, 2007 6:00 AM EST- Posted in
- IT Computing
Software Rendering: zVisuel (32-bit Windows)
This benchmark is the zVisuel Kribi 3D test, which is exclusive to AnandTech.com and which simulates the assembly of a mechanical watch. The complete model is very detailed with around 300,000 polygons and a lot of texture, bump, and reflection maps. We render more than 1000 frames and report the average FPS (frames per second). All this is rendered on the "Kribi 3D" engine, an ultra-powerful real-time software rendering 3D engine. That all this happens at reasonable speeds is a result of the fact that the newest AMD and Intel architectures contain four cores and can perform up to eight 32-bit FP operations per clock cycle and per core. The people of zVisuel told us that - in reality - the current Core architecture can sustain six FP operations in well-optimized loops. Profiling for Barcelona architecture is not yet complete, so we did our best with CodeAnalyst 2.74 for Windows. We only profiled the non-AA benchmark so far.
ZVisuel Kribi3D Profiling | |
Profile | Total |
Average IPC (on Opteron 2350) | 1 |
Instruction mix | |
Floating Point | 31% |
SSE | 35% |
Branches | 6% |
L1 datacache ratio | 0.63 |
L1 Instruction ratio | 0.22 |
Performance indicators on Opteron 2350 | |
Branch misprediction | 8% |
L1 datacache miss | 1% |
L1 Instruction cache miss | 1% |
L2 cache miss | 0% |
This is a very different engine than the scanline-rendering engine of 3ds Max. SSE instructions play a very dominant role, and the zVisuel Kribi 3D benchmark gives us a view on how the different CPUs perform on well-optimized SSE applications. While the application seems to run almost perfectly from the L2 cache, this seems to be a result of well-tuned, predictable access to the memory. We noticed that hardware prefetching and the new Seaburg chips help this benchmark a lot:
Zvisuel Intel Platform Performance Comparison | |||
CPU | HW Prefetch on | HW Prefetch disabled | Difference |
Dual Xeon E5365 3.0 (Blackford) | 99.9 | 87.7 | 14% |
Dual Xeon E5365 3.0 (Seaburg) | 110 | 104.2 | 6% |
Dual Xeon E5472 3.0 (Seaburg) | 124.8 | 110 | 13% |
Let us see all the results.
Although we haven't done a detailed analysis, we can assume that the "Super Shuffle Engine" and "Radix-16" divider that Intel has implemented in the Xeon 5472 is paying off here. AMD Opteron 2360 SE at 2.5GHz can overtake the best Xeon at 65nm, but the new Xeon has a tangible lead. A silver lining to the cloud hanging over AMD is that the Opteron 23xx series scale perfectly with clock speeds: compare the 2GHz with the 2.5GHz results. Still, Intel has the advantage when it comes to SSE processing.
The results with AA show that the memory subsystem of the Xeon 53xx is a major bottleneck, but the new Seaburg chipset has made this bottleneck a bit smaller. The result is a crushing victory for the latest Intel architecture. Enough FP testing, let us see what Barcelona can do when running typical integer server workloads.
43 Comments
View All Comments
aeternitas - Thursday, December 13, 2007 - link
Then why are you here? Details is what technology is about!I for one have a pet peeve with tech sites that use the wrong formats in their stories. Slightly damages credibility. Not to say this is a big deal in this case, though .gif is pretty much dead, unless you use an old browser on old tech, but then why would you be reading this story?
Look on the bright side, at least this isnt a Codec vs. Codec story, where the author uses jpgs for such color-limited screenshots.
SonicIce - Tuesday, November 27, 2007 - link
I think the color depth was decreased alot more than 8 bit. That image only has 33 unique colors in it. Something went wrong with the dithering maybe? 256 is usually more than enough.Justin Case - Friday, November 30, 2007 - link
Who cares? The only part that suffers is the gradient at the top, all the relevant information is there, and this file is about half the size of what a PNG would be.