AMD's 3rd generation Opteron versus Intel's 45nm Xeon: a closer look
by Johan De Gelas on November 27, 2007 6:00 AM EST- Posted in
- IT Computing
The Opteron 2360SE - the Facts
Getting back to AMD, the new quad-core chip still lives under veil of secrecy. Quite a few rumors and myths are going around and we investigated them one by one so we could be sure that you only get the facts.
Fact 1: The B2 stepping does not have a much faster memory controller than the B1 stepping
The controller found in stepping "2" might be a tiny bit faster, but we have not found any significant difference. Our Stream benchmarks were only a tiny bit faster on the 2.5GHz (Stepping 2) than on the 2GHz (Stepping DR-B1) and so were the latency numbers.
The 2.5GHz Barcelona is a newer stepping than the 2GHz sample we tested earlier.
Fact 2: The 2.5GHz review sample is running at 1.2V; it is not overclocked
CPU-Z reported that the chip was running at 1.5V, while the 2GHz quad-core was running at 1.2V.
Power measurements show that the BIOS of our ASUS board is accurate, but CPU-Z is not. The last evidence is of course the laser marking on the quad-core Opteron.
AMD is capable of producing 2.5GHz quad-core, but not in large quantities at this time. The 2.5GHz part should arrive at the end of this year, with large quantities expected in the first quarter of 2008.
Fact 3: the memory controller always runs RAM at the rated frequency
In this case, the new quad-core Opteron is completely different from what we have seen with previous Opteron (and Athlon 64) processors. In the first and second generation Opteron, the memory controller ran at a divisor of the CPU. This resulted in very odd memory clocks speeds at times, particularly on odd multipliers. For example, a 2.2GHz Opteron (11X multiplier) uses a divisor of 7 and ends up running DDR2-667 (333.5MHz clock) at 314MHz. That gives DDR2-628 instead of 667. In reality, this doesn't have any major performance impact, and it is only measurable with "Stream-like" benchmarks. In contrast, Barcelona's memory controller runs at its own frequency and will run the DIMMs at the rated speed.
Fact 4: By running the Northbridge at a lower speed, the new quad-core loses a bit of performance but saves power
The core of the Opteron 2360 runs at 2.5GHz, but the L3 cache runs at Northbridge frequency, which is 2GHz. It seems that AMD's engineers felt that running the L3 at core frequency would not have resulted in significantly higher performance, but significantly higher power dissipation. From another point of view, given a certain power envelope, running the Northbridge and the L3 cache at higher frequencies would result in lower core frequencies.
Fact 5: The L3 cache was a good choice, but…
The L3 cache does increase latency of accessing the main memory but decreases the average latency seen by the CPU. This leads to the question of whether the relatively slow L3 cache is really an advantage. The L3 cache has a latency of 43 cycles (2GHz) or 48 cycles (2.5GHz), but it's still quite a bit faster than system memory, which takes about 130 to 170 cycles to access.
In addition, it has one main advantage for server workloads. If more than one core is accessing a cacheline in the L3 cache, it will remain in the L3. If not, the L3 cache will behave like a fully exclusive cache: it will send the cacheline to the L1 and throw out the cacheline to make place for a "victim" of the L2. This allows relatively fast sharing of data between threads, which is important for large code footprint applications like database applications and others. For single-threaded applications, it looks like they get a 2.5MB L2 cache, although with an average latency of about 20 cycles.
Still, there is no doubt that the L3 cache of Barcelona could have been a bit bigger to score even better in the larger database benchmarks such as TPC. We have to guess that a larger L3 cache became a victim (pun intended) of the already large 283 mm² die size. Still, a 44 cycle latency (and more) is rather disappointing for only 2MB of L3 cache.
Fact 6: Dual-Link is possible with AMD 2xxx Opterons
Several readers asked us how it was possible that our ASUS KFSN4-DRE board linked our 2350 CPUs with two instead of one HyperTransport point-to-point connection, as the 23xx Opteron supports only one coherent HyperTransport link. However, the constraint is not the number of links but actually the number of coherent responses that are supported. Our ASUS board does feature twice as much bandwidth for CPU-to-CPU traffic (snoop, access to remote memory etc.)
43 Comments
View All Comments
Regs - Tuesday, November 27, 2007 - link
I would not expect any from vendors and wholesalers until early next year.Matter of fact I wouldn't want one until then anyhow. I would at least wait until B3 stepping.
TA152H - Tuesday, November 27, 2007 - link
Johan,From my understanding, x87 is now obsolete and not even supported in x86-64. Can you verify this? I know I had read it, from your article you state that Intel improved it, so I'm not as sure. I had assumed one of AMD's handicaps was the disproportionate, and nearly useless, x87 processing power their processors carried, but now I am not as sure. Is x87 supported in x86-64, and if not, why would Intel increase their x87 capabilities when it's clearly a deprecated technology?
JohanAnandtech - Tuesday, November 27, 2007 - link
The x87 instructions can be used in legacy mode and long mode. But it is true that Scalar SSE instructions are preferred by AMD and Intel.x87 performance as many 32 bit programs are still important (look at 3DSMAx 32 bit).
If Intel's newest Core architecture would not have improved the x87 FP it would probably have looked silly as so many 32 bit programs still use it intensively. Secondly, as you can see, things like the Radix-16 circuitry are used by both the SIMD as the x87 units.
Gholam - Tuesday, November 27, 2007 - link
Do you have any plans to benchmark Opteron vs Xeon in an ESX Server environment?DeepThought86 - Tuesday, November 27, 2007 - link
This is exactly what I was thinking of too. I want to change my mode of working to run several separate VM's, one for programming, one for Office etc and really want to know how Phenom compares to Q6600 for those uses. Well, this article looks at the server versions of those chips but for VMware the performance might be more comparable than, say, SuperPi 1M benchmarks!DeepThought86 - Tuesday, November 27, 2007 - link
I forgot to add, since Phenom would presumably also have the nested table support as Barcelona, how much performance improvement would this yield? I'd love to knowsht - Tuesday, November 27, 2007 - link
I was about to ask the same question after reading the concludingYou may feel for example that using four instances in our SPECjbb test favors AMD too much, but there is no denying that using more virtual machines on fewer physical servers is what is happening in the real world.
Since the CPUs have features that should accelerate virtualization, it would really be interesting to see how they compete there. My only addition to your request would be to add KVM as host as well (and XEN and what not as well if you care, though I really think only KVM is of interest).
JohanAnandtech - Tuesday, November 27, 2007 - link
Indeed, we are working on that. The software that we described here (http://www.anandtech.com/IT/showdoc.aspx?i=2997&am...">http://www.anandtech.com/IT/showdoc.aspx?i=2997&am... is being adapted to testing virtualized applications. We are also looking into the parameters that can really influence the results of a benchmark on a virtualized server.JohanAnandtech - Tuesday, November 27, 2007 - link
Indeed, we are working on that. The software that we described here (http://www.anandtech.com/IT/showdoc.aspx?i=2997&am...">http://www.anandtech.com/IT/showdoc.aspx?i=2997&am... is being adapted to testing virtualized applications. We are also looking into the parameters that can really influence the results of a benchmark on a virtualized server.AssBall - Tuesday, November 27, 2007 - link
Thanks, Johan.This has been one of the clearer and better proofread articles I have read here lately. It was interesting, unbiased, and insightful. I am excited to see what you get into for your next project.