The Secret Boost of the Opteron 2224

Socket F Opterons have a small secret weapon: a speed bump offers more than just a faster CPU. To understand this, take a look at the table below. We measured the L2 cache's bandwidth with Lavalys Everest 3.51.

Lavalys Everest 3.51 L2 Bandwidth
  Read (MB/s) Write (MB/s) Copy (MB/s)
Dual Xeon 5160 3.0 GHz 22019 17751 23628
Xeon E5345 2.33 GHz 17610 14878 18291
Opteron 2224 SE 3.2 GHz 14636 12636 14630
Opteron 8218HE 2.6 GHz 11891 10266 11891

The L2 cache of the Opteron 8218 at 2.6GHz is slower than the Core 2's L2 cache at 2.33. At about 10-11 GB/s it barely matches the theoretical peak bandwidth that DDR2 at 667MHz can deliver (10.6 GB/s), while its exclusive nature also forces it to exchange quite a bit of data with the L1 cache. Now combine this table with the following one, where we measured memory bandwidth.

Lavalys Everest 3.51 Memory Bandwidth
  Read (MB/s) Write (MB/s) Copy (MB/s) Latency (ns)
Dual Xeon 5160 3.0 GHz 3656 2771 3800 112.2
Xeon E5345 2.33 GHz 3578 2793 3665 114.9
Opteron 2224 SE 3.2 GHz 7466 6980 6863 58.9
Opteron 8218HE 2.6 GHz 6944 6186 5895 64

It is no secret that a higher clocked integrated memory controller can increase the actual delivered bandwidth of the same DDR2 modules. But it also helps that the L2 cache is able to swallow the bandwidth that the memory is capable of delivering. Also notice that without the use of SSE2 instructions, the memory subsystem of the 5000p chipset delivers relatively disappointing amounts of bandwidth. As most applications do not use carefully tuned SSE2 code to get data from memory, this should reflect the real world situation most of the time. And of course, until Intel introduces the Nehalem family, memory latency will continue to be one of the strong points of AMD.

Processor Latency Comparison
CPU L1 L2 L3 min mem max mem Absolute latency (ns)
Xeon 5160 3.0 - DDR2 533 3 14   69 380 127
Xeon 5160 3.0 - DDR2 667 3 14   67 338 113
Core 2 Duo 2.933 - DDR2 533 3 14   67 180 61
Quad Xeon E5345 2.33 - DDR2 533 3 14   80 280 120
Quad Xeon E5345 2.33 - DDR2 667 3 14   80 271 116
Xeon 7130M 3.2 - DDR2 400 4 29 109 245 624 195
Opteron 880 2.4 - DDR333 3 12   84 228 95
Opteron 2224 SE - DDR2 667 3 12   72 189 59
Opteron 2218 HE - DDR2 667 3 12   62 157 60

The latency penalty that FB-DIMM introduces is huge. To get an idea, we added the latency measured with a Core 2 Duo 2.933 using 2x 2GB 533MHz DDR2. The staggering conclusion is that registered FB-DIMMs add - in the worst case - about 200 cycles or 66ns of latency. Sure, some of that latency can be attributed to the buffering which is necessary for server memory. Buffered memory contains registers which will actually hold data for one full clock cycle before it's passed on. So this means that registered memory should add about 8ns (2 clock cycles at 266MHz base clock, DDR2-533).

The secondary benefit of FB-DIMMs is that motherboards can use more DIMMs per bank, potentially increasing total memory capacity. AMD already gets around this quite easily with up to eight DIMM sockets per CPU socket, however, so this benefit really doesn't materialize in any reasonable form. The bottom line is that while FB-DIMMs were a potentially good idea from a purely theoretical point of view, it is rather obvious that in practice they have some pretty bad consequences.

Tyan Transport TA26 SPECjbb2005
Comments Locked

30 Comments

View All Comments

  • piroroadkill - Tuesday, August 7, 2007 - link

    it is a car analogy
  • Gul Westfale - Monday, August 6, 2007 - link

    good analogy there, except that mustangs (and various other cars) use pickup truck engines for cost reasons. large trucks use larger engines (often diesels) because they offer considerably more torque at much lower RPM than a smaller gasoline engine; and thus provide more pulling power.
  • Gul Westfale - Monday, August 6, 2007 - link

    these are not regular consumer cpus, but intended for use in commercial servers and workstations. they and their motherboards cost more because they support features such as multiple sockets (so in addition to having multiple cores on one chip you can also have multiple chips on one motherboard).

  • yyrkoon - Monday, August 6, 2007 - link

    quote:

    Intel has a clear lead in the rendering market. If you are rendering complex high resolutions images, the quad core Xeon is clearly the best choice.


    they win 1 of 2 tests, and it is clear they are the winner ? Why ? Because they won the software rendering also ? Anyone interrested enough in rendering, and HAVING to have this sort of hardware for it is NOT going to bother with software . . .

    This means your conclusion on this point is incorrect, and in which case, it boils down to which application the rendering machine is going to do.

    Man you guys come to the wierdest conclusions based on your own data, and I am not even the first to notice/mention this sort of thing . . .
  • JohanAnandtech - Monday, August 6, 2007 - link

    The Quadcore wins all high resolution rendering tests. Where do you see the DC opterons win against the Quadcore Intel in high resolution rendering? Show me a rendering engine where a 3 GHz K8 DC core is faster in high resolution renderering than a 2.33 GHz Quadcore. All decent and used in the realworld rendering engines will more or less show the same picture.

    In fact, the "rendering performance" situation will get worse for the K8 as SSE-2 tuning will get more common. All Intel CPUs since core and all AMD CPUs since Barcelona will show (or are already showing) high performance boost from using better SSE-2 code.
  • yyrkoon - Monday, August 6, 2007 - link

    Ok, I see now with the graphs 'lower is better' on 3ds max, I missed that with the tables, which is actually what I meant this morning 'table obfustication'. I personally do not mind tables, but when the data is not in a uniform spot, it confuses/makes it harder to read at a glance.

    Anyhow, I was tired when I posted this morning, cranky, and was overly harsh I think. However it *is* much easier for me personaly to read the graphs at a glance (I cannot speak for everyone though).
  • yyrkoon - Monday, August 6, 2007 - link

    Oh, and while on the subject, you guys here at anandtech have lately mastered the art of graph obfustication. Is it really THAT hard leaving items in the same rows / columns for different tests ? Are we trying to confuse the results, or is there some other reason this happens, and has gone completely over my head ?
  • JohanAnandtech - Monday, August 6, 2007 - link

    The only reason is that until very recently I didn't master the graphing engine. I got some weird error messages and gave up. But I have found the error, and you should see some nice graphs which don't obfusticate...
  • Spoelie - Monday, August 6, 2007 - link

    the gif on page 2 is non-looping, so after a very quick jump from 1ghz -> 2.8ghz (why??) -> 3.2ghz , it stays put on the 3.2ghz image. If reading the article, by the time the reader sees the image, it's already 5 minutes on the last image and staying there, making it for all intents and purposes a static image instead of an animated one

    :)
  • JohanAnandtech - Monday, August 6, 2007 - link

    Thanks, fixed that. The reason to show 2.8 GHz is that for example Specjbb and other applications sometimes don't completely stress the CPU and then the cpu dynamically goes back to 2.8 GHz. It are simply the 3 stages I saw the most, and found the most interesting to show.

Log in

Don't have an account? Sign up now