AMD Socket-F Opteron vs. Intel Woodcrest
by Jason Clark & Ross Whitehead on December 18, 2006 12:05 AM EST- Posted in
- IT Computing
Benchmark Methodology
For AnandTech Database Benchmarks, we have always focused on "real world" Benchmarks. To achieve this, we have used real applications with loads such that CPU utilization was 80-90%. Recently we discussed how most Enterprise Database Servers do not average 80-90% CPU utilization, but rather something closer to the 30-60% range. We thought it would make more sense to show performance where it is most likely going to be used, as well as the saturation numbers for the situations where the CPU is maxed.
We feel this is consistent with how GPUs are reviewed, and how you might test drive a car. With GPUs, the cards are tested with varying resolutions, and anti-aliasing levels. With a car, you don't just hit the highway and see what the top end is.
We settled on six load points 20%, 40%, 60%, 80%, 100%, and 120+% for testing the varying ranges of load. These load points are consistent across all platforms and are throttled from the client, independent of the platform being measured. We chose these load points as they split the load range into 6, roughly, equal parts and allow us to extrapolate data between the points. The 120+% load point was included to verify that our 100% load point really was 100%.
The 100% load point was determined by starting an execution of the client and adding threads until the CPU utilization was between 95% and 100%. The other load points were determined by altering the number of threads from the client, thus adjusting the rate of client request per second, until the appropriate ratio of Orders/Minute for Dell and Transactions/Minute for Forums was obtained relative to the 100% load point. These thread counts were recorded and maintained consistent across all platforms.
For any given load point, there is a defined number of threads. Each test is 20 minutes in duration, which includes an 8 minute warm up period followed by a 12 minute measured period. For a given load point, the client submits requests to the DB server as fast as the DB server will respond. The rate which the client is able to submit requests is measured during the final 12 minutes of the test and averaged to determine the Orders/Minute for Dell, and Transactions/Minute for Forums.
After much blood, sweat, and almost tears we were able to produce repeatable loads with an average deviation of 1.6%.
For each platform we ran the test 5 times for each load point and than averaged the 5 results, this was repeated for all loads, all tests, on all platforms... that is 300 test executions!.
Dell & Forum SQL Trace Analysis
The Dell and Forum benchmarks are quite different workloads, which you will see in the benchmark results. Dell executes approximately 10 times more queries during the test, and the durations are approximately 4 times less than that of the Forum benchmark durations. To summarize, Dell is a workload with a high transaction volume, and each query executes in a very short amount of time. The Forum workload has a medium transaction volume, and the queries execute in a reasonable amount of time but are much more read intensive (larger datasets are returned).
Dell DVD Store Information
For AnandTech Database Benchmarks, we have always focused on "real world" Benchmarks. To achieve this, we have used real applications with loads such that CPU utilization was 80-90%. Recently we discussed how most Enterprise Database Servers do not average 80-90% CPU utilization, but rather something closer to the 30-60% range. We thought it would make more sense to show performance where it is most likely going to be used, as well as the saturation numbers for the situations where the CPU is maxed.
We feel this is consistent with how GPUs are reviewed, and how you might test drive a car. With GPUs, the cards are tested with varying resolutions, and anti-aliasing levels. With a car, you don't just hit the highway and see what the top end is.
We settled on six load points 20%, 40%, 60%, 80%, 100%, and 120+% for testing the varying ranges of load. These load points are consistent across all platforms and are throttled from the client, independent of the platform being measured. We chose these load points as they split the load range into 6, roughly, equal parts and allow us to extrapolate data between the points. The 120+% load point was included to verify that our 100% load point really was 100%.
The 100% load point was determined by starting an execution of the client and adding threads until the CPU utilization was between 95% and 100%. The other load points were determined by altering the number of threads from the client, thus adjusting the rate of client request per second, until the appropriate ratio of Orders/Minute for Dell and Transactions/Minute for Forums was obtained relative to the 100% load point. These thread counts were recorded and maintained consistent across all platforms.
For any given load point, there is a defined number of threads. Each test is 20 minutes in duration, which includes an 8 minute warm up period followed by a 12 minute measured period. For a given load point, the client submits requests to the DB server as fast as the DB server will respond. The rate which the client is able to submit requests is measured during the final 12 minutes of the test and averaged to determine the Orders/Minute for Dell, and Transactions/Minute for Forums.
After much blood, sweat, and almost tears we were able to produce repeatable loads with an average deviation of 1.6%.
For each platform we ran the test 5 times for each load point and than averaged the 5 results, this was repeated for all loads, all tests, on all platforms... that is 300 test executions!.
Dell & Forum SQL Trace Analysis
The Dell and Forum benchmarks are quite different workloads, which you will see in the benchmark results. Dell executes approximately 10 times more queries during the test, and the durations are approximately 4 times less than that of the Forum benchmark durations. To summarize, Dell is a workload with a high transaction volume, and each query executes in a very short amount of time. The Forum workload has a medium transaction volume, and the queries execute in a reasonable amount of time but are much more read intensive (larger datasets are returned).
Dell DVD Store Information
38 Comments
View All Comments
proteus7 - Monday, December 18, 2006 - link
Not sure how the conclusion that Socket-F wins is reached.True performance benchmarking can only occur with CPU as close as possible to 100%, and NO other benchmarks in the system. For a TPC-C style OLTP database workload for example, usually about 400+ HDDs would be required on a 4-core system to ensure this, and a lot more memory (16GB minimum would be realistic for 4-cores).
In every benchmark posted, at full load, Woodcrest wins. If you try to spin "load points", "perf per watt", etc, it then muddies the waters.
Finally, you should put a disclaimer that this test is for a very specific workload. The good news is that there was no deliberate attempt to skew the results towards AMD. If so, you would have picked a large OLAP, DSS, or Data Warehouse type workload, which take better advantage of Socket-Fs superior memory latency on out-of-cache workloads.
JarredWalton - Monday, December 18, 2006 - link
Actually, Woodcrest isn't the clear winner. It is faster, at a higher power draw, and there are reportedly situations where Opteron will still come out with a (sometimes significant) lead. We didn't test such situations here, but a "clear win" would be what we see on the desktop where Core 2 Duo is typically faster than any X2 processor while using less power - although the power situation is still somewhat up for debate.I know I worked in a data center for several years where we had at least 12 servers. I don't think any of those servers was running at more than about 25% capacity, so there are definitely companies that aren't going to care too much about performance at maximum load. Of course, it's kind of funny that the data center I worked at was a 100% Dell shop (at least for desktops, laptops, and non-UNIX servers), so all of the servers were running the Xeon DP/MP processors during a time where Opteron was clearly providing better performance and lower power requirements.
mlittl3 - Monday, December 18, 2006 - link
Jarred,I think the problem with many of these comments questioning the conclusions of the article is that not many people understand how enterprise level workstation and servers are bought and used. It would be nice if Anandtech did an article about some company that uses a lot of workstation and servers and go through their thought process when it comes to what hardware to buy. Then the article could go into how the servers manage workloads and what factors are important (performance, power, stability, etc.).
Too many readers here think that a server cluster is bought as soon as new hardware is released and that these enterprise level IT professionals go to hardware review sites and see which hardware has a better 3dmark score. This is of course not the case.
Strunf - Tuesday, December 19, 2006 - link
I think it’s pretty common knowledge how companies get their systems, I mean after Intel owning so much with the crap they had, it’s pretty obvious that performance and power consumption are secondary... but articles have to stay objective because no one knows what the deals between companies and the OEMs really “hide”...“Too many readers here think that a server cluster is bought as soon as new hardware is released...”
Actually I would say many server clusters are bought at the same time new hardware is released...
mino - Wednesday, December 20, 2006 - link
"Actually I would say many server clusters are bought at the same time new hardware is released..."Yeah, they are. The purchase is mostly waiting for the certain speed-bump in architecture/design to appear...
chucky2 - Thursday, December 21, 2006 - link
I work in IT as a PM at what is the probably now the largest telecomm in the US - not on a standards committee, purchaser, or operations person who actually sets the hardware up - and at least for the boxes that host the services my org cares for, I know the hardware folks don't like to see any one of them go over 50% CPU or RAM utilization, and that's simply because of failover.If a machine in a cluster goes down, the other machine(s) are expected to pick up that load and not incurr downtime...downtime is bad. You could have .003% downtime <i>just for one small but main part of IT (like mine) </i> for a <b>quarter</b> and that might equate into a million dollars lost.
Hardware is not necessairly ordered when new hardware is released...in fact, that's more than likely not the case. New hardware is not necessairly tested and proven hardware. Just because the parts folks (Intel, AMD, whoever) and the vendors (IBM, Sun, HP, whoever) are selling it, doesn't mean it stable, or at least proven to work for what a company is going to use it for. When the expectation is 100% uptime except for maintenance periods, you want your standards folks to have tested the stuff everyone in the company will be allowed to order...or at least have them looked over the changes to whatever the company standard is and say, OK, that's an acceptable upgrade, we're comfortable with that change.
So, No, ordering the latest and greatest hardware when it comes out it not really the smart way to do things when you're talking about reliability...and the same goes for software...putting on the latest AIX or Oracle or JRE patch/version almost never happens. It's the same thing there, unless there's an absolute need for that specifc hardware/software, then you go with tried and true, because that's what's delivering <i>for sure</i>.
The above is most likely why AMD had such a hard time breaking into the Enterprise sector...they had to prove that their hardware could get the job done as reliably as Intel, Sun, and IBM. Now that they have, hopefully the major Enterprise folks will give them more consideration...with as good as Operteron has been, they deserve it.
Chuck
LoneWolf15 - Monday, December 18, 2006 - link
Are pages three and four supposed to have tables/graphs? I'm getting two paragraphs of text on each page using Firefox 2.0, and that's it. Seems like there'd be more under your testing methodology.Jason Clark - Monday, December 18, 2006 - link
Yep, I was fixing something in the article and juggling pages around. Should be ok now.LoneWolf15 - Monday, December 18, 2006 - link
Still doesn't look right. Paragraph formatting is really off, and there's a couple of HTML tags showing. Makes it kind of hard to read.LoneWolf15 - Monday, December 18, 2006 - link
That's on Page 3 btw. Page 4 now looks fine.