Server Guide Part 1: Introduction to the Server World
by Johan De Gelas on August 17, 2006 1:45 PM EST- Posted in
- IT Computing
TCO
Originally described by the Gartner group, TCO sounds like something that does not belong on a hardware enthusiast site. It has frequently been abused by managers and financial people who understand very little of IT to delay necessary IT investments, so many view it as a pejorative term.
However it is impossible to make a well thought-out server buying decision without understanding TCO, and many typical server hardware features are based on the idea of lowering TCO. Hardware enthusiasts mostly base their buying decision on TCA or Total Cost of Acquisition. The enthusiast motherboard and chipset business is a typical example of how to ignore TCO. As the products are refreshed every 6 months, many of the new features don't work properly, and you find yourself flashing the BIOS, installing new drivers and tweaking configurations before you hopefully get that RAID, Firewall or sound chip to work properly. Luckily you don't have to pay yourself for all the hours you spend....
TCO is a financial estimate of the total cost of buying and using a server. Think of it as the cost that it takes to buy, deploy, support and adapt a certain server during it's lifecycle. So when evaluating servers you should look at the following costs:
There are two big problems with the "hardware choice does not matter much" kind of reasoning. The first is that the TCA is still a big part of the total TCO. For example this study[1] estimates that the price of buying the server is still about 40-50% of the TCO, while maintenance comprises a bit more than 10% and operation costs take about 40% of TCO pie. Thus we can't help but be wary when a vendor claims that a high price is okay, because the maintenance on his product is so much lower than the competition.
Secondly, certain hardware choices have an enormous impact on the rest of the TCO picture. One example is hot-spare and hot-swappable RAID arrays which on average significantly reduce the time that a server is unreachable. This will also become clearer as we dig deeper into the different hardware features of modern servers and the choices you will have to make.
RAS features
Studies done by IBM say that about 50% of the hardware failures are related to hard disk problems and 25% are due to a power supply failure. Fans with 8% are a distant third, so it is clear you need power supplies and hard disks of high reliability, the R of RAS. You also want to increase availability, the A of RAS, by using some redundancy for the most vulnerable parts of your server. RAID, redundant power supplies and fans are a must for a critical server. The S in RAS stands for Serviceability, which relates to hot-swappable/pluggable drives and other areas. Do you need to shut down the server to perform maintenance; what items can be replaced/repaired while keeping the system running? All three items are intertwined, and higher-end (and more expensive) servers will have features designed to improve all three areas.
Originally described by the Gartner group, TCO sounds like something that does not belong on a hardware enthusiast site. It has frequently been abused by managers and financial people who understand very little of IT to delay necessary IT investments, so many view it as a pejorative term.
However it is impossible to make a well thought-out server buying decision without understanding TCO, and many typical server hardware features are based on the idea of lowering TCO. Hardware enthusiasts mostly base their buying decision on TCA or Total Cost of Acquisition. The enthusiast motherboard and chipset business is a typical example of how to ignore TCO. As the products are refreshed every 6 months, many of the new features don't work properly, and you find yourself flashing the BIOS, installing new drivers and tweaking configurations before you hopefully get that RAID, Firewall or sound chip to work properly. Luckily you don't have to pay yourself for all the hours you spend....
TCO is a financial estimate of the total cost of buying and using a server. Think of it as the cost that it takes to buy, deploy, support and adapt a certain server during it's lifecycle. So when evaluating servers you should look at the following costs:
- The total cost of buying the server
- The time you will spend installing it in your network
- The time you will spend on configuring the software and remote management
- Facility management: the space it takes in your datacenter and the electricity it consumes
- The hours you spend on troubleshooting, reconfiguring, securing and repairing the server
- The costs associated with users waiting for the system to respond
- The costs associated with outages and failures, with users not being able to reach your server
- The upgrade costs and the time you spend on upgrading your server to meet new demands
- Cost of security breaches, etc.
There are two big problems with the "hardware choice does not matter much" kind of reasoning. The first is that the TCA is still a big part of the total TCO. For example this study[1] estimates that the price of buying the server is still about 40-50% of the TCO, while maintenance comprises a bit more than 10% and operation costs take about 40% of TCO pie. Thus we can't help but be wary when a vendor claims that a high price is okay, because the maintenance on his product is so much lower than the competition.
Secondly, certain hardware choices have an enormous impact on the rest of the TCO picture. One example is hot-spare and hot-swappable RAID arrays which on average significantly reduce the time that a server is unreachable. This will also become clearer as we dig deeper into the different hardware features of modern servers and the choices you will have to make.
RAS features
Studies done by IBM say that about 50% of the hardware failures are related to hard disk problems and 25% are due to a power supply failure. Fans with 8% are a distant third, so it is clear you need power supplies and hard disks of high reliability, the R of RAS. You also want to increase availability, the A of RAS, by using some redundancy for the most vulnerable parts of your server. RAID, redundant power supplies and fans are a must for a critical server. The S in RAS stands for Serviceability, which relates to hot-swappable/pluggable drives and other areas. Do you need to shut down the server to perform maintenance; what items can be replaced/repaired while keeping the system running? All three items are intertwined, and higher-end (and more expensive) servers will have features designed to improve all three areas.
32 Comments
View All Comments
AtaStrumf - Sunday, October 22, 2006 - link
Interesting stuff! Keep up the good work!LoneWolf15 - Thursday, October 19, 2006 - link
I'm guessing this is possible, but I've never tried it...Wouldn't it be possible to use a blade server, and just have the OS on each blade, but have a large, high-bandwith (read: gig ethernet) NAS box? That way, each blade would have, say (for example), two small hard disks in RAID-1 with the boot OS for ensuring uptime, but any file storage would be redirected to RAID-5 volumes created on the NAS box(es). Sounds like the best of both worlds to me.
dropadrop - Friday, December 22, 2006 - link
This is what we've had in all of the places I've been working at during the last 5-6 years. The term used is SAN, not NAS, and servers have traditionally been connected to it via fiberoptics. It's not exactly cheap storage, actually it's really damn expensive.To give you a picture, we just got a 22TB SAN at my new employer, and it cost way over 100000$. If you start counting price for gigabyte, it's not cheap at all. Ofcourse this does not take into consideration the price of Fiber Connections (cards on the server, fiber switches, cables ect). Now a growing trend is to use iScsi instead of fiber. Iscsi is scsi over ethernet and ends up being alot cheaper (though not quite as fast).
Apart from having central storage with higher redundancy, one advantage is performance. A SAN can stripe the data over all the disks in it, for example we have a RAID stripe consisting of over 70 disks...
LoneWolf15 - Thursday, October 19, 2006 - link
(Since I can't edit)I forgot to add that it even looks like Dell has some boxes like these that can be attached directly to their servers with cables (I don't remember, but it might be an SAS setup). Support for a large number of drives, and mutliple RAID volumes if necessary.
Pandamonium - Thursday, October 19, 2006 - link
I decided to give myself the project of creating a server for use in my apartment, and this article (along with its subsequent editions) should help me greatly in this endeavor. Thanks AT!Chaotic42 - Sunday, August 20, 2006 - link
This is a really interesting article. I just started working in a fairly large data center a couple of months ago, and this stuff really interests me. Power is indeed expensive for these places, but given the cost of the equipment and maintenance, it's not too bad. Cooling is a big issue though, as we have pockets of hot and cold air through out the DC.I still can't get over just how expensive 9GB WORM media is and how insanely expensive good tape drives are. It's a whole different world of computing, and even our 8 CPU Sun system is too damned slow. ;)
at80eighty - Sunday, August 20, 2006 - link
Target Reader here - SMB owner contemplating my options in the server routeagain - thank you
you guys fucking \m/
peternelson - Friday, August 18, 2006 - link
Blades are expensive but not so bad on ebay (as is regular server stuff affordable second user).
Blades can mix architecture eg IBM blades of CELL processor could mix with pentium or maybe opteron blades.
How important U size is depends if it's YOUR rack or a datacentre rack. Cost/sq ft is more in a datacentre.
Power is not just $cents per kwh paid to the utility supplier.
It is cost of cabling and PDU.
Cost (and efficiency overhead) of UPS
Cost of remote boot (APC Masterswitch)
Cost of transfer switch to let you swap out ups batteries
Cost of having generator power waiting just in case.
Some of these scale with capacity so cost more if you use more.
Yes virtualisation is important.
IBM have been advertising server consolidation (ie not invasion of beige boxes).
But also see STORAGE consolidation. eg EMC array on a SAN. You have virtual storage across all platforms, adding disks as needed or moving the free space virtually onto a different volume as needed. Unused data can migrate to slower drives or tape.
Tujan - Friday, August 18, 2006 - link
"[(o)]/..\[(o)]"Zaitsev - Thursday, August 17, 2006 - link
Fourth paragraph of intro.
Haven't finished the article yet, but I'm looking forward to the series.