The Intel Broadwell Review Part 2: Overclocking, IPC and Generational Analysis
by Ian Cutress on August 3, 2015 8:00 AM ESTOverclocking Broadwell
For any user that has overclocked an Intel processor since Sandy Bridge, there is not much new to see here. Overclocking, for those unfamiliar with the term, means adjusting the settings of the system to make a component run faster, typically outside its specifications and at the expense of power but with the benefit of a faster system.
There's is a large community around overclocking, with motherboard manufacturers giving special options to make overclocking easier, as well as bigger and better CPU coolers to move the extra heat generated away from the processor faster to keep it cool. Some users use liquid cooling, either prebuilt arrangements or custom designs, on either the processor or the graphics card or both. One original purpose to overclocking was to end up buying a cheap component and ending with performance similar to an expensive component. Since 2011, Intel now restricts overclocking to a few high end models, meaning that the goal is to make the fastest, faster.
Asking a processor to run faster than its specifications requires more power. This is usually provided in terms of voltage. This increases the power into the system and raises energy lost as heat in the system, which has to be removed, and power consumption goes up (usually efficiency also goes down). Financial services and high frequency trading is an example of an industry that relies on ultimate fast response times regardless of efficiency, so overclocking is par for the course to get better results and make the trade faster than the next guy. Typically we are then left with individuals who need to process work quicker, or gamers looking for a better frame rate or the ability to increase settings without losing immersion. There is a separate group of individuals called extreme overclockers that are not concerned with everyday performance and insist on pushing the hardware to the limit by using coolants such as liquid nitrogen to remove the extra heat (350W+) away. These individuals are on the precipice of stability, only needing to be stable enough to run a benchmark and compare scores around the word. The best extreme overclockers are picked up by PC component manufacturers to help build future products (eg HiCookie and Sofos at GIGABYTE, NickShih at ASRock, Coolice and Shamino at ASUS) or at retailers to build a brand (8Pack at OverclockersUK).
Extreme overclocking at MSI’s HQ
Here at AnandTech, we mainly focus on 24/7 stability (although I have roots in the extreme overclocking community) as our diverse readership ranges from the non-clockers to enthusiasts. This means a good high end air cooler or a liquid cooler, namely in this case either the Cooler Master Nepton 140XL liquid cooler in a push/pull configuration with the supplied fans or a 2kg TRUE Copper air cooler with a 150CFM Delta fan. Both of these are more than sufficient to push the hardware for general overclocking and 24/7 use (though I hesitate to recommend the TRUE Copper for a regular system due to its mass unless upright).
The Cooler Master Nepton 140XL
In our testing, we keep it relatively simple. The frequency of a modern Intel processor is determined by the base frequency (~100 MHz) and the multiplier (20-45+). These two numbers are multiplied together to give the final frequency, and our overclocking is performed by raising the multiplier.
The other variable in overclocking is the voltage. All processors have an operating voltage out of the box, known as the VID or stock voltage. In general, the processor architecture will have a stock voltage within a certain range, and processors with that architecture will fall on the spectrum. As time goes on, we might find that the average VID falls on new processors within the same architecture due to improvements in the manufacturing process, but it ultimately is the luck of the draw. When a faster frequency is requested, this draws more power and in order to remain stable, the voltage should be increased. Most motherboards have an auto calibration tool for voltage based on the set frequency, though these tend to be very conservative values to ensure all processors are capable. Users can adjust the voltage with an offset (e.g. +0.100 volts) or in most cases can set the absolute voltage (as in 1.200 volts). For a given frequency, there will be a minimum voltage to which the processor is stable, and the process by-and-large is a case of trial and error. When the system works, the frequency/voltage combination is typically tested for stability using stress tests to ensure proper operation, as well as probing temperatures of the system to avoid overheating which causes the processor to override the settings and induce a low voltage/frequency mode to cool down.
There is a tertiary concern in that when a processor is performing work, the voltage across the processor will drop. This can result in instability, and there are two ways to approach this - a higher initial voltage, or adjusting what is called the load line calibration which will react to this drop. Both methods have their downsides, such as power consumption or temperatures, but where possible most users should adjust the load line calibration. This ensures a constant voltage no matter the processor workload.
At AnandTech, our overclocking regime is thus - we test the system at default settings and acquire the stock voltage for the stock frequency. Then we set the processor multiplier at one higher than normal, and set the voltage to the round down to the 0.1 volt level (e.g. 1.227 VID becomes 1.200). The system is then tested for stability, which our case is a simple stability test consisting of the POV-Ray benchmark, five minutes on the OCCT stress test and a run of 3DMark Firestrike. If this test regime is successful, and the CPU remains below 95C throughout without overheating, we mark it as a success and raise the multiplier by one. If any test fails (either the system does not boot, the system gets stuck or we get a blue screen), we raise the voltage by 0.025 volts and repeat the process at the same multiplier. All the adjustments are made in the BIOS and we get an overall picture of how the processor performance and temperature scales with voltage.
Here are our results with the Broadwell i7-5775C in the MSI Z97A Gaming 6 :
Our top result was 4.2 GHz on all cores, reaching 80C. When we selected 4.3 GHz, even with another 0.300 volts, the system would not be stable.
To a number of people, this is very disappointing. Previous Intel architectures have over clocked from 4.4 GHz to 5.0 GHz, so any increase in base performance for Broadwell is overshadowed by the higher frequency possible on older platforms. This has been a recent unfortunate trend in the overclocking performance of Intel’s high end processors since Sandy Bridge:
Intel 24/7 Overclocking Expected Results in MHz | |||
Stock Speed | Good OC | Great OC | |
Sandy Bridge i7 | 3400 | 4700 | 4900 |
Ivy Bridge i7 | 3500 | 4500 | 4700 |
Haswell i7 | 3500 | 4300 | 4500 |
Broadwell i7 | 3300 | 4100 | 4300 |
Not mentioned in the table, but for Haswell a Devil's Canyon based processor (such as the i7-4790K) could yield an extra +100-200 MHz in temperature limited situations as we found during our testing.
It is worth noting at least two points here. When Intel reduces the process node size, the elements of the processor are smaller and removing the heat generated is more problematic. Some of this can be mitigated through the fundamental design of the processor, such as not having heat generating logic next to each other and then used in the program in quick succession to make a hotspot. However, if a processor is fundamentally designed as a mobile first platform, overclocking may not even be a consideration at the design phase and merely tacked on as a ‘feature’ to certain models at the end.
Other methods have been used in the past to increase overclockability, such as changing the thermal interface material between the processor and the heatspreader. Intel did this on its Devil’s Canyon line of processors as a ‘Haswell upgrade’ and most results showed that it afforded another 10ºC of headroom. To that extent, many users interested in getting the most out of their Haswell processors found the best ways to remove a heatspreader (voiding the warranty) but getting better overclocking performance.
With all that said, it is important to consider what we are dealing here with Broadwell. This is a Crystal Well design, which looks like this:
This is an image taken for us when we reviewed the i7-4950HQ, the first Crystal Well based processor aimed specifically for high powered laptops and all-in-one devices. On the left is the processor die, and on the right is the eDRAM die, both on the same package. The thing to note here is that when the heatspreader is applied, different parts of the package will generate different amount of heat. As a result, this needs to be planned in accordance with the design.
What I’m specifically getting to here is thermal paste application. Many users here will have different comments about the best way to apply thermal paste, and for those following the industry they will remember how suggested methods change over time based on the silicon in the package. For the most part, the suggested methods revolve around a pea-sized blob in the center of the processor and a heatsink with sufficient force to help spread the paste. This minimizes air bubbles which can cause worse performance.
As a personal side note, I heavily discourage the credit card/spreading method due to the air bubble situation. The only arrangement where the spreading application is used should be for sub-zero overclocking.
With Broadwell, I took the pea-sized blob approach, strapped on a big cooler, and went to work. Almost immediately the processor temperature under load rose to 90ºC, which seemed extremely high. I turned the system off, removed the cooler, and placed it back on without doing anything, and the temperature under load dropped a few degrees. After some trial and error, the best anecdotal temperature arrangement was for a line of thermal paste from top to bottom of the CPU (where the arrow on the corner of the CPU is in the bottom left).
Put bluntly, the reason why this method works better than the pea is down to where the heat generating spots on the CPU are. With a pea sized blob in the middle, with a slightly wrong mounting, it will spread to the eDRAM rather than over the processor. A line ensures that both are covered, transferring heat to the cooler.
Now I should note that this method is useful when you are in a temperature limited overclock situation. It would seem that our CPU merely would not go above 4.2 GHz, regardless of the voltage applied. But in terms of thermal management, thermal paste application became important again.
121 Comments
View All Comments
name99 - Monday, August 3, 2015 - link
Well think about WHY these results are as they are:- There is one set of benchmarks (most of the raytracing and sci stuff) that can make use of AVX. They see a nice boost from initial AVX (implemented by routing each instruction through the FPU twice) to AVX on a wider execution unit to the introduction of AVX2.
- There is a second set of benchmarks (primarily winRAR) that manipulate data which fits in the crystalwell cache but not in the 8MB L3). Again a nice win there; but that's a specialized situation. In data streaming examples (which better described most video encode/decode/filtering) that large L4 doesn't really buy you anything.
- There WOULD be a third set of benchmarks (if AnandTech tested for this) that showed a substantial improvement in indirect branch performance going from IB to Haswell. This is most obvious on interpreters and similar such code, though it also helps virtual functions in C++/Swift style code and Objective C method calls. My recollection is that you can see this jump in the GeekBench Lua benchmark. (Interestingly enough, Apple's A8 seems to use this same advanced TAGE-like indirect predictor because it gets Lua IPC scores as good as Intel).
OK, no we get to Skylake. Which of these apply?
- No AVX bump except for Xeons.
- Usually no CrystalWell
So the betting would be that the BIG jumps we saw won't be there. Unless they've added something new that they haven't mentioned yet (eg a substantially more sophisticated prefetcher, or value prediction), we won't even get the small targeted boost that we saw when Haswell's indirect predictor was added. So all we'll get is the usual 1 or 2% improvement from adding 4 or 6 more physical registers and ROB slots, maybe two more issue slots, a few more branch predictor slots, the usual sort of thing.
There ARE ideas still remaining in the academic world for big (30% or so) improvements in single-threaded IPC, but it's difficult for Intel to exploit these given how complex their CPUs are, and how long the pipeline is from starting a chip till when it ships. In the absence of competition, my guess is they continue to play it safe. Apple, I think, is more likely to experiment with these ideas because their base CPU is a whole lot easier to understand and modify, and they have more competition.
(Though I don't expect these changes in the A9. The A7 was adequate to fight off the expected A57; the A8 is adequate to fight off the expected A72; and all the A9 needs to do to maintain a one year plus lead is add the ARMv81.a ISA and the same sort of small tweaks and a two hundred or so MHz boost that we saw applied to the A8. I don't expect the big microarchitectural changes at Apple until
- they've shipped ARMv81.a ISA
- they've shipped their GPU (tightly integrated HSA style with not just VM and shared L3, but with tighter faster coupling between CPU and GPU for fast data movement, and with the OS able to interrupt and to some extent virtualize the GPU)
- they're confident enough in how wide-spread 64-bit apps are that they don't care about stripping out the 32-bit/thumb ISA support in the CPU [with what they implies for the pipeline, in particular predication and barrel shifter] and can create a microarchitecture that is purely optimized for the 64-bit ISA.
Maybe this will be the A10, IF the A9 has ARMv8.1a and an Apple GPU.)
Speedfriend - Tuesday, August 4, 2015 - link
"The A7 was adequate to fight off the expected A57;"In hindsight the A7 was not very good at all, it was the reason that Apple was unable to launch a large screen phone with decent battery life. Look at he improvements made to A8, around 10% better performance, but 50% more battery life.
Speedfriend - Tuesday, August 4, 2015 - link
"they've shipped their GPU" by the way, why do you expect them to ship their own GPU and not use IMG's. The IMG GPU have consistently been the best in the market.nunya112 - Monday, August 3, 2015 - link
by the looks of it. the 4790K seems to be the best CPU. until skylake that is. but even then I doubt there will be much improvementnunya112 - Monday, August 3, 2015 - link
unless u have the older ivy's then yeah maybe worth it ?TheinsanegamerN - Monday, August 3, 2015 - link
Nah. the older ivys can be overclocked to easily meet these chips. the IPC of broadwell is overshadowed by a 400mhz lower clock rate on typical OC. only reason to upgrade is if you NEED something on the new chipset or are running some nehalem-era chip.Teknobug - Monday, August 3, 2015 - link
Ivy's are the best overclockers.TheinsanegamerN - Monday, August 3, 2015 - link
Sandy overclocked better than ivy,Hulk - Monday, August 3, 2015 - link
Ian - Very nice job on this one! Thanks.Meaker10 - Monday, August 3, 2015 - link
A slight correction, on the image of crystal well it is the die on the left (the much larger one) which is the cache and the small one is the cpu on the right.