Intel at Hot Chips 2018: Showing the Ankle of Cascade Lakeby Ian Cutress on August 19, 2018 7:30 PM EST
Purley Mark Two: Cascade Lake-SP
On the processor front, the on-paper hardware specifications of Cascade Lake Xeons offer no surprises, mainly because the stock design is identical to Skylake Xeons. Users will be offered up to 28 cores with hyperthreading, the same levels of cache, the same UPI connectivity, the same number of PCIe lanes, the same number of channels of memory, and the same maximum supported frequency of memory.
Questions still to be answered will be if the XCC/HCC/LCC silicon dies, from which the processor stack will come, will be the same. There is also no information about memory capacity limitations.
What Intel is saying on this slide however is in the second bullet point:
- Process tuning, frequency push, targeted performance improvements
We believe this is a tie-in to Intel improving its 14nm process further, tuning it for voltage and frequency, or a better binning. At this point Intel has not stated if Cascade Lake is using the ‘14++’ process node, to use Intel’s own naming scheme, although we expect it to be the case. We suspect that Intel might drop the +++ naming scheme altogether, if this isn’t disclosed closer to the time. However a drive to 10% better frequency at the same voltage would be warmly welcomed.
Where some of the performance will come from is in the new deep learning instructions, as well as the support for Optane DIMMs.
AVX-512 VNNI Instructions for Deep Learning
The world of AVX-512 instruction support is completely confusing. Different processors and different families support various sets of instructions, and it is hard to keep track of them all, let alone code for them. Luckily for Intel (and others), companies that invest into deep learning tend to focus on one particular set of microarchitectures for their work. As a result, Intel has been working with software developers to optimize code paths for Xeon Scalable systems. In fact, Intel is claiming to have already secured a 5.4x increase in inference throughput on Caffe / ResNet50 since the launch of Skylake – partially though code and parallelism optimizations, but partially though reduced precision and multiple concurrent instances also.
With VNNI, or Vector Neural Network Instructions, Intel expects to double its neural network performance with Cascade Lake. Behind VNNI are two key instructions that can optimized and decoded, reducing work:
Both instructions aim to reduce the number of required manipulations within inner convulsion loops for neural networks.
VPDPWSSD, the INT16 version of the two instructions, fuses two INT16 instructions and uses a third INT32 constant to replace PMADDWD and VPADD math that current AVX-512 would use:
VPDPBUSD does a similar thing, but takes it one stage back, using INT8 inputs to reduce a three-instruction path into a one-instruction implementation:
The key part from Intel here is that with the right data-set, these two instructions will improve the number of elements processed per cycle by 2x and 3x respectively.
Framework and Library for these new instructions will be part of Caffe, mxnet, TensorFlow, and Intel’s MKL-DNN.