Hot Chips 2021 Live Blog: CPUs (Alder Lake, Zen3, IBM Z, Sapphire Rapids)

Name: Hot Chips 2021 Live Blog: CPUs (Alder Lake, Zen3, IBM Z, Sapphire Rapids)
Item: Hot Chips 2021 Live Blog: CPUs (Alder Lake, Zen3, IBM Z, Sapphire Rapids)
Author: Dr. Ian Cutress

by Dr. Ian Cutress on August 23, 2021 10:30 AM EST

39 Comments | Add A Comment

39 Comments

11:40AM EDT - Welcome to Hot Chips! This is the annual conference all about the latest, greatest, and upcoming big silicon that gets us all excited. Stay tuned during Monday and Tuesday for our regular AnandTech Live Blogs. Today we start at 8:45am PT, so set your watches and notifications to return back here! The first set of talks is all about CPUs: Intel Alder Lake, AMD Zen 3, IBM Z, and Intel Sapphire Rapids.

11:45AM EDT - The stream should be starting momentarily

11:45AM EDT - It usually starts with 15 minutes of pre-show info to begin

11:47AM EDT - Here we go

11:48AM EDT - Apparently some attendees are having issues with too many from the same company on the same VPN

11:49AM EDT - There's a slack channel for all attendees

11:49AM EDT - Behind the scenes

11:51AM EDT - Lots of members on the committees

11:51AM EDT - Selecting the best talks

11:51AM EDT - These people identify keynote speakers, solicit papers for talks

11:52AM EDT - Tutorials were yesterday

11:54AM EDT - Three keynotes

11:55AM EDT - Synopsys is on AI in EDA

11:55AM EDT - Skydio on autonomous flight

11:55AM EDT - DoE on AI Chips and challenges

11:56AM EDT - 'Chips enabling Chips'

11:57AM EDT - For those attending

11:57AM EDT - Posters as part of the conference as well

11:58AM EDT - First session is CPUs, about to start

11:59AM EDT - 'State of the art CPUs'

11:59AM EDT - Efi Rotem for Intel on Alder Lake

12:00PM EDT - The why and how of Alder Lake

12:00PM EDT - Most apps are Single or lightly MT

12:01PM EDT - Increase in support of ML

12:01PM EDT - Working on smarter structures and new instructions for ML

12:01PM EDT - Duplicating multicore

12:02PM EDT - Moores Law and Dennard Scaling

12:02PM EDT - Same arch, different uArch, different opimization point

12:03PM EDT - This is what we saw in the Alder Lake part of the Architecture Day

12:03PM EDT - P-Core and E-Core

12:04PM EDT - E-core has shared L2

12:04PM EDT - P-core is +50% ST performance over the E-core

12:05PM EDT - Scalable SoC architecture

12:05PM EDT - UP3/UP4 for mobile, Desktop

12:05PM EDT - 2+8, 6+8 and 8+8 for P-core + E-core

12:06PM EDT - modular design

12:06PM EDT - mix and match for future products

12:06PM EDT - 96 EUs on mobile, 32 EUs on desktop

12:07PM EDT - Only mobile will get native Thunderbolt

12:08PM EDT - Smartness is built into the hardware

12:08PM EDT - Thread Director is mostly for Window 11

12:09PM EDT - Onboard microcontroller

12:10PM EDT - Thread Director will predict the class of workload and bucket it the classes for the OS scheduler on the oder of 30 microseconds

12:10PM EDT - Core-to-Core IPC is the main metric

12:11PM EDT - Intel EHFI

12:12PM EDT - This is more detail about Thread Director

12:12PM EDT - So every processor gets a section in the table, and it has a value for Perf and Efficiency, and workload is compared

12:14PM EDT - Sometimes it makes sense to coalesce a software thread to fewer cores, or one type of core

12:14PM EDT - Thread Director Table updated less often than thread classification

12:14PM EDT - OS has idea of priority of thread

12:15PM EDT - OS scheduler is final arbiter

12:15PM EDT - Table is topology agnostic

12:17PM EDT - Here's a scheduling example

12:18PM EDT - Helps with asymmetry between the threads

12:19PM EDT - All AI workloads go to P-Cores over anything else

12:19PM EDT - AVX + VNNI / INT8 get highest priority over anything

12:21PM EDT - EPP - Energy Performance Preference also takes a role in input to the scheduler

12:21PM EDT - For power constrained systems

12:21PM EDT - higher priority gets higher voltage and frequency regardless of P-core and E-core

12:22PM EDT - optimal P/V point is a function of phyiscal properties (thermal, binning)

12:22PM EDT - Q&A time

12:23PM EDT - Q: Security of side channel attacks with Thread Director A: No security effect, only performance

12:24PM EDT - Q; Die photo, PCIe - how many PCIe 5/4/3 lanes? A: As shown, slide 11, 16x PCIe 5, 4x lanes of PCIe 4, Desktop has PCH

12:25PM EDT - Q: TDT for Linux, when? A: First enabling was Windows 11, work with Linux for time - it is coming, which version and build will be published later

12:25PM EDT - Now AMD Zen 3 talk

12:26PM EDT - Mark Evers from AMD

12:26PM EDT - The Zen Journey from 2017

12:27PM EDT - New era in the market for AMD

12:27PM EDT - Zen3 says AMD 3D Cache support

12:27PM EDT - Exceeding Industry Trends

12:28PM EDT - Scale-out for servers and supercomptuers

12:28PM EDT - Socket compatibility for past products

12:29PM EDT - 4k op cache

12:30PM EDT - +19% IPC gains, which we verified at launch

12:31PM EDT - Large chunk of performance gain from the front end fetch/decode

12:31PM EDT - reduced bubble cycle latency

12:32PM EDT - supporting wider execution

12:32PM EDT - lower latencies for some instructions

12:33PM EDT - 10 issue per cycle up from 7

12:33PM EDT - More execution bandwidth ILP extraction

12:33PM EDT - Disaggregated the ALUs rather than just add more

12:34PM EDT - Without any additional increase in register file ports

12:34PM EDT - larger 6-wide FP unit

12:34PM EDT - Faster 4-cycle FMAC

12:34PM EDT - Reduced FMA latency

12:34PM EDT - Doubled INT8 throughput

12:35PM EDT - larger load-store

12:35PM EDT - L2 DTLB has 6 page walkers

12:36PM EDT - Changes from Zen2

12:36PM EDT - Removed bubble cycle with branch prediction

12:37PM EDT - Back on track faster when mispredict

12:37PM EDT - Quicker switching with I-cache overflow

12:38PM EDT - How AMD calculated IPC uplift

12:39PM EDT - Enterprise security additions

12:39PM EDT - SEV, SEV-ES, SEV-SNP

12:39PM EDT - SNP is the new feature for Zen 3

12:40PM EDT - Eliminates page table attack vectors through VMs/hypervisors

12:40PM EDT - No application modification needed

12:40PM EDT - New instruction support

12:41PM EDT - Double L3 cache

12:42PM EDT - access from cores, better for gaming

12:42PM EDT - reduction in effective L3 memory latency

12:42PM EDT - 2x32B data channels in opposite directions

12:43PM EDT - L3 is an non-inclusive cache

12:43PM EDT - L2 tags in L3

12:43PM EDT - support 192 misses from L3 to memory

12:44PM EDT - Built in support for AMD V-Cache

12:44PM EDT - Already demoed +64 MB L3

12:44PM EDT - +15% faster on gaming

12:47PM EDT - Ryzen performance gains in the same TSMC 7nm

12:47PM EDT - All from uarch and physical design

12:47PM EDT - Gaming was a main target for Zen 3

12:48PM EDT - Performance that matters for the user

12:49PM EDT - Summing up

12:49PM EDT - Zen4 by end of 2022

12:49PM EDT - On track in TSMC N5

12:50PM EDT - Time for Q&A

12:51PM EDT - Q: V-Cache is applicable all the segments, all just for desktop/server A: Lot of different workloads, benefit from v-cache, Havenlt announced specific products with v-cache, but some workloads across segments that benefit

12:52PM EDT - Q: Primary motication for tripling table walkers A: some workloads with large DRAM access footprint with outstandling TLB misses. Lots of workloads won't need more than 2, but benefits a few pages, but a clever way to add more without excessive

12:54PM EDT - Q: Is the chiplet technology technology scalable? A: When it comes to the 3D Vcache - latency is not large. For chiplets of having CCDs and IODs, it can give you more flexibility than monolithic. Build best products with chiplets

12:55PM EDT - Next presentation is from IBM

12:55PM EDT - IBM Telum Processor

12:55PM EDT - Optimized from AI

12:55PM EDT - Next Gen Z processor

12:55PM EDT - This would be IBM z16, but this one gets a name

12:56PM EDT - How often do you use a mainframe? Probably yes if you've used your credit card

12:57PM EDT - Chief Architect

12:57PM EDT - Starting with background on IBM z

12:57PM EDT - large part of IT infrastructure

12:58PM EDT - Even startups with NFTs

12:58PM EDT - Insights with AI models are needed

12:58PM EDT - Telum is based for this

12:59PM EDT - Enterprise workloads sensitive to ST performance and scalability

12:59PM EDT - Lots of workloads are heterogeneous

01:00PM EDT - Need embedded accelerators

01:00PM EDT - New AI accelerator

01:00PM EDT - New cache hierarchy and fabric design

01:00PM EDT - Encrypted memory and trusted execution environment

01:00PM EDT - Enclaves

01:01PM EDT - Reilability and Availability, 7x9 on z15

01:02PM EDT - 8 cores + 4 MB L2

01:02PM EDT - 5 GHz+ with SMT2

01:02PM EDT - New branch prediction

01:02PM EDT - 270000 branch target table entries

01:02PM EDT - Private 32 MB of L2 cache

01:03PM EDT - 19-cycle load-use latency

01:03PM EDT - Moved away for shared L3 and off-chip L4

01:04PM EDT - 4 pipelines on L2 allowing overlapped traffic

01:04PM EDT - L3 and L4 are now virtual

01:05PM EDT - 320 GB/s ring bandwidth

01:06PM EDT - 8 GB L4 cache

01:07PM EDT - 2-cycle transfer path between chips

01:07PM EDT - 2:1 sync clock grids

01:08PM EDT - 8-chip has flat topology - direct connect to all 8 chips

01:08PM EDT - 40% socket performance over z15

01:08PM EDT - Some of this comes from the AI workload

01:09PM EDT - AI algorithms make machines more efficient

01:09PM EDT - Using the AI to increase security

01:10PM EDT - very low inference latency - every core has access

01:11PM EDT - Kinda like the Centaur CNS core

01:12PM EDT - 6 TFLOPs per chip per AI accelerator

01:13PM EDT - Accelerator is extensible with firmware releases as AI evolves

01:13PM EDT - New instructions for AI accelerator

01:13PM EDT - simpler instructions

01:13PM EDT - But you have to use the libraries to use the instructions

01:14PM EDT - supports virtualization and memory translation

01:14PM EDT - Manages all the data with the new instructions

01:14PM EDT - 6 TF per chip, 200 TF in 32-chip system

01:15PM EDT - 8-way SIMD engine, 128 tiles, MAC array on MatMul and convolution. 32 tioles for activation

01:15PM EDT - focused on FP16

01:16PM EDT - 100 GB/s bandwidth to the AI accelerator

01:16PM EDT - 100 per core, 600 total

01:16PM EDT - Software is all through ONNX

01:17PM EDT - TensorFlow or through IBM Deep Learning Compiler through ONNX

01:18PM EDT - Client proxy model performance

01:19PM EDT - Samsung 7nm, 530 sq mm, 22.5B transistors

01:19PM EDT - 5 GHz+ base clock frequency

01:20PM EDT - Q&A time

01:21PM EDT - Q: Use ring for AI accelerator? A: yes

01:21PM EDT - Firmware does additional management through dedicated buses

01:22PM EDT - Q: Packaing technology for dual die A: Standard, no bridges, put them close together, less than 0.5mm, some intresting thermal and mechanical, but signalling is through the package. Cool innovation on signalling due to clock synchronization.

01:23PM EDT - Q: BW of Inter-socket and Intra-drawer links A: 320 GB/s between chips, draw in each direction is 45 GB/s

01:25PM EDT - Q: Memory ordering preserved between cores and accelerators - A: magic! keep track of data, on a cache miss, broadcasts, memory state bits tracked to broadcast further out, even go across whole system, when data arrives, have to make sure it can be used, invalidate all other copies confirmed before working on data

01:29PM EDT - Q: How does Telum maintain linear scaling A: lots of scaling work on generic workloads, fabric design etc, optimize latency between chips and across drawers, that's standard. Investment in latency and bandwidth. AI chart shows almost perfect linear, because those are parallel tasks, and data is local

01:30PM EDT - Now for Sapphire Rapids from Intel

01:31PM EDT - Laucnhing 1H 22

01:32PM EDT - Xeon is optimized for performance and CD Perf

01:33PM EDT - Still calling the cores P-cores even though there's no E-cores

01:33PM EDT - Modular SoC architecture

01:34PM EDT - CXL 1.1

01:34PM EDT - Virtualization and VM telemetry

01:34PM EDT - Low Jitter Architecture

01:34PM EDT - Next-Gen QoS

01:35PM EDT - THIS IS SAPPHIRE RAPIDS

01:35PM EDT - Here's the die shot

01:36PM EDT - We've been told there is two types of tile on SPR

01:36PM EDT - Every thread has full access to all resources on all tiles

01:37PM EDT - NUME Clustering

01:37PM EDT - NUMA*

01:37PM EDT - UPI 2.0

01:38PM EDT - One of the issues here is that each tile has two memory channels - we've been told each SPR core will have 8 channels, that means each SPR product will have to have 4 tiles

01:39PM EDT - New instructions

01:39PM EDT - AMX for Matrix

01:39PM EDT - AIA instructions for Acceelrators

01:39PM EDT - HFNI for FP16 half precision

01:39PM EDT - CLDEMOTE

01:40PM EDT - Accelerator Engine improvements

01:41PM EDT - Avoid kernel mode overheads with AIA

01:41PM EDT - Providing base functions for deployment of acceleration engines

01:42PM EDT - DSA and QAT

01:42PM EDT - Doubled QAT

01:42PM EDT - Still requires a chipset

01:43PM EDT - ZLIB L9 98% offload

01:44PM EDT - Dynamic Load Balancer

01:44PM EDT - 400M load balancing decisions per second

01:44PM EDT - important for QoS

01:44PM EDT - ideal for packet processing and microservices

01:45PM EDT - 4 x24 UPI links at 16 GT/s (four PCIe 4.0 x16 links for multisocket)

01:46PM EDT - >100 MB LLC

01:46PM EDT - 8 memory channels

01:47PM EDT - Optane 300-series support

01:47PM EDT - SPR+HBM

01:47PM EDT - connected over EMIB

01:47PM EDT - Flat mode and caching mode with DRAM

01:47PM EDT - Can also support optane

01:48PM EDT - INT8 improved through a new accelerator

01:49PM EDT - Industry standard frameworks for CPU based training and inference

01:49PM EDT - Large focus on microservices from initial design

01:50PM EDT - AIA to help service startup time

01:51PM EDT - Scalability with a monolithic view

01:51PM EDT - 10 lots of EMIB

01:51PM EDT - Q&A time

01:52PM EDT - Q: HBM Cache mode have a map? Where do you keep the tags if the HBM is a cache? A: No details quite yet

01:53PM EDT - Q: How is the AI perf of AMX compared to A100? A: No comparison yet

01:54PM EDT - Q: Intel CPU support DDIO, if HBM is cache, where does Data go first? A: Go to L3.

01:55PM EDT - Q: CXL - IBM did CAPI. Can you compare CXL to CAPI? A: Intent of CXL is similar to CAPI. CXL has IO similar to PCIe, but also can consider Accelerators with their own caches

01:55PM EDT - Intel will support CXL.mem in future products, not SPR

01:56PM EDT - Q: Interdie crossing latency A: low single digit nanosecond, little different between vertical and horizontal due to rectangular design

01:57PM EDT - DSA and QAT look like PCIe devices, require drivers (not bare metal), but they are part of the AIA framework. Works with AIA instructions, work with virtualization, but they look like PCIe devices

02:00PM EDT - .

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

39 Comments

View All Comments

arashi - Monday, August 23, 2021 - link
They can't even power it on/run workloads on it, every single vague chart/graph is simulated.
eastcoast_pete - Monday, August 23, 2021 - link
Thanks Ian! Question about IBM's Telum CPU for mainframe being fabbed at Samsung: Is Samsung considered a "Trusted foundry"? If not, quite a number of US government agencies cannot use (buy or lease ) a mainframe with a Telum inside.

On a different subject: How many people from Apple attend this conference? Reason I ask is that Apple least in the past, basically behaved like a parasite, as they never present anything at this and similar meetings. They typically take a lot of notes and ask questions, but it's all take, and no giving of information. If I am mistaken about Apple presenting, please correct me; would be nice to know they actually show signs of good corporate citizenship.
name99 - Tuesday, August 24, 2021 - link
(a) The Samsung question is very interesting! I'd be curious as to how that plays out.

(b) At least when I was at Apple (before Apple got into the CPU design business), plenty of Apple people attended. Your outrage is more based on ignorance than reality.

- Apple explain plenty of how their designs work if you make the effort to spelunk through the patents and run some experiments.

- BUT their design is what you would get if you started with a clean slate in say 2005, with strong opinions (that have been validated) as the how frequency vs power vs density will play out over the next few generations of process. Their design will not help anyone who's unwilling to burn their existing design and start from scratch.

- There is very little in their design that I had not previously encountered somewhere in the academic literature. They benefited massively from ZERO NIH concerns. You may examining the literature is obvious. It's not. So many good ideas were published 20 years ago (plenty of them sponsored by Intel) but Intel's management, in their wisdom, have not been interested in restructuring their designs to the extent necessary to exploit those ideas.

- Which gets us to the final point. You'd be stunned, when you look at the details, at how much Apple changes (ie is willing to change) every design. Their have been three big generations, the first one being internal PA Semi stuff ending at the A6; then A7..A10; then A11..A14. My guess is A15 begins a new generation.
Each generation is a huge visible change (eg A7 added 64b; A11 added clustering and everything that flows from that, and removed 32b). But it's also a massive design change. Apart from that, the annual changes are frequently, and silently, much larger than the sorts of things we see in these HotChips talks.
You have to have a team [and management!] that are willing to make these massive annual changes, plus a set of tools to validate the changes are worth doing, plus a set of tools to help implement the changes.

I used to think somewhat like you. Not any more. Apple didn't get to their position by some sort of nefarious tricks whereby they "stole" ideas in some way that prevented their use by others, and they aren't keeping their tricks secret. They got to where they are by
- very deep knowledge of the literature
- an imagination to combine ideas from many many places
- a willingness to take risks in the sense of constant redesign.

One way in which the best parts of Apple work well is that design and UI is separate from implementation (as individuals, not as collaborators). This has a VERY important (and under-appreciated) effect: the designers design for what would be great UI, and what the HW is capable of, but they don't have to do the work!
This is SO important. When the engineer is the designer, you always consider an idea in terms of "oh god, that sounds like so much hard work, so many changes". The Apple split means you rarely suffer from that failure mode: rather than engineers dismissing their own ideas bcs a few minutes thought suggests it's a lot of work, they are constantly being forced to implement good ideas -- and often discovering those good ideas can be implemented without nearly as much work as they imagined, or as part of a grand redesign that's worth doing because of so much more it opens up.

My GUESS is that Apple's CPU design works in much the same way, that there are a few lofty theorists, extremely familiar with the academic literature, who are constantly revisiting previous ideas and simulations and asking "why don't we change the register allocator in this way? the current scheme for sharing registers is OK, but look at this new scheme I thought up; etc etc"
The next level down of engineers probably groan and push back against every one of these ideas, but the important point is that in Apple all the weight is on the side of the grand designers, none on the side of the poor engineers who have to do the implementation.

In a way this is just the latest version of a computing argument as old as time. When do you stick with the existing, tried and true code base/design; and when do you engage in huge changes? Since the mid-70s Apple has been defined by being willing to engage in the huge changes, and pay the price of constant low-level irritation every year (every year many things are fixed but a few other things break). Since the same time both halves of Wintel have been defined by not engaging in large changes, by engaging mainly in minimal changes. For a few years Intel engaged in aggressive internal design changes even as the ISA was not changed much (think of 386 to 486 to Pentium to PPro) but not much since about Nehalem.
Meanwhile the classic MS mentality has been expressed by Joel Spolsky in many ways, not least here: https://www.joelonsoftware.com/2001/10/14/in-defen... and here https://www.joelonsoftware.com/2000/04/06/things-y...

I'm not interested in arguing about the extent to which Spolsky (or MS or Intel) or justified in their behavior. My point is the poster's original claim, that Apple is not sharing; and my claim that the issue is not that Apple is keeping secrets, it's that all the other companies find it (every year) easier to just evolve the existing design a little more in a few directions than to tear it all down and start again. Apple publishing a hundred papers would not change that...

It's interesting to compare this with semiconductor processes. Aren't I a hypocrite for complaining that Intel are too timid in redesigning their micro-architecture while also complaining that they should follow TSMC in how they design their process?
I think the differences is Intel's process failures (IMHO) result not so much from big leaps as from marketing/finance driven decisions.
The difference between INTC and TSMC that matters is that TSMC STFU until it has something validated along every scary dimension. If something CANNOT be validated yet, it is postponed (cf, eg, GAA on N3). Intel, on the other hand (for reasons that make zero sense to me) insists on claiming, well before the scheme is validated at a manufacturing level, that it will deliver technology X on date Y. Then they find themselves locked into that promise even when it makes no sense.
Would TSMC's cautious half-nodes help Intel? Well, not if Intel insisted on still describing every half node step they plan for the next ten years. (Wait, isn't that what they already DID with Intel 7, Intel 5, Intel 4, ...?) The issue is not that TSMC is making cautious half steps while Intel is rebuilding the process from scratch each generation; it's that TSMC is using, for each process, a suite of technologies that have all been validated, separately and together in the lab; while Intel is using, for each process, a suite of suite of technologies that have all been validated, separately and together, on a marketing slide five years ago.
eastcoast_pete - Tuesday, August 24, 2021 - link
While I actually agree with you on a number of your points, my criticism of Apple was not that they don't disclose at least some of their hardware designs somewhere (patents actually require that, after they have been granted). Rather, it was and is about them never (AFAIK) presenting at Hot Chips; they absolutely attended, at least in the past. . One of the attractions of such meetings is that attendees can ask presenters questions; and, often enough, "I can't talk about that" is also an important answer.
name99 - Tuesday, August 24, 2021 - link
This is not a technical point, so take it as you wish, but I would urge you to look inward as to why you ACTUALLY care about how Apple behaves in this respect.
We've agreed that there is nothing going on that deserves the term parasite, no "unfair" witholding of information by Apple, no insights that couldn't be acted upon by others.

And if you don't have the energy to work through patents and run your own tests to know how these CPUs work, well in the past people like Agner or Henry Wong provided the real, serious info at a much deeper level than HotChips talks, and for M1 people like Andrei, Dougall, and I have been doing the same thing, with deep dives published in various places.

So look into yourself, look past the tribalism and mindlessness, and ask what you're REALLY upset about.
My guess is that you want to bring the future forward; you want to experience that thrill of knowing what's new in the A15 now, not when Apple has their iPhone event in three weeks. You want to know today, not some time next year, how Apple will solve the issue of scaling up M1-sized concepts to the requirements of a Mac Pro.
And that's perfectly human, we all want to know the future. But you have to realize that, in this particular sort of case, it's something like an addiction. You'll get a one-time thrill of knowing 2022's design in 2021. But then what? Now, in 2022 you will want 2023's design. After that one hit, you're still limited to only learning one year's worth of new design every year. Neither your epistemic situation, nor your level of joy, have actually improved. And if your chosen company, like Intel, submits to this addiction, things go south really fast, with Intel, every year, trying to provide more than one year's worth of future prophecy beyond what they did last year, till they're demoing utterly meaningless ten year projections. This all ends like any addiction ends.

If you want to get a constant thrill of what might be coming, don't demand it from a company that has to produce real products; that simply cannot end well either for you (with a drastically telescoped future) or the company (locked into a roadmap that may make ever less sense). Instead read the stuff that *might* happen, but doesn't lock down the future -- read the academic literature, read what IMEC is doing.
Oxford Guy - Tuesday, August 24, 2021 - link
Apple’s business model has been about speeding up planned obsolescence since the Apple III.

(Demoing the Mac using a superior not-for-sale prototype surreptitiously is just one symptom of that.)
TristanSDX - Monday, August 23, 2021 - link
Great dissapoitment. For ADL, on such conference I expected great detail of core design, instead there were replay of pretty shallow marketing info, and explanation of ThreadDirector. Crap and waste of time.
abufrejoval - Wednesday, August 25, 2021 - link
Some things don't seem to change, ever, like the z/Arch chips: Tons of really good ideas, but useless, because they stay hell bent on selling to a very affluent niche.

They've stuck with their mainframe snake oil since water cooled ECL, even when they went CMOS under the cover and yet for most credit card companies I know, their addiction was never really about the hardware, but the software stack. That software stack could run on the very same power chips that runs the i-Series (or ARM for that matter) quite quickly and reliably enough for pretty much everyone.

You won't get these chips manufactured cheaply, but there is no technical hurdle to doing a lesser "E-Core" variant of z-Arch. And had they done so years ago, AMD64 might have never happened.

And on that front: I'd have never thought I'd see a AMD HotChips presentation *that* boring. I think there wasn't a single bit of news in all that and they got caught in a very awkward moment of their product roadmap. (And I don't forgive them, that they made all those VM encryption options "server only": That is a move so stupid, I want to fire someone)

It made all the trumpeting from Intel almost look impressive: Somebody sure thinks that there are major doubts on Intel's attractiveness in corporate/cloud decision makers mainds. They sure fire from all cannons, but it still sounds like stage thunder.
SystemsBuilder - Thursday, August 26, 2021 - link
My take on it, as someone who attended Hot Chips and this session live:
I was hugely disappointed by Intel's presentations. completely marketing department controlled - pretty much a rerun of intel's architecture days. I would even say that the way the intel presenters were speaking (monotone unengaging tone and very controlled sentences) it was 100% scripted and they never even one went of script or expanded outside what has already been release at the architecture days. not even in the Hot Chips Slack chat channel - 100% marketing messaging controlled. I feel bad for the Intel engineers being on super tight leach from they masters at the Marketing department.

AMD did a much better job and it was a quite exciting presentation that actually released new exciting information.

my conclusion of this session together with the Packaging session was that TWSC is at least 2-4 years a head of intel in packaging technology and that means AMD will continue to be 2-4 years ahead for the foreseeable future in terms of core scaling and performance... I remember Intel presenter said something defensive since he was presenting directly after the TSMC presenter like: We are focusing on packaging technology "at scale" clearly feeling the need to differentiate with that towards TSMC since his presentation was 2-3 years behind TSMC in pure tech terms - in my view.

Hot Chips 2021 Live Blog: CPUs (Alder Lake, Zen3, IBM Z, Sapphire Rapids)

Post Your Comment

39 Comments

View All Comments

arashi - Monday, August 23, 2021 - link

eastcoast_pete - Monday, August 23, 2021 - link

name99 - Tuesday, August 24, 2021 - link

eastcoast_pete - Tuesday, August 24, 2021 - link

name99 - Tuesday, August 24, 2021 - link

Oxford Guy - Tuesday, August 24, 2021 - link

TristanSDX - Monday, August 23, 2021 - link

abufrejoval - Wednesday, August 25, 2021 - link

SystemsBuilder - Thursday, August 26, 2021 - link

Log in

Don't have an account? Sign up now