Arm Announces Neoverse V1, N2 Platforms & CPUs, CMN-700 Mesh: More Performance, More Cores, More Flexibility

Name: Arm Announces Neoverse V1, N2 Platforms & CPUs, CMN-700 Mesh: More Performance, More Cores, More Flexibility
Item: Arm Announces Neoverse V1, N2 Platforms & CPUs, CMN-700 Mesh: More Performance, More Cores, More Flexibility
Author: Andrei Frumusanu

by Andrei Frumusanu on April 27, 2021 9:00 AM EST

95 Comments | Add A Comment

95 Comments

Eventual Design Performance Projections

Alongside the ISO-process node IPC, power and area projections, Arm also made projection of possible eventual implementations of the V1 and N2. These would naturally no longer be ISO-process, but the company’s expectations of what actual possible products might end up as in future designs.

The most important slide and disclosure in this regard is the fact that a Neoverse N2 design on TSMC’s 5nm is expected to achieve the same power as well as the same area as a TSMC 7nm Neoverse N1 design today.

In general, that’s a relatively large presumption, but could possibly pan out if the vendors are able to achieve a good implementation. We don’t have too many details as to the 7nm node generation of Amazon’s or Ampere’s current N1 chips, but I would assume that they’re baseline N7 – at least similar to that of what AMD uses in their EPYC 7002 and 7003 chips.

Still, a -40% power reduction from N7 to N5 is a very high goal and assumption to make. The only N5 chips we’ve had in-house to date, the Kirin 9000, and Apple A14, showcased only a rough 10% efficiency advantage over their N7P predecessors. N7P being roughly 15% better than N7, that’s still only somewhat 26% better efficiency.

Arm expects that the current generation N1 implementations to day to not have fully achieved their potential as it was the vendor’s first attempt with the IP. Arm expects that the following generations with more experience, better implementations with for example more metal layers, to be able to squeeze out more performance and power efficiency on the N5 node.

In terms of socket performance, Arm is expecting some very large generational gains versus a 64C N1 product today – it’s to be noted that these are Arm pre-silicon figures and not the Graviton2.

The “Traditional 2020” chips are the 24C Xeon 8268 and the 64C EPYC 7742. I would ignore the “Traditional 2021” parts here – Arm was aiming and estimating the performance of Intel’s newest 40C Ice-Lake and 64C Milan, however the presentation and figures here were integrated before AMD and Intel actually launched those systems – we have actual benchmark numbers in a custom graph below.

One metric Arm was focusing on was per-thread performance, where the “traditional” cores from AMD and Intel are falling short of the performance of Arm’s Neoverse cores.

Arm here is being somewhat sneaky in their presentation as they are trying to only focus on per-thread performance in cloud environments, where typically things operate on a vCPU basis, and essentially SMT-enabled designs from AMD and Intel naturally fall behind quite a lot in per-thread performance.

I can’t really blame Arm for depicting the performance figures like this – the cloud vendors today don’t really differentiate between real cores and SMT cores in vCPU environments, even having pricing that’s arguably unfair to SMT-enabled designs, which is why we’ve deemed Amazon’s Graviton2 m6g instances to vastly outperform AMD and Intel instances in terms of perf/W and perf/thread.

I wasn’t happy with Arm’s slides not including 1 thread per core performance figures for the SMT systems, so I included my own chart based on actual measured performance figures on the various platforms. The V1 and N2 figures use Arm’s performance scaling versus the Neoverse N1 datapoint, and I’ve baselined that to the Graviton2 scores we’ve measured earlier last year. Arm uses the same compiler flags as we do and also GCC 10.2, so the scores should also be compatible – with the only discrepancy being that Arm used 2MB page sizes.

The Neoverse V1 system uses 96 cores at 2.7GHz with 1MB L2 per core, on a 128MB 2GHz mesh, with 8 DDR5-4800 memory controllers. The N2 datapoint uses 128 3GHz cores at 1MB per L2, 96MB 2GHz mesh, with 10 DDR5-4800 memory controllers.

Arm’s per-thread performance lead doesn’t look that great here when looking at the 1T/C figures of AMD and Intel, but admittedly when in a vCPU scenario, Arm’s design would vastly outperform the SMT chips.

Generally speaking, the performance figures look good when it comes to per-socket performance, but generally that’s to be expected given the new 5nm process node and the more advanced memory controller technology in the projected figures.

AMD's next-generation Genoa should feature more massive performance jumps through the adoption of N5, DDR5, and transition away from their 14nm IO die. IPC and core count increases should also close the gap that’s depicted today. Intel’s next generation Sapphire Rapids should also improve the situation – albeit how that ends up depends on how much they’ll be able to squeeze out of 10nm SuperFin node in relation to what we’ve seen a few weeks ago on Ice Lake-SP.

Usually, I’m more open to Arm’s performance projections, however this time around the V1 and N2’s performance projections are extremely optimistic, especially since they’re completely dependent on the vendors achieving good implementations on N5 and actually reaching the projected 40% perf/W process node and implementation power efficiency gains. Based on what I’ve seen in the mobile space, I remain quite sceptical, and will be adopting a wait & see approach this time around.

The CMN-700 Mesh Network - Bigger, More Flexible First Thoughts & End Remarks

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

95 Comments

View All Comments

mode_13h - Tuesday, April 27, 2021 - link
> sample in the second half of 2022

Uh, that means new machines won't be using them until at least the end of next year. And if we want more cores than an ultraportable, it's still no good.
Raqia - Wednesday, April 28, 2021 - link
I wouldn't put it past them to do a desktop or server sized SoC eventually if they have a great in house core design that isn't a commoditized IP block that anyone can license from ARM. It would give them an advantage at the higher tiers of performance that they will want piece of for sure.

They also seem to be devoted to providing an open ARM computing platform in working with Linux developers and Windows when compared with Apple. That they added a hypervisor to the 888 should give you some indication to their future compute ambitions...
mode_13h - Wednesday, April 28, 2021 - link
> I wouldn't put it past them to do a desktop or server sized SoC

The already tried this, but their investors killed it. Lookup "Centriq". Building out a whole server infrastructure & ecosystem takes a lot of investment, and now they'd have established competitors with a multi-year lead.
Raqia - Wednesday, April 28, 2021 - link
I wasn't talking about servers (at least not right away), more consumer oriented and workstation scale compute. Amon did say that the designs they had in mind with Nuvia were "scalable" and that they were going to be addressing multiple markets.
mode_13h - Wednesday, April 28, 2021 - link
I hope you're right. If anyone can compete with Apple right now, it's probably Nuvia/Qualcomm.
name99 - Thursday, April 29, 2021 - link
You need three things to create a higher performance core than Apple
- designers (check)
- an implementation team (hmm. maybe? this means *enough* good people and superb simulation/design tools)
- management willing to pay the costs [design costs, and willing to accept a substantially larger core] (hmmmmmmmm? will they chicken out and assume no-one is willing to pay for such a core, they way they always have for watch, phone, then centriq?)

And Apple won't stand still...
mode_13h - Tuesday, April 27, 2021 - link
> so far except the HPE's A64FX

Gigabyte makes Altra motherboards and servers that I'm sure you can buy for less than a HPE A64FX-based machine.

And, if you're counting A64FX as a "consumer machine", you ought to include Avantek's Altra-based workstations that I mentioned below.
mode_13h - Tuesday, April 27, 2021 - link
> if these CPUs outperform the EPYC Milan technically AWS should replace all of them right ?

No, because a lot of people are still stuck on x86. Also, Amazon could be fab-limited, like just about everyone else. The sun might be setting on x86, but it's still a long time until dark.
Rudde - Tuesday, April 27, 2021 - link
An Avantek Ampere workstation might be available in a stand-alone system. Andrei expects Ampere to include N2 in their next gen systems instead of V1. Apple might also launch something in that segment in the coming years.
mode_13h - Tuesday, April 27, 2021 - link
A UK-based company called Avantek makes Ampere-based workstations. Their eMAG-based version was reviewed on this site, a couple years ago, and they now have one with Altra. So, I'd say better than average chances we might see one with a V1-based CPU by maybe the end of the year or so.

Arm Announces Neoverse V1, N2 Platforms & CPUs, CMN-700 Mesh: More Performance, More Cores, More Flexibility

Eventual Design Performance Projections

Post Your Comment

95 Comments

View All Comments

mode_13h - Tuesday, April 27, 2021 - link

Raqia - Wednesday, April 28, 2021 - link

mode_13h - Wednesday, April 28, 2021 - link

Raqia - Wednesday, April 28, 2021 - link

mode_13h - Wednesday, April 28, 2021 - link

name99 - Thursday, April 29, 2021 - link

mode_13h - Tuesday, April 27, 2021 - link

mode_13h - Tuesday, April 27, 2021 - link

Rudde - Tuesday, April 27, 2021 - link

mode_13h - Tuesday, April 27, 2021 - link

Log in

Don't have an account? Sign up now