Several years ago, at a local event detailing a new Arm microarchitecture core, I recall a conversation I had with a number of executives at the time: the goal was to get Arm into 25% of servers by 2020. A lofty goal, which hasn’t quite been reached, however after the initial run of Arm-based server designs the market is starting to hit its stride with Arm’s N1 core for data-centers getting its first outings. Out of those announcing an N1-based SoC, Ampere is leading the pack with a new 80-core design aimed at cloud providers and hyperscalers. The new Altra family of products is aiming to offer competitive performance, performance per watt, and scalability up to 210 W and tons of IO, for any enterprise positioning.

The Arm Server Market: 2010-2019 (Abridged)

We’ve seen companies such as Broadcom/Cavium/Marvell, Calxeda, Huawei, Fujitsu, Phytium, Annapurna/Amazon, AppliedMicro/Ampere, and even AMD put Arm-based IP into silicon and subsequently into the server market. Up until recently, most designs have been fairly lackluster – with companies either developing their own core on an Arm architecture license and not getting a performance lift, or using the standard Arm cores and not finding the right mix of performance, power, and software uptake needed to drive home the design. As a result, we’ve seen multiple companies fall by the wayside, be acquired, or limit their activities to specific customers and keep very hush-hush.


First Generation Ampere eMAG, Built on Applied Micro designs

A big example of the ‘be acquired’ type of company was Annapurna, whom Amazon acquired and eventually released its Graviton2 processor in recent months. This chip has 64 cores based on Arm’s N1 design, which is the leading microarchitecture layout for Arm server chips at this point. To that end, Ampere (who originally purchased Applied Micro) is now set to release its second generation product, with 80 of the N1-based cores, and it now has a name: Altra.

Ampere Altra

Ampere has already given a number of details away about Altra in an announcement late last year, however this time around we have concrete details and the company has performance projections. On the back of its first generation eMAG product, Ampere is looking to offer better-than-Graviton2 performance to any cloud provider or hyperscaler who isn’t called Amazon, given that Graviton2 is built by Amazon and only available to Amazon. In that regard, Ampere has taking Arm’s full recommendations for its N1 design, building a chip with the most number of cores that N1 is designed to support.

As with other N1-based products, Altra will be single threaded, ensuring that each thread has its own core, its own resources, and removing any potential core-sharing thread security issues that have occurred recently. The Altra SoC is built with containers in mind, ensuring high-levels of quality of service with multiple customers on the same chip, and additional RAS features to ensure consistent performance.

The N1 core is by design what we’ve covered when Arm detailed the microarchitecture design last year. There is a 4-cycle 64 KB L1I/L1D caches per core, along with a 9-11 cycle 1 MB of private L2 per core. This is partnered with 32 MB of system wide LLC distributed through the SoC mesh, and all these caches are ECC with SECDED operation. It’s worth noting that 32 MB across 80 cores is less per core than Amazon’s Graviton2, which has 32 MB for 64 cores. 32 MB is actually half of what Arm recommends, as in Arm’s presentation it stated that it would expect a 64-core design to have 64 MB.

On top of the 80 cores, the SoC will also have eight DDR4-3200 memory channels with ECC support, up to 4 TB per socket. There are also 128 PCIe 4.0 lanes, with which the CPU can use 32 of them to hook up to another CPU for dual socket operation. The dual socket system can then have a total of 192 PCIe 4.0 lanes between it, as well as support for up to 8 TB of memory. We are told that it’s actually the CCIX protocol that runs over these PCIe lanes, which means 25 GB/s per x16 linkup. That’s good for 50 GB/s in each direction.

Each of the PCIe lanes can be bifurcated down to x8/x4/x2, and every different variant of the Altra SoC will only be segmented on core count and frequency: all CPUs will have 4 TB support and 128 lanes of PCIe 4.0. Each CPU can also support up to four CCIX-based accelerators.

Altra is built on TSMC’s 7nm, and while is technically an Arm v8.2 design, it does borrow a couple of features from 8.3 and 8.5, namely hardware based mitigations for side channel attacks and a couple of other small micro-architectural features.

Each of the 80 cores is designed to run at 3.0 GHz all-core, and Ampere was consistent in its messaging in that the top SKU is designed to run at 3.0 GHz at all times, even when both 128-bit SIMD units per core are being used (thus an unlimited turbo at 3.0 GHz). The CPU range will vary from 45W to 210W, and vary in core count - we suspect these SKUs will be derived from the single silicon design, and it will depend on demand as well as binning as to what comes out of the fabs. Exact SKUs are going to be announced later this year.

Also on security, Ampere was keen to point out that its new SoC will have two control processors: an SM Pro and a PM Pro. These allow for server manageability, up to SBSA Level 4, as well as Secure Boot, RAS error reporting, and advanced power management/temperature control.

Ampere will be launching with two reference designs for Altra, one in single socket called Mt. Snow, and one in dual socket called Mt. Jade. Each design will be available in 1U and 2U form factors, with PCIe 4.0 and CCIX attach, and up to 16 memory modules per socket. We know that the partner for the single socket is the GIGABYTE Server team, however the dual socket partner has not be announced yet. We have been told that the CPUs are socketed, which makes mass scale production and testing (at least on our side) a little easier.

Projected Performance

Ampere has some performance numbers, which as always we take with a grain of salt. These include 2.23x the performance on SPEC2017_int rate over a single 28-core Intel Xeon Platinum 8280, and 1.04x over a single 64-core AMD EPYC 7742. This is obviously extended into a number of claims about improved TCO. Ampere didn’t provide similar numbers for SPEC2017_fp, because the company states that the SoC has been developed with INT workloads in mind. Exact power/performance numbers were not given, but based purely on TDP, which is somewhat of an unreliable metric at times. We’ll wait to run our own numbers in due course.

Developing a Roadmap: 2021, 2022

One of the key questions going into our briefings with Ampere is how closely they are working with Arm on the next generation enterprise server core designs for upcoming SoCs. They weren’t keen to position themselves as Arm’s key partner in this venture (which might be Amazon, given they were first), but did state that there is a lot of collaboration and feedback that goes into the future designs. As a result, Ampere is able to formally declare a long-term roadmap for its product portfolio.

In this instance, Ampere is stating that today it has the 80-core Altra design on 7nm. In 2021, it will launch its Mystique product, which is currently in development (and when asked, Ampere told us will share the same socket as Altra). In 2022, Ampere will launch Siryn, and at this time the product has been defined and requires development.

Having a sustained product cadence has been critical to a number of processor designs in the last couple of decades – it tells potential ODM partners and customers that the company is in for the long haul, and committed to future developments with targets to meet. Obviously with Ampere tying itself to Arm’s roadmap helps in those product definition stages. It’s a feature that has crippled previous Arm designs from coming to market – without a clear roadmap, customers are unwilling to invest in a one-generation wonder and provide long term support for it. There’s always the issue as to whether any investment funding might run out, so Ampere’s goal here with Altra is to be the obvious answer to Graviton2 for the other hyperscalers. With that large market on offer, the goal is to be profitable and self-sustaining as quickly as possible, which then in turn gives potential customers even more confidence.

Next Stage for Altra

At this point, Ampere has stated to us that Altra is currently sampling with its key customers who are looking to deploy the hardware. From previous experience, the key customers who are involved early tend to get priority for deployment, and in that respect Ampere has stated that an official SKU list will come to market mid-year, along with pricing, and with official SPEC submissions. Hopefully at that time we will also get instance pricing from the companies intending to deploy the new chip.

We’re currently in talks with Ampere in order to obtain Altra for in-house testing when they feel it is ready. We have a version of Ampere’s previous generation eMAG workstation that just arrived in the labs, which should help us provide a good base-line from the previous design to the new one. Stay tuned for our coverage of eMAG and Altra!

Related Reading

Gallery: Ampere Altra

Comments Locked

65 Comments

View All Comments

  • Reflex - Wednesday, March 4, 2020 - link

    There are far more failed products than successful ones.
  • Cogman - Tuesday, March 3, 2020 - link

    What I see with this is very high core counts for cheap. Where that makes a lot of sense (IMO) is the case of FAAS and ultra light weight services.

    There's plenty of market for super cheap CPUs that can run a lot of things in parallel. Cloud providers are swimming with those sorts of opportunities. (see aws t3.nano instances or aws lambda)
  • eek2121 - Tuesday, March 3, 2020 - link

    Doesn't matter if it's low IPC.
  • braxster - Wednesday, March 4, 2020 - link

    What makes you think that the IPC is low?
  • ProDigit - Wednesday, March 4, 2020 - link

    Pcie 4.0, 80 cores at 3Ghz, 100+ pcie Lanes,
    Nothing about it sounds cheap to me.
  • name99 - Tuesday, March 3, 2020 - link

    Every day there’s a new Graviton success story.
    Today’s: https://docs.keydb.dev/blog/2020/03/02/blog-post/
    And those successes transfer to Ampere.
  • eek2121 - Tuesday, March 3, 2020 - link

    It won't be. That's the issue. Read between the marketing lines. ARM, when scaled up, has equal perf/watt to anything x86 has. x86 can scale down to ARM levels as well. However, Neither Intel nor AMD have the desire to tackle insanely low margin, high volume products.

    The giveaways here are the numbers they do provide:

    "These include 2.23x the performance on SPEC2017_int rate over a single 28-core Intel Xeon Platinum 8280, and 1.04x over a single 64-core AMD EPYC 7742"

    24% more cores (forget the threads for a moment) and you only get 4% more performance?

    I try to have at least a bit of faith in benchmarks provided by the manufacturer, but the numbers they provided Anandtech are off the rails.
  • The_Assimilator - Wednesday, March 4, 2020 - link

    Bro, you're SOOO wrong, Arm on severs is happening any day now, just like the year of the Linmx desktop. /s
  • CiccioB - Wednesday, March 4, 2020 - link

    <quote>x86 can scale down to ARM levels as well.</quote>
    Experience does not support this statement, as Intel's Atom chips, precisely created to go against ARM chips in mobile system (where power consumption is factor n.1) did not manage to get to the same power/watt number despite being produced on a more advanced PP.

    You may be impressed by the fact that this chip single core size is about 1/8 of the Zen one against which it has been tested. Yet, the Zen core may have much more performances in many tasks that this chip cannot even think to afford, but the net result for the work that this chip has been designed for is that it is equal than an EPYC and probably much more better in AI tasks where AMD has still not added a single dedicated instruction to take care of that part of the market.
  • HStewart - Wednesday, March 4, 2020 - link

    Yes I would agree that actually workloads would be interested especially with more complex operations than synthetic bench marks - does not matter which x86 processor AMD on Intel on this test - that is different story.

    Maybe for some limited servers stuff this made be good option - but for workstation it would not be and serious doubt it would be good on Virtual x86 machine.

Log in

Don't have an account? Sign up now