Last week, Apple made industry news by announcing new Mac products based upon the company’s new Apple Silicon M1 SoC chip, marking the first move of a planned 2-year roadmap to transition over from Intel-based x86 CPUs to the company’s own in-house designed microprocessors running on the Arm instruction set.

During the launch we had prepared an extensive article based on the company’s already related Apple A14 chip, found in the new generation iPhone 12 phones. This includes a rather extensive microarchitectural deep-dive into Apple’s new Firestorm cores which power both the A14 as well as the new Apple Silicon M1, I would recommend a read if you haven’t had the opportunity yet:

Since a few days, we’ve been able to get our hands on one of the first Apple Silicon M1 devices: the new Mac mini 2020 edition. While in our analysis article last week we had based our numbers on the A14, this time around we’ve measured the real performance on the actual new higher-power design. We haven’t had much time, but we’ll be bringing you the key datapoints relevant to the new Apple Silicon M1.

Apple Silicon M1: Firestorm cores at 3.2GHz & ~20-24W TDP?

During the launch event, one thing that was in Apple fashion typically missing from the presentation were actual details on the clock frequencies of the design, as well as its TDP which it can sustain at maximum performance.

We can confirm that in single-threaded workloads, Apple’s Firestorm cores now clock in at 3.2GHz, a 6.66% increase over the 3GHz frequency of the Apple A14. As long as there's thermal headroom, this clock also applies to all-core loads, with in addition to 4x 3.2GHz performance cores also seeing 4x Thunder efficiency cores at 2064MHz, also quite a lot higher than 1823MHz on the A14.

Alongside the four performance Firestorm cores, the M1 also includes four Icestorm cores which are aimed for low idle power and increased power efficiency for battery-powered operation. Both the 4 performance cores and 4 efficiency cores can be active in tandem, meaning that this is an 8-core SoC, although performance throughput across all the cores isn’t identical.

The biggest question during the announcement event was the power consumption of these designs. Apple had presented several charts including performance and power axes, however we lacked comparison data as to come to any proper conclusion.

As we had access to the Mac mini rather than a Macbook, it meant that power measurement was rather simple on the device as we can just hook up a meter to the AC input of the device. It’s to be noted with a huge disclaimer that because we are measuring AC wall power here, the power figures aren’t directly comparable to that of battery-powered devices, as the Mac mini’s power supply will incur a efficiency loss greater than that of other mobile SoCs, as well as TDP figures contemporary vendors such as Intel or AMD publish.

It’s especially important to keep in mind that the figure of what we usually recall as TDP in processors is actually only a subset of the figures presented here, as beyond just the SoC we’re also measuring DRAM and voltage regulation overhead, something which is not included in TDP figures nor your typical package power readout on a laptop.

Apple Mac mini (Apple Silicon M1) AC Device Power

Starting off with an idle Mac mini in its default state while sitting idle when powered on, while connected via HDMI to a 2560p144 monitor, Wi-Fi 6 and a mouse and keyboard, we’re seeing total device power at 4.2W. Given that we’re measuring AC power into the device which can be quite inefficient at low loads, this makes quite a lot of sense and represents an excellent figure.

This idle figure also serves as a baseline for following measurements where we calculate “active power”, meaning our usual methodology of taking total power measured and subtracting the idle power.

During average single-threaded workloads on the 3.2GHz Firestorm cores, such as GCC code compilation, we’re seeing device power go up to 10.5W with active power at around 6.3W. The active power figure is very much in line with what we would expect from a higher-clocked Firestorm core, and is extremely promising for Apple and the M1.

In workloads which are more DRAM heavy and thus incur a larger power penalty on the LPDDR4X-class 128-bit 16GB of DRAM on the Mac mini, we’re seeing active power go up to 10.5W. Already with these figures the new M1 is might impressive and showcases less than a third of the power of a high-end Intel mobile CPU.

In multi-threaded scenarios, power highly depends on the workload. In memory-heavy workloads where the CPU utilisation isn’t as high, we’re seeing 18W active power, going up to around 22W in average workloads, and peaking around 27W in compute heavy workloads. These figures are generally what you’d like to compare to “TDPs” of other platforms, although again to get an apples-to-apples comparison you’d need to further subtract some of the overhead as measured on the Mac mini here – my best guess would be a 20 to 24W range.

Finally, on the part of the GPU, we’re seeing a lower power consumption figure of 17.3W in GFXBench Aztec High. This would contain a larger amount of DRAM power, so the power consumption of Apple’s GPU is definitely extremely low-power, and far less than the peak power that the CPUs can draw.

Memory Differences

Besides the additional cores on the part of the CPUs and GPU, one main performance factor of the M1 that differs from the A14 is the fact that’s it’s running on a 128-bit memory bus rather than the mobile 64-bit bus. Across 8x 16-bit memory channels and at LPDDR4X-4266-class memory, this means the M1 hits a peak of 68.25GB/s memory bandwidth.

In terms of memory latency, we’re seeing a (rather expected) reduction compared to the A14, measuring 96ns at 128MB full random test depth, compared to 102ns on the A14.

Of further note is the 12MB L2 cache of the performance cores, although here it seems that Apple continues to do some partitioning as to how much as single core can use as we’re still seeing some latency uptick after 8MB.

The M1 also contains a large SLC cache which should be accessible by all IP blocks on the chip. We’re not exactly certain, but the test results do behave a lot like on the A14 and thus we assume this is a similar 16MB chunk of cache on the SoC, as some access patterns extend beyond that of the A14, which makes sense given the larger L2.

One aspect we’ve never really had the opportunity to test is exactly how good Apple’s cores are in terms of memory bandwidth. Inside of the M1, the results are ground-breaking: A single Firestorm achieves memory reads up to around 58GB/s, with memory writes coming in at 33-36GB/s. Most importantly, memory copies land in at 60 to 62GB/s depending if you’re using scalar or vector instructions. The fact that a single Firestorm core can almost saturate the memory controllers is astounding and something we’ve never seen in a design before.

Because one core is able to make use of almost the whole memory bandwidth, having multiple cores access things at the same time don’t actually increase the system bandwidth, but actually due to congestion lower the effective achieved aggregate bandwidth. Nevertheless, this 59GB/s peak bandwidth of one core is essentially also the speed at which memory copies happen, no matter the amount of active cores in the system, again, a great feat for Apple.

Beyond the clock speed increase, L2 increase, this memory boost is also very likely to help the M1 differentiate its performance beyond that of the A14, and offer up though competition against the x86 incumbents.

Benchmarks: Whatever Is Available
POST A COMMENT

681 Comments

View All Comments

  • lilmoe - Tuesday, November 17, 2020 - link

    Beating Intel doesn't say much. Intel is a well known issue in the tech industry. Apple isn't the only one complaining. Mobile Ryzen goes neck-to-neck with Intel desktop. This is well know fact. Apple doesn't have any breakthrough here; AMD did earlier this year. TSMC has great 7nm and a breakthrough 5nm process (Your move, Sammy).

    The M1 is Apple's A-Game in single thread. You won't see double digit improvements YoY.

    Apples to apples (pro review):
    - Consistent Charts.
    - M1+Rosetta2 VS Zen2(4800U): THAT's what's available today.
    - M1 Native vs Zen3 (Prediction/Analysis for Zen4): THAT is what M1/M2 Native will go up against.
    - M1 Chrome/Javascript VS 4800U Chrome, NOT Safari+M1(Native) VS Chrome/Edge+4800U. That's a browser benchmark, not a CPU benchmark. Chrome all the way, then an educated prediction into how much Native would improve that.
    - Actual popular games.

    I'm dismissing this entire review. Any monkey can install and run a couple of benchmark apps.
    Reply
  • andreltrn - Tuesday, November 17, 2020 - link

    Why Chrome? What you people don't get is that people buying a Mac mini or a macbook air are buying a device. Like you would buy a refrigerator, a Nest thermostat. They will use what is the best for that DEVICE. They don't care about the processor. Even a processor (CPU) comparaison is bulls... when comparing the M1 with the AMD laptop offering. The M1 is a SOC with way more built-in functionality such as ML processor and accelerator and much more. A (intel or AMD) laptop CPU couldn't do what the M1 does in a properly coded app. EG. Final cut Pro or Logic or Safari for that mather. A device is the sum of it's parts. Not only a CPU specially in a laptop. Reply
  • vlad42 - Tuesday, November 17, 2020 - link

    Why Chrome? Well because it is available for both operating systems. Of course another browser such as FireFox could also be used.

    We have seen time and time again that the web browser used can have an enormous impact on the results of browser based benchmarks. As you can see here in Anandtech's browser comparison on 9/10/2020, https://www.anandtech.com/show/16078/the-2020-brow... the Chromium version of Edge out performs FireFox in Speedometer 2.0 by roughly 35%! Since this article is not trying to compare the performance of different web browsers, the browser used should be kept the same.

    In addition, since Speedometer 2.0 was made by Apple, it is highly likely that they likely put more weight on Safari updates improving the Speedometer score than, say, Google does with Chrome.
    Reply
  • helpmeoutnow - Thursday, November 26, 2020 - link

    @andreltrn lol keep it real. we are talking benchmarks. you can only compere software that runs on both systems. Reply
  • Spunjji - Tuesday, November 17, 2020 - link

    Also, Zen 3 is in some of those comparisons... It wins as you'd expect, but dropping to 5nm wouldn't magically bring it to M1 power levels. Reply
  • vlad42 - Tuesday, November 17, 2020 - link

    5nm would help by reducing the voltage, and thus power draw, for the chip. The bigger thing to remember is that there are no mobile versions of Zen 3 yet. Consider that the 5950X is only ~37% faster than the 4800U in single threaded Cinebench despite have a tdp 7 times higher. If the 5800U ends up having the same clocks as the 4800U, then the M1 would roughly have a 7% perf/W advantage. Granted, this assumes the 5800U’s score would be 19% faster than the 4800U's 1199.

    So, given the expected benefits that TSMC, Samsung, etc. have touted about 5nm, a die shrink from 7nm to 5nm would easily make up for this difference in power efficiency.
    Reply
  • R3lay - Wednesday, November 18, 2020 - link

    You can't compare single core performance and then compare then to the TDP. At single core the 5950X doesn't use 7x more power. Reply
  • Kangal - Saturday, November 21, 2020 - link

    To be honest, a lot of comparisons of the Apple Silicon M1 are vague, misrepresentative or blatantly off. The best representative benchmarks I've seen are:

    Single Core, Geekbench v5, 5min run, Rosetta2
    2020 Macbook Air (10W M1): ~1300 score
    2019 MacBook Pro 16in (35W i9-9880H): ~1100 points
    AMD Zen2+ Laptop (35W r9-4900HS): ~920 points
    2019 Macbook Pro 13in (15W i5-8257U): ~900 points
    AMD Zen2+ Laptop (20W r7-4800U): ~750 points

    Multi-Thread, CineBench r23, 10min run, Rosetta2
    AMD Zen2+ Laptop (35W r9-4900HS): ~11,000 score
    AMD Zen2+ Laptop (20W r7-4800U): ~9,200 score
    2019 MacBook Pro 16in (35W i9-9880H): ~9,100 score
    2020 Macbook Air (10W M1): ~7,100 score
    2019 Macbook Pro 13in (15W i5-8257U): ~5,100 score

    Rendering Performance, Final Cut ProX, 10min clip
    AMD Zen2+ Laptop (35W r9-4900HS): error on ryzentosh
    AMD Zen2+ Laptop (20W r7-4800U): error on ryzentosh
    2019 MacBook Pro 16in (35W i9-9880H): ~360 seconds
    2020 Macbook Air (10W M1): ~410 seconds
    2019 Macbook Pro 13in (15W i5-8257U): ~1100 seconds

    GPU Performance, GFXBench v5 Aztec Ruins High, Rosetta2
    2019 MacBook Pro 16in (i9 5600M): ~79 fps
    2020 Macbook Air (M1 8CU): ~76 fps
    AMD Zen2+ Laptop (r9 Vega-8): ~39 fps
    AMD Zen2+ Laptop (r7 Vega-7): ~36 fps
    2019 Macbook Pro 13in (i5 Iris Pro): ~20 fps

    Gaming Perfomance, Rise of the Tomb Raider, 1080p High
    2019 MacBook Pro 16in (i9 5600M): ~70 fps
    2020 Macbook Air (M1 8CU): ~40 fps
    AMD Zen2+ Laptop (r9 Vega-8): ~23 fps
    AMD Zen2+ Laptop (r7 Vega-7): ~21 fps
    2019 Macbook Pro 13in (i5 Iris Pro): ~12 fps

    ....so I share the well-grounded outlook that Dave Lee (D2D) has on the matter, where Linus (LTT) was more pessimistic than Dave but I think his opinions are pretty neutral overall. I simply out-right reject the unprofessional and unrealistic look that Andrei (Anandtech) has displayed in the previous article. Nor am I fully on-board with the overly-optimistic perspective that Jonathan Morrison demonstrated.
    Reply
  • Kangal - Saturday, November 21, 2020 - link

    More thoughts on the matter...

    I get there's the argument to be made that: new modern and more efficient apps are coming natively, that single-core is most important, low TDP is very important, and race to idle (or at least race to small cores) is important. From that perspective, the M1 in the Macbook Air is the best by a HUGE margin. We're talking a x3 better overall experience than the best x86 devices in such comparisons.

    Then there's the alternate debate. That what you get is, what you get. So that legacy program performance is most important, single-core is no longer be-all-end-all, multi-thread being relevant for actual "Pro" users, and just as important as TDP is the Sustained Performance. When looking from that perspective, the Apple M1 in a MacBook Pro 16/13 is only equivalent to the very best x86 device performances. So basically a meh situation, not a paradigm shift.

    So what can we realistically postulate from this, and expect from Apple and the industry?
    Firstly, Apple disappointed us with the M1. In short, Apple played it safe and didn't really do their best. That means they purposely left performance on the table, it was artificial and it was deliberate. The why is simple, just so that they can incrementally introduce these increases, that way they can incentivise customers. In long, what they have now, the 4/8 setup is somewhat reminiscent to the current high-end phablets, or the 4c/8t hyperthreading setup of Intel CPUs, or the older AMD Bulldozer setup. At these thicknesses, there's really no need for the medium-cores, they should have killed it, and stuck with an 8-large-core design instead. These large ARM cores aren't too different to x86 core in size, so they could have afforded that silicon cost. As for operation, simply undervolt/underclock (1.5GHz) the whole stack, and ramp up 1-2 cores to high clocks/volts (3.5GHz) dynamically when necessary. That makes thread allocation simple, and here simple means more efficient software. And this means we can see a performance difference moving from an 11inch passive-cooled device, to a 17in active-cooled device. For example, 8-cores running at 4.0GHz, versus 2-cores running at 3.5GHz. And let's not forget the GPU which is fine as an 8CU (~GTX 1050) on an "ultraportable" like an 11in Macbook Air. But we we're expecting something more like 16CU (~GTX 1660) for the "regular laptop" 13in MacBook, and even beefier 32CU (~RTX 2070) for a "large laptop" 17in MacBook Pro. On top of this, the new SoC demands a smaller size internally, so we should have seen a much more compact Mac devices, and Apple didn't take advantage of this.

    Other places Apple dropped the ball, is that they have less PCIe ports allocated. There is no dedicated GPU or eGPU option available. Their current iGPU is about on-par with GTX 1050, so impressive against AMD's and Intel's iGPUs... but it's still behind modern (low-profile) dedicated-GPUs from Nvidia's Volta or AMD's RDNA2. There's no support for 32bit x86 programs. And lastly, there is no bootloader support, so that people can run another OS such as Android, Linux distro, Windows10 ARM/Mobile (or perhaps even to boot x86-OS via a low-level translator).

    And here's what Apple got right.
    They released the Mac Mini Zero/Development device a year early to get developers primed. Their new Operating System, which is definitely NOT the same OS X (macOS), but is an "iOS-Pro OS" actually is stable. Their forwards-compatibility with iOS Apps runs without issues. Their backwards-compatibility for 64bit-macOS Apps actually runs very very very well (some code, such as the gpu-APIs are actually processed natively). And we can only surmise that most current Apps will run (average 60%) almost as good as running natively (min 49%-to-94% max), something Microsoft dropped the ball on with Windows 8/RT and have dragged their feet since. Whilst in the near future (3-4 years), they will remove the actual hardware-coprocessors that handle this x86-to-ARM translation, and they will use that "silicon budget" to add to the SoC, slightly improving native further. So with updating Applications, improving microarchitecture, improving lithography, increasing silicon budget, and thus extending it from an efficient design (4B+4s) to a (8 Big) performance design...... we will see performance literally x2-x4 in the coming 2-4 year timeframe (think Apple M2, M3, M4 in 2024). And I didn't even mention GPU improvements either. That's just too much pressure on the whole industry (Lenovo, HP, Dell, ASUS), and more specifically on Microsoft, AMD Zen, and Intel (lol) when it comes to their roadmap.

    Plus, the current setup of 4-big, and 4-medium cores, is adequate, but works wonders for low-impact tasks and thermally limited devices. And they have demonstrated that their software is mature in handling these hybrid systems. So the current setup means the Macbook Air (ultra thin/light) has a phenomenal leap, and future iterations will benefit from this setup too. Also that means lower R&D time/effort/cost is necessary as most of the work between the smallest iPhone Mini, the medium-sized iPad-Mini, and the much larger Macs are closely related, as far as SoC is concerned. And it's a brilliant move to keep the current x86 line, and launch identical hardware with the M1 silicon. So all feedback will provide insight for future Silicon-M designs.

    I personally think, they're going to move to having a better quality keyboard (bye crappy Butterfly) as now there is more internal space to play around with. And they will add new features to the Macs that are already included in iPhones, like barometer, GPS, etc etc. Also, they will add Apple Pen support (but no silo), with probably a magnetic holder. Lastly, I think they're going to evolve the design of the Macbooks... they will all have OLED HDR10+ displays, maybe in the 4K-5K resolutions, have a proper touchscreen, and mimic the Lenovo Yoga style with a 360' hinge.
    Reply
  • Spunjji - Monday, November 23, 2020 - link

    @Kangal - I have a few disagreements with what you've written here.

    Firstly, I'm a little confused about why you see the Rosetta-based benchmarks as most relevant. I doubt that anyone buying an M1 device today will be getting rid of it before the majority of apps are converted across, so that performance is going to become increasingly *less* relevant as time passes.

    Secondly, this quote: "In short, Apple played it safe and didn't really do their best. That means they purposely left performance on the table, it was artificial and it was deliberate." - I just don't see how you could draw that conclusion. They used their highest-performing cores in the largest chip yet produced on 5nm. It would be bizarre for them to begin such a grand experiment from the top-down - it would produce an odd situation where their most demanding users, who are most likely to be using applications that currently need translation, would be expected to transition to an incomplete ecosystem with performance that doesn't exceed existing systems.

    To me, it makes perfect sense from both an engineering and a product perspective. They begin the transition with a relatively small (and thus high-yielding, despite the new process) chip as part of a platform for users who are relatively performance-insensitive, but who will still appreciate the immediate benefits of reduced heat and increased battery life.

    I'm also a bit confused about your perspective on their GPU. AFAIK the most modern low-profile low-power GPU out there is Nvidia's 1650 - and in terms of performance-per-watt, this iGPU thrashes it, with absolute performance being not far behind. Perf/Watt appears to be Apple's primary concern (for a given degree of absolute performance), so I see it as a resounding (and surprising) success. It's down to AMD and Nvidia to respond now.
    Reply

Log in

Don't have an account? Sign up now