The Samsung Exynos M3 - 6-wide Decode With 50%+ IPC Increase
by Andrei Frumusanu on January 23, 2018 1:30 PM EST- Posted in
- Mobile
- CPUs
- Samsung
- Smartphones
- Exynos 9810
- Exynos M3
The Exynos 9810 was one of the first big announcements for 2018 and it was quite an exciting one. Samsung’s claims of doubling single-threaded performance was definitely an eye-catching moment and got a lot of attention. The new SoC sports four of Samsung’s third-generation Exynos M3 custom architecture cores running at up to 2.9GHz, alongside four Cortex A55 cores at 1.9GHz.
Usually Samsung LSI’s advertised target frequency for the CPUs doesn’t necessarily mean that the mobile division will release devices with the CPU running at those frequencies. The Exynos 8890 was advertised by SLSI to run up to 2.7GHz, while the S7 limited it to 2.6GHz. The Exynos M2’s DVFS tables showed that the CPU could go up to 2.8GHz but was rather released with a lower and more power efficient 2.3GHz clock. Similarly, it’s very possible we might see more limited clocks on an eventual Galaxy S9 with the Exynos 9810.
Of course even accounting for the fact that part of Samsung’s performance increase claim for the Exynos 9810 comes from the clockspeed jump from 2.3GHz to 2.9GHz, that still leave a massive performance discrepancy towards the goal of doubling single-threaded performance. Thus, this performance delta must come from the microarchitectural changes. Indeed the effective IPC increase must be in the 55-60% range for the math to make sense.
With the public announcement of the Exynos 9810 having finally taken place, Samsung engineers are now free to release information on the new M3 CPU microarchitecture. One source of information that’s been invaluable over the years into digging into the deeper working of CPU µarch’s are the companies' own submissions to open-source projects such as the GCC and LLVM compilers. Luckily Samsung is a fantastic open-source contributor and has yesterday posted the first patches describing the machine model for the M3 microarchitecture.
To better visualise the difference between the previous microarchitectures and the new M3, we take a step back in time to have a look what the high-level pipeline configuration of the Exynos M1/M2:
At heart the Exynos M1 and M2 microarchitectures are based on a 4-wide in-order stage for decode and dispatch. The wide decode stage was rather unusual at the time as ARM’s own Cortex A72 and A73 architectures made due with respectively 3 and 2-wide instruction decoders. With the Exynos M1/M2 being Samsung LSI’s first in-house microarchitecture it’s possible that the front-end wasn’t as advanced as ARM’s, as the latter’s 2-wide A73 microarchitecture was more than able to keep up in terms of IPC against the 4-wide M1 & M2. Samsung’s back-end for the M1 and M2 included 9 execution ports:
- Two simple ALU pipelines capable of integer additions.
- A complex ALU handling simple operations as well as integer multiplication and division.
- A load unit port
- A store unit port
- Two branch prediction ports
- Two floating point and vector operations ports leading to two mixed capability pipelines
The M1/M2 were effectively 9-wide dispatch and execution machines. In comparison the A73 dispatches up to 8 micro-ops into 7 pipelines and the A75 dispatches up to 11 µops into 8 pipelines, keeping in mind that we’re talking about very different microarchitectures here and the execution capabilities between the pipelines differ greatly. From fetch to write-back, the M1/M2 had a pipeline depth of 13 stages which is 2 stages longer than that of the A73 and A75, resulting is worse branch-misprediction penalties.
This is only a rough overview of the M1/M2 cores, Samsung published a far more in depth microarchitectural overview at HotChips 2016 which we’ve covered here.
The Exynos M3 differs greatly from the M1/M2 as it completely overhauls the front-end and also widens the back-end. The M3 front-end fetch, decode, and rename stages now increases in width by 50% to accommodate a 6-wide decoder, making the new microarchitecture among one of the widest in the mobile space alongside Apple’s CPU cores.
This comes at a cost however, as some undisclosed stages in the front-end become longer by 2 cycles, increasing the minimum pipeline depth from fetch to writeback from 13 to 15 stages. To counteract this, Samsung must have improved the branch predictor, however we can’t confirm for sure what individual front-end stage improvements have been made. The reorder buffer on the rename stage has seen a massive increase from 96 entries to 228 entries, pointing out that Samsung is trying to vastly increase their ability to extract instruction level parallelism to feed their back-end execution units.
The depiction of the schedulers are my own best guess on how the M3 looks like, as it seemed to me like the natural progression from the M1 configuration. What we do know is that the core dispatches up to 12 µops into the schedulers and we have 12 execution ports:
- Two simple ALU pipelines for integer additions, same as on the M1/M2.
- Two complex ALUs handling simple integer additions and also multiplication and division. The doubling of the complex pipelines means that the M3 has now double the integer multiplication throughput compared to the M1/M2 and a 25% increase in simple integer arithmetic.
- Two load units. Again, the M3 here doubles the load capabilities compared to the M1 and M2.
- A store unit port, same as on the M1/M2.
- Two branch prediction ports, likely the same setup as on the M1/M2, capable of feeding the two branches/cycle the branch prediction unit is able to complete.
- Instead of 2 floating point and vector pipelines, the M3 now includes 3 of them, all of them capable of complex operations, theoretically vastly increasing FP throughput.
The simple ALU pipelines already operate at single-cycle latencies so naturally there’s not much room for improvement there. On the side of the complex pipelines we still see 4-cycle multiplications for 64-bit integers, however integer division has been greatly improved from 21 cycles down to 12 cycles. I’m not sure if the division unit reserves both complex pipelines or only one of them, but what is clear as mentioned before, integer multiplication execution throughput is doubled and the additional complex pipe also increases simple arithmetic throughput from 3 to 4 ADDs.
The load units have been doubled and their load latency remains 4 cycles for basic operations. The Store unit also doesn’t seem to change in terms of its 1-cycle latency for basic stores.
The floating point and vector pipelines have seen the most changes in the Exynos M3. There are 3 pipelines now with distributed capabilities between them. Simple FP arithmetic operations and multiplication see a three-fold increase in throughput as all pipelines now offer the capability, compared to only one for the Exynos M1/M2. Beyond tripling the throughput, the latency of FP additions and subtractions (FADD, FSUB) is reduced from 3 cycles down to 2 cycles. Multiplication stays at a 4-cycle latency.
Floating point division sees a doubling of the throughput as two of the three pipelines are now capable of the operations, and latency has also been reduced from 15 cycles down to 12 cycles. Cryptographic throughput of AES instruction doubles as well as two of the 3 pipelines are able to execute them. SHA instruction throughput remains the same. For simple vector operations we see a 50% increase in throughput due to the additional pipeline.
We’re only scratching the surface of what Samsung’s third-generation CPU microarchitecture is bringing to the table, but already one thing is clear: SLSI’s claim of doubling single-threaded performance does not seem farfetched at all. What I’ve covered here are only the high-level changes the in the pipeline configurations and we don’t know much at all about the improvements on the side of the memory subsystem. I’m still pretty sure that we’ll be looking at large increases in the cache sizes up to 512KB private L2’s for the cores with a large 4MB DSU L3. Given the floating point pipeline changes I’m also expecting massive gains for such workloads. The front-end of the M3 microarchitecture is still a mystery so here’s hoping that Samsung will be able to re-attend Hot Chips this year for a worthy follow-up presentation covering the new design.
With all of these performance improvements, it’s also expected that the power requirements of the core will be greatly beyond those of existing cores. This seems a natural explanation for the two-fold single-core performance increase while the multi-core improvement remains at 40% - running all cores of such a core design at full frequency would indeed showcase some very high TDP numbers.
If all these projections come to fruition, I have no idea how Samsung’s mobile division is planning to equalise the CPU performance between the Exynos 9810 and against an eventual Snapdragon 845 variant of the Galaxy S9, short of finding ourselves in a best-case scenario for ARM’s A75 vs a worst-case for the new Exynos M3. With 2 months to go, we’ll have to wait & see what both Samsung mobile and Samsung LSI have managed to cook up.
60 Comments
View All Comments
french toast - Tuesday, January 23, 2018 - link
Jesus...Samsung have gone all apple...power consumption is going to be very interesting indeed.lilmoe - Tuesday, January 23, 2018 - link
Ironically, Samsung waited before going all in "to get it right".The SoC wars are back ya'll.
french toast - Tuesday, January 23, 2018 - link
We will see, nothing comes for free and 2.9ghz is very high for a wide core, be interesting to see what frequency is typical outside of a short burst.I find it odd that MT is only 40% higher, of course the big cores are going to be clocked much lower..but still with A55s and new fabric/likely new memory controller...i expected better.
Then again I didn't expect such an increase in ST performance, incredible really.
What a shame it will be mated underneath touchwhiz..
lilmoe - Tuesday, January 23, 2018 - link
I find it strange you don't like Touchwiz... It's way better than the pile of brown stuff that is AOSP, or even "Google's take". It's a double whammy in irony; Google has long ways before their software becomes anywhere near the practicality and efficiency of Samsung's, especially in stock apps, browser, connectivity, general device settings or even design.Well, the design part is subjective, some find the inconsistent mess that is stock "appealing" for some reason, but the rest of the points aren't subjective.
generalako - Tuesday, January 23, 2018 - link
Lol, are you for real, mate? Stock Android's appeal isn't in looks (which, subjectively, is better than TW, imo), but smoothness and consistency. Stock Android has much less frame drops and achieves 60 FPS more frequently than TW. You notice this everywhere and all the time in actions and animations. Pixel UI just feels more fluid. Also, it's far more consistent with fewer hiccups and freezes than TW.And did I mention lack of bloat? Quick and longer updates?
lilmoe - Tuesday, January 23, 2018 - link
I just HATE all the misinformation ya'll been spoon fed by the internet.Stock doesn't sell. It just isn't popular. No one cares for it outside the internet reviewer bubble. Period. The only front end UX's consumers care about are iOS and Touchwiz, the rest of the popular skins (Chinese OEMs) are copies from those two. This is a fact no one can refute, no matter how much you're brainwashed to believe otherwise.
Quick updates have nothing to do with stock. Google builds their code against their own phones, that's why they get a head start, and even then, their updates are pact full of bugs, lots of them being major. Apple has been copying that from Google for some reason recently (sorry, had to throw that in). Google only supports phones for 2 years with updates, just like Samsung, they've only just recently promised to support them for 3 major updates, after promising not to f*** with the underlying code too much after Oreo. You can expect similar update support from Samsung and other OEMs that lauch devices with Oreo, and probably even more so, since Samsung and others are the ones who pushed Google for the monthly patch initiative. The GS5 and Note4 are still getting updates to this day. Get this: NO ONE WANTS THE LATEST VERSION OF ANDROID. No one wants to deal with the software bugs and instability. Look at iPhone users. Security updates are what counts.
Pixels aren't faster than other Android devices, it's quite the opposite actually, and the smoothness you talk about (after lots of research) comes from increasing buffers, aggressive scheduling and other hacks to the OS that do more harm than good in the longer run. Android absolutely does NOT perform great out of the box. OEMs and other ROM devs have to literally fix Android before it's ready for prime time for actual users who want popular features and things to just work.
Android as an OS still has ways to go before reaching iOS and Windows Mobile level of efficiency and UI fluidity in an elegant manner. Massive re-writes need to be done. Even Google are seemingly giving up on all that and developing a new OS from scratch. Lets see where that goes.
Listen, I get it, Youtubers and the photographers and lawyers at the Verge are promoting Google's failure devices hard, but that doesn't make them any good. You need to understand these guys are businesses that make money from what they recommend, and have absolutely no idea what they're talking about. Most of the enthusiasts on XDA fall in the same category. I seriously wouldn't recommend a Pixel for someone I care about. They're the worse thing you can spend ~$850 on, they're basically a rip-off.
If you're spending more than $600 on a phone, do yourself a favor and get a Galaxy S/Note or an iPhone. Nothing else is worth the premium.
Zeratul56 - Tuesday, January 23, 2018 - link
I must say that agree on your point about Samsung devices over pixels. I never understood the positive feedback for google branded devices that are released months after the Samsung phones with the same soc’s and a few less features usually. Great marketing I guess.Windows mobile was a well architected platform, especially with the addition of .net native however, sometimes technology is not the only factor in success and as always with Microsoft, there timing was terrible.
french toast - Wednesday, January 24, 2018 - link
I agree pixel phones are a rip off..just like majority of apple devices.But let's be clear, if you value speed and responsiveness, touchwhiz is the worst..by far.
I've have many Samsung's, used other people's new Samsung's, and I've had nexus phones, night and day difference in responsiveness..I have a low end mi max 2..containing a puny Snapdragon 620?....smooth as butter..much faster than any Samsung device I've ever had or used...the fastest phones on the market for all round smoothness, web browsing and general app opening are OnePlus phones..the software is tuned to lightning speeds, it bests all other phones with same hardware but different software.. including Google's own rip off pixels.
Touchwhiz seems to me from using decent skins...to have a couple of milliseconds lag and some bank and stutter, just my personal experience.
generalako - Wednesday, January 24, 2018 - link
Maybe it's time you listen to the people who actually write their opinion about this matter based on their experience. I buy and sell phones for a living, and therefore get to own and test virtually every flagship phone out there. That includes all the Samsung Galaxy S flagships out there and all the Google Pixels and Nexuses. I also have a pretty large family, who have been Samsung Galaxy owners since the S2 first came out, with several Galaxy S and Note flagships in the house at all times. And not one of my family members or friends whom I've given OnePlus 3/5's (runs OxygenOS, which is not as smooth as Pixel UI, but close to it) or Pixels to, have preferred TouchWiz over them. They all have commented on the latter units being much more smooth and consistent with their use. On the other hand, over the years of using Samsung flagships, they have constantly complained about lag, slowdown over time, freezes, phones being completely useless after a while or after an update, etc., and have often bought new Samsung flagships because of it.My experience is similiar. Although Samsung's flagships are without a doubt the best phones out there in terms of hardware (best displays, top notch cameras, great CPUs that will now become market leading, amazing designs, etc.), the software just doesn't cut it. Even with the new Samsung Experience introduced by Note 8, in which TouchWiz has become a whole lot smoother, they're still a fair amount behind Pixel UI in smoothness. And you notice it everywhere and in everything, if you've ever used a Pixel phone.
You claim reviewers are sell-outs to Google, which is kind of ridiculous. I completely agree reviewers aren't really as neutral as they claim/seem, and cater to the industry in that they act as indirect advertisement for the products they showcase (this even more true in consumer reviews as it is in journalism about politics -- you can read about how it works in detail in Manufacturing Consent). But Google doesn’t even have 1% market share in the smartphone industry; Samsung has well over 20%, if I remember correctly. Google are completely dwarfed by Samsung in this area. If there’s an influence on reviewers on how they talk about a unit, which we both agree there is, who do you think has most influence on those reviewers? Clearly, Samsung (and Apple) – not Google. Samsung is not just a larger company, but also provides units with more popularity among viewers/readers than Pixels to these reviewers. Not to mention more units as well. They have the most influence on reviewers of any OEM.
The fact that so many reviewers have Pixels as their daily drivers speaks for itself. These people even explain why this is the case: the software. The software defines how you use a device. And in that regard, the Pixel UI is simply better than TouchWiz. Maybe not in total software features. But in actual usage, it is prominently smoother and more consistent; everybody works more effortlessly and reliably every time, every day, every week and every month. I know this for a fact based on my own experience with flagship phones from all the large OEMs.
Maybe software isn’t as important to you (or maybe you just haven’t tried the Pixel – which I suspect is the case). But it is a lot of other people. And I know for a fact that when I introduced people who don’t know about the Pixel to it, they appreciate its software experience more than any other phone they’ve used.
generalako - Wednesday, January 24, 2018 - link
The level of absurdity in your post is ridiculous."Stock doesn't sell", what? Did you ever think that the correlation between marketing costs, like advertisements, and sales of units, has something to do with it? The high marketing costs of Samsung and Apple over the years, as well asappropriation of market shareis why their phones are so popular. It's not down to any independent popular, no more than McDonald's superiority as the most popular food chain is down to their food being the best out there of any restaurant?
As for updates, Google’s updates are nowhere near as buggy as those of Apple's have been the recent years (iOS 11, for example, is a complete disaster). Nor is it as buggy as when OEMs like Samsung update their phones to newer Android versions, which they always implement poorly.
The effect of the updates on older devices is almost different. I've had a wide range of Nexus devices over the years, and almost all of them have only gotten better with newer updates, as opposed to Apple iPhone and iPad units, or quite a few Samsung units. Nexus 5 just got better with newer updates, increasing in smoothness and battery life. Nexus 5X recently got quite a lot better battery because of the Oreo update. The update last year with HDR+ also made the camera better. Not to mention a general improvement in smoothness as well with Nougat and Oreo updates. My Nexus 7 kept getting better for every update as well.
You also claim iOS to be smoother, which is again wrong. Here you clearly reveal that you haven't used the Pixel or Pixel 2. Because if you did, you'd know that stock Android, or at least Pixel UI, has been smoother than iOS in general since Nougat. The rest of the garbage that OEMs deliver certainly isn’t smoother (EMUI, MIUI, LG UX, TouchWiz, Sense, etc) – but stock Android is. That is, it overall has fewer microstutters, frame drops and jitter and jank in animation and tasks than iOS. Sure, iOS is better in certain important instances, like scrolling, zooming -- the browser experience on iOS is far superior. But overall, Android is a smoother experience. I know this because I use devices on both platforms, and have been doing so for years. Hell, even TouchWiz is getting close to iOS in smoothness with the recent Samsung Experience update (which the Note 8 currently runs, and S8 will do with the Oreo update).