Investigating Performance of Multi-Threading on Zen 3 and AMD Ryzen 5000

Name: Investigating Performance of Multi-Threading on Zen 3 and AMD Ryzen 5000
Item: Investigating Performance of Multi-Threading on Zen 3 and AMD Ryzen 5000
Author: Dr. Ian Cutress

by Dr. Ian Cutress on December 3, 2020 10:00 AM EST

126 Comments | Add A Comment

126 Comments

CPU Performance

For simplicity, we are listing the percentage performance differentials in all of our CPU testing – the number shown is the % performance of having SMT2 enabled compared to having the setting disabled. Our benchmark suite consists of over 120 tests, full details of which can be found in our #CPUOverload article.

Here are the single threaded results.

Single Threaded Tests AMD Ryzen 9 5950X
AnandTech	SMT Off Baseline	SMT On
y-Cruncher	100%	99.5%
Dwarf Fortress	100%	99.9%
Dolphin 5.0	100%	99.1%
CineBench R20	100%	99.7%
Web Tests	100%	99.1%
GeekBench (4+5)	100%	100.8%
SPEC2006	100%	101.2%
SPEC2017	100%	99.2%

Interestingly enough our single threaded performance was within a single percentage point across the stack (SPEC being +1.2%). Given that ST mode should arguably give more resources to each thread for consistency, the fact that we see no difference means that AMD’s implementation of giving a single thread access to all the resources even in SMT mode is quite good.

The multithreaded tests are a bit more diverse:

Multi-Threaded Tests AMD Ryzen 9 5950X
AnandTech	SMT Off Baseline	SMT On
Agisoft Photoscan	100%	98.2%
3D Particle Movement	100%	165.7%
3DPM with AVX2	100%	177.5%
y-Cruncher	100%	94.5%
NAMD AVX2	100%	106.6%
AIBench	100%	88.2%
Blender	100%	125.1%
Corona	100%	145.5%
POV-Ray	100%	115.4%
V-Ray	100%	126.0%
CineBench R20	100%	118.6%
HandBrake 4K HEVC	100%	107.9%
7-Zip Combined	100%	133.9%
AES Crypto	100%	104.9%
WinRAR	100%	111.9%
GeekBench (4+5)	100%	109.3%

Here we have a number of different factors affecting the results.

Starting with the two tests that scored statistically worse with SMT2 enabled: yCruncher and AIBench. Both tests are memory-bound and compute-bound in parts, where the memory bandwidth per thread can become a limiting factor in overall run-time. yCruncher is arguably a math synthetic benchmark, and AIBench is still early-beta AI workloads for Windows, so quite far away from real world use cases.

Most of the rest of the benchmarks are between a +5% to +35% gain, which includes a number of our rendering tests, molecular dynamics, video encoding, compression, and cryptography. This is where we can see both threads on each core interleaving inside the buffers and execution units, which is the goal of an SMT design. There are still some bottlenecks in the system affecting both threads getting absolute full access, which could be buffer size, retire rate, op-queue limitations, memory limitations, etc – each benchmark is likely different.

The two outliers are 3DPM/3DPMavx, and Corona. These three are 45%+, with 3DPM going 66%+. Both of these tests are very light on the cache and memory requirements, and use the increased Zen3 execution port distribution to good use. These benchmarks are compute heavy as well, so splitting some of that memory access and compute in the core helps SMT2 designs mix those operations to a greater effect. The fact that 3DPM in AVX2 mode gets a higher benefit might be down to coalescing operations for an AVX2 load/store implementation – there is less waiting to pull data from the caches, and less contention, which adds to some extra performance.

Overall

In an ideal world, both threads on a core will have full access to all resources, and not block each other. However, that just means that the second thread looks like it has its own core completely. The reverse SMT method, of using one global core and splitting it into virtual cores with no contention, is known as VISC, and the company behind that was purchased by Intel a few years ago, but nothing has come of it yet. For now, we have SMT, and by design it will accelerate some key workloads when enabled.

In our CPU results, the single threaded benchmarks showed no uplift with SMT enabled/disabled in our real-world or synthetic workloads. This means that even in SMT enabled mode, if one thread is running, it gets everything the core has on offer.

For multi-threaded tests, there is clearly a spectrum of workloads that benefit from SMT.

Those that don’t are either hyper-optimized on a one-thread-per-core basis, or memory latency sensitive.

Most real-world workloads see a small uplift, an average of 22%. Rendering and ray tracing can vary depending on the engine, and how much bandwidth/cache/core resources each thread requires, potentially moving the execution bottleneck somewhere else in the chain. For execution limited tests that don’t probe memory or the cache at all, which to be honest are most likely to be hyper-optimized compute workloads, scored up to +77% in our testing.

Investigating SMT on Zen 3 Gaming Performance (Discrete GPU)

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

126 Comments

View All Comments

abufrejoval - Thursday, December 3, 2020 - link
It's hard to imagine a transistor defect that would break *only* SMT. As you say all non-SMT chips are really SMT chips internally and the decision to disable SMT doesn't really result in huge chunks of transistors going dark (the potential target area for physical defects).

I'd say most of the SMT vs. no-SMT decisions on individual CPUs are binning related: SMT can create significantly more heat because there is less idle which allows the chip to cool. So if you have a chip with higher resistance in critical vias and require higher voltage to function, you need to sacrifice clocks, TDP or utilization (and permutations).
leexgx - Saturday, December 5, 2020 - link
With HT off I have definitely noticed less smoothness windows, as with HT it can keep the cpu active when a thread is slightly stuck
iranterres - Thursday, December 3, 2020 - link
Why are people still testing SMT in 2020? Cache coherency and hierarchy design is mature enough to offset the possible instruction bottleneck issues. I don't even know the purpose of this article at all... Anyways, perhaps fallng back to 2008? Come on...
quadibloc - Friday, December 4, 2020 - link
Well, instead of testing the concept of SMT, which has been around for a while, perhaps one could think of it as testing the implementation of SMT found on the chips we can get in 2020.
eastcoast_pete - Friday, December 4, 2020 - link
Thanks Ian! I always thought of SMT as a way of using whatever compute capacity a core has, but isn't being used in the moment. Hence it's efficient if many tasks need doing that each don't take a full core most of the time. However, that hits a snag if the cores get really busy. Hence (for desktop or laptop), 6 or 8 real cores are usually better than 4 cores that pretend to be 8.
AntonErtl - Friday, December 4, 2020 - link
I found the "Is SMT an good thing" discussion (and later discussion of the same topics) strange, because it seemed to take the POV of someone who wants to optimize some efficiency or utilization metric of someone who can choose the number of resources in the core. If you are in that situation, then the take of the EV8 designers was: we build a wide machine so that single-threaded applications can run fast, even though we know the wideness leads to low utilization; we also add SMT so that multi-threaded applications can increase utilization. Now, 20 years later, such wide cores become reality, although interestingly Apple and ARM do not add SMT.

Anyway, buyers and users of off-the-shelf CPUs are not in that situation, and for them the questions are: For buyers: How much benefit does the SMT capabilty provide, and is it worth the extra money? For users: Does disabling SMT on this SMT-capable CPU increase the performance or the efficiency?

The article shows that the answers to these questions depend on the application (although for the Zen3 CPUs available now the buyer's question does not pose itself).

It would be interesting to see whether the wider Zen3 design gives significantly better SMT performance than Zen or Zen2 (and maybe also a comparison with Intel), but that would require also testing these CPUs.

I did not find it surprising that the 5950X runs into the power limit with and without SMT. The resulting clock rates are mentioned in the text, but might be more interesting graphically than the temperature. What might also be interesting is the power consumed at the same clock frequency (maybe with fewer active cores and/or the clock locked at some lower clock rate).

If SMT is so efficient (+91%) for 3DPMavx, why does the graphics only show a small difference?
Bensam123 - Friday, December 4, 2020 - link
Anand, while I value your in depth articles you guys really need to drop the 95th percentile frame times and get on board with 1% and .1% lows. What disrupts gaming the most is the hiccups, not looking at a statistically smooth chart. SMT/HT effects these THE most, especially in heavily single threaded games. If you aren't testing what it influences, why test it at all? Youtube reviews are also having problems with tests that don't reflect real world scenarios as well. Sometimes it's a lot more disagreeable then others.

Completely invalid testing methodology at this point.

My advice based on my own testing. You turn off SMT/HT except in scenarios in which you become CPU bound, across all cores, not one. This improved .1 and 1% frame time... IE stutters. You turn it on when you reach a point of 90%+ utilization as it helps and a lot when your CPU is maxed out. Generally speaking <6 and soon to be 8 cores should always have it on.

You didn't even test where this helps the most and that's low end CPUs vs high end CPUs where you find the Windows scheduler messes things up.

Also if you're testing this on your own, always turn it off in the bios. If you use something like process lasso or manually change affinity, windows will still put protected services and process onto those extra virtual cores causing contention issues that lead to the stuttering.

Most obvious games that get a benefit from SMT/HT off are heavily single threaded games, such as MOBAS.
Gloryholle - Friday, December 4, 2020 - link
Testing Zen3 with 3200CL16?
peevee - Friday, December 4, 2020 - link
"Most modern processors, when in SMT-enabled mode, if they are running a single instruction stream, will operate as if in SMT-off mode and have full access to resources."

Which would have access to the whole microinstruction cache (L0I) in SMT mode?
Arbie - Friday, December 4, 2020 - link
Another excellent AT article, which happens to hit my level of knowledge and interest; thanks!

Investigating Performance of Multi-Threading on Zen 3 and AMD Ryzen 5000

CPU Performance

Overall

Post Your Comment

126 Comments

View All Comments

abufrejoval - Thursday, December 3, 2020 - link

leexgx - Saturday, December 5, 2020 - link

iranterres - Thursday, December 3, 2020 - link

quadibloc - Friday, December 4, 2020 - link

eastcoast_pete - Friday, December 4, 2020 - link

AntonErtl - Friday, December 4, 2020 - link

Bensam123 - Friday, December 4, 2020 - link

Gloryholle - Friday, December 4, 2020 - link

peevee - Friday, December 4, 2020 - link

Arbie - Friday, December 4, 2020 - link

Log in

Don't have an account? Sign up now