NVIDIA Launches A2 Accelerator: Entry-Level Ampere For Edge Inference

by Ryan Smith on November 9, 2021 7:30 AM EST

16 Comments | Add A Comment

16 Comments

Alongside a slew of software-related announcements this morning from NVIDIA as part of their fall GTC, the company has also quietly announced a new server GPU product for the accelerator market: the NVIDIA A2. The new low-end member of the Ampere-based A-series accelerator family is designed for entry-level inference tasks, and thanks to its relatively small size and low power consumption, is also being aimed at edge computing scenarios as well.

Along with serving as the low-end entry point into NVIDIA’s GPU accelerator product stack, the A2 seems intended to largely replace what was the last remaining member of NVIDIA’s previous generation cards, the T4. Though a bit of a higher-end card, the T4 was designed for many of the same inference workloads, and came in the same HHHL single-slot form factor. So the release of the A2 finishes the Ampere-ficiation of NVIDIA accelerator lineup, giving NVIDIA’s server customers a fresh entry-level card.

NVIDIA ML Accelerator Specification Comparison
	A100	A30	A2
FP32 CUDA Cores	6912	3584	1280
Tensor Cores	432	224	40
Boost Clock	1.41GHz	1.44GHz	1.77GHz
Memory Clock	3.2Gbps HBM2e	2.4Gbps HBM2	12.5Gbps GDDR6
Memory Bus Width	5120-bit	3072-bit	128-bit
Memory Bandwidth	2.0TB/sec	933GB/sec	200GB/sec
VRAM	80GB	24GB	16GB
Single Precision	19.5 TFLOPS	10.3 TFLOPS	4.5 TFLOPS
Double Precision	9.7 TFLOPS	5.2 TFLOPS	0.14 TFLOPS
INT8 Tensor	624 TOPS	330 TOPS	36 TOPS
FP16 Tensor	312 TFLOPS	165 TFLOPS	18 TFLOPS
TF32 Tensor	156 TFLOPS	82 TFLOPS	9 TFLOPS
Interconnect	NVLink 3 12 Links	PCIe 4.0 x16 + NVLink 3 (4 Links)	PCIe 4.0 x8
GPU	GA100	GA100	GA107
Transistor Count	54.2B	54.2B	?
TDP	400W	165W	40W-60W
Manufacturing Process	TSMC 7N	TSMC 7N	Samsung 8nm
Form Factor	SXM4	SXM4	HHHL-SS PCIe
Architecture	Ampere	Ampere	Ampere

Going by NVIDIA’s official specifications, the A2 appears to be using a heavily cut-down version of their low-end GA107 GPU. With only 1280 CUDA cores (and 40 tensor cores), the A2 is only using about half of GA107’s capacity. But this is consistent with the size and power-optimized goal of the card. A2 only draws 60W out of the box, and can be configured to drop down even further, to 42W.

Compared to its compute cores, NVIDIA is keeping GA107’s full memory bus for the A2 card. The 128-bit memory bus is paired with 16GB of GDDR6, which is clocked at a slightly unusual 12.5Gbps. This works out to a flat 200GB/second of memory bandwidth, so it would seem someone really wanted to have a nice, round number there.

Otherwise, as previously mentioned, this is a PCIe card in a half height, half-length, single-slot (HHHL-SS) form factor. And like all of NVIDIA’s server cards, A2 is passively cooled, relying on airflow from the host chassis. Speaking of the host, GA107 only offers 8 PCIe lanes, so the card gets a PCIe 4.0 x8 connection back to its host CPU.

Wrapping things up, according to NVIDIA the A2 is available immediately. NVIDIA does not provide public pricing for its server cards, but the new accelerator should be available through NVIDIA’s regular OEM partners.

Source: NVIDIA

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

16 Comments

View All Comments

sabot00 - Tuesday, November 9, 2021 - link
Can this play games if one virtualizes the card (to get around no display out).
RU482 - Tuesday, November 9, 2021 - link
I doubt you'd be blown away by the performance if it could
mode_13h - Tuesday, November 9, 2021 - link
Yes, quite likely. Given that it's based on a GA107, performance would be similar to a RTX 3050.
Spunjji - Wednesday, November 10, 2021 - link
About 2/3 of an RTX 3050 when you balance out the clock speed and core count differences.
catavalon21 - Tuesday, November 9, 2021 - link
I'm trying to figure out the purpose of this card. With a 1/32 Double Precision ratio and a core count not seen so small in accelerators since Maxwell - what niche is this filling? As a T4 successor it falls a bit short in comparison. Less memory bandwidth. Fewer cores. Less compute capability. Just...why?
mode_13h - Tuesday, November 9, 2021 - link
As they say, its niche is inference. For that, you mostly need int8 and BF16.

Compared to their other Tesla products, what it gives you is a GPU that can run with no aux power and has the same CUDA capability as the rest of their Ampere line. That includes tensor cores with BF16, which I think T4 is lacking.
mode_13h - Tuesday, November 9, 2021 - link
> Less compute capability.

Uh, those words have a very specific meaning, in Nvidia/CUDA context. What I think you mean is less compute capacity or performance.

https://en.wikipedia.org/wiki/CUDA#Version_feature...
catavalon21 - Thursday, November 11, 2021 - link
point taken
Gigaplex - Tuesday, November 9, 2021 - link
Replacement for the low end Quadro line, often used in workstation grade office boxes that don't need much GPU power, but need something with certified drivers (Intel support is garbage).
Gigaplex - Tuesday, November 9, 2021 - link
Scratch that, I didn't see that this line doesn't have the display outputs. Stupid NVIDIA claiming that Quadro will be replaced by the A model line and then mixing it up with stuff like this.

NVIDIA Launches A2 Accelerator: Entry-Level Ampere For Edge Inference

Post Your Comment

16 Comments

View All Comments

sabot00 - Tuesday, November 9, 2021 - link

RU482 - Tuesday, November 9, 2021 - link

mode_13h - Tuesday, November 9, 2021 - link

Spunjji - Wednesday, November 10, 2021 - link

catavalon21 - Tuesday, November 9, 2021 - link

mode_13h - Tuesday, November 9, 2021 - link

mode_13h - Tuesday, November 9, 2021 - link

catavalon21 - Thursday, November 11, 2021 - link

Gigaplex - Tuesday, November 9, 2021 - link

Gigaplex - Tuesday, November 9, 2021 - link

Log in

Don't have an account? Sign up now