Nvidia Ampere Architecture

 



Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures, officially announced on May 14, 2020. It is named after French mathematician and physicist André-Marie Ampère. Nvidia announced the next-generation GeForce 30 series consumer GPUs at a GeForce Special Event on September 1, 2020. Nvidia announced 80GB GPU at SC20 on November 16, 2020. Mobile RTX graphics cards and the RTX 3060 were revealed on January 12, 2021. Nvidia also announced Ampere's successor, Hopper, at GTC 2022, and "Ampere Next Next" for a 2024 release at GPU Technology Conference 2021.


Ampere Graphics Processors Line-up's 

Highlighted :


Third-Generation Tensor Cores


First introduced in the NVIDIA Volta™ architecture, NVIDIA Tensor Core technology has brought dramatic speedups to AI, bringing down training times from weeks to hours and providing massive acceleration to inference. The NVIDIA Ampere architecture builds upon these innovations by bringing new precisions—Tensor Float 32 (TF32) and floating point 64 (FP64)—to accelerate and simplify AI adoption and extend the power of Tensor Cores to HPC.

TF32 works just like FP32 while delivering speedups of up to 20X for AI without requiring any code change. Using NVIDIA Automatic Mixed Precision, researchers can gain an additional 2X performance with automatic mixed precision and FP16 by adding just a couple of lines of code. And with support for bfloat16, INT8, and INT4, Tensor Cores in NVIDIA Ampere architecture Tensor Core GPUs create an incredibly versatile accelerator for both AI training and inference. Bringing the power of Tensor Cores to HPC, A100 and A30 GPUs also enable matrix operations in full, IEEE-certified, FP64 precision.


Third-Generation NVLink


Scaling applications across multiple GPUs requires extremely fast movement of data. The third generation of NVIDIA® NVLink® in the NVIDIA Ampere architecture doubles the GPU-to-GPU direct bandwidth to 600 gigabytes per second (GB/s), almost 10X higher than PCIe Gen4. When paired with the latest generation of NVIDIA NVSwitch™, all GPUs in the server can talk to each other at full NVLink speed for incredibly fast data transfers. 

NVIDIA DGX™A100 and servers from other leading computer makers take advantage of NVLink and NVSwitch technology via NVIDIA HGX™ A100 baseboards to deliver greater scalability for HPC and AI workloads.



Second-Generation RT Cores


The NVIDIA Ampere architecture’s second-generation RT Cores in the NVIDIA A40 deliver massive speedups for workloads like photorealistic rendering of movie content, architectural design evaluations, and virtual prototyping of product designs. RT Cores also speed up the rendering of ray-traced motion blur for faster results with greater visual accuracy and can simultaneously run ray tracing with either shading or denoising capabilities.




Architectural improvements of the Ampere architecture include the following:


  • CUDA Compute Capability 8.0 for A100 and 8.6 for the GeForce 30 series
  • TSMC's 7 nm FinFET process for A100
  • Custom version of Samsung's 8 nm process (8N) for the GeForce 30 series
  • Third-generation Tensor Cores with FP16, bfloat16, TensorFloat-32 (TF32) and FP64 support and sparsity acceleration. The individual Tensor cores have with 256 FP16 FMA operations per second 4x processing power (GA100 only, 2x on GA10x) compared to previous Tensor Core generations; the Tensor Core Count is reduced to one per SM.
  • Second-generation ray tracing cores; concurrent ray tracing, shading, and compute for the GeForce 30 series
  • High Bandwidth Memory 2 (HBM2) on A100 40GB & A100 80GB
  • GDDR6X memory for GeForce RTX 3090, RTX 3080 Ti, RTX 3080, RTX 3070 Ti
  • Double FP32 cores per SM on GA10x GPUs
  • NVLink 3.0 with a 50Gbit/s per pair throughput
  • PCI Express 4.0 with SR-IOV support (SR-IOV is reserved only for A100)
  • Multi-instance GPU (MIG) virtualization and GPU partitioning feature in A100 supporting up to seven instances
  • PureVideo feature set K hardware video decoding with AV1 hardware decoding for the GeForce 30 series and feature set J for A100
  • 5 NVDEC for A100
  • Adds new hardware-based 5-core JPEG decode (NVJPG) with YUV420, YUV422, YUV444, YUV400, RGBA. Should not be confused with Nvidia NVJPEG (GPU-accelerated library for JPEG encoding/decoding)




Ampere PowerFull GPU :




Nvidia GeForce RTX 3090

The GeForce RTX 3090 Ti is anenthusiast-class graphics card by NVIDIA, launched on January 27th, 2022. Built on the 8 nm process, and based on the GA102 graphics processor, in its GA102-350-A1 variant, the card supports DirectX 12 Ultimate. This ensures that all modern games will run on GeForce RTX 3090 Ti. Additionally, the DirectX 12 Ultimate capability guarantees support for hardware-raytracing, variable-rate shading and more, in upcoming video games. The GA102 graphics processor is a large chip with a die area of 628 mm² and 28,300 million transistors. It features 10752 shading units, 336 texture mapping units, and 112 ROPs. Also included are 336 tensor cores which help improve the speed of machine learning applications. The card also has 84 raytracing acceleration cores. NVIDIA has paired 24 GB GDDR6X memory with the GeForce RTX 3090 Ti, which are connected using a 384-bit memory interface. The GPU is operating at a frequency of 1560 MHz, which can be boosted up to 1860 MHz, memory is running at 1313 MHz (21 Gbps effective).

Being a triple-slot card, the NVIDIA GeForce RTX 3090 Ti draws power from 1x 16-pin power connector, with power draw rated at 450 W maximum. Display outputs include: 1x HDMI 2.1, 3x DisplayPort 1.4a. GeForce RTX 3090 Ti is connected to the rest of the system using a PCI-Express 4.0 x16 interface. The card's dimensions are 336 mm x 140 mm x 61 mm, and it features a triple-slot cooling solution. Its price at launch was 1999 US Dollars.



Ampere Vs Turing Architecture


The fastest RTX graphics cards are now alive, from Nvidia’s factory. New Nvidia Ampere GPUs, the successor of Turing are most powerful, that’s what we expect from the new-gen. Specifically, ray tracing performance has improved so much.

The Turing architecture also introduced Ray Tracing cores used to accelerate photo realistic rendering. With Ampere NVIDIA has continued to make significant improvements










No comments:

Post a Comment