Showing posts with label Architecture. Show all posts

Intel Tiger Lake Architecture

Tiger Lake is Intel's codename for the 11th generation Intel Core mobile processors based on the new Willow Cove Core microarchitecture, manufactured using Intel's third-generation 10 nm process node known as 10SF ("10 nm SuperFin"). Tiger Lake replaces the Ice Lake family of mobile processors, representing an Optimization step in Intel's process–architecture–optimization model.

Tiger Lake processors launched on September 2, 2020, are part of the Tiger Lake-U family and include dual-core and quad-core 9 W (7–15 W) TDP and 15 W (12–28 W) TDP models. They power 2020 "Project Athena" laptops. The quad-core 96 EU die measures 13.6 × 10.7 mm (146.1 mm2), which is 19.2% wider than the 11.4 × 10.7 mm (122.5 mm2) quad-core 64 EU Ice Lake die. The 8-core 32 EU die used in Tiger Lake-H is around 190 mm2.[8] According to Yehuda Nissan and his team, the architecture is named after a lake across Puget Sound, Washington.[9] Laptops based on Tiger Lake started to sell in October 2020.

The Tiger Lake-H35 processors were launched on January 11, 2021. These quad-core processors are designed for "ultraportable gaming" laptops with 28-35 W TDP. Intel also announced that the Tiger Lake-H processors with 45 W TDP and up to eight cores will become available in Q1 2021. Intel officially launched 11th Gen Intel Core-H series on May 11, 2021[13] and announced 11th Gen Intel Core Tiger Lake Refresh series on May 30, 2021.

Tiger Lake Processors Line-up's

Intel® Core™ i3-11100HE

Intel® Core™ i3-1115G4E

Intel® Core™ i3-1115GRE

Intel® Core™ i3-1120G4

Intel® Core™ i3-1125G4

Intel® Core™ i3-1110G4

Intel® Core™ i3-1115G4

Intel® Core™ i5-11500HE

Intel® Core™ i5-11320H

Intel® Core™ i5-1155G7

Intel® Core™ i5-11260H

Intel® Core™ i5-11400H

Intel® Core™ i5-11500H

Intel® Core™ i5-11400

Intel® Core™ i5-11400F

Intel® Core™ i5-11400T

Intel® Core™ i5-11500

Intel® Core™ i5-11500T

Intel® Core™ i5-11600

Intel® Core™ i5-11600K

Intel® Core™ i5-11600KF

Intel® Core™ i5-11600T

Intel® Core™ i5-11300H

Intel® Core™ i5-1140G7

Intel® Core™ i5-1145G7

Intel® Core™ i5-1145G7E

Intel® Core™ i5-1145GRE

Intel® Core™ i5-1130G7

Intel® Core™ i5-1135G7

Intel® Core™ i7-11850HE

Intel® Core™ i7-11600H

Intel® Core™ i7-11390H

Intel® Core™ i7-1195G7

Intel® Core™ i7-11800H

Intel® Core™ i7-11850H

Intel® Core™ i7-11700

Intel® Core™ i7-11700F

Intel® Core™ i7-11700K

Intel® Core™ i7-11700KF

Intel® Core™ i7-11700T

Intel® Core™ i7-11370H

Intel® Core™ i7-11375H

Intel® Core™ i7-1180G7

Intel® Core™ i7-1185G7E

Intel® Core™ i7-1185GRE

Intel® Core™ i7-1160G7

Intel® Core™ i7-1165G7

Intel® Core™ i7-1185G7

Intel® Core™ i9-11900H

Intel® Core™ i9-11950H

Intel® Core™ i9-11980HK

Intel® Core™ i9-11900

Intel® Core™ i9-11900F

Intel® Core™ i9-11900K

Intel® Core™ i9-11900KF

Intel® Core™ i9-11900T

Features

CPU

Further information: Willow Cove (microarchitecture)

Intel Willow Cove CPU cores

Full memory (RAM) encryption

Indirect branch tracking and CET shadow stack

Intel Key Locker

GPU

Intel Xe-LP ("Gen12") GPU with up to 96 execution units (50% uplift compared to Ice Lake, up from 64) with some yet to be announced processors using Intel's discrete GPU, DG1

Fixed-function hardware decoding for HEVC 12-bit, 4:2:2/4:4:4; VP9 12-bit 4:4:4 and AV1 8K 10-bit 4:2:0

Support for a single 8K 12-bit HDR display or two 4K 10-bit HDR displays

Hardware accelerated Dolby Vision

Sampler Feedback support

Dual Queue Support

IPU

Image Processing Unit, a special co-processor to improve image and video capture quality

Not available on embedded models

Initially there were 1165G7, 1135G7, 1125G4 and 1115G4 models with no IPU but later embedded processors were introduced instead

I/O

PCI Express 4.0 (Pentium and Celeron CPUs are limited to PCI Express 3.0)

Integrated Thunderbolt 4 (includes USB4)

LPDDR4X-4267 memory support

LPDDR5-5400 "architecture capability" (Intel expected Tiger Lake products with LPDDR5 to be available around Q1 2021 but never released them)

Miniaturization of CPU and motherboard into an M.2 SSD-sized small circuit board

Intel Iris Xe Graphics G7 96EUs

Intel Iris Xe G7 96EUs

The Intel Xe Graphics G7 (Tiger-Lake U GPU with 96 EUs) is a integrated graphics card in the high end Tiger-Lake U CPUs (15 - 28 Watt). It is using the new Xe architecture (Gen12) and was introduced in September 2020. The GPU clocks with a base clock speed (guaranteed) of 400 MHz in all CPUs and can boost up to 1340 MHz (i7-1185G7). The slowest variant offers only 1100 MHz boost (i5-1130G7, 12 Watt TDP).

The performance depends on the TDP settings of the laptop and the used cooling. First informations show that the chip can be configured at 12 and 28 Watt TDP default (as the Ice Lake-U chips) and the performance should be around a dedicated GeForce MX350 in 3DMark benchmarks. For gaming we are expecting a bit worse performance due to the missing dedicated graphics memory and driver support. Many games e.g. had problems when testing the various laptops (e.g. Horizon Zero Dawn or Cyberpunk 2077 did not start or were crashing - see list below). Less demanding games like the Mass Effect Legendary Edition ran in medium settings fine. Compared to the older Ice Lake Iris Plus G7 GPU, the new Tiger Lake GPU should be approximately twice as fast. Therefore, the iGPU is still only for lowest graphical settings and low resolutions in demanding games.

The Tiger Lake SoCs and therefore the integrated GPU are manufactured in the modern 10nm+ (10nm SuperFin) process (improved 10nm process) at Intel and therefore should offer a very good efficiency.

Nvidia Kepler Architecture

Kepler is the codename for a GPU microarchitecture developed by Nvidia, first introduced at retail in April 2012, as the successor to the Fermi microarchitecture. Kepler was Nvidia's first microarchitecture to focus on energy efficiency. Most GeForce 600 series, most GeForce 700 series, and some GeForce 800M series GPUs were based on Kepler, all manufactured in 28 nm. Kepler also found use in the GK20A, the GPU component of the Tegra K1 SoC, as well as in the Quadro Kxxx series, the Quadro NVS 510, and Nvidia Tesla computing modules. Kepler was followed by the Maxwell microarchitecture and used alongside Maxwell in the GeForce 700 series and GeForce 800M series.

Kepler Graphics Processors Line-up's

Nvidia GeForce GTX Titan Z

Nvidia GeForce GTX Titan

Nvidia GeForce GTX 780 Ti

Nvidia GeForce GTX 780

Nvidia GeForce GTX 770

Nvidia GeForce GTX 760 Ti

Nvidia GeForce GTX 760

Nvidia GeForce GT 740

Nvidia GeForce GT 730

Nvidia GeForce GT 720

Nvidia GeForce GT 710

Nvidia GeForce GTX 690

Nvidia GeForce GTX 680

Nvidia GeForce GTX 670

Nvidia GeForce GTX 660 Ti

Nvidia GeForce GTX 660

Nvidia GeForce GTX 650 Ti

Nvidia GeForce GTX 650

Nvidia GeForce GTX 645

Nvidia GeForce GT 640

Nvidia GeForce GT 635

Nvidia GeForce GT 630

Highlighted :

Next Generation Streaming Multiprocessor (SMX)

The Kepler architecture employs a new Streaming Multiprocessor Architecture called "SMX". SMXs are the reason for Kepler's power efficiency as the whole GPU uses a single unified clock speed.[5] Although SMXs usage of a single unified clock increases power efficiency due to the fact that multiple lower clock Kepler CUDA Cores consume 90% less power than multiple higher clock Fermi CUDA Core, additional processing units are needed to execute a whole warp per cycle. Doubling 16 to 32 per CUDA array solve the warp execution problem, the SMX front-end are also double with warp schedulers, dispatch unit and the register file doubled to 64K entries as to feed the additional execution units. With the risk of inflating die area, SMX PolyMorph Engines are enhanced to 2.0 rather than double alongside the execution units, enabling it to spurr polygon in shorter cycles. There are 192 shaders per SMX.[8] Dedicated FP64 CUDA cores are also used as all Kepler CUDA cores are not FP64 capable to save die space. With the improvement Nvidia made on the SMX, the results include an increase in GPU performance and efficiency. With GK110, the 48KB texture cache are unlocked for compute workloads. In compute workload the texture cache becomes a read-only data cache, specializing in unaligned memory access workloads. Furthermore, error detection capabilities have been added to make it safer for workloads that rely on ECC. The register per thread count is also doubled in GK110 with 255 registers per thread.

Microsoft Direct3D Support

Nvidia Fermi and Kepler GPUs of the GeForce 600 series support the Direct3D 11.0 specification. Nvidia originally stated that the Kepler architecture has full DirectX 11.1 support, which includes the Direct3D 11.1 path. The following "Modern UI" Direct3D 11.1 features, however, are not supported:

Target-Independent Rasterization (2D rendering only)

16xMSAA Rasterization (2D rendering only).

Orthogonal Line Rendering Mode.

UAV (Unordered Access View) in non-pixel-shader stages.

According to the definition by Microsoft, Direct3D feature level 11_1 must be complete, otherwise the Direct3D 11.1 path can not be executed.[14] The integrated Direct3D features of the Kepler architecture are the same as those of the GeForce 400 series Fermi architecture.

Hyper-Q

Hyper-Q expands GK110 hardware work queues from 1 to 32. The significance of this being that having a single work queue meant that Fermi could be under occupied at times as there wasn't enough work in that queue to fill every SM. By having 32 work queues, GK110 can in many scenarios, achieve higher utilization by being able to put different task streams on what would otherwise be an idle SMX. The simple nature of Hyper-Q is further reinforced by the fact that it's easily mapped to MPI, a common message passing interface frequently used in HPC. As legacy MPI-based algorithms that were originally designed for multi-CPU systems that became bottlenecked by false dependencies now have a solution. By increasing the number of MPI jobs, it's possible to utilize Hyper-Q on these algorithms to improve the efficiency all without changing the code itself.

Shuffle Instructions

At a low level, GK110 sees an additional instructions and operations to further improve performance. New shuffle instructions allow for threads within a warp to share data without going back to memory, making the process much quicker than the previous load/share/store method. Atomic operations are also overhauled, speeding up the execution speed of atomic operations and adding some FP64 operations that were previously only available for FP32 data.

Dynamic Parallelism

Dynamic Parallelism ability is for kernels to be able to dispatch other kernels. With Fermi, only the CPU could dispatch a kernel, which incurs a certain amount of overhead by having to communicate back to the CPU. By giving kernels the ability to dispatch their own child kernels, GK110 can both save time by not having to go back to the CPU, and in the process free up the CPU to work on other tasks.

Video decompression/compression

NVDEC

NVENC

Main article: Nvidia NVENC

NVENC is Nvidia's power efficient fixed-function encode that is able to take codecs, decode, preprocess, and encode H.264-based content. NVENC specification input formats are limited to H.264 output. But still, NVENC, through its limited format, can support up to 4096x4096 encode.

Like Intel's Quick Sync, NVENC is currently exposed through a proprietary API, though Nvidia does have plans to provide NVENC usage through CUDA.

TXAA Support

Exclusive to Kepler GPUs, TXAA is a new anti-aliasing method from Nvidia that is designed for direct implementation into game engines. TXAA is based on the MSAA technique and custom resolve filters. It is designed to address a key problem in games known as shimmering or temporal aliasing. TXAA resolves that by smoothing out the scene in motion, making sure that any in-game scene is being cleared of any aliasing and shimmering.

GPU Boost

GPU Boost is a new feature which is roughly analogous to turbo boosting of a CPU. The GPU is always guaranteed to run at a minimum clock speed, referred to as the "base clock". This clock speed is set to the level which will ensure that the GPU stays within TDP specifications, even at maximum loads. When loads are lower, however, there is room for the clock speed to be increased without exceeding the TDP. In these scenarios, GPU Boost will gradually increase the clock speed in steps, until the GPU reaches a predefined power target (which is 170 W by default). By taking this approach, the GPU will ramp its clock up or down dynamically, so that it is providing the maximum amount of speed possible while remaining within TDP specifications.

The power target, as well as the size of the clock increase steps that the GPU will take, are both adjustable via third-party utilities and provide a means of overclocking Kepler-based cards.

NVIDIA GPUDirect

NVIDIA GPUDirect is a capability that enables GPUs within a single computer, or GPUs in different servers located across a network, to directly exchange data without needing to go to CPU/system memory. The RDMA feature in GPUDirect allows third party devices such as SSDs, NICs, and IB adapters to directly access memory on multiple GPUs within the same system, significantly decreasing the latency of MPI send and receive messages to/from GPU memory.[16] It also reduces demands on system memory bandwidth and frees the GPU DMA engines for use by other CUDA tasks. Kepler GK110 also supports other GPUDirect features including Peer‐to‐Peer and GPUDirect for Video.

Features

PCI Express 3.0 interface

DisplayPort 1.2

HDMI 1.4a 4K x 2K video output

Purevideo VP5 hardware video acceleration (up to 4K x 2K H.264 decode)

Hardware H.264 encoding acceleration block (NVENC)

Support for up to 4 independent 2D displays, or 3 stereoscopic/3D displays (NV Surround)

Next Generation Streaming Multiprocessor (SMX)

Polymorph-Engine 2.0

Simplified Instruction Scheduler

Bindless Textures

CUDA Compute Capability 3.0 to 3.5

GPU Boost (Upgraded to 2.0 on GK110)

TXAA Support

Manufactured by TSMC on a 28 nm process

New Shuffle Instructions

Dynamic Parallelism

Hyper-Q (Hyper-Q's MPI functionality reserve for Tesla only)

Grid Management Unit

NVIDIA GPUDirect (GPU Direct's RDMA functionality reserve for Tesla only)

Intel Alder Lake-S Architecture

12th Gen Intel® Core™ CPUs adapt to the ways you work and play. When gaming, the processor prevents background tasks from interrupting or using your high-performance cores. When working, it provides a smoother system-level experience while using demanding applications.

12Th Generation Processors Line-up's [ Intel Alder Lake-S Architecture ]

Intel® Core™ i3-1215UL Processor

Intel® Core™ i3-12300HL Processor

Intel® Core™ i3-1215UE Processor

Intel® Core™ i3-1220PE Processor

Intel® Core™ i3-1210U Processor

Intel® Core™ i3-1215U Processor

Intel® Core™ i3-1220P Processor

Intel® Core™ i3-12100 Processor

Intel® Core™ i3-12100E Processor

Intel® Core™ i3-12100F Processor

Intel® Core™ i3-12100T Processor

Intel® Core™ i3-12100TE Processor

Intel® Core™ i3-12300 Processor

Intel® Core™ i3-12300HE Processor

Intel® Core™ i3-12300T Processor

Intel® Core™ i5-1235UL Processor

Intel® Core™ i5-1245UL Processor

Intel® Core™ i5-12500HL Processor

Intel® Core™ i5-12600HL Processor

Intel® Core™ i5-12450HX Processor

Intel® Core™ i5-12600HX Processor

Intel® Core™ i5-1245UE Processor

Intel® Core™ i5-1250PE Processor

Intel® Core™ i5-1230U Processor

Intel® Core™ i5-1235U Processor

Intel® Core™ i5-1240P Processor

Intel® Core™ i5-1240U Processor

Intel® Core™ i5-1245U Processor

Intel® Core™ i5-1250P Processor

Intel® Core™ i5-12400 Processor

Intel® Core™ i5-12400F Processor

Intel® Core™ i5-12400T Processor

Intel® Core™ i5-12450H Processor

Intel® Core™ i5-12500 Processor

Intel® Core™ i5-12500E Processor

Intel® Core™ i5-12500H Processor

Intel® Core™ i5-12500T Processor

Intel® Core™ i5-12500TE Processor

Intel® Core™ i5-12600 Processor

Intel® Core™ i5-12600H Processor

Intel® Core™ i5-12600HE Processor

Intel® Core™ i5-12600T Processor

Intel® Core™ i5-12600K Processor

Intel® Core™ i5-12600KF Processor

Intel® Core™ i7-1255UL Processor

Intel® Core™ i7-1265UL Processor

Intel® Core™ i7-12700HL Processor

Intel® Core™ i7-12800HL Processor

Intel® Core™ i7-12650HX Processor

Intel® Core™ i7-12800HX Processor

Intel® Core™ i7-12850HX Processor

Intel® Core™ i7-1265UE Processor

Intel® Core™ i7-1270PE Processor

Intel® Core™ i7-1250U Processor

Intel® Core™ i7-1255U Processor

Intel® Core™ i7-1260P Processor

Intel® Core™ i7-1260U Processor

Intel® Core™ i7-1265U Processor

Intel® Core™ i7-1270P Processor

Intel® Core™ i7-1280P Processor

Intel® Core™ i7-12650H Processor

Intel® Core™ i7-12700 Processor

Intel® Core™ i7-12700E Processor

Intel® Core™ i7-12700F Processor

Intel® Core™ i7-12700H Processor

Intel® Core™ i7-12700T Processor

Intel® Core™ i7-12700TE Processor

Intel® Core™ i7-12800H Processor

Intel® Core™ i7-12800HE Processor

Intel® Core™ i7-12700K Processor

Intel® Core™ i7-12700KF Processor

Intel® Core™ i9-12900HX Processor

Intel® Core™ i9-12950HX Processor

Intel® Core™ i9-12900KS Processor

Intel® Core™ i9-12900 Processor

Intel® Core™ i9-12900E Processor

Intel® Core™ i9-12900F Processor

Intel® Core™ i9-12900H Processor

Intel® Core™ i9-12900HK Processor

Intel® Core™ i9-12900T Processor

Intel® Core™ i9-12900TE Processor

Intel® Core™ i9-12900K Processor

Intel® Core™ i9-12900KF Processor

Highlighted :

12th Gen CPUs integrate two types of cores into a single die: performance-cores (P-cores) and efficient-cores (E-cores).

Performance-cores :

Physically larger high-performance cores designed for raw speed while maintaining efficiency.

Optimized for low-latency single-threaded performance and AI workloads.

Capable of hyper-threading, or running two software threads at once.

Measured at 19% better performance, on average, than 11th Gen Intel® Core™ CPUs across a wide range of workloads at ISO frequency3.

Efficient-cores :

Physically smaller, with multiple E-cores fitting into the physical space occupied by one P-core.

Optimized for multi-core performance-per-watt—delivering scalable multithread performance and efficient offload of background tasks.

Capable of running a single software thread.

Capable of 40% more performance when running at the same power as a single Skylake core4.

DDR5 Memory Details :

DDR5 is the next-generation specification for RAM and it comes with a host of improvements in speed and efficiency when compared to DDR4, the current standard.

Higher-bandwidth kits thanks to doubled burst length—the number of bits that can be read per cycle.

12th Gen supports speeds up to 4,800MHz for DDR5 and 3,200MHz for DDR4.

DDR5 allows capacities of up to 128GB of RAM per module, whereas DDR4 allows only 32GB.

DDR5 doubles the number of memory bank groups and improves the speed at which groups can be refreshed.

PCIe 5.0 :

12th Gen Intel® Core™ CPUs are at the forefront of the industry transition to PCIe 5.0. PCIe 5.0 doubles the bandwidth of 4.0, which means your system will be ready for the next generation of SSDs and discrete GPUs.

PCIe is the high-bandwidth expansion bus used to connect graphics cards, SSDs, and other peripherals to your motherboard. Each generation of PCIe doubles in throughput, with PCIe 5.0 providing theoretical maximum data transfer speeds of 32 GT/s.

Full backwards compatibility with PCIe 4.0 and 3.0 devices.

Double the bandwidth of 4.0 and four times the bandwidth of 3.0.

Up to 16 CPU PCIe 5.0 lanes and up to 4 CPU PCIe 4.0 lanes.

Intel UHD Graphics 64EU :

The UHD Graphics 64EU is a mobile integrated graphics solution by Intel, launched on January 4th, 2022. Built on the 10 nm process, and based on the Alder Lake GT1 graphics processor, the device supports DirectX 12. This ensures that all modern games will run on UHD Graphics 64EU. It features 512 shading units, 32 texture mapping units, and 16 ROPs. The GPU is operating at a frequency of 300 MHz, which can be boosted up to 1400 MHz.

Its power draw is rated at 45 W maximum.

General info

Of UHD Graphics 64EU's architecture, market segment and release date.

Place in performance rating not rated

Architecture Generation 12.2 (2021−2022)

GPU code name Alder Lake GT1

Market segment Desktop

Release date 4 January 2022 (less than a year ago)

Technical specs

Pipelines / CUDA cores 512 of 18432 (AD102)

Boost clock speed 1400 MHz of 2903 (Radeon Pro W6600)

Manufacturing process technology 10 nm of 4 (H100 PCIe)

Thermal design power (TDP) 45 Watt of 900 (Tesla S2050)

Texture fill rate 44.80 of 939.8 (H100 SXM5)

Memory :

Memory type System Shared

Maximum RAM amount System Shared of 128 (Radeon Instinct MI250X)

Memory bus width System Shared of 8192 (Radeon Instinct MI250X)

Memory clock speed System Shared of 21000 (GeForce RTX 3090 Ti)

API support :

DirectX 12 (12_1)

Shader Model 6.4

OpenGL 4.6

OpenCL 3.0

Vulkan 1.3

Nvidia Ampere Architecture

Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures, officially announced on May 14, 2020. It is named after French mathematician and physicist André-Marie Ampère. Nvidia announced the next-generation GeForce 30 series consumer GPUs at a GeForce Special Event on September 1, 2020. Nvidia announced 80GB GPU at SC20 on November 16, 2020. Mobile RTX graphics cards and the RTX 3060 were revealed on January 12, 2021. Nvidia also announced Ampere's successor, Hopper, at GTC 2022, and "Ampere Next Next" for a 2024 release at GPU Technology Conference 2021.

Ampere Graphics Processors Line-up's

GeForce RTX 3050 mobile (GA107)

GeForce RTX 3050 Ti mobile (GA107)

GeForce RTX 3050 (GA106 or GA107)[19]

GeForce RTX 3060 (GA106)

GeForce RTX 3060 Ti (GA104 or GA103)

GeForce RTX 3070 (GA104)

GeForce RTX 3070 Ti (GA104)

GeForce RTX 3080 (GA102)

GeForce RTX 3080 Ti (GA102)

GeForce RTX 3090 (GA102)

GeForce RTX 3090 Ti (GA102-350-A1)

Highlighted :

Third-Generation Tensor Cores

First introduced in the NVIDIA Volta™ architecture, NVIDIA Tensor Core technology has brought dramatic speedups to AI, bringing down training times from weeks to hours and providing massive acceleration to inference. The NVIDIA Ampere architecture builds upon these innovations by bringing new precisions—Tensor Float 32 (TF32) and floating point 64 (FP64)—to accelerate and simplify AI adoption and extend the power of Tensor Cores to HPC.

TF32 works just like FP32 while delivering speedups of up to 20X for AI without requiring any code change. Using NVIDIA Automatic Mixed Precision, researchers can gain an additional 2X performance with automatic mixed precision and FP16 by adding just a couple of lines of code. And with support for bfloat16, INT8, and INT4, Tensor Cores in NVIDIA Ampere architecture Tensor Core GPUs create an incredibly versatile accelerator for both AI training and inference. Bringing the power of Tensor Cores to HPC, A100 and A30 GPUs also enable matrix operations in full, IEEE-certified, FP64 precision.

Third-Generation NVLink

Scaling applications across multiple GPUs requires extremely fast movement of data. The third generation of NVIDIA® NVLink® in the NVIDIA Ampere architecture doubles the GPU-to-GPU direct bandwidth to 600 gigabytes per second (GB/s), almost 10X higher than PCIe Gen4. When paired with the latest generation of NVIDIA NVSwitch™, all GPUs in the server can talk to each other at full NVLink speed for incredibly fast data transfers.

NVIDIA DGX™A100 and servers from other leading computer makers take advantage of NVLink and NVSwitch technology via NVIDIA HGX™ A100 baseboards to deliver greater scalability for HPC and AI workloads.

Second-Generation RT Cores

The NVIDIA Ampere architecture’s second-generation RT Cores in the NVIDIA A40 deliver massive speedups for workloads like photorealistic rendering of movie content, architectural design evaluations, and virtual prototyping of product designs. RT Cores also speed up the rendering of ray-traced motion blur for faster results with greater visual accuracy and can simultaneously run ray tracing with either shading or denoising capabilities.

Architectural improvements of the Ampere architecture include the following:

CUDA Compute Capability 8.0 for A100 and 8.6 for the GeForce 30 series

TSMC's 7 nm FinFET process for A100

Custom version of Samsung's 8 nm process (8N) for the GeForce 30 series

Third-generation Tensor Cores with FP16, bfloat16, TensorFloat-32 (TF32) and FP64 support and sparsity acceleration. The individual Tensor cores have with 256 FP16 FMA operations per second 4x processing power (GA100 only, 2x on GA10x) compared to previous Tensor Core generations; the Tensor Core Count is reduced to one per SM.

Second-generation ray tracing cores; concurrent ray tracing, shading, and compute for the GeForce 30 series

High Bandwidth Memory 2 (HBM2) on A100 40GB & A100 80GB

GDDR6X memory for GeForce RTX 3090, RTX 3080 Ti, RTX 3080, RTX 3070 Ti

Double FP32 cores per SM on GA10x GPUs

NVLink 3.0 with a 50Gbit/s per pair throughput

PCI Express 4.0 with SR-IOV support (SR-IOV is reserved only for A100)

Multi-instance GPU (MIG) virtualization and GPU partitioning feature in A100 supporting up to seven instances

PureVideo feature set K hardware video decoding with AV1 hardware decoding for the GeForce 30 series and feature set J for A100

5 NVDEC for A100

Adds new hardware-based 5-core JPEG decode (NVJPG) with YUV420, YUV422, YUV444, YUV400, RGBA. Should not be confused with Nvidia NVJPEG (GPU-accelerated library for JPEG encoding/decoding)

Ampere PowerFull GPU :

Nvidia GeForce RTX 3090

The GeForce RTX 3090 Ti is anenthusiast-class graphics card by NVIDIA, launched on January 27th, 2022. Built on the 8 nm process, and based on the GA102 graphics processor, in its GA102-350-A1 variant, the card supports DirectX 12 Ultimate. This ensures that all modern games will run on GeForce RTX 3090 Ti. Additionally, the DirectX 12 Ultimate capability guarantees support for hardware-raytracing, variable-rate shading and more, in upcoming video games. The GA102 graphics processor is a large chip with a die area of 628 mm² and 28,300 million transistors. It features 10752 shading units, 336 texture mapping units, and 112 ROPs. Also included are 336 tensor cores which help improve the speed of machine learning applications. The card also has 84 raytracing acceleration cores. NVIDIA has paired 24 GB GDDR6X memory with the GeForce RTX 3090 Ti, which are connected using a 384-bit memory interface. The GPU is operating at a frequency of 1560 MHz, which can be boosted up to 1860 MHz, memory is running at 1313 MHz (21 Gbps effective).

Being a triple-slot card, the NVIDIA GeForce RTX 3090 Ti draws power from 1x 16-pin power connector, with power draw rated at 450 W maximum. Display outputs include: 1x HDMI 2.1, 3x DisplayPort 1.4a. GeForce RTX 3090 Ti is connected to the rest of the system using a PCI-Express 4.0 x16 interface. The card's dimensions are 336 mm x 140 mm x 61 mm, and it features a triple-slot cooling solution. Its price at launch was 1999 US Dollars.

Ampere Vs Turing Architecture

The fastest RTX graphics cards are now alive, from Nvidia’s factory. New Nvidia Ampere GPUs, the successor of Turing are most powerful, that’s what we expect from the new-gen. Specifically, ray tracing performance has improved so much.

The Turing architecture also introduced Ray Tracing cores used to accelerate photo realistic rendering. With Ampere NVIDIA has continued to make significant improvements

Nvidia Fermi Architecture

Fermi is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia, first released to retail in April 2010, as the successor to the Tesla microarchitecture. It was the primary microarchitecture used in the GeForce 400 series and GeForce 500 series. It was followed by Kepler, and used alongside Kepler in the GeForce 600 series, GeForce 700 series, and GeForce 800 series, in the latter two only in mobile GPUs. In the workstation market, Fermi found use in the Quadro x000 series, Quadro NVS models, as well as in Nvidia Tesla computing modules. All desktop Fermi GPUs were manufactured in 40nm, mobile Fermi GPUs in 40nm and 28nm. Fermi is the oldest microarchitecture from NVIDIA that received support for the Microsoft's rendering API Direct3D 12 feature_level 11.

Fermi Graphics Processors Line-up's

NVIDIA GeForce 410M

NVIDIA GeForce 510

NVIDIA GeForce 605

NVIDIA GeForce 610M

NVIDIA GeForce 620M

NVIDIA GeForce 705A

NVIDIA GeForce 705M

NVIDIA GeForce 710A

NVIDIA GeForce 710M

NVIDIA GeForce 720A

NVIDIA GeForce 720M

NVIDIA GeForce 800M

NVIDIA GeForce 810M

NVIDIA GeForce 820A

NVIDIA GeForce 820M

NVIDIA GeForce GT 415M

NVIDIA GeForce GT 420

NVIDIA GeForce GT 420M

NVIDIA GeForce GT 425M

NVIDIA GeForce GT 430

NVIDIA GeForce GT 435M

NVIDIA GeForce GT 440

NVIDIA GeForce GT 445M

NVIDIA GeForce GT 520

NVIDIA GeForce GT 520M

NVIDIA GeForce GT 520MX

NVIDIA GeForce GT 525M

NVIDIA GeForce GT 530

NVIDIA GeForce GT 540M

NVIDIA GeForce GT 545

NVIDIA GeForce GT 550M

NVIDIA GeForce GT 555M

NVIDIA GeForce GT 610

NVIDIA GeForce GT 620

NVIDIA GeForce GT 620M

NVIDIA GeForce GT 625 (OEM)

NVIDIA GeForce GT 625M

NVIDIA GeForce GT 630

NVIDIA GeForce GT 630M

NVIDIA GeForce GT 635M

NVIDIA GeForce GT 640

NVIDIA GeForce GT 645

NVIDIA GeForce GT 705

NVIDIA GeForce GT 710M

NVIDIA GeForce GT 720A

NVIDIA GeForce GT 720M

NVIDIA GeForce GT 730

NVIDIA GeForce GT 820M

NVIDIA GeForce GTS 450

NVIDIA GeForce GTX 460

NVIDIA GeForce GTX 460 SE

NVIDIA GeForce GTX 460 v2

NVIDIA GeForce GTX 460M

NVIDIA GeForce GTX 465

NVIDIA GeForce GTX 470

NVIDIA GeForce GTX 470M

NVIDIA GeForce GTX 480

NVIDIA GeForce GTX 480M

NVIDIA GeForce GTX 485M

NVIDIA GeForce GTX 550 Ti

NVIDIA GeForce GTX 555

NVIDIA GeForce GTX 560

NVIDIA GeForce GTX 560 SE

NVIDIA GeForce GTX 560 Ti

NVIDIA GeForce GTX 560M

NVIDIA GeForce GTX 570

NVIDIA GeForce GTX 570M

NVIDIA GeForce GTX 580

NVIDIA GeForce GTX 580M

NVIDIA GeForce GTX 590

NVIDIA GeForce GTX 670M

NVIDIA GeForce GTX 675M

Highlighted :

Fermi Graphic Processing Units (GPUs) feature 3.0 billion transistors and a schematic is sketched in Fig. 1

Streaming Multiprocessor (SM): composed of 32 CUDA cores (see Streaming Multiprocessor and CUDA core sections).
GigaThread global scheduler: distributes thread blocks to SM thread schedulers and manages the context switches between threads during execution (see Warp Scheduling section).
Host interface: connects the GPU to the CPU via a PCI-Express v2 bus (peak transfer rate of 8GB/s).
DRAM: supported up to 6GB of GDDR5 DRAM memory thanks to the 64-bit addressing capability (see Memory Architecture section).
Clock frequency: 1.5 GHz (not released by NVIDIA, but estimated by Insight 64).
Peak performance: 1.5 TFlops.
Global memory clock: 2 GHz.
DRAM bandwidth: 192GB/s.

Fermi Chips :

GF 100
GF 104
GF 106
GF 108
GF 110
GF 114
GF 116
GF 118
GF 119
GF 117

Architecture :

With these requests in mind, the Fermi team designed a processor that greatly increases raw compute horsepower, and through architectural innovations, also offers dramatically increased programmability and compute efficiency. The key architectural highlights of Fermi are:

• Third Generation Streaming Multiprocessor (SM)

o 32 CUDA cores per SM, 4x over GT200

o 8x the peak double precision floating point performance over GT200

o Dual Warp Scheduler simultaneously schedules and dispatches instructions from two independent warps

o 64 KB of RAM with a configurable partitioning of shared memory and L1 cache

• Second Generation Parallel Thread Execution ISA

o Unified Address Space with Full C++ Support

o Optimized for OpenCL and DirectCompute

o Full IEEE 754-2008 32-bit and 64-bit precision

o Full 32-bit integer path with 64-bit extensions

o Memory access instructions to support transition to 64-bit addressing

o Improved Performance through Predication

• Improved Memory Subsystem

o NVIDIA Parallel DataCacheTM hierarchy with Configurable L1 and Unified L2 Caches

o First GPU with ECC memory support

o Greatly improved atomic memory operation performance

• NVIDIA GigaThreadTM Engine

o 10x faster application context switching

o Concurrent kernel execution o Out of Order thread block execution

o Dual overlapped memory transfer engines

More Details :

Optimized for OpenCL and DirectCompute

OpenCL and DirectCompute are closely related to the CUDA programming model, sharing the key abstractions of threads, thread blocks, grids of thread blocks, barrier synchronization, perblock shared memory, global memory, and atomic operations. Fermi, a third-generation CUDA architecture, is by nature well-optimized for these APIs. In addition, Fermi offers hardware support for OpenCL and DirectCompute surface instructions with format conversion, allowing graphics and compute programs to easily operate on the same data. The PTX 2.0 ISA also adds support for the DirectCompute instructions population count, append, and bit-reverse.

Fermi's PowerFull GPU :

NVIDIA GeForce GTX 590

The GeForce GTX 590 was an enthusiast-class graphics card by NVIDIA, launched on March 24th, 2011. Built on the 40 nm process, and based on the GF110 graphics processor, in its GF110-351-A1 variant, the card supports DirectX 12. Even though it supports DirectX 12, the feature level is only 11_0, which can be problematic with newer DirectX 12 titles. The GF110 graphics processor is a large chip with a die area of 520 mm² and 3,000 million transistors. GeForce GTX 590 combines two graphics processors to increase performance. It features 512 shading units, 64 texture mapping units, and 48 ROPs, per GPU. NVIDIA has paired 3,072 MB GDDR5 memory with the GeForce GTX 590, which are connected using a 384-bit memory interface per GPU (each GPU manages 1,536 MB). The GPU is operating at a frequency of 608 MHz, memory is running at 854 MHz (3.4 Gbps effective).

Menu

Intel Tiger Lake Architecture

Tiger Lake Processors Line-up's

Features

CPU

GPU

IPU

I/O

Intel Iris Xe Graphics G7 96EUs

Intel Iris Xe G7 96EUs

Nvidia Kepler Architecture

Kepler Graphics Processors Line-up's

Highlighted :

Next Generation Streaming Multiprocessor (SMX)

Microsoft Direct3D Support

Hyper-Q

Shuffle Instructions

Dynamic Parallelism

Video decompression/compression

TXAA Support

GPU Boost

NVIDIA GPUDirect

Features

Intel Alder Lake-S Architecture

12Th Generation Processors Line-up's [ Intel Alder Lake-S Architecture ]

Highlighted :

Performance-cores :

Efficient-cores :

DDR5 Memory Details :

PCIe 5.0 :

Intel UHD Graphics 64EU :

General info

Technical specs

Memory :

API support :

Nvidia Ampere Architecture

Ampere Graphics Processors Line-up's

Highlighted :

Third-Generation Tensor Cores

Third-Generation NVLink

Second-Generation RT Cores

Architectural improvements of the Ampere architecture include the following:

Ampere PowerFull GPU :

Ampere Vs Turing Architecture

Nvidia Fermi Architecture

Fermi Graphics Processors Line-up's

Highlighted :

Fermi Chips :

Architecture :

• Third Generation Streaming Multiprocessor (SM)

• Second Generation Parallel Thread Execution ISA

• Improved Memory Subsystem

• NVIDIA GigaThreadTM Engine

More Details :

Optimized for OpenCL and DirectCompute

Fermi's PowerFull GPU :

NVIDIA GeForce GTX 590

Featured Post

What Is Cuda Core ( Nvidia )

Total Pageviews

Translate