Fermi is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia, first released to retail in April 2010, as the successor to the Tesla microarchitecture. It was the primary microarchitecture used in the GeForce 400 series and GeForce 500 series. It was followed by Kepler, and used alongside Kepler in the GeForce 600 series, GeForce 700 series, and GeForce 800 series, in the latter two only in mobile GPUs. In the workstation market, Fermi found use in the Quadro x000 series, Quadro NVS models, as well as in Nvidia Tesla computing modules. All desktop Fermi GPUs were manufactured in 40nm, mobile Fermi GPUs in 40nm and 28nm. Fermi is the oldest microarchitecture from NVIDIA that received support for the Microsoft's rendering API Direct3D 12 feature_level 11.
Fermi Graphics Processors Line-up's
Highlighted :
Fermi Graphic Processing Units (GPUs) feature 3.0 billion transistors and a schematic is sketched in Fig. 1
.
- Streaming Multiprocessor (SM): composed of 32 CUDA cores (see Streaming Multiprocessor and CUDA core sections).
- GigaThread global scheduler: distributes thread blocks to SM thread schedulers and manages the context switches between threads during execution (see Warp Scheduling section).
- Host interface: connects the GPU to the CPU via a PCI-Express v2 bus (peak transfer rate of 8GB/s).
- DRAM: supported up to 6GB of GDDR5 DRAM memory thanks to the 64-bit addressing capability (see Memory Architecture section).
- Clock frequency: 1.5 GHz (not released by NVIDIA, but estimated by Insight 64).
- Peak performance: 1.5 TFlops.
- Global memory clock: 2 GHz.
- DRAM bandwidth: 192GB/s.
Fermi Chips :
- GF 100
- GF 104
- GF 106
- GF 108
- GF 110
- GF 114
- GF 116
- GF 118
- GF 119
- GF 117
Architecture :
With these requests in mind, the Fermi team designed a processor that greatly increases raw
compute horsepower, and through architectural innovations, also offers dramatically increased
programmability and compute efficiency. The key architectural highlights of Fermi are:
• Third Generation Streaming Multiprocessor (SM)
o 32 CUDA cores per SM, 4x over GT200
o 8x the peak double precision floating point performance over GT200
o Dual Warp Scheduler simultaneously schedules and dispatches instructions
from two independent warps
o 64 KB of RAM with a configurable partitioning of shared memory and L1 cache
• Second Generation Parallel Thread Execution ISA
o Unified Address Space with Full C++ Support
o Optimized for OpenCL and DirectCompute
o Full IEEE 754-2008 32-bit and 64-bit precision
o Full 32-bit integer path with 64-bit extensions
o Memory access instructions to support transition to 64-bit addressing
o Improved Performance through Predication
• Improved Memory Subsystem
o NVIDIA Parallel DataCacheTM hierarchy with Configurable L1 and Unified L2
Caches
o First GPU with ECC memory support
o Greatly improved atomic memory operation performance
• NVIDIA GigaThreadTM Engine
o 10x faster application context switching
o Concurrent kernel execution
o Out of Order thread block execution
o Dual overlapped memory transfer engines
More Details :
Optimized for OpenCL and DirectCompute
OpenCL and DirectCompute are closely related to the CUDA programming model, sharing the
key abstractions of threads, thread blocks, grids of thread blocks, barrier synchronization, perblock shared memory, global memory, and atomic operations. Fermi, a third-generation CUDA
architecture, is by nature well-optimized for these APIs. In addition, Fermi offers hardware
support for OpenCL and DirectCompute surface instructions with format conversion, allowing
graphics and compute programs to easily operate on the same data. The PTX 2.0 ISA also
adds support for the DirectCompute instructions population count, append, and bit-reverse.
Fermi's PowerFull GPU :
The GeForce GTX 590 was an enthusiast-class graphics card by NVIDIA, launched on March 24th, 2011. Built on the 40 nm process, and based on the GF110 graphics processor, in its GF110-351-A1 variant, the card supports DirectX 12. Even though it supports DirectX 12, the feature level is only 11_0, which can be problematic with newer DirectX 12 titles. The GF110 graphics processor is a large chip with a die area of 520 mm² and 3,000 million transistors. GeForce GTX 590 combines two graphics processors to increase performance. It features 512 shading units, 64 texture mapping units, and 48 ROPs, per GPU. NVIDIA has paired 3,072 MB GDDR5 memory with the GeForce GTX 590, which are connected using a 384-bit memory interface per GPU (each GPU manages 1,536 MB). The GPU is operating at a frequency of 608 MHz, memory is running at 854 MHz (3.4 Gbps effective).
No comments:
Post a Comment