
NVIDIA GeForce3 Technology Preview
Lossless Z Compression / Z-Occlusion Culling / High-Resolution Antialiasing (HRAA)February 27, 2001
By Vince Freeman
Lossless Z Compression
When processing complex scenes, especially in 32-bit, the Z-Buffer can take up sizeable amounts of overall memory bandwidth. The Z-Buffer is an integral part of any 3D process, and essentially determines the depth and placement of pixels in a rendered screen. To help minimize the bandwidth demands of the Z-Buffer, NVIDIA has implemented a lossless form of data compression that can allow up to a 4:1 compression of Z-Buffer memory transfers. Z Compression will lower overall bandwidth requirements, and NVIDIA promises no loss in overall image quality or Z depth accuracy.
Z-Occlusion Culling
Those familiar with the PowerVR and RADEON video cards will be well acquainted with this form of advanced rendering. It has been called Point of Sight, Overdraw, Hidden Surface Removal or any number of different terms, but it relates to the removal of pixels that will not be displayed, or are "occluded" by other objects. While this makes sense, the trick is to determine which pixels will be unseen in the displayed frame, and then make sure they are not textured, shaded or processed in any way.
The GeForce3's Z-Occlusion Culling unit performs this task, and confirms with the Z-Buffer if each pixel will or will not be displayed. If it is determined that the pixel is occluded, then it will not even be sent to the GPU for processing, and no additional framebuffer memory will be used to store it. By predetermining which pixels will be visible and which occluded, there are potentially huge savings in overall memory bandwidth demands.
In addition to the basic Z-Occlusion Culling, NVIDIA has also left the door open to greater returns through an "Occlusion Query". In this case, the game developer would format a query to the GPU to test entire objects or regions for visibility. If this query comes back negative, then the entire object would not even be included in the rendering or T&L process. In this scenario, the bandwidth and processing savings would be much higher when compared to a per-pixel method of determining occlusion.
Possibly the most attractive aspect of these memory bandwidth saving technologies is that the GeForce3 enjoys an ample 7.36GB/sec memory bandwidth to begin with. While other Overdraw solutions have been employed to more effectively utilize a card's low memory bandwidth, the GeForce3 utilizes Z-Occlusion Culling in addition to its extremely robust memory bandwidth. Since increased memory bandwidth was the single most important performance factor for the GeForce2, these new Z-Buffer compression and occlusion technologies have tremendous performance potential.
High-Resolution Antialiasing (HRAA)
Ever since 3dfx blew off the barn doors with their FSAA (Full Scene Anti-Aliasing) it has become an important feature for many users. While the established NVIDIA supersampling method of antialiasing (AA) produces acceptable image quality, the performance impact has a noticeable effect on framerates. Much like 3dfx, NVIDIA determined the best way to get around the performance issue is to incorporate the antialiasing process in the hardware itself.
This new technology is called High-Resolution Antialiasing, or HRAA for short. NVIDIA's method for dealing with the extra pixels generated in HRAA is to use the wider data paths of the GeForce3 and then apply the same texture maps to all pixels. This technically does not incur a processing penalty, and even the extra pixel memory is reduced due to the use of identical texture maps.
Not only has HRAA hardware support been added, but NVIDIA has added a new antialiasing pattern as well. In addition to the standard 2X and 4X modes, the GeForce3 also includes the Quincunx pattern, which gets its name from the five dot design on the "5" side of a 6-sided die. The key to the Quincunx pattern is that NVIDIA has hardcoded this into the GeForce3 and it can supply 2X AA framerates while approaching the image quality of 4X AA. It also lowers the antialiasing overhead, since the Quincunx pattern actually takes samples from neighboring pixels rather than creating new samples for each pixel.
This is very similar to the sampling requirements of 2X AA, but Quincunx AA uses these samples quite differently. It actually looks like a very efficient technology, since the sample data remains similar to 2X AA, but Quincunx AA increases the input data to generate higher quality AA images. This ensures that Quincunx AA uses the same amount of sample memory of 2X AA, but processes more actual pixel data. Although 4X AA should result in higher quality AA images, Quincunx AA seems to offer an excellent tradeoff between speed and image quality.

Speaking of performance, the data NVIDIA supplied looks extremely promising. NVIDIA's stated goal is to achieve a minimum of 60 fps for HRAA and this seems to have been accomplished in both 2X and Quincunx AA modes. The GeForce3 also shows noticeable increases in AA performance when compared to previous NVIDIA cards. Comparing Quake 3 4X AA on both the GeForce3 and GeForce2 Ultra shows a 40-50% framerate increase when using GeForce3.
| Previous: « Lightspeed Memory Architecture / Higher Order Surfaces / Crossbar Memory Controller | Next: DirectX 8 / Developer Support / The GeForce3 GPU / First on Mac / Conclusion » |
|
Add hardwarecentral.com to your favorites
|

