Even with fast writes and everything, let's do some simple maths. That 1.6 gigatexel figure is based on the assumption that each of those 4 units can actually be kept going on and on and on, like the energizer bunny, and doing two textures per clock.
Now let's say we're running in 32 bit colour.
A) Let's take a worst case scenario first, where everything is drawn back to front. Two textures per unit mean two memory reads to get that pixel. Plus one memory read from the Z-Buffer, plus one memory write to draw the pixel, plus a memory write to update the Z-Buffer. (If you also have transparency/translucency effects, add one read to get the old pixel. But let's assume we have no transparent textures.) That's five memory operations per cycle, and per texturing unit. Now multiply this by 4 units, and you get 20 memory accesses per clock. Times 32 bits, that's 640 bits moved per clock.
640 bits per clock on an 128 bit bus? Dream on. But real life memory will have at least 6 cycle penalty for a page miss, and that'll happen a lot for reading the textures. And the memory writes aren't that fast, either.
B) Even in a best possible case scenario, things aren't looking that much better. As in: all the drawing is done front to back and there are no transparencies, and you're looking straight at a wall, so everything will get obstructed right away. (And assuming the card or game is actually smart enough to optimize drawing for this situation. That remains to be seen.) It's not quite the typical situation in a game, unless your only purpose in life is to view walls up close, but let's pretend it happens. And it only happens after the first layer of pixels have been drawn, as per the previous scenario. But even so, it's still at least one read per texturing unit for the Z-Buffer. Now it's 4 operations times 32 bits, and that's 128 bits moved per clock. On an 128 bit bus. It fits quite nicely, but you'd need some very ideal memory to actually get it. As in: can do one operation per GPU clock, and again that just doesn't exist.
C) So far I've been assuming that all the textures and triangles are in the card's memory, and _nothing_ needs to be transferred on the AGP bus. I.e., not only with fast writes, but even with divine intervention on the AGP bus, it'll still fall a lot shorter than that advertised number.
D) Note that the above calculations have already been _very_ generous. E.g., I've been blissfully ignoring bus usage issues. To actually get that number of bits per clock, the 128 bit bus would have to be able to act like 4 independent 32 bit busses, with separate address and control lines. Furthermore, the memory would have to be quad-ported so reads from the same chip don't wait for each other. In practice, the situation would be a lot less nice.
E) I've also been ignoring the fact that the screen refresh itself needs to read the memory, too. At least at high resolutions and high refresh rates, this can eat some of the memory bandwidth, too. At, say 1280x1024 by 32 bits colour, and with a 75 Hz refresh rate, that's 384 megabytes per second eaten just by that. It's not much, compared to DDR bandwidth, but it's there.
Briefly: the GeForce 2 can't possibly achive that advertised gigapixel, and that's it. It'll exist just in the marketing hype, not in your computer. So wth, go buy it. We really need to support falsehood in advertising, you know
------------------
Moraelin -- the proud member of the Idiots' Guild
[This message has been edited by Moraelin (edited 04-27-2000).]