Well I guess many of you have already read the articles on the web stating that RDRAM is not worth the money and that Intel and Rambus are actually joined at the hip trying to convince you, the consumer, to buy their faulty chipsets and poor-performing memory architecture. But let’s be fair here, why would Intel continue to support Rambus? To make more money? To some extent yes, but on the other hand these articles suggest that Intel owns a large share of Rambus. Well, you might be surprised to learn that Intel owns no shares of Rambus at this time. It has the right to exercise 1 million warrants if it meets pre-determined objectives, as stated in Rambus’ S-1 registration statement with the SEC when it filed to go public, but these have not been met yet.
Even more important, there are approximately over 22 million outstanding shares of Rambus stock now, so if Intel were to exercise these warrants today, it would own less than 5% of Rambus. So there has to be some other valid reason for Intel to continue supporting it, as it still regards this as the memory architecture of choice for upcoming CPUs and chipsets. But before we get down to the nitty gritty, let’s take a look at the situation at hand and some of the past occurrences that have led many websites to believe that Intel and Rambus were indeed plotting some evil scam and are out to get your money.
A Little History
The last couple of months Intel, AMD, VIA and Rambus have been trying hard to gain credibility. Intel’s i820 chipset, the only chipset officially supporting the 133 MHz front side bus, a requirement for operating the majority of its Coppermine processors, had its official introduction held back due to problems with memory architecture. VIA responded quickly by releasing its 693A chipset, with full support for the 133 MHz front side bus as well as some of the i820’s most compelling features. For the first time in years Intel was unable to actually support its new processors with an accompanying chipset.
However, soon after the introduction of the VIA 693A it became apparent that it was compatible with Intel’s 133 MHz CPUs, but not as fast as initially expected. Both in terms of AGP performance and memory throughput it even failed to surpass the aging BX chipset. Intel and VIA did not, however, stop work on their chipsets, and a few months later we saw the introduction of a somewhat limited i820 chipset and the VIA 694X. The VIA 694X was intended to do away with the drawbacks of the 693A and include many of the i820’s features like AGP 4X with fast writes and ATA-66, but obviously excluding the i820’s RDRAM memory architecture.
Whereas the i820 was promoted as the platform of choice for most desktop and office machines, its pricey RDRAM memory kept it from being a big success. VIA wisely positioned the 694X as a low-cost chipset with all of the i820’s features, and promoted it as a valid alternative to Intel’s offering.
From a consumer’s point of view things couldn’t have been better, as we now had the ability to choose between four different chipsets to run our shiny new Intel CPUs on: the Intel 440BX, the Intel i820, VIA’s 693A and VIA’s 694X. And due to stiff competition prices have been steadily dropping, making things even more complicated and interesting.
Intel has never before found itself fighting such a stiff battle on two fronts; it has to wage a price and delivery battle against AMD with its successful Athlon CPU as well as against VIA over dominance in the chipset market.
Many argue that Intel may find itself in second place on both counts, and may not win these new battles with AMD and VIA over the next few months. However, I tend to disagree, as neither have Intel’s track record of reliability and performance, its resources, or its customer support, considered one of the best in the industry. From a business point of view, you’d think twice about investing in computer systems with unproven stability, reliability and performance. But Intel may have lost some of its feathers by the time the dust finally clears, and might even lose some of its market share in areas where it dominated a year ago.
But competition usually brings out the best on both sides. And although Intel has a vast network of resources to tap from, a network unlikely to dry up just because of some new players in the field, judging from its track record and reputation, it still needs to work hard to keep up with what the competition is doing. With its resources, in any case, unless Intel sits still and waits for things to happen, which is highly unlikely, it’s not going to lose this battle anytime soon.
RDRAM vs. SDRAM
RDRAM promises a bandwidth twice that of PC100; well, true to some extent, but only valid if comparing PC800 RDRAM with PC100 SDRAM. PC800? PC100? Confusing to say the least, as that would suggest that PC800 is 8X the speed of PC100. Upon closer examination, RDRAM uses a 2 byte (16 bit) wide databus versus SDRAM’s 8byte (64 bit) wide databus.
Furthermore, the PC800 rating is a bit confusing, as PC800 RDRAM is actually a double-pumped module operating at 400 MHz clockspeed. Double-pumped simply means data is transferred to the RDRAM on both the rising and falling edge of the clock, often referred to as double data rate (DDR), creating an effective 800 MHz memory rating. PC100 SDRAM is referred to as single data rate (SDR) and operates at 100 MHz clockspeed; it can only transfer data on the rising edge of the clock, thus having an effective 100 MHz memory rating.
If we were to compare theoretical bandwidth without taking memory latency into consideration, we end up with the following:
PC800 RDRAM : 800 MHz x 2 Bytes = 1600 MB/s = 1.6 GB/s
PC100 SDRAM : 100 MHz x 8 Bytes = 800 MB/s = 0.8 GB/s
However, both the VIA chipset and 440BX support 133 MHz, or PC133, memory, though unofficially for the 440BX. If we look at the theoretical bandwidth of PC133 memory we end up with the following:
PC133 SDRAM : 133 MHz x 8 Bytes = 1064 MB/s = 1.064 GB/s
These numbers seem to offer performance above what we have seen in real world benchmarks. That is quite true, as these are theoretical numbers not taking the memory’s latency into account, which makes a world of difference. It is, however, a shame to see that these idealized numbers are used to promote one architecture’s superiority over another.
As stated, theoretical bandwidth cannot be used alone to measure memory architecture superiority. Memory latency imposes too much of a penalty on actual memory bandwidth, and is different for every architecture. Therefore to be able to really determine architectural superiority these latencies must be accounted for.
To start off with RDRAM, it is a memory architecture with a packet-based protocol, with the access latency depending on how far away from the memory controller that it resides. Although systems with multiple RDRAMs have slightly increased latencies compared to single-RDRAM systems, RDRAM latency is still somewhat comparable to that of SDRAM systems. However, the RDRAM protocol and architecture facilitates memory concurrency and minimizes latency compared to SDRAM memory systems when multiple memory references are being serviced simultaneously. The number of RDRAMs does not affect peak bandwidth, and an RDRAM-based memory system provides peak bandwidth twice that of PC100 SDRAM. The 1.6 GB/sec bandwidth of RDRAM is achieved with only a 16-bit data bus, and when combined with control signals the memory controller only needs about one third of I/O channels that SDRAM does.
SDRAM uses a different approach; it is a parallel databus, 64 bits wide, and adding modules to the system has no effect on memory latency. In addition to the 64-bit databus, the memory controller must drive a multiplexed row and column address to the SDRAMs along with control signals.
RDRAM vs. SDRAM Performance
SDRAM performance is actually measured with two metrics: bandwidth and latency. Surprisingly RDRAM does not only offer a higher bandwidth, but its latency has also been improved relative to SDRAM. What may be even more surprising is that PC133 SDRAM latency is worse than PC100 SDRAM.
How is component latency defined? The accepted definition of latency is the time between the moment the RAS (Row Address Strobe) is activated (ACT command sampled) to the moment the first data bit becomes valid. Synchronous device timing is always a multiple of the device clock period.
The fundamental latency of a DRAM is determined by the speed of the memory core. All SDRAMs use the same memory core technology, so all SDRAMs are subject to the same latency. Any differences in latency between SDRAM types are therefore only the result of the differences in the speed of their interfaces.
At the 400 MHz databus, the interface to a RDRAM operates with an extremely fine timing granularity of 1.25ns, resulting in a component latency of 38.75ns. The PC100 SDRAM interface runs with a coarse timing granularity of 10ns. Its interface timing matches the memory core timing very well, so that its component latency ends up being 40ns. The PC133 SDRAM interface, with its coarse timing granularity of 7.5ns, incurs a mismatch with the timing of the memory core that increases the component latency significantly, to 45ns.
The latency timing values can be computed easily from the device data sheets. For the PC100 and PC133 SDRAMs, the component latency is the sum of the tRCD and CL values. The RDRAM’s component latency is the sum of the tRCD and TCAC values, plus one half clock period for the data to become valid.
Although component latency is an important factor in system performance, system latency is even more important, since it is system latency that reduces overall performance. System latency is determined by adding external address and data delays to the component latency. For PCs, the system latency is measured as the time to return 32-bytes of data, also referred to as the ‘cache line fill’ data, to the CPU.
In a system, SDRAMs suffer from what is known as the two-cycle addressing problem. The address must be driven for two clock cycles (20ns at 100 MHz) in order to provide time for the signals to settle on the SDRAM’s highly loaded address bus. After the two-cycle address delay and the component delay, three more clocks are required to return the 32 bytes of data. The system latency of PC100 and PC133 SDRAM add five clocks to the component latency. The total SDRAM system latency is:
40 + (2 x 10) + (3 x 10) = 90ns for PC100 SDRAM
45 + (2 x 7.5) + (3 x 7.5) = 82.5ns for PC133 SDRAM
The superior electrical characteristics of a RDRAM eliminate the two-cycle addressing problem, requiring only 10ns to drive the address to the RDRAM. The 32 bytes of data are transferred back to the CPU at 1.6 GB/second, which works out to be 18.75ns. Adding in the component latency, the RDRAM system latency is:
38.75 + 10 + 18.75 = 67.5ns for PC800 RDRAM
Measured at either the component or system level, RDRAMs have the fastest latency. Surprisingly, due to the mismatch between its interface and core timing, the PC133 SDRAM latency is significantly higher than the PC100 SDRAM. The RDRAM’s low latency coupled with its 1.6 gigabyte per second bandwidth provides the highest possible sustained system performance.
From a performance point of view we must note that L1 and L2 cache hits and misses contribute greatly to memory architecture performance. Also, individual programs vary in memory use and so have different impacts on its performance. For example, a program that uses random database search using a large chunk of memory will ‘thrash’ the caches, and the memory architecture having the lowest latency will have the advantage. On the other hand, large sequential memory transfers with little requirement for CPU processing can easily saturate SDRAM bandwidth. RDRAM will have an advantage here with its higher bandwidth. For code that fits nicely within the L1/L2 caches, memory type will have virtually no impact at all.
Intel has chosen to implement support for the RDRAM memory architecture in its i820/i840 and upcoming chipsets. Most people in the industry think they’ve bet on the wrong horse, as the promised performance benefits don’t show up in most of today’s benchmarks. But let’s be fair here; a couple of years ago we still used EDO RAM and SDRAM was something new and pretty expensive. In hindsight we’ve seen the benefits of using SDRAM and its impact on overall system performance. But if we look at the performance benefits SDRAM offered in its early days on applications that were then popular, it also didn’t seem to offer huge advantages. Still, we’ve come a long way, and SDRAM performance has indeed improved although the technology at first didn’t seem to promise that much of an improvement.
However, due to the growing demand in memory bandwidth, the arrival of GHz CPUs and the ever-growing demands of today’s software, SDRAM seems to have run into bandwidth limitations. Whereas DDR SDRAM and VCDRAM might be able to hold off the introduction of a new memory standard for a while, it is inevitable. While DDR SDRAM might promise increased memory bandwidth, it will run into severe timing, latency and propagation delay problems due to the wide databus and ever increasing clockspeeds. Memory may then be cheap to produce, but motherboards will then need to have six or even eight PCB layers to be able to run these memory modules at such high data rates and clockspeeds, thus increasing motherboard costs substantially.
RDRAM is not perfect, but it is currently one of the most promising solutions to bandwidth, latency and propagation delay problems, and is scalable, a distinct advantage. It is expensive, but that’s partly because it’s new and the market has not caught on yet. Once more manufacturers start selling RDRAM and it becomes as commonplace as SDRAM now is, we will see its prices dropping, too. Due to the nature of the manufacturing process it will probably never be as affordable as SDRAM, but then again SDRAM doesn’t offer the same performance, which is what you’re actually paying for. Better technology usually comes at a price; you don’t expect your sub-$1000 PC to perform as well as a $5000 top-of-the-line model, do you?
Update Monday, May 8, 2000: In the follow up to this article you’ll find an in-depth analysis of Rambus’ RDRAM, what makes it tick, what the benefits are and above all we’ll look into its technological advantages and widespread misconceptions about them. We’ll round up this article with a broad range of real world benchmarks aimed at finding the advantages and disadvantages, checking the theory with practice. And naturally, the conclusion will summarize our findings and give a verdict on the SDRAM vs. RDRAM issue. The article can be found by following this link: