SDRAM vs. RDRAM, Facts and Fantasy

Our recent ‘Lies, Damned Lies and a Different Perspective’ article has raised quite a controversy. However, that article only skimmed the surface of the underlying issues, this article will get into greater detail about Rambus, SDRAM, RDRAM and the relationship between Intel and Rambus and why both these companies keep backing their technologies. If you haven’t read the original article yet, we’d really recommend you do so, as the foundations of the ongoing discussion are laid there. The article can be found by following this link:

Lies, Damned Lies and a Different Perspective

We’ll address most of the issues that generally seem to be considered ‘disadvantages’ of Rambus technology and investigate whether these are based on fact, or fantasy. At times we’ll go deeper into technical detail than some of you might be willing to follow, in order to prove a point or accurately describe a specific function. But overall we’ve tried to keep the technical details and analysis to a minimum and as basic and straightforward as possible. However, we’ve made sure that our benchmarking setup, the suite of benchmarks used, and our technical analysis is reproducible by anyone wishing to do so. We at HardwareCentral always aim to keep our benchmarks and reviews as objective as possible, and this SDRAM vs. RDRAM article is no exception.

In the following pages you’ll find an in-depth analysis of Rambus’ RDRAM, what makes it tick, what the benefits are and above all we’ll look into its technological advantages, as those will most likely determine whether RDRAM is worthwhile or not. The analysis will be followed by a discussion of benchmarks and why we’ve chosen to use this specific set, as well as an analysis on how each particular benchmark makes use of system resources.

We’ll round off this article with a broad range of real world benchmarks, aimed at exploring the technological advantages and disadvantages as pointed out in the theoretical analysis of the previous pages, checking theory with practice. And, naturally, the conclusion will summarize the results as well as give our verdict on SDRAM vs. RDRAM.

 

Rambus Direct RDRAM

Rambus’ latest offering, the Direct RDRAM, hereafter referred to as RDRAM, features an architecture and a protocol designed to achieve high effective bandwidth. The Rambus channel architecture has a single-device upgrade granularity, offering engineers the ability to balance performance requirements against system capacity and component count. The narrow, high-performance channel also offers performance and capacity scalability through the use of multiple channels in parallel. In addition, the validation program created by Intel and Rambus promotes system stability by ensuring that devices and modules conform to published specifications.

Although RDRAMs have a low pin count, a single device is capable of providing up to 1.6 GB/sec bandwidth. Memory systems that use RIMMs (RDRAM modules) employ a narrow, uniform-impedance transmission line, the Rambus Channel, to connect the memory controller to a set of RIMMs. Low pin count and uniform interconnection topology allow easy routing and reduction of pin count on the memory controller. While a single channel is capable of supplying 1.6 GB/sec of bandwidth, multiple channels can be used in parallel to increase this number. Systems that use, for example, the Intel 840 chipset have two parallel Rambus channels, and are able to handle up to 3.2 GB/sec.

Providing high bandwidth from a single device also allows memory systems to be constructed from small numbers of RDRAMs. The Sony PlayStation2 uses two RDRAM channels, each with a single RDRAM, to achieve a total of 3.2 GB/sec memory bandwidth.

In order to ensure stability of RDRAM memory systems, design guidelines and a validation program have been put in place that surpass requirements set for previous memory technologies. Intel and Rambus have defined system specs to ensure robustness of RDRAMs and of the channel to the memory controller. In addition, they have created a rigorous validation programs for certification of RDRAMs and RIMM modules.

Despite these features, there has been a considerable amount of misinformation regarding RDRAMs. The biggest misconceptions today allege high price premiums brought about by inherently higher manufacturing costs, lack of performance gain attributed to high latency, and high power consumption. Later I’ll delve deeper into these areas of contention. In the next section covers the features of RDRAMs in more detail, and then some of the misconceptions that surround the technology. This will be followed by some platform benchmarks, and finally a summary, and conclusions.

 

Conventional Memory Systems

The diagram below illustrates a conventional SDRAM memory system in a PC. The memory controller is connected to multiple DIMM sockets through a 64-bit wide memory bus. Address and control signals are connected to the DIMM modules using a different topology than for the data bus, resulting in some signals being loaded differently than others. Row and Column addresses are transmitted on a shared set of address lines, with the memory controller scheduling this resource when multiple transactions are being serviced.

Today, DIMM modules commonly consist of eight 8-bit SDRAMs connected in parallel to compose the 64-bit data bus. All eight SDRAMs, called a ‘bank’, operate in parallel performing the same sequence of commands (RAS, CAS, etc) required for reading and writing data. Some DIMMs have sixteen 8-bit SDRAMs, arranged as two banks with eight devices each. One bank populates each face of the module, but only one bank can transmit or receive data at a time. The minimum number of devices in the memory system and the minimum upgrade granularity both depend on the width of the SDRAMs being used. With 8-bit SDRAMs, a minimum of eight devices is required in the memory system to be able to meet the 64-bit databus requirement. This is also the minimum upgrade granularity of the memory system. With 16-bit SDRAMs, a minimum of four devices is required in the memory system, and is also the minimum upgrade granularity. The clock speed of SDRAMs available in PC main memory systems today has reached 133 MHz (with address and data transmitted on one edge of the clock).

 

 

A Rambus memory system differs from today\’s SDRAM-based memory systems in several ways. The memory controller is connected to the RIMM sockets through the Rambus Channel, a set of uniform-impedance transmission lines with a 16-bit data bus. The memory controller incorporates a narrow, high-speed interface called the Rambus ASIC Cell (or RAC) that communicates at high speeds across the channel to the RDRAMs. The RDRAMs also have a high-speed interface that allows them to send and receive data at the same speed as the RAC. The addition of the high-speed interface results in a small increase in die size, but also allows higher performance to be obtained. Address and data are routed in parallel on the Rambus Channel, resulting in uniform signal loading. In contrast to current SDRAMs, memory systems that use RDRAMs transmit addresses and data in a wave-pipelined manner on both edges of the clock.

 

 

The Rambus Channel is routed through the RIMM modules and is terminated on the motherboard after the last RIMM module. The Rambus channel requires that all sockets be populated with either a RIMM module or a C-RIMM so that the channel remains continuous to the termination resistors. RDRAMs are also 16-bits wide (some versions are 18-bits wide and use an 18-bit Rambus Channel), matching the width of the data path to the memory controller. Unlike SDRAM-based memory systems, in which multiple SDRAMs transmit in parallel across a 64-bit wide data bus, only one RDRAM transmits or receives data at a time across the 16-bit wide data bus. This means that RDRAM-based memory systems can be composed of a single RDRAM, and have a minimum upgrade granularity of a single device.

In addition, commands are passed along eight control wires routed in parallel to the data bus. The eight control wires are split into a Row control bus and a Column control bus. The Row control bus carries commands such as RAS and Refresh instructions, while the Column control bus carries commands such as CAS and write mask information. Splitting the control bus in this manner enhances transaction pipelining by allowing a RAS operation for one transaction to be specified at the same time as a CAS operation for a different transaction. Traditional technologies like SDRAMs require that Row and Column addresses be transmitted on the same set of address lines, resulting in a resource conflict when the memory system is placed under heavy load.

 

RDRAM Benefits

High Bandwidth

RDRAM memory systems use a high-speed bus and a low-swing signaling technology called RSL (Rambus Signaling Level) that allows data to be transferred at high speeds (up to 800 million samples/second) across each wire. The high-speed signaling is performed by dedicated interfaces on the memory controller and the RDRAM. These interfaces are not specific to computer memory systems, but can be placed on any type of chip to speed communication. In this way, Rambus’ signaling solution is really a general-purpose high-speed chip-to-chip interface rather than a solution for the DRAM market only. Recently, Rambus has announced design wins in another high bandwidth arena, computer networking.

Efficient Protocol

The high peak bandwidth of RDRAMs (1.6 GB/sec) is complemented by an efficient packet-based protocol and a high bank count that achieves high effective bandwidth. The benefits of the RDRAM protocol and architecture can be illustrated by comparing it to how data is accessed in memory systems that use conventional DRAMs like PC100 and PC133 SDRAMs. The following timing diagrams are for PC100 SDRAMs, but PC133 DRAMs are very similar.

The diagram illustrates two interesting points about SDRAM technology in the PC platform. The first is that in PC memory systems, unbuffered SDRAM DIMMs require ‘2-cycle addressing,’ in which Row and Column addresses occupy the address bus for two consecutive cycles. This is necessary due to the high capacitance of signal traces supporting multiple DIMM slots. Note that in some applications, such as graphics accelerators, where single devices are soldered down to the memory bus, 2-cycle addressing may not be needed. 2-cycle addressing is discussed in the following document.

Intel PC SDRAM Specification PDF

Another interesting point is that the location of data relative to read and write Column commands is different, resulting in a ‘bubble’ on the data bus whenever a write is followed by a read, or when a read is followed by a write.

Another source of performance loss is bank conflicts. Once a bank is accessed, there is a minimum amount of time that must pass before a different row of the same bank can be accessed. Bank conflicts result in a bubble that appears on the data bus, increasing latency and reducing bandwidth.

SDRAM Bank Conflicts

Although not shown, a further loss in performance occurs when back-to-back reads are serviced by different ranks of devices. In order to ensure that data is interpreted properly by the memory controller, a one-cycle bubble on the data bus must separate back-to-back data from different banks of SDRAMs, again increasing latency and reducing bandwidth.

The following diagram shows how commands and data are scheduled on the Rambus channel. The transactions each read and write 32 bytes of data. Notice that the placement of data relative to Column addresses is similar for reads and writes. This means that very little bandwidth is lost when the memory controller transitions from writing data to reading data, and no loss occurs when transitioning from reading data to writing data.

The address lines of the Rambus channel are separated into two groups over which protocol commands are transmitted to the RDRAMs. The first set of address lines is used to specify Row information, while the second set of lines is used to specify Column information. This enhances transaction pipelining by allowing the Row address for one transaction to be transmitted at the same time as the Column address for another transaction. Traditional technologies such as SDRAMs specify Row and Column address over a shared set of wires. This can inhibit transaction concurrency, as conflicts can arise when scheduling the Row address for one transaction and the Column address for a different transaction.

Because RDRAMs and SDRAMs use a similar core technology, their timing characteristics are similar, and the size of bubbles that arise due to bank conflicts are similar. However, bank conflicts will occur less often in RDRAM-based memory systems than in SDRAM-based memory systems due to the higher bank count of RDRAMs. Today, RDRAMs are shipping with 16 ‘doubled’ banks (neighboring banks cannot be accessed simultaneously, so an RDRAM can have up to 8 banks in use at a time), whereas SDRAMs used in DIMMs have 4 banks. The increased bank count of individual RDRAMs over SDRAMs is compounded at the system level by the differences in memory system architecture. Because eight SDRAMs (one rank) respond in unison to each read and write command, there are only 4 total banks for each rank of devices. In a memory system with a single-sided DIMM, there are only 4 banks. In a RIMM, each RDRAM acts independently (only one RDRAM responds to each read and write command), so for an equivalent-capacity RIMM (8 devices), there are 128 banks. The reason that this increase in bank count is so important is that large increases in latency and decreases in bandwidth are caused by bank conflicts. The larger bank count of RDRAM-based memory systems means that the probability of encountering a bank conflict is smaller when using RDRAMs versus SDRAMs. This can have a profound impact on overall performance, especially in PC memory systems that use small numbers of memory modules.

Another feature of RDRAMs is that they employ a different I/O driver than SDRAMs. The RDRAM I/O driver eliminates the need for a bubble on the data bus when back-to-back reads are directed to different devices, reducing latency and increasing bandwidth.

 

Reducing System Cost

The high per-pin and per-device bandwidths of RDRAMs have several system-level advantages. Providing high bandwidth from a small number of pins means fewer bus traces to route. In addition, the fact that the Rambus channel is routed in parallel means simpler routing and reduced complexity of motherboard designs. Often, simplified routing can result in fewer motherboard layers, reducing manufacturing costs as well.

The high per-device bandwidth of RDRAM, along with the narrow channel, allows memory systems to be composed of a small number of RDRAMs. For example, the Sony PlayStation2 uses just two Rambus channels in parallel, each with a single RDRAM, to achieve a peak memory bandwidth of 3.2 GB/sec. Using conventional memory technology, the same bandwidth requires using 32 8-bit PC100 SDRAMs in parallel. Two unwanted side effects of achieving bandwidth in this manner are that memory capacity and the number of DRAMs in the memory system increase. However, the PlayStation2 does not require high memory capacity, and the increased number of DRAMs increase overall system cost, motherboard size, and routing complexity. The high per-device bandwidth of RDRAMs allows critical system parameters like component cost, memory capacity, memory bandwidth, and motherboard size to be optimized to achieve high performance that is also cost-effective.

The Rambus channel design also has an advantage in minimum upgrade granularity. Conventional memory systems based on SDRAMs, discussed earlier, require that DIMMs have eight 8-bit (or four 16-bit) SDRAMs to span the 64-bit bus. However, because a single RDRAM spans all 16 bits of the data bus and can provide 1.6 GB/sec of bandwidth, RIMMs can have as few as one RDRAM. In fact, RIMMs can have any number of RDRAMs, up to the maximum 32 device capacity of one Rambus channel.

This granularity advantage allows upgrades of RDRAM-based memory systems to be tailored to different segments of the PC market. High-performance PCs have the highest capacity requirements, followed by mainstream desktops and finally value PCs. Price sensitivity is typically reversed for these three categories, with value PCs being the most cost-conscious, followed by mainstream desktops, and finally high-performance PCs. In traditional SDRAM memory systems, the minimum upgrade granularity is eight devices if 8-bit SDRAMs are used, or four devices if 16-bit SDRAMs are used, independent of the market segment addressed. Hence, the minimum upgrade cost is the same for all three market segments. RDRAMs offer a more cost-effective alternative. Since RIMMs can have anywhere from 1 to 32 RDRAMs on them, different RIMM capacities can be created to address different market segments.

As a result, performance PCs and desktops may choose to use RIMMs with 8 or 16 RDRAMs, while value PCs may choose to use four-device RIMMs. The single-device upgrade granularity inherent in the Rambus channel design allows OEMs and system manufacturers to tailor initial system capacity and minimum upgrade cost based on PC market segment and supply and demand constraints. In times of DRAM shortages, this can be especially important, as component costs can rise dramatically based on availability. For good examples of just how volatile DRAM prices can be, recall the dramatic rise in DRAM prices in the Fall of 1999 and the high DRAM prices of the mid 1990’s.

 

RDRAM Pricing

Price is undoubtedly the most controversial aspect of RDRAMs today. Many articles have been written citing the high price of RDRAM relative to SDRAM in the retail market. The high price is often attributed to factors including low yields, large die sizes, the cost of new equipment, and royalties charged by Rambus. However, several of these explanations have recently been called into question. An interesting point is that the price premium for RIMMs in the retail channel is higher than the premium for RIMMs purchased from the major OEMs. The Dell website, for example, allows prospective buyers to configure systems with varying amounts of RDRAM. The cost of upgrading a system to include an additional 128 MB RIMM module is much lower than for purchases made through retail channels, and has even been dropping recently. This calls into question the exact cause(s) of RDRAM price premiums. Central to this controversy is the difference between price (what a consumer pays for RIMMs) and cost (what it costs a manufacturer to produce RIMMs).

A recent article in PC World states that the price premium OEMs pay for RIMMs is much lower than the premiums in the retail channels, suggesting that the inherent costs of RDRAM are much lower than initially reported. While it is true that the high-speed interface on RDRAMs results in die sizes that are larger than comparable SDRAMs, the pricing information in this article indicates that yields are not unusually low, and that RDRAM dies are not dramatically larger. If either of these were true, then one would expect RDRAM manufacturing costs to be much, much higher. In turn, these costs would be passed on to the OEMs, who are the largest consumers of RDRAMs today.

The article also points out that development costs are being recouped, and that there are opportunity costs associated with producing RDRAMs instead of SDRAMs. This was true of the transition from EDO to SDRAM memory, and is true again today. Any new technology has lower yields than the incumbent technology, and has development costs that must be recovered, so one would expect some premium to be paid initially for RDRAMs. But this article suggests that the cost of manufacturing RDRAMs is not the reason for the RDRAM price premiums in the retail market.

 

As an aside, it has been rumored that RDRAMs are first tested when they are placed on RIMMs, and that low device yields mean that the entire RIMM must be thrown away if one bad device is found. This is not the case. RDRAMs are tested in much the same manner as SDRAMs, with functionality being verified before devices are assembled onto RIMMs. The high operating speed of RDRAMs, which exceeds the operating speeds of previous DRAMs, means that new high-speed testers are needed to test some functionality. While there is an investment for this new capital equipment (which must be recovered), it has the advantage of being able to test RDRAMs at full speed, meaning that test time per RDRAM is reduced and test throughput is increased. Agilent describes this advantage in a recent press release.

Another misconception is that high RDRAM prices are due to the royalty charged by Rambus. Analysts estimate the royalty Rambus receives to be about 2% for each RDRAM sold. This is very small compared to price premiums seen in the retail channels today, and is thus not a major contributor to these premiums.

Even with all these considerations, the pricing information in the PC World article indicates that the price premium in the retail channels is not at all representative of the inherent manufacturing costs of RDRAM versus SDRAM. Furthermore, if RDRAMs were very expensive to produce, then platforms like the PlayStation2 would be at a severe disadvantage, as a large fraction of the cost of the PlayStation2 would be devoted to the two RDRAMs. This is unreasonable given all the other features packed into the PlayStation2.

So if manufacturing costs and royalties are not the culprit, why are RIMM price premiums high compared to DIMMs in the retail market? Certainly there are some development costs to recoup, but the belief that RDRAM technology is inherently much more expensive than SDRAM technology is overblown. The real explanation is simply that demand is exceeding supply. When prices are negotiated, the major OEMs get the best prices because they sell the most RIMMs. In the retail channel, only small quantities of already scarce RIMMs are allocated to retailers, resulting in prices that are higher than OEM prices. A good example of the sensitivity of the DRAM market to supply and demand is the sudden jump in SDRAM prices in the Fall of 1999. Shortly after a set of earthquakes in Taiwan, SDRAM prices began to jump sharply due to demand exceeding supply. Over a short period of time, SDRAM prices rose dramatically to levels more than double their earlier price. There was nothing inherent in the manufacturing process that suddenly caused manufacturing costs to increase sharply. Rather, the price increase was brought about simply by a reduction in supply.

So what is the supply situation with RIMMs? At the beginning of this year, Samsung and Toshiba were the only volume manufacturers of RDRAMs. Toshiba’s were dedicated to the PlayStation2, and hence didn’t impact the PC market. Since the beginning of the year, RDRAMs and RIMMs from NEC, Hyundai, and Infineon have become validated, and all three are beginning high-volume production. The entire process of producing a DRAM takes several months, so it is expected that by the third quarter of 2000 RDRAM supply will increase sharply. In addition, Toshiba has announced that they will increase production and begin selling into the PC market. Finally, Samsung has announced that they are increasing production to meet increased demand. It is expected that the increased supply will feed a growing demand from OEMs, and that some will find its way into the retail channels as well, which should help to drop prices in both channels.

 

RDRAM Performance

Many independent reviewers have benchmarked RDRAM systems and compared them to SDRAM systems. Some benchmarks show little, if any, performance advantage for initial RDRAM-based systems. This is often attributed to a ‘high latency’ inherent to RDRAMs. But what is the latency for RDRAMs, and how does it compare to SDRAM latency?

The answer, not surprisingly, is that access latency depends on several factors, including system architecture and how busy the memory system is. The DRAMs themselves account for only a part of total latency. When the CPU requests data, the request is transmitted across the front side bus and is processed by the memory controller. The memory controller issues the commands necessary to retrieve data from the DRAMs, and the data is passed through the memory controller, across the front side bus, and finally back to the CPU. Differences in chipset architecture, for example, can cause differences in memory latency in systems that use the same DRAM technology.

At the component level, datasheets for both SDRAMs and RDRAMs indicate that access latencies are similar. This is not surprising, since both use similar DRAM cores. But the component-level timings in the datasheets are not the entire story. While most SDRAM datasheets show timings for single devices, when they are put onto DIMMs system considerations change the access latencies. SDRAM motherboards allow for multiple DIMM sockets, but with unbuffered DIMMs the address bus settling time requires that Row and Column addresses be held for two consecutive clock cycles, also known as ‘2-cycle addressing’ and mentioned above.

In addition, back-to-back reads to different ranks of devices (which can occur when a double-sided DIMM is used, or when multiple DIMMs are used) require a single cycle bubble on the data bus to ensure that data can be interpreted correctly at the memory controller. This increases latency and reduces bandwidth. Sample timing diagrams for SDRAMs were shown earlier.

In RDRAM-based memory systems, the memory bus can be multiple clock cycles in length, with the memory controller sending commands to the RDRAMs in a wave-pipelined manner. Addresses do not have to be held for consecutive clock cycles as in SDRAM-based PC memory systems, but the flight time of the address and data must be taken into account when determining memory latencies in RDRAM-based memory systems. Because the Rambus channel is routed through the RIMMs, the channel is longer when both RIMM slots are populated than if only the first RIMM slot is populated. From the memory controller’s point of view, managing the placement of addresses and data on the channel would be complex if different DRAMs have different access latencies. Instead, the Rambus channel is ‘levelized’ so that all RDRAMs have the same latency from the memory controller’s point of view. This is done during initialization by programming the devices closest to the memory controller with a delay that causes data to be returned to the memory controller with a timing that matches the devices furthest from it. Levelizing a fully populated Rambus channel adds a few bus clock cycles (2.5 ns clock cycles for PC800 RDRAMs) to access latency.

 

The I/O driver structure of RDRAMs is different than SDRAMs’, and RDRAMs do not require that back-to-back reads to different devices incur a gap on the memory bus. Sample timing diagrams for RDRAMs to illustrate that were shown earlier.

An important factor affecting latency is the utilization of the memory system. When it is servicing no other transactions, the latency for an ‘isolated’ read transaction is different than when other transactions are being concurrently processed. When 2-cycle addressing and levelization are factored in, the system latency for an isolated read is comparable for SDRAM-based and RDRAM-based memory systems.

However, when the memory system is under higher load the answer can be quite different. When one or more additional transactions are being serviced, factors such as bank conflicts and address bandwidth become important issues. The higher bank count of RIMMs versus DIMMs means that the probability of bank conflicts occurring is much lower. Therefore, the high latency and bandwidth penalties associated with bank conflicts occur far less often in RDRAM-based systems than in SDRAM-based systems. Furthermore, as illustrated in the previous timing diagrams, the need for 2-cycle addressing on an address bus used to specify both Row and Column addresses means that the address bus may not be available to start a subsequent transaction in SDRAM-based memory. During periods of higher memory utilization, when more than one request is sent to the memory controller, some memory requests may be delayed waiting for the address bus. For these reasons, under higher loads RDRAM-based memory can be much more efficient, achieving lower latency and higher bandwidth than SDRAM.

So why don’t RDRAM-based platforms substantially outperform SDRAM-based platforms on some of today’s benchmarks? The answer is that most benchmarks today do not utilize much memory bandwidth. Office applications and even most games are designed to run well on machines that are several generations behind the cutting edge in order to address the largest possible market. I.e., they are designed to run well on platforms with older CPUs and memory (166 MHz Pentiums, for example). The memory technology for Pentium processors was EDO, with less memory bandwidth than SDRAM or RDRAM. If such programs really required high levels of bandwidth from the memory system, they would run well on SDRAM-based and RDRAM-based platforms but poorly on EDO-based platforms, ruling out a large number of potential PC market sales. Benchmarks based on these types of programs thus show almost identical SDRAM and RDRAM performance. In fact, EDO-based systems would probably perform just as well on some of them. Even though memory bandwidth requirements are typically low in most programs today, memory requests are not evenly spaced in time. Rather, memory requests are typically ‘bursty’ in nature, tending to appear in groups followed by quiet periods with no requests. So although the average memory bandwidth required by a program may be low, during periods of bursty activity, bandwidth requested can be much higher. The high bank count and separate Row and Column address buses make RDRAMs much better suited to bursty communication than other memory technologies.

 

System Performance

Two technology trends should increase the gap in performance between systems that use SDRAM and RDRAM. The first is that faster, more advanced processors will increase the importance of memory systems in overall performance. For a given program, as CPUs get faster the amount of time the CPU takes to do its computation drops, but the time it spends idle waiting for data from the memory hierarchy stays about the same. Overall execution time (the sum of CPU busy time and CPU idle time) drops, but the fraction of time spent idle waiting for the memory hierarchy increases. In the future, as CPU clock speeds continue to rise much faster than memory system clock speeds, the aggressive use of techniques such as prefetching, predication, out-of-order execution, branch prediction, and speculative execution will require significantly higher memory bandwidths in order to better tolerate memory system latency.

A second important trend is that as hardware evolves, software inevitably evolves as well. Even office applications like Word and Excel stress CPUs and memory systems more today than previous versions did five years ago. Advances in CPU and memory technology allow incorporation of features like dynamic spelling and grammar checking, multimedia, and interactive graphic utilities. At the time of the transition from EDO to SDRAM, several studies showed almost no performance benefit for SDRAM-based systems on benchmarks prominent at the time. However, faster CPUs and the benefits of SDRAM technology allowed software to become more ambitious. Running today’s software on EDO-based machines and SDRAM-based machines produces more noticeable benchmarking differences than did applications designed four years ago. This will be true again as PC main memory transitions from SDRAM to RDRAM, allowing software writers to exploit new features that enhance productivity and allow products to distinguish themselves from previous versions and the offerings of competitors.

These trends (especially increasing processor speeds) point to the need for increased processor bandwidth. Today, as processor speeds reach 1 GHz, we are starting to see definite performance advantages for RDRAM-based platforms, as evidenced by this review from PC Magazine.

Of course, in some of the more bandwidth-intensive applications like AutoCAD and visualization benchmarks, RDRAM-based platforms outperform SDRAM-based platforms by wider margins at processor speeds below 1 GHz.

System performance can also be affected by many BIOS settings, and RDRAM-based motherboards are no exception. In fact, there are several such settings in the 820 chipset. One group that can have a dramatic impact on performance relate to ‘device pools.’ RDRAMs were designed to be used in many environments, including workstations, desktops, and portables. The portable market in particular is sensitive to the power consumption of all components, as it affects battery life. Desktops and workstations are less sensitive, however. RDRAMs have four power states that balance power consumption and access latency to meet the needs of all three of these markets. The lowest-latency access modes are called Active and Standby, while the lowest power-consuming modes are called Nap and Powerdown. Active and Standby consume more power than Nap and Powerdown, but the latter two states have higher access latencies.

RDRAM-based motherboards that use Intel’s 820 chipset allow RDRAMs to be split into multiple pools (A,B), with all devices in a pool being placed in similar power states. Pool A devices can be placed in Active or Standby, and Pool B devices can be placed into Standby or Nap. Many motherboards allow BIOS programmability of Pool B devices. In order to obtain the highest performance, the maximum number of devices should be placed in the Active state in Pool A, and Pool B devices should be set to ‘Standby.’

While allowing Pool B devices to be placed into Standby instead of Nap increases power consumption, it is a common misconception that that this will readily cause the RDRAMs to overheat, necessitating special cooling. But systems from Dell, for example, do not ship with any special RDRAM cooling, yet achieve some of the best benchmark scores reported to date by reviewers such as PC Magazine and Maximum PC. Some BIOSes default to placing Pool B devices into Nap instead of Standby, which unnecessarily hinders the performance of RDRAM-based platforms. In the next section, the power consumption of RDRAMs will be explored further.

 

RDRAM Power Consumption

Another contention is that RIMMs have high power consumption. A point worth repeating is that systems from major OEMs like Dell achieve some of the highest performance levels without special cooling. This contradicts the notion that RIMMs have high power dissipation. While it is true that an individual RDRAM can consume more power than an individual SDRAM, at a module level (DIMM versus RIMM), RIMM modules can consume much less power while providing higher bandwidth than equivalent-capacity SDRAM DIMM modules. The reason is straightforward, and relates to how bandwidth is provided by DIMM and RIMM modules. On a DIMM, eight SDRAMs respond to each request to provide the data requested by the CPU, so eight devices in the same rank are all dissipating the same amount of power. On an equivalent RIMM module with eight devices, only one RDRAM is providing data; the other seven devices are in lower-power states. On DIMM modules, power is thus evenly spread among all devices in the same rank. On RIMM modules, power can be localized in one RDRAM, with others at much lower power levels. At maximum module bandwidth the power dissipated by the eight SDRAMs exceeds the total power dissipated by the one active RDRAM and seven other RDRAMs in lower-power states. An additional difference is that SDRAMs dissipate I/O power whether transmitting 1’s or 0’s, while RDRAMs do so only when transmitting 1’s.

Micron Technology recently presented a power analysis at the Platform 2000 conference. In this analysis, they computed ‘maximum’ and ‘typical’ power consumption for modules using PC133 SDRAM, DDR SDRAM, and RDRAM. Maximum power consumption is important for determining power delivery constraints and worst-case cooling requirements. This study concludes that PC133 DIMMs dissipate a maximum power of 11.6 Watts while providing 1.1 GB/sec of bandwidth, while DDR DIMMs dissipate 9.1 Watts while providing 2.1 GB/sec. The maximum power RIMMs dissipate is 4.6 Watts, while providing 1.6 GB/sec. These results clearly indicate that RDRAM provides more Bandwidth per Watt than the alternatives. More details about Micron’s presentation can be found here:

Micron’s Memory Module Power Consumption Presentation

RIMM modules use an aluminum heatspreader to cover the RDRAMs, and some have argued this is evidence of the high power dissipation of RIMM modules. In reality, the heatspreader has two functions. The first, as its name implies, is to spread heat on the RIMM module when power is localized in one device across the entire surface of the module. The flat surface also indicates that the power dissipated by a RIMM module is not all that high, otherwise the heatspreader would need to look like more traditional heatsinks (i.e. lots of fins to maximize surface area for heat dissipation). The second function of the heatspreader is simply to provide mechanical protection for the RDRAMs during shipping and installation.

 

Benchmark Applications

There are many ways to conduct a benchmark, and there are even more ways to interpret its results. Choosing the wrong benchmark for a certain product can greatly influence the results. Most benchmarks used by manufacturers to demonstrate a specific product’s performance are tailored to make efficient use its features; they generally give an estimate how it performs in relation to others, but more importantly they show the maximum performance it’s capable of delivering.

And that’s exactly the sore spot, because ‘capable of delivering’ and ‘actually delivering’ are two entirely different things, not to be confused. Real world apllication is an entirely different story than a controlled benchmark environment. Even the fastest CPU/system can be crippled by either an ill-configured system or running of software not tailored to make use of the CPU or system’s features. Thus, to be able to determine a system’s real world performance simple synthetic benchmarks will not do; the benchmarks must simulate a real world environment to estimate real world performance.

Furthermore, if you know how to ‘benchmark’ a product, it is very easy to make one come out on top by focussing on the actual performance enhancing features of one, and neglecting those of the other. Just to give an example, Apple’s G4 CPU was boasted to offer ‘supercomputer performance on the desktop’, but after close examination of the benchmarks and SIMD optimizations used, it is very clear that it only does so under certain conditions. If another benchmark had been used which did not include the SIMD optimizations the outcome would have been entirely different.

So nothing new here, as the saying goes ‘Lies, damn lies and benchmarks.’ To do away with some of the obvious pitfalls, we’ve decided use as balanced a set of benchmarks for this article as possible. We’ll use synthetic benchmarks to provide an insight into a system’s maximum performance plus two real world benchmarks, one focussed on business and desktop applications, the other on multimedia and games, to measure real-world operation.

The synthetic benchmark is SiSoft Sandra 2000, which measures CPU, FPU, memory and multimedia performance. Bapco’s SYSmark 2000 will be used to determine business and desktop application performance; it actually consists of a whole slew of popular applications ranging from 3D modeling to word processing. And finally id Software’s Quake III Arena will measure multimedia and game performance, as it is the most demanding 3D game currently available, any misconfiguration or bottleneck and this game will bring performance to a grinding halt.

 

BAPco SYSmark 2000

Overview

Before we lay out the configurations of the systems used in benchmarking, let’s take a closer look at BAPco’s SYSmark 2000 and how exactly it works, and why we feel it is one of the most objective and comprehensive benchmarks available.

SYSmark 2000 is the latest release in the SYSmark family of benchmarking products designed and developed by the members of Business Applications Performance Corporation (BAPCo). In 1992, BAPCo developed the concept of application-based benchmarking using popular business software. Ever since the inception of SYSmark 92, others have been developed using the same concept. BAPCo’s SYSmark 2000 adheres to the concept by offering a new suite of application-based benchmarks consisting of today’s popular business applications. Particular emphasis has been given to Internet-related operations. It is a suite of twelve application-based tests to accurately evaluate and characterize the performance of a computer system. The advantage of using it lies in the weighting of its workloads, which reflect those of real applications. It allows comparisons between computer systems based on applications running on Windows 2000, NT 4.0, 98, and 95.

Benchmark Structure

SYSmark 2000 contains twelve application workloads and a ‘workload manager’ application responsible for setting them up, timing their execution, and reporting performance results. Each workload consists of a real application (for example, Adobe Photoshop) and a test script that sends commands to the application.

They are divided into two categories; the Office Productivity category contains CorelDRAW 9, Microsoft Excel 2000, Dragon Systems NaturallySpeaking Preferred 4.0, Netscape Communicator 4.61, Corel Paradox 9, Microsoft PowerPoint 2000, and Microsoft Word 2000. The Internet Content Creation category contains MetaCreations Bryce 4, Avid Elastic Reality 3.1, Adobe Photoshop 5.5, Adobe Premiere 5.1, and Microsoft Windows Media Encoder 4.0.

Rating Methodology

After SYSmark 2000 is run, it assigns the system a performance rating for each application, a rating for each category, and an overall rating. Application ratings are based on a comparison with a fixed calibration platform. A rating of 100 indicates the test system has performance equal to that of the calibration platform, a rating of 200 indicates twice the performance, etc. Each category rating is simply a geometric mean of the workload ratings in the category. The overall rating is a weighted geometric mean of the category ratings. The SYSmark 2000 calibration platform has the following configuration:

CPU : Intel Pentium III 450 MHz at 100 MHz FSB
Motherboard : Based on the Intel i440BX chipset
Memory : 128 MB PC100 SDRAM
Videocard : Diamond Viper V770 Ultra 32 MB
Harddisk : IBM DJNA 371800
Operating System : Windows 98 SE, typical installation

Office Productivity

Corel CorelDraw 9.0

This script first takes an abstract design and applies a Corel ‘Art Stroke’ to it and then runs various filter effects (spheroid, charcoal, etc). It then creates and manipulates a scene composed of vector graphics. Next, it takes a raster image and applies several effects (add noise, blur, etc). It also creates several 3D objects and performs various 3D manipulations. Finally, it collates several of the images it created and publishes them as a web site.

Corel Paradox 9.0

Paradox imports a large text file and creates a database table (tens of thousands of entries). It does some SQL-style queries on this table, and also runs a find duplicates query. Next, it continues to import several other text files, formatting them and then exporting each to HTML. Then, it opens up some query forms and enters some more data and produces reports based on queries. Each of these reports is exported to HTML.

Microsoft Word 2000

The Microsoft Word 2000 workload invokes a range of word processing functions including editing, spell checking, search and replace, font change, copy and paste text, print preview, merge mail fields, insert hyperlinks, background and table formatting and opening and viewing HTML pages.

Microsoft Excel 2000

Operations in the Excel workload include closing and opening spreadsheets, HTML pages and data in text files, spell checks, editing, formula calculation, plotting data in chart or histogram, formatting charts and cells, analyzing data in Pivot tables and naming a cell and inserting hyperlinks.

Microsoft PowerPoint 2000

Operations in the PowerPoint workload include closing and opening PowerPoint slides and HTML pages, spell checking, editing, formatting and moving pictures, applying templates, formatting tables in slides, inserting hyperlinks, applying header and footer information, formatting and rotating charts, applying graphic and sound effects and adding movie files.

Dragon NaturallySpeaking Preferred 4.0

Dragon NaturallySpeaking Preferred v4.0 is a continuous speech recognition application that converts speech into text. The script plays a pre-recorded wave file (a recorded speech) using Dragon’s PlayWave utility. The utility feeds this wave file into NaturallySpeaking which then converts the wave file to text.

Netscape Communicator 4.61

The Netscape Communicator script simulates a user loading, viewing and navigating common web pages. First, it opens up a web site of Shakespeare plays and selects and loads the entire texts several times. Then, the script loads a page consisting of large tables, thumbnails and images, and cycles through viewing the images. Next, the script loads many pages that contain mixes of graphics, tables, and text and does a lot of text searching through these web pages.

Internet Content Creation

Adobe Premiere 5.1

This script composes several pictures, video clips and audio clips into a movie. It creates an animation of approximately 16 BMP files and some AVI clips, and puts various transitions between them (like scrolling text, fade outs, etc). It also superimposes two audio tracks and runs the audio through several filters, such as reverb effects, etc. The final video is a compressed AVI file.

Adobe Photoshop 5.5

The script uses the following operations: load, resize, zooming out images, applying a number of filters to the images, changing the mode and color settings of images, adjusting image brightness and contrast and saving the image to a JPEG file that is optimized for web use.

Avid Elastic Reality 3.1

Avid Elastic Reality 3.1 is an image processing application used to create ‘morphs’ between images. The Elastic Reality workload sets up and renders a morph between two MPEG2-sized images (720×480 pixels, roughly 750 kB). The workload has three phases; image loading, morph definition, and morph rendering. The rendering phase takes the majority of the workload run time.

Metacreation’s Bryce 4

Bryce 4.0 is a ray tracing application where the user can create still and animated scenes, depicting real or fantasy objects and terrain. Those wire frame scenes are rendered, with realistic effects of light and shadow, as well as realistic interaction in animated scenes between the materials used in the objects and terrain. The Bryce 4.0 script opens an assembled wire frame scene and renders it to the final image. Once completed, the script opens a new image and saves it in Metastream video format, suitable for use as streaming video on the Internet. After saving in Metastream format, the script opens another image and saves it in HTML format, for use in Web pages.

Microsoft Windows Media Encoder 4.0

Media Encoder encodes audio and video content into an Advanced Streaming Format (ASF) stream. The output from Media Encoder is a stream of information that can be heard or viewed with Microsoft Windows Media Player, or sent to a server for multicasting, unicasting, or storage. The input file is AVI clip which is encoded using MPEG-4 Video Codec.

 

Benchmark Setup

We’ll be using an Intel 440BX chipset platform, a VIA 694X Apollo Pro 133A and a Intel i820 chipset platform respectively to determine the performance of both the SDRAMs and RDRAMs. All platforms will be using the exact same hardware configuration and hardware drivers, except for the drivers needed by the chipset or busmaster/IDE controller. All platforms will run at their rated, official clockspeed, which means that the Intel 440BX chipset will run at a 100 MHz FSB, the VIA 694X Apollo Pro 133A at 133 MHz FSB and the Intel i820 at a 133 MHz FSB respectively. Furthermore, for the Intel 440BX chipset the PC133 SDRAM will be run at 2-2-2 timing at 100 MHz FSB, the PC133 SDRAM at 3-2-2 timing at 133 MHz FSB and the PC800 RDRAM will run at 400 MHz with ECC enabled.

System Configuration Intel 440BX

CPU : Intel Pentium III ES at 600 or 800 MHz with a 100 MHz FSB
Motherboard : Soyo SY-6BA+IV Intel i440BX BIOS v2BA1
Memory : Samsung 128 MB GA PC133 SDRAM CAS3
Videocard : Elsa ERAZOR X2 GeForce DDR 32 MB BIOS v7.05.00
Harddisk : Quantum Fireball Plus KA 13.6GB ATA-66
CDROM : Samsung SCR-3231 32X IDE-ATAPI
Floppy : Generic 1.44 MB
Case : Generic Midi-ATX
Powersupply : Enlight 250W AMD-Approved

Operating System : Windows98 SE, typical installation
DirectX Drivers : DirectX 7.0
Drivers Video: nVidia Detonator v3.68
Drivers ATA : HighPoint Technology HPT366 v1.22
Drivers Chipset : Intel Chipset Inf Update v2.2

System Configuration VIA 694X Apollo Pro 133A

CPU : Intel Pentium III ES at 600 or 800 MHz with a 133 MHz FSB
Motherboard : Soyo SY-6VCA VIA 694X Apollo Pro 133A BIOS v2AP2
Memory : Samsung 128 MB GA PC133 SDRAM CAS3
Videocard : Elsa ERAZOR X2 GeForce DDR 32 MB BIOS v7.05.00
Harddisk : Quantum Fireball Plus KA 13.6GB ATA-66
CDROM : Samsung SCR-3231 32X IDE-ATAPI
Floppy : Generic 1.44 MB
Case : Generic Midi-ATX
Powersupply : Enlight 250W AMD-Approved

Operating System : Windows98 SE, typical installation
DirectX Drivers : DirectX 7.0
Drivers Video: nVidia Detonator v3.68
Drivers ATA : VIA 4-in-1 v4.20
Drivers AGP : VIA 4-in-1 v4.20 in Turbo mode
Drivers Chipset : VIA 4-in-1 v4.20

System Configuration Intel i820

CPU : Intel Pentium III ES at 600 or 800 MHz with a 133 MHz FSB
Motherboard : Asus P3C-E Intel i820 BIOS v1008
Memory : Samsung 128 MB PC800 ECC RDRAM
Videocard : Elsa ERAZOR X2 GeForce DDR 32 MB BIOS v7.05.00
Harddisk : Quantum Fireball Plus KA 13.6GB ATA-66
CDROM : Samsung SCR-3231 32X IDE-ATAPI
Floppy : Generic 1.44 MB
Case : Generic Midi-ATX
Powersupply : Enlight 250W AMD-Approved

Operating System : Windows98 SE, typical installation
DirectX Drivers : DirectX 7.0
Drivers Video: nVidia Detonator v3.68
Drivers ATA : Intel ATA v5.01
Drivers Chipset : Intel Chipset Inf Update v2.2

The display resolution for all benchmarks was set to 1024x768x16 unless stated otherwise. The software used to test all combinations consists of the following:

Benchmark Software Configuration

Applications/business : Bapco SYSmark 2000 v1.0
CPU/FPU : SiSoft Sandra 2000 v2000.3.6.4
Multimedia/FPU : SiSoft Sandra 2000 v2000.3.6.4
Gaming/Multimedia : id Software Quake III Arena v1.11

Quake III Arena Graphics Settings :

Vsync : Disabled
GL Driver : Default
GL Extensions : On
Video Mode : 640×480 or 1024×768
Color Depth : 32-bit
Fullscreen : On
Lighting : Lightmap
Geometric Detail : High
Texture Detail : Maximum
Texture Quality : 32-bit
Texture Filter : Trilinear

Quake III Console Commands :

Timedemo 1
Demo001
Demo002

 

Benchmark Results Intel 440BX

Intel Pentium III 800MHz (100×8)

Quake III 640×480 : demo001/demo002 : 107.0/105.7 FPS
Quake III 1024×768 : demo001/demo002 : 53.4/55.8 FPS
SYSmark 2000 : 161
Sandra 2000 CPU/FPU : 2170/1077
Sandra 2000 MM/FPU : 2527/3364
Sandra 2000 Memory : 316/350

Sandra 2000 Memory Intel 440BX

Sandra 2000 Memory Intel 440BX

SYSmark 2000 Pentium III 800EB

SYSmark 2000 Pentium III 800EB

Intel Pentium III 600MHz (100×6)

Quake III 640×480 : demo001/demo002 : 97.2/93.9 FPS
Quake III 1024×768 : demo001/demo002 : 53.2/55.8 FPS
SYSmark 2000 : 134
Sandra 2000 CPU/FPU : 1629/807
Sandra 2000 MM/FPU : 1894/2522
Sandra 2000 Memory : 316/350

SYSmark 2000 Pentium III 600EB

SYSmark 2000 Pentium III 600EB

 

Benchmark Results VIA 694X Apollo Pro 133A

Intel Pentium III 800MHz (100×8)

Quake III 640×480 : demo001/demo002 : 109.6/108.5 FPS
Quake III 1024×768 : demo001/demo002 : 53.8/56.2 FPS
SYSmark 2000 : 158
Sandra 2000 CPU/FPU : 2155/1069
Sandra 2000 MM/FPU : 2508/3340
Sandra 2000 Memory : 303/346

Sandra 2000 Memory VIA 694X Apollo Pro 133A

Sandra 2000 Memory VIA 694X Apollo Pro 133A

SYSmark 2000 Pentium III 800EB

SYSmark 2000 Pentium III 800EB

Intel Pentium III 600MHz (100×6)

Quake III 640×480 : demo001/demo002 : 98.3/94.8 FPS
Quake III 1024×768 : demo001/demo002 : 53.7/56.2 FPS
SYSmark 2000 : 132
Sandra 2000 CPU/FPU : 1615/801
Sandra 2000 MM/FPU : 1881/2499
Sandra 2000 Memory : 303/346

SYSmark 2000 Pentium III 600EB

SYSmark 2000 Pentium III 600EB

 

Benchmark Results Intel i820

Intel Pentium III 800MHz (100×8)

Quake III 640×480 : demo001/demo002 : 106.6/105.6 FPS
Quake III 1024×768 : demo001/demo002 : 48.7/48.6 FPS
SYSmark 2000 : 168
Sandra 2000 CPU/FPU : 2174/1078
Sandra 2000 MM/FPU : 2532/3371
Sandra 2000 Memory : 395/495

Sandra 2000 Memory Intel i820

Sandra 2000 Memory Intel i820

SYSmark 2000 Pentium III 800EB

SYSmark 2000 Pentium III 800EB

Intel Pentium III 600MHz (100×6)

Quake III 640×480 : demo001/demo002 : 98.3/96.0 FPS
Quake III 1024×768 : demo001/demo002 : 48.8/48.4 FPS
SYSmark 2000 : 139
Sandra 2000 CPU/FPU : 1898/2527
Sandra 2000 MM/FPU : 1630/808
Sandra 2000 Memory : 395/495

SYSmark 2000 Pentium III 600EB

SYSmark 2000 Pentium III 600EB

 

Benchmark Evaluation

Id Software Quake III Arena

Inspecting at the Quake III Arena benchmarks, we can clearly see that at 1024×768 the GeForce DDR is fill-rate limited, as all motherboards are in the 50 FPS range. In the 640×480 resolution, however, we can see quite a few differences. What is actually suprising to see is that the VIA 694X Apollo Pro 133A chipset has the highest scores in the 800 MHz benchmarks, but the Intel i820 has the highest scores at 600 MHz. This would indicate that the Intel i820 chipset has less CPU overhead than the other chipsets, which should have shown up in the 800 MHz scores, too, but doesn’t. Also note the 1024×768 scores for the Intel i820 chipset; if there was less CPU overhead these would have been much higher, as the i820 now is consistently last in that resolution, at any clockspeed.

If we however take memory throughput into consideration, it is strange that a chipset with the lowest memory throughput of all, the VIA 694X Apollo Pro 133A, would best the two others. The Intel 440BX chipset is obviously operating at AGP 2X as it has no support for AGP 4X, but we’d expected better from the i820’s AGP 4X implementation. By the looks of it, VIA’s AGP 4X implementation requires less CPU overhead and is more efficient at higher clockspeeds. An explanation could be that the drivers VIA supplies with this chipset were installed using the ‘Turbo Mode’, which could be the reason for the AGP 4X performing so well, as it is optimized through software.

Quake III Arena 800 MHz Benchmark Scores

Red=demo001 FPS, Green=demo002 FPS

Quake III Arena 600 MHz Benchmark Scores

Red=demo001 FPS, Green=demo002 FPS

BAPco SYSmark 2000

The SYSmark 2000 scores, however, show an entirely different picture. The Intel i820 chipset comes out on top in both the 800 and 600 MHz benchmarks, clearly taking the lead. As mentioned, the SYSmark 2000 benchmark is a rather objective and comprehensive indication of real-world performance, and thus the Intel i820 chipset’s scores have to be attributed to its high-bandwidth RDRAM memory and fast ATA-66 implementation. The VIA 694X Apollo Pro chipset is severely limited by memory bandwidth; even though the memory is running at 133 MHz with a 3-2-2 timing it isn’t even able to beat the Intel 440BX’s 100 MHz memory throughput at 2-2-2 timing, and consistently came in last.

SYSmark 2000 Benchmark Scores

SiSoft Sandra 2000

SiSoft Sandra 2000 provided us with the synthetic Dhry- and Whetstone as well as memory and multimedia scores and those actually showed that each chipset was within a few points of each other in raw CPU/FPU performance; not much of a surprise, and no need to further investigate those scores. However, the memory throughput showed a different picture. Although the benchmark isn’t designed to make efficient use of memory architecture, and the scores don’t fairly represent it, this is still a good measure of raw memory throughput. In this benchmark the RDRAM really showed its muscle, besting both the VIA 694X Apollo Pro 133A chipset and the Intel 440BX.

Sandra 2000 Memory Benchmark Scores

Red=memory CPU MB/s, Green=memory FPU MB/s

 

Conclusion

Many have doubted Intel’s decision to keep supporting Rambus Direct RDRAM in both its current and upcoming chipsets. Early benchmarks of the Intel i820 chipset platform did not show much of an improvement over existing SDRAM technology. However, our benchmarks with Asus’ latest offering show beyond shadow of a doubt that overall it does perform better than any other chipset currently on the market.

We’ve gone to great lengths to make sure this article is well researched and an accurate, comprehensive and objective representation of current memory technologies, describing their advantages and disadvantages, and addressing any misconceptions. Our benchmark configurations and software used are documented in such a way that the results can be easily reproduced by anyone wishing to verify our findings. We feel that this article has more than adequately discussed most misconceptions about Rambus, SDRAM, RDRAM and why both Intel and Rambus keep backing the technology. Our benchmarks clearly show the performance edge the Intel i820 chipset equipped with Rambus memory has over the other chipsets.

In summary, we’re confident that Rambus will continue work on improving the performance of its technology and lowering the price of RDRAMs. Due to the growing demand in memory bandwidth, the arrival of GHz CPUs, Gigabit Networking, real time audio/video and the ever-growing demands of today’s software, we’ll soon run into memory bandwidth problems. RDRAM is not perfect, but it is currently one of the most promising solutions to bandwidth, latency and propagation delay problems, and it is scalable, a distinct advantage. It still is expensive, but that’s partly because it’s new and the market has not caught on yet. Once more manufacturers start producing RDRAM and it becomes as readily available as SDRAM now is, we will see its prices dropping, too.

We think it is about time to step away from the endless price/performance discussion and open our eyes to the potential Rambus Direct RDRAM has to offer. We’re not saying you shouldn’t keep an eye on good value, but due to all the negative press Rambus and Intel have gotten for adapting this new memory architecture, the focus hasn’t been on the performance potential of Rambus, but on the price and supposedly poorer performance of Rambus modules.

Given the technological advantages Rambus Direct RDRAM offers over current and upcoming memory technologies and its scalability we’re confident that once prices start dropping and the technology becomes more commonplace we’ll value its performance, bandwidth, robustness, and above all its scalability.

 

Leave a Comment