A free service rounding up the week's news, articles, tips and reviews.







AMD Revs Up a Bulldozer

Rounding Up To the Nearest Core?



September 7, 2010
By Vince Freeman

AMD recently outlined its upcoming 32-nanometer SOI process "Bulldozer" architecture, and this is turning out to be not just another CPU release. The Bulldozer design represents an entirely new direction for AMD, moving from the old-school multicore format of the Phenom II toward a more efficient modular architecture.

AMD has committed to three Bulldozer versions dubbed Zambezi (8-core client), Valencia (8-core server) and Interlagos (16-core server). The Bulldozer architecture is slated to be sampling this year and officially released sometime in 2011.

AMD Goes Modular

AMD has chosen to design its latest CPU line around a modular architecture, combining the basic hardware of two conventional processor cores into a single module. This allows AMD to share certain resources between the two cores, and lose some of the duplication or redundancy present in multicore processors. Bulldozer can do more using less silicon, with the ability to create an 8- or 16-core processor on a smaller die, along with a lower power draw.

Each Bulldozer module contains two integer processing cores and one floating-point unit with two 128-bit pipes, the latter of which can be combined into a single 256-bit pipe. Each core has 16K of Level 1 data cache, with 64K of shared L1 instruction cache and an L2 cache shared within the module, supplemented with 8 to 16MB of shared L3. So far, the exact L2 capacity has yet to be announced, though it is rumored to be 2MB per module.

Since integer calculations make up the majority of PC workloads, AMD is basing their core counts on the number of integer units per module. Of course, this sidesteps the fact that no matter the pipelines or SMT performance, each core in the module has to share a single floating-point unit. The thread schedulers also follow this same format, with two for the integer processors and one for the FPU, but at least the dual 128-bit floating-point pipes can theoretically handle two independent threads. It is not the same as two independent FPUs, but it is better than one pipeline.

By going the module route, the result is a smaller, more efficient chip that runs cooler and uses less power, but it does come with several concessions. There will be a performance penalty for shared resources, and AMD estimates the Bulldozer will achieve up to 80 percent of the integer performance of a true dual-core alternative. The company has been quiet on the area of floating-point performance, but due to the single FPU per module, do not expect performance to approach that of a true dual-core CPU.

AMD has already mimicked Intel's Turbo Boost technology with a dynamic overclocking feature of its own. Unfortunately, the rudimentary AMD Turbo Core of the Phenom II was not as flexible as Intel's solution, but improvements are being made with AMD's next-generation architecture.

Each Bulldozer module can be independently regulated, allowing each module to be turned off or clocked higher as the computing environment dictates. This is a significant improvement compared to the Phenom II, but since this happens at the module level, it still cannot match the flexibility of the Intel standard, which allows gate control down to each individual core.

AMD has also stated that the integrated memory controller on Bulldozer will be significantly enhanced, representing the first true redesign since 2007. The specifics of this new memory controller have not been released, and although memory clock speeds are a bit higher under Bulldozer, standard features such as support for dual-channel DDR3 and HyperTransport 3.1 remain unchanged.

The Bulldozer will use the new AM3+ socket, which is extremely similar to AM3, but different enough to limit backward compatibility options. New AM3+ motherboards will support existing Socket AM2 and AM3 processors, but not the other way around, and potential upgraders will not be able to plug in an AM3+ Bulldozer CPU into their current AM2 or AM3 motherboards.

One and a Half Cores

The controversy surrounding this new CPU architecture has been raging since its announcement, and it revolves around the use of module count versus core count. In some ways this is a dispute of semantics, as the operating system and programs will still properly identify and utilize the stated number of logical cores. This is also a more efficient way to design a new CPU, but it is still somewhat disingenuous to portray a 4-module Bulldozer design as a true 8-core processor.

Each module in the Bulldozer architecture is actually a single super-processor that can simultaneously handle two threads at the hardware level. So while faster than a single CPU core with Intel's Hyper-Threading Technology, a Bulldozer module falls a bit short of matching an independent dual-core processor. It's a tweener that shares its architecture with both camps.

While the Bulldozer's two integer processors match the hardware of a true dual-core processor, pushing two threads through the single floating-point unit is similar to Hyper-Threading. In fact, multicore-aware operating systems like Windows will see each Bulldozer module as two logical processor cores, again similar to a CPU with Hyper-Threading. The only difference is that Intel does not market a quad-core with HT as an 8-core processor, which is a very important distinction.

AMD's counter argument is that the Bulldozer module-based architecture is more efficient and supplies more physical cores on the same die space, while costing less than a comparable Intel solution. Again this is true, but AMD keeps overstating the obvious, that modular hardware cores are preferable to Hyper-Threading through a single core.

AMD contends that SMT technologies like Hyper-Threading overload a single-core architecture, while true multi-core processors waste resources. In a recent blog post, AMD's director of product marketing John Fruehe has compared this scenario to a car engine, with Intel delivering a 4-cylinder powerhouse while AMD counters with a similarly priced 6-cylinder that uses less gas. The obvious problem emerges when Intel's 4-cylinder speeds by AMD's V6.

The argument for Bulldozer does make a lot of sense. Adding a second integer core to a module only increases its size by approximately 12 percent. That small increase in real estate gains you almost another core of performance. By combining two cores into a single module with shared resources, this also pays off in greater scalability, which could translate into higher clock speeds than a comparable multicore Phenom II.

While the strategy is sound, the danger is selling these dual-module processors as true quad-cores and putting them up against the best hardware quads from Intel. AMD has been playing this price/performance angle for a long time, and there is no doubt that Bulldozer will do well in cost-per-watt and price/performance comparisons, but much of the industry mindset is still derived from a performance-per-core basis.

Bulldozing the Software

By adopting such a different architecture, AMD needs to be concerned with any possible repercussions. One is potential issues with software, including both operating systems and individual programs. Since a Bulldozer module is not really two individual cores, software needs to recognize this and be optimized to take advantage of performance benefits, while not falling into potential sinkholes.

If "optimizing software" gives you a bit of a headache, this is understandable, and the same problems inherent in any SMT design will also affect Bulldozer. In optimal environments, a Bulldozer module will supply near-dual-core performance at a significant power savings, but if the threads start hitting memory or resource bottlenecks, then performance could really dip.

The Bulldozer architecture is also tuned to provide efficiencies in real-world usage, where massive CPU resources are usually wasted and less duplication could yield significant power-saving benefits. But this goes against conventional benchmark philosophy, which pushes the CPU to its limits and makes use of all its available resources. Under this scenario, the Bulldozer looks very susceptible to being crushed in the press by Intel processors sporting the same core count.

The entire Bulldozer strategy is one of price-performance and supplying end users with the most efficient processors on the market. AMD understands that due to the base architecture, and the decision to market these modules as two cores, that there is virtually no chance of beating Intel in a performance race. A dual-module/quad-core Bulldozer will probably not outpace a quad-core Sandy Bridge, but it could likely smoke any Intel dual-core processor.

This strategy of marketing a quad-module Bulldozer as an 8-core processor brings with it some risks. Rather than continuing to play the part of Intel's red-headed stepchild, it would be nice to see AMD step up and market Bulldozer based on the number of modules, rather than logical core count. That way, AMD would be selling the fastest, most supercharged quad that money can buy, rather than extolling the virtues of the slowest 8-core processor.

AMD's corporate vice president John Volkmann shed some light on the strategy, stating in a recent blog that AMD knows "processor brands (CPUs and GPUs) are largely irrelevant in the PC buying decision" and "that the average PC buyer is unaware of what processors are under the hood of their PC". I believe most buyers are better informed than this, especially on the corporate side, and few would be looking for a "bushel of 8-core processors."



 
  Topic By Replies Updated
MTECH 1
jonny b 3
Rich 1
Mr. Ross 2
ScratchFBST 1
sfb2no 9
AWEINCA 2
Yu22 2
DanceMan 1
weissmertz 3

 
  Topic By Replies Updated
mattaust 73
sfb2no 9
jonny b 3
Mr. Ross 2
MTECH 1
Rich 1
ScratchFBST 1
 


Linux is a trademark of Linus Torvalds.