A free service rounding up the week's news, articles, tips and reviews.







AMD Flexes New Floating-Point Unit



November 5, 2010
By Vince Freeman

Our initial commentary on the upcoming AMD Bulldozer architecture covered its basic features and design, including the transition to a modular architecture, but in a recent blog, AMD expanded on the feature set by outlining the new Flex FP floating-point capabilities of the Bulldozer microarchitecture.

A floating-point unit operates on larger numbers with decimal points and uses higher-level math like exponential and trigonometric calculations. This differs from integer math processing with its basic round integers and more rudimentary math. Obviously, floating-point performance would not affect all areas of computing, but it does have a significant impact when processing financial or technical data.

AMD's Flex FP goes back to the basic concept behind Bulldozer, that of a fully modular design where two integer cores are linked to a single floating-point unit in a processor module. This single FPU is now Flex FP.

Related Articles

Inside Flex FP

The single floating-point unit of the Bulldozer has come under a bit of fire, as an eight-core Bulldozer would have only four physical FPUs. Instead of a dedicated 128-bit FPU per core as in current Phenom II designs, the Bulldozer architecture will feature a single 256-bit FPU shared by two integer cores.

The reason for this is simple - adding a second integer core to a Bulldozer module increases CPU real estate by only 12 percent. This is similar to the “shared resources” strategy Intel employs on a per-core basis with Hyper-Threading, but Flex FP is doing it at the component level and with a larger resource base.

AMD's stance is that as most programs have significantly more integer code than floating-point, a single integer core does not require its own dedicated 256-bit FPU. By adding a second integer unit, and sharing the same FPU, AMD can target the Bulldozer directly at the most common instructions.

AMD's Flex FP also includes some additional enhancements designed to improve performance and keep the data pipelines flowing. Bulldozer has dedicated schedulers for both integer and floating-point commands, rather than using a single scheduler for both units like Intel does on their Core-based processors.

By designing a separate FPU scheduler, each of the floating-point processes can be handled independently. This can not only speed up floating-point operations and keep the FPU path filled up, but it also drops the scheduling load from the integer processor. The only caveat is that there is a scheduler for each physical unit, two for integer and one for FPU per module.

AMD's Flex FP is designed around a full 256-bit FPU that can be further segmented into dual 128-bit data pipes. Flex FP certainly lives up to its name, and can handle two 128-bit SSE instructions through a single core, or both cores can simultaneously process a 128-bit FPU command. Support for AVX (Advanced Vector Extensions) instructions allows Flex FP to handle full 256-bit floating-point execution, but programs need to be recompiled to take advantage of it.

AMD is promoting Flex FP as a more flexible design that can easily handle both standard 128-bit floating-point code and the enhanced 256-bit AVX instructions. This differs from what Intel will offer with the Sandy Bridge FPU, which can process 1x128-bit in legacy mode and 1x256-bit with AVX code. Flex FP allows multiple configurations, so AMD Bulldozer should be able to process as a full 256-bit FPU, just not in the same form.

The difference is that regardless of the configuration, Flex FP can handle only 128-bit pieces, and pairs them up into 2x128-bit for a 256-bit AVX instruction. Intel can handle a full 256-bit floating-point AVX command per core, as well as a dedicated 128-bit path for legacy applications. This may sound equivalent, but this slight difference means that a Sandy Bridge multi-core processor should be faster when using the AVX instruction set.

Smaller, Cooler, More Energy Efficient

Sharing a 256-bit FPU across two integer cores provides many benefits compared to a standard processor with a dedicated FPU per core, most of which relate to factors other than pure floating-point performance.

The power draw of a Bulldozer module with two integer cores and a single Flex FP unit will be lower than a conventional dual-core processor with two FPUs. Production costs will also be lower, and by sharing resources, AMD will be able to decrease the relative die size of a Bulldozer processor. Even though Bulldozer will make the transition to 32mn, any savings in die size translates into a greater number of CPUs per wafer, which will pay off on the AMD balance sheet.

Both of these factors will also benefit AMD customers, resulting in lower prices for Bulldozer processors and decreased operating costs for AMD-powered client systems and servers. Conversely, the shared FPU also provides additional integer cores and higher integer performance at a given power envelope. According to AMD, integer performance is where most of the bottlenecks occur, and the Bulldozer architecture is designed to solve this problem.

These factors, combined with the fact that Bulldozer still offers a 128-bit FPU pipe for each integer core, should make it an attractive option, even if the design seems a bit more suited for the server market. AMD has been playing this “better bang for the buck” game for a very long time, and still losing ground to Intel, so it remains to be seen if the “more cores for less” efficiencies of Bulldozer and Flex FP will pay off in terms of greater market share.

Share and Share Alike

There are some concessions inherent in Flex FP, some of them real and others more of a mindset. Consumers like to get what they pay for, and no matter the innovative design and theoretical 128-bit FPU per core, some may view the Bulldozer module as “missing” a floating-point unit. Even taking the high road, it is slightly disingenuous to market a four module Bulldozer as a true 8-core processor, at least against Intel's Sandy Bridge and its dedicated 128-bit/256-bit floating-point engine.

Performance is also a question mark, as AMD can blog all it wants about the architecture, flexibility and features of an upcoming processor, but Bulldozer will be called to the benchmark table eventually. Long gone are the days when AMD could flaunt its FPU superiority over Intel, and the Core-based processors are floating-point beasts. In a worst-case scenario, any performance gap between AMD and Intel could easily be (mis)attributed to the shared FPU, and would be something Intel fans could latch onto and run with.

With AMD painting a big bull's-eye on its shared Flex FP, you can also bet that floating-point performance evaluations will be a major part of any upcoming Bulldozer review. The shared resources of the module also extend to the integer unit, which AMD has already stated has only 80 percent the performance of a dedicated multi-core. If FPU performance tapers off even lower, that could spell bad news for Flex FP.

At the end of the day, how you view Flex FP comes back to your concept of a CPU core as it relates to the Bulldozer. If terms like "processor module" and "shared resources" are not your thing, then AMD could have a tough time, especially if core efficiency and price-performance are ignored in favor of a straight core-to-core performance comparison.



 
  Topic By Replies Updated
summithelper 3
MTECH 1
jonny b 3
Rich 1
Mr. Ross 2
ScratchFBST 1
sfb2no 9
AWEINCA 2
Yu22 2
DanceMan 1

 
  Topic By Replies Updated
mattaust 74
Binar 64
sfb2no 9
resStealth 5
summithelper 3
jonny b 3
MTECH 1
Rich 1
 


Linux is a trademark of Linus Torvalds.