A free service rounding up the week's news, articles, tips and reviews.







Intel Previews Deep Dives on CPU Technologies

Cores and Threads and Interconnects, Oh My



February 4, 2010
By Andy Patrizio

The International Solid-State Circuits Conference (ISSCC) is not a show for the casual technology enthusiast or mere mortals with less than a Ph.D. in electrical engineering. They don't simplify the topics for easy, mass consumption. It's a forum for serious engineering discussions and it's up to you to keep up.

One sample paper for discussion at the show: "A 320mV-to-1.2V On-Die Fine-Grained Reconfigurable Fabric for DSP/Media Accelerators in 32nm CMOS." Try saying that five times fast. The paper will discuss a chip fabric that could be reconfigured for a variety of purposes.

So keeping up with the advanced briefing Intel provided Wednesday on the papers it will present at the show was a challenge in and of itself. Nasser Kurd, senior principal engineer in the Intel Architecture Group, and Randy Mooney, Intel Fellow and director of Intel Labs, I/O Research, provided a look into Intel's offerings.

Related Articles

They gave no release date or time frame for some of the advancements Intel is working on, but when they do ship, they will be significant improvements over today's technologies. Their preview also gives insight into how Intel works.

The 80-core test processor from 2007 and 48-core processor from 2009, for example, will never see the light of day as finished products, but they are test beds that will have an impact on all future Intel processors.

The news for the masses is that Intel has a six-core version of Westmere planned for desktops and servers. The processor, called Westmere-EP, will be the subject of a paper at ISSCC. It will not have the on-die GPU like the dual-core desktop Westmeres, but it will have Hyper-Threading, meaning one chip can handle 12 threads at a time, and it will have Turbo Boost, the feature that shuts off idle cores and lets cores in use run at a higher clock speed.

The six-core processor also has a much larger data cache due to the many cores. The L3 cache, shared between the six cores, is 12MB. That's 2MB per core, which is the same as the 4MB of L3 on the dual-core processor. Still, it's a lot of cache and makes for a huge chip with 1.17 billion transistors.

Digging In

Kurd kicked off the talk discussing Westmere. Westmere is Nehalem shrunk from 45nm to 32nm with a few new instructions and, in some cases, on-die GPUs. That's the 10,000 foot view. In reality, Intel did a lot more than that.

Westmere implemented core-level gates to completely power down the core when idle. Nehalem did not do this. That feature enables Westmere to effectively reduce idle power to a far greater extent than Nehalem by completely turning off transistors when not in use.

While Nehalem had power gates in the core, Westmere adds power gates in the "uncore." The core of a CPU is the processor cores and L1 and L2 caches, while the "uncore" is the memory controller and L3 cache. By adding gates there as well, even more power savings can be achieved.

Westmere also added SRAM to save the contents of core memory when it is shut down. A core may be shut down and turned on a dozen times or more per second. When it's shut down and the power cut, what happens to its contents? They're stored in SRAM and quickly copied back to the core when it starts up.

Intel also introduced a low-voltage DDR3 technology for Westmere that lets the memory run at 1.35 volts instead of the usual 1.5 volts, and at the same time reduce CPU power by up to 20 percent.

Faster Interconnects

Mooney talked about several other Intel papers, most of which are related to interconnects. This is where the chipmaker's 48- and 80-core processors really have served as test beds. One of the problems is that with traditional interconnects, data often went out of the chip and onto the motherboard before coming back onto the chip and to another part of the CPU. Intel has been working on keeping intra-chip data transfers on the chip to improve speed.

Future tera-scale processing jobs also need faster data transfers, and they need to be more power-efficient. One Intel paper will describe how, with current interconnect and circuit technology available today, to move one terabyte of data from one chip to another would take 150 watts. But with new techniques Intel will discuss in a presentation at ISSCC, that terabyte of data can be moved with just 11 watts of power.

Intel is working on a more aggressive sleep state for these chip-to-chip interconnects that draws just seven watts of power in sleep mode and wakes up 1,000 times faster.

The company is also working on more careful mapping of threads onto cores rather than placing them randomly in a many-core system, as well as putting threads onto the most aggressive cores in a many-core system. Mooney estimates Intel can net a 20- to 60-percent performance improvement this way, depending on the core network.

ISSCC takes place in San Francisco and starts next Monday.



 
  Topic By Replies Updated
marei 6
YankeeMan 1
detailer 7
zillah 2
sbrown121 4
saiadmiah 1
stevebreslin 12
grasshopper1970 3
RWaytz 1
gazix 5

 
  Topic By Replies Updated
detailer 7
marei 6
sbrown121 4
zillah 2
YankeeMan 1
 


Linux is a trademark of Linus Torvalds.