A few months ago we did a preview of Intel's new Willamette CPU. Now that more information has become available we've revised our initial preview to include the latest information. Intel's Willamette CPU is said to run at higher clockspeeds and deliver performance levels above any other IA-32 CPU's. This report takes a closer look at some of these new features and their influence on overall performance.
Would it be possible for Intel to use some of that 'software layer' technology like Transmeta or even Apple (68k --> PPC) to use the SSE2 FPU as the x87 FPU? Certainly the SSE2 FPU could do everything the x87 FPU can do, so just have something that catches the instructions and changes them to the appropriate SSE2 instruction. That would provide immediate speed increase, and make is so developers would not have to worry as much about a rewrite and just do it in the next version or something. Im not quite sure if that would work, but it sounded reasonable to me.
Wouldn't that depend on whether there was a 1-1 match between the structure of the x87 instructions and the SSE2's? If there is a significant change in parameter and output structuring or consolidation of instructions etc. then the translation might become very elaborate.
Yes, I thought of that too -- but it could be done. With the amount of people and money Intel has they could definatly pull it off. It would probably be worth it, as even after translation it should still be fater than x87 since acording to the article SSE2 is over 10x faster.
About the instruction pipelines, making them deeper allows more Mhz. But it decreases speed per Mhz, is this so? If this is true why dont they try and use LESS pipelines and make a more efficient per Mhz CPU? Its kind of confusing to me, unless they can make SO MUCH more Mhz with the extra pipelines that it doesnt matter that its less efficient. That is what Intel is going for correct? Im new to this kind of thing (this being the more in depth discussion of computer hardware) so excuse my ignorance.
This is not the first time Intel has increased the number of pipeline stages. In fact, Intel has pretty much doubled the number of pipestages in each of the past 3 generations. 386, 1 stage; 486, 2 stages(??); Pentium, 5 stages; Pentium Pro/II/III, 12 stages. Latency of the entire pipeline is not very important. It affects performance to a minor degree. The part of the pipestage where latency is important is the reg and exe stage. If you pipeline those stages, then performance will degrade by double digit percentage. You can see that the ALU in Willamette is still 1 clock. AMD also doubled their pipeline stages from K6 to Athlon from 5 stages to around 10.
The part that really hurts the performance in a hyperpipelined CPU is branch mispredict. Since there are double the number of pipestages to flush and time was wasted on executing instructions that need to be thrown out. So you can see that branch prediction has been beefed up to reduce this effect. Overall, the performace of going from 10 to 20 pipestages is maybe 5%(??) due to the branch mispredict but that is more than made up by the gain in frequency. And other microarchitecture features will account for the extra IPC.
Intel will aim for a performance gain of 1.3x in IPC when a new microarchitecture is designed so at the same frequency, the CPU should perform 1.3x faster. Then there is more gain with the increased frequency. There is a lot of performance simulation done so Intel already knows the real world performance will meet the mark before the CPU was finished.
Don't hold your breath anytime soon for Intel to have a real PRODUCTION chip running 1.5 gig - unless it's Nitrogen cooled.. Scuttlebut from inside Intel scoff at the 1.5 gig number - 1 gig if EVERYTHING goes well.