Pro Tour
Pro Tour
![]() |
![]() Nike Golf Victory Red VR Staff Bag Black Red Silver BG0246 016 Brand New $299 US $169.99
|
![]() 2011 Nike Golf Tour Cart Bag II 14 Way Full Length Dividers Navy Color $199 New US $109.95
|
![]() 2011 Taylormade Den Caddy Mini Staff Bag White Black Red Silver US $69.99
|
![]() TaylorMade R7 Special Edition TP Tour Pro Staff Bag Cart Bag Golf Club Bag US $53.00
|
![]() 2011 Nike Golf Tour Cart Bag II 14 Way Full Length Dividers Black Color $199 New US $109.99
|
![]() EXPENSIVE BELDING HANDMADE GOLF BAG MADE IN AMERICA US $325.00
|
![]() Autographed Casey Martin PGA Tour Bag Only Pro Golf Bag Ever Allowed on Cart US $199.00
|
![]() New 2011 Ping 4 Under Stand Bag Red White US $69.99
|
![]() NEW 2012 TAYLORMADE TMX R11 TOUR FULL SIZE STAFF GOLF BAG LOOK LIKE A PRO US $269.05
|
![]() Ecco Golf 2012 9 Inch Cart Trolley Bag US $156.45
|
![]() New Ping Limited Edition 4 Under Stand Bag White Navy US $69.99
|
![]() New 2012 Ping Hoofer Carry Stand Bag Black Charcoal Red US $134.99
|
![]() NEW Taylor Made R7 TP Tour Staff Golf Bag US $79.50
|
![]() NEW 2012 Ping Pioneer Cart Bag Black US $144.99
|
![]() PALM SPRINGS GOLF Tour Player Travel Cover BLUE US $59.99
|
![]() Wilson Staff Pro Tour Golf Practice Ball Bag White Tour Colours US $39.49
|
![]() PALM SPRINGS GOLF Tour Player Travel Cover GRAY US $59.99
|
![]() Nike Golf Pro Tour Swoosh Staff Bag Black Silver 95 Brand NEW US $125.00
|
![]() PROSiMMON GOLF DUAL STRAP BLUE SILVER STAND BAG NEW US $59.99
|
![]() NIKE GOLF PRO TOUR SWOOSH STAFF BAG BLACK SILVER 95 BRAND NEW GREAT PRICE US $199.95
|
![]() Wilson STAFF Lizard Golf Pro Carry Stand Bag Silver NEW Tour Quality PGA US $49.99
|
![]() Wilson STAFF Lizard Golf Pro Carry Stand Bag Navy NEW PGA Tour Quality US $84.99
|
![]() New 2012 Ping Latitude Stand Carry Golf Bag Black White US $159.99
|
![]() NEW 2012 Ping Golf Traverse Cart Bag Black Inferno Red US $139.99
|
![]() New 2012 Ping Golf Hoofer Carry Stand Bag Green US $139.99
|
![]() New 2012 Ping Golf Latitude Stand Carry Golf Bag Navy White US $154.99
|
![]() New 2012 Ping Golf 4 Series Stand Carry Bag Black White US $109.99
|
![]() New 2012 Ping Golf Hoofer Carry Stand Bag White US $139.99
|
![]() Tour Edge Exotics Extreme Cart Bag Black White US $97.95
|
![]() Ogio Monster Travel Bag Charcoal US $229.99
|
![]() ASAHI GOLF JAPAN TOUR Z ZC 7814 STAND CADDY BAG 65 x 46 22 kg 48 lb NEW US $200.00
|
![]() Ogio Straight Jacket Travel Bag Black Tech US $139.99
|
![]() RYDER CUP OAKLAND HILLS Top 25 GC PLAYERS STAFF BAG US $675.00
|
![]() Tour Edge Bazooka Geo Max Jr 4x1Box Set Right Hand Ages 9 12 Blue US $89.99
|
![]() Ogio Yeti Travel Bag Black Tech US $149.99
|
![]() NEW Nike Golf Travel Cover Tour Bag Golf Travel Case US $229.00
|
![]() NEW Nike Golf Club Travel Cover Tour Issue US $382.00
|
![]() CALLAWAY JAPAN LEGACY TOUR 90 2012 TWL JM STAFF CADDY BAG 52 kg 114 lb 9 US $800.00
|
![]() FUJIKURA STAFF BAG PRO TOUR STAND CARTPRO V NEW TITLEIST MPMIZUNOCOBRA US $200.00
|
![]() HONMA JAPAN BERES TOUR CADDY STAFF BAG HB 3017 45kg 9 47 2 colours US $599.00
|
![]() MIZUNO GOLF JAPAN MP TOUR STYLE 075 STAFF CADDY BAG 95x47 56 kg 13 lb NEW US $600.00
|
![]() CALLAWAY JAPAN BRAND NEW 2012 MODEL TWL JM STREAMLINE BOSTON BAG BLACK US $200.00
|
![]() CALLAWAY JAPAN 2011 MODEL NEW 90 JM TOUR CADDY BAG US $630.00
|
![]() CALLAWAY JAPAN BRAND NEW 2012 MODEL TWL JM TOUR SHOES CASE BLACK || WHITE US $100.00
|
![]() Custom Made JStewart Elite Pro Tour Bag US $405.00
|
![]() MIZUNO GOLF JAPAN JPX TOUR STYLE 079 CART CADDY BAG 95x47 38 kg 84 lb NEW US $400.00
|
![]() MIZUNO GOLF JAPAN TOUR STYLE WORLD REPLICA STAFF CADDY BAG 49 kg 108 lb NEW US $360.00
|
![]() MIZUNO GOLF JAPAN TOUR STYLE WORLD MODEL STAFF CADDY BAG 95x4757 kg 125 lb US $460.00
|
![]() Tour Edge Exotics Extreme Cart Bag Black US $93.84
|
![]() New TaylorMade Golf 2011 TMX R11 Tour Preferred Staff Bag US $239.98
|
![]() Tour Edge Exotics Extreme Stand Bag Gray White US $99.99
|
![]() Tour Edge Exotics Extreme Cart Bag Gray White US $99.99
|
![]() Tour Edge Exotics Extreme Stand Bag Black US $99.99
|
![]() Tour Edge Exotics Extreme Cart Bag Red White US $99.99
|
![]() PROSiMMON GOLF DUAL STRAP BLACK SILVER STAND BAG NEW US $59.99
|
![]() G6] White Pro Tour GOLF SUNDAY BAG Junior size US $10.00
|
![]() AspenBlue golfbag filled with Wilson Pro Staff OS clubsputterMORE reducedprice US $309.00
|
![]() Bridgestone Golf Tour Staff Pro Cart Bag Black New US $319.00
|
![]() 2011 FUJIKURA PRO PGA TOUR DEMO STAFF GOLF BAG CART US $284.05
|
![]() Callaway Golf RAZR Tour Staff Pro Bag Black New US $349.99
|
![]() Taylor Made R11 Golf Bag T2 and Golf Clubs US $550.00
|
![]() COBRA GOLF 10 TOUR STAFF BAG BLACK YELLOW WITH RAIN HOOD NEW US $148.00
|
![]() 2011 WILSON STAFF PRACTICE GOLF BALL SHAG BAG US $55.29
|
![]() BEN HOGAN TOUR FUTURES VISOR WHITE 1990 93 MADE IN USA US $19.99
|
![]() Wilson Lizard Hooters Pro Golf Tour Stand Bag Black 9 5 way top Golf US $129.99
|
![]() New Srixon Staff Bag White Black Orange US $269.99
|
![]() Vintage Ladies Sandra Haynie Top Flight Executive Golf Clubs with Leather Bag US $169.99
|
![]() UNIVERSITY OF MIAMI STAFF BAG 8 INCH LK MAKE OFFER US $179.99
|
![]() 2011 FUJIKURA PRO PGA TOUR MOTORE STAFF GOLF BAG CART US $332.49
|
![]() PROSiMMON GOLF DUAL SHOULDER STRAP BLUE SILVER LIGHTWEIGHT STAND BAG NEW US $59.99
|
![]() PROSiMMON GOLF DUAL SHOULDER STRAP BLACK SILVER LIGHTWEIGHT STAND BAG NEW US $59.99
|
![]() NEW Pro X7 Junior Kids Golf Bag w Stand Ball Marker Shoe Bag US $34.95
|
![]() TAYLORMADE R9 2010 PGA CHAMPION WHISTLING STRAITS MARTIN KAYMER STAFF BAG I US $450.00
|
![]() CALLAWAY HAWK EYE TOUR STAFF BAG 105 6 WAY TOP US $59.99
|
![]() NICE ADAMS TOUR STAFF BAG 105 6 WAY TOP US $169.99
|
![]() 1996 Titleist Cart Golf Bag near MINT Condition One Owner Very Well Kept US $89.99
|
![]() New 2012 Ping Hoofer Carry Stand Bag Black US $134.99
|
![]() New 2012 Ping 4 Series Stand Carry Bag Navy Charcoal US $109.99
|
![]() New 2012 Ping 4 Series Stand Carry Bag White Inferno Rd US $109.99
|
![]() New 2012 Ping 4 Series Stand Carry Bag White Royal US $109.99
|

A Tour of the Pentium(r) Pro Processor Microarchitecture
Introduction
One of the Pentium(r) Pro processor's primary goals was to significantly exceed the performance
of the 100MHz Pentium(r) processor while being manufactured on the same semiconductor process. Using the same process as a volume production processor practically assured that the Pentium Pro processor would be manufacturable, but it meant that Intel had to focus on an improved microarchitecture for ALL of the performance gains. This guided tour describes how multiple architectural techniques - some proven in mainframe computers, some proposed in academia and some we innovated ourselves - were carefully interwoven, modified, enhanced, tuned and implemented to produce the Pentium Pro microprocessor. This unique combination of architectural features, which Intel describes as Dynamic Execution, enabled the first Pentium Pro processor silicon to exceed the original performance goal.
Building from an already high platform
The Pentium processor set an impressive performance standard with its pipelined,
superscalar microarchitecture. The Pentium processor's pipelined implementation uses five
stages to extract high throughput from the silicon - the Pentium Pro processor moves to a
decoupled, 12-stage, superpipelined implementation, trading less work per pipestage for
more stages. The Pentium Pro processor reduced its pipestage time by 33 percent, compared
with a Pentium processor, which means the Pentium Pro processor can have a 33% higher clock
speed than a Pentium processor and still be equally easy to produce from a semiconductor
manufacturing process (i.e., transistor speed) perspective.
The Pentium processor's superscalar microarchitecture, with its ability to execute two
instructions per clock, would be difficult to exceed without a new approach.
The new approach used by the Pentium Pro processor removes the constraint of linear
instruction sequencing between the traditional "fetch" and "execute" phases, and opens up
a wide instruction window using an instruction pool. This approach allows the "execute"
phase of the Pentium Pro processor to have much more visibility into the program's
instruction stream so that better scheduling may take place. It requires the instruction
"fetch/decode" phase of the Pentium Pro processor to be much more intelligent in terms of
predicting program flow. Optimized scheduling requires the fundamental "execute" phase to
be replaced by decoupled "dispatch/execute" and "retire" phases. This allows instructions
to be started in any order but always be completed in the original program order. The
Pentium Pro processor is implemented as three independent engines coupled with an
instruction pool as shown in Figure 1 below.
What is the fundamental problem to solve?
Before starting our tour on how the Pentium Pro processor achieves its high performance it
is important to note why this three- independent-engine approach was taken. A fundamental
fact of today's microprocessor implementations must be appreciated: most CPU cores are not
fully utilized.
The first instruction in this example is a load of r1 that, at run time, causes a cache miss.
A traditional CPU core must wait for its bus interface unit to read this data from main
memory and return it before moving on to instruction 2. This CPU stalls while waiting for
this data and is thus being under-utilized.
While CPU speeds have increased 10-fold over the past 10 years, the speed of main memory
devices has only increased by 60 percent. This increasing memory latency, relative to the
CPU core speed, is a fundamental problem that the Pentium Pro processor set out to solve.
One approach would be to place the burden of this problem onto the chipset but a
high-performance CPU that needs very high speed, specialized, support components is not a
good solution for a volume production system.
A brute-force approach to this problem is, of course, increasing the size of the L2 cache to reduce the miss ratio. While effective, this is another expensive solution, especially considering the speed requirements of today's L2 cache SRAM components. Instead, the Pentium Pro processor is designed from an overall system implementation perspective which will allow higher performance systems to be designed with cheaper memory subsystem designs.
Pentium Pro processor takes an innovative approach
To avoid this memory latency problem the Pentium Pro processor "looks-ahead" into its instruction pool at subsequent instructions and will do useful work rather than be stalled. In the example in Figure 2, instruction 2 is not executable since it depends upon the result of instruction 1; however both instructions 3 and 4 are executable. The Pentium Pro processor speculatively executes instructions 3 and 4. We cannot commit the results of this speculative execution to permanent machine state (i.e., the programmer-visible registers) since we must maintain the original program order, so the results are instead stored back in the instruction pool awaiting in-order retirement. The core executes instructions depending upon their readiness to execute and not on their original program order (it is a true dataflow engine). This approach has the side effect that instructions are typically executed out-of-order.
The cache miss on instruction 1 will take many internal clocks, so the Pentium Pro processor core continues to look ahead for other instructions that could be speculatively executed and is typically looking 20 to 30 instructions in front of the program counter. Within this 20- to 30- instruction window there will be, on average, five branches that the fetch/decode unit must correctly predict if the dispatch/execute unit is to do useful work. The sparse register set of an Intel Architecture (IA) processor will create many false dependencies on registers so the dispatch/execute unit will rename the IA registers to enable additional forward progress. The retire unit owns the physical IA register set and results are only committed to permanent machine state when it removes completed instructions from the pool in original program order.
Dynamic Execution technology can be summarized as optimally adjusting instruction execution by predicting program flow, analysing the program's dataflow graph to choose the best order to execute the instructions, then having the ability to speculatively execute instructions in the preferred order. The Pentium Pro processor dynamically adjusts its work, as defined by the incoming instruction stream, to minimize overall execution time.
Overview of the stops on the tour
We have previewed how the Pentium Pro processor takes an innovative approach to overcome a key system constraint. Now let's take a closer look inside the Pentium Pro processor to understand how it implements Dynamic Execution. Figure 3 below extends the basic block diagram to include the cache and memory interfaces - these will also be stops on our tour. We shall travel down the Pentium Pro processor pipeline to understand the role of each unit:
•The FETCH/DECODE unit: An in-order unit that takes as input the user program instruction stream from the instruction cache, and decodes them into a series of micro-operations (uops) that represent the dataflow of that instruction stream. The program pre-fetch is itself speculative.
•The DISPATCH/EXECUTE unit: An out-of-order unit that accepts the dataflow stream, schedules execution of the uops subject to data dependencies and resource availability and temporarily stores the results of these speculative executions.
•The RETIRE unit: An in-order unit that knows how and when to commit ("retire") the temporary, speculative results to permanent architectural state.
•The BUS INTERFACE unit: A partially ordered unit responsible for connecting the three internal units to the real world. The bus interface unit communicates directly with the L2 cache supporting up to four concurrent cache accesses. The bus interface unit also controls a transaction bus, with MESI snooping protocol, to system memory.
Tour stop #1: The FETCH/DECODE unit.
Let's start the tour at the Instruction Cache (ICache), a nearby place for instructions to reside so that they can be looked up quickly when the CPU needs them. The Next_IP unit provides the ICache index, based on inputs from the Branch Target Buffer (BTB), trap/interrupt status, and branch-misprediction indications from the integer execution section. The 512 entry BTB uses an extension of Yeh's algorithm to provide greater than 90 percent prediction accuracy. For now, let's assume that nothing exceptional is happening, and that the BTB is correct in its predictions. (The Pentium Pro processor integrates features that allow for the rapid recovery from a mis-prediction, but more of that later.)
The ICache fetches the cache line corresponding to the index from the Next_IP, and the next line, and presents 16 aligned bytes to the decoder. Two lines are read because the IA instruction stream is byte-aligned, and code often branches to the middle or end of a cache line. This part of the pipeline takes three clocks, including the time to rotate the prefetched bytes so that they are justified for the instruction decoders (ID). The beginning and end of the IA instructions are marked.
Three parallel decoders accept this stream of marked bytes, and proceed to find and decode the IA instructions contained therein. The decoder converts the IA instructions into triadic uops (two logical sources, one logical destination per uop). Most IA instructions are converted directly into single uops, some instructions are decoded into one-to-four uops and the complex instructions require microcode (the box labeled MIS in Figure 4, this microcode is just a set of preprogrammed sequences of normal uops). Some instructions, called prefix bytes, modify the following instruction giving the decoder a lot of work to do. The uops are enqueued, and sent to the Register Alias Table (RAT) unit, where the logical IA-based register references are converted into Pentium Pro processor physical register references, and to the Allocator stage, which adds status information to the uops and enters them into the instruction pool. The instruction pool is implemented as an array of Content Addressable Memory called the ReOrder Buffer (ROB).
We have now reached the end of the in-order pipe.
Tour stop #2: The DISPATCH/EXECUTE unit
The dispatch unit selects uops from the instruction pool depending upon their status. If the status indicates that a uop has all of its operands then the dispatch unit checks to see if the execution resource needed by that uop is also available. If both are true, it removes that uop and sends it to the resource where it is executed. The results of the uop are later returned to the pool. There are five ports on the Reservation Station and the multiple resources are accessed as shown in Figure 5 below:
The Pentium Pro processor can schedule at a peak rate of 5 uops per clock, one to each resource port, but a sustained rate of 3 uops per clock is typical. The activity of this scheduling process is the quintessential out-of-order process; uops are dispatched to the execution resources strictly according to dataflow constraints and resource availability, without regard to the original ordering of the program.
Note that the actual algorithm employed by this execution-scheduling process is vitally important to performance. If only one uop per resource becomes data-ready per clock cycle, then there is no choice. But if several are available, which should it choose? It could choose randomly, or first-come-first-served. Ideally it would choose whichever uop would shorten the overall dataflow graph of the program being run. Since there is no way to really know that at run-time, it approximates by using a pseudo FIFO scheduling algorithm favoring back-to-back uops.
Note that many of the uops are branches, because many IA instructions are branches. The Branch Target Buffer will correctly predict most of these branches but it can't correctly predict them all. Consider a BTB that's correctly predicting the backward branch at the bottom of a loop: eventually that loop is going to terminate, and when it does, that branch will be mispredicted. Branch uops are tagged (in the in-order pipeline) with their fallthrough address and the destination that was predicted for them. When the branch executes, what the branch actually did is compared against what the prediction hardware said it would do. If those coincide, then the branch eventually retires, and most of the speculatively executed work behind it in the instruction pool is good.
But if they do not coincide (a branch was predicted as taken but fell through, or was predicted as not taken and it actually did take the branch) then the Jump Execution Unit (JEU) changes the status of all of the uops behind the branch to remove them from the instruction pool. In that case the proper branch destination is provided to the BTB which restarts the whole pipeline from the new target address.
Tour stop #3: The RETIRE unit
The retire unit is also checking the status of uops in the instruction pool - it is looking for uops that have executed and can be removed from the pool. Once removed, the uops' original architectural target is written as per the original IA instruction. The retirement unit must not only notice which uops are complete, it must also re-impose the original program order on them. It must also do this in the face of interrupts, traps, faults, breakpoints and mis- predictions.
There are two clock cycles devoted to the retirement process. The retirement unit must first read the instruction pool to find the potential candidates for retirement and determine which of these candidates are next in the original program order. Then it writes the results of this cycle's retirements to both the Instruction Pool and the RRF. The retirement unit is capable of retiring 3 uops per clock.
Tour stop #4: BUS INTERFACE unit
There are two types of memory access: loads and stores. Loads only need to specify the memory address to be accessed, the width of the data being retrieved, and the destination register. Loads are encoded into a single uop. Stores need to provide a memory address, a data width, and the data to be written. Stores therefore require two uops, one to generate the address, one to generate the data. These uops are scheduled independently to maximize their concurrency, but must re-combine in the store buffer for the store to complete.
Stores are never performed speculatively, there being no transparent way to undo them. Stores are also never re- ordered among themselves. The Store Buffer dispatches a store only when the store has both its address and its data, and there are no older stores awaiting dispatch.
What impact will a speculative core have on the real world? Early in the Pentium Pro processor project, we studied the importance of memory access reordering. The basic conclusions were as follows:
•Stores must be constrained from passing other stores, for only a small impact on performance.
•Stores can be constrained from passing loads, for an inconsequential performance loss.
•Constraining loads from passing other loads or from passing stores creates a significant impact on performance.
So what we need is a memory subsystem architecture that allows loads to pass stores. And we need to make it possible for loads to pass loads. The Memory Order Buffer (MOB) accomplishes this task by acting like a reservation station and Re-Order Buffer, in that it holds suspended loads and stores, redispatching them when the blocking condition (dependency or resource) disappears.
About the Author
Itech troubleshooter is an advanced web development, high skilled professional software Solution Company located in New Delhi founded by, PRABHAKAR MISHRA in the year 2008.The company provides vast range of services to each and every customer in reaching their respective targeted spectators and their valuable information in fix and on steady affordable price. Today, you can easily get a lot of quality services by this company on just dialing a call to the company which includes services like website designing , web application development , Application development , Maintenance , Re-engineering , Flash development , SEO , SEO Services , Computer AMC , Computer Networking , Wireless Networking , Data Recovery , ERP Solution .


US $169.99






















































![G6] White Pro Tour GOLF SUNDAY BAG Junior size](http://www.golferclick.net/images/e/130677229223_0.jpg)























