Cyrix 6x86™ Processor Processor Brief The Cyrix 6x86™ processor family offers the highest level of performance available for desktop PCs today. Through the use of innovative, sixth-generation architectural techniques, the 6x86 processors achieve best-in-class performance that surpasses the Pentium® processor in each performance class. The superscalar, superpipelined 6x86 processor, available in PR200+, PR166+, PR150+, PR133+ and PR120+ performance classes, is optimized to run both 16-bit and 32-bit software. It is fully compatible with the x86 instruction set and delivers industry-leading performance running Windows® 95, Windows NT, Windows, OS/2®, DOS, Solaris UNIX® and other operating systems. The Cyrix 6x86 processor is optimized for both 16-bit and 32-bit applications. Our goal is to offer users of 6x86-based PCs an easy path to higher performance for Windows NT and to MMX that protects today’s PC investment. The next version of Cyrix’s 6x86 processor, code-named M2, will provide optimum performance on 32-bit software and will be fully compatible with MMX. This new processor will leverage existing 6x86 motherboard platforms. The Cyrix 6x86 processor achieves top performance through the use of two optimized superpipelined integer units and an on-chip FPU. The integer and floating point units are optimized for maximum instruction throughput by using advanced architectural techniques including register renaming, out-of-order completion, data dependency removal, branch prediction and speculative execution. These design innovations eliminate many data dependencies and resource conflicts to achieve high performance when executing existing non-recompiled software programs as well as future x86-compatible code. While the 6x86 achieves superior performance with existing software, it takes advantage of any recompiled code to gain an additional 5-10% performance increase. Features and Benefits Architectural Overview (Synopsis) Architectural Comparison Technical Specifications Performance Benchmarks Features and Benefits Superscalar architecture Provides two pipelines to execute multiple instructions in parallel for faster processing and higher performance. Superpipelining Increases the number of pipeline stages to avoid execution stalls and keep information flowing faster for higher frequency scalability. Register Renaming Provides temporary data storage for instant data availability without waiting for the CPU to access the on-chip cache or main system memory. Data Dependency Removal Provides instruction results to both pipelines simultaneously so that neither pipeline is stalled. Multi-Branch Prediction Boosts processor performance by predicting with high accuracy the next instructions needed. Speculative Execution Allows the pipelines to continuously execute instructions following a branch without stalling the pipelines. Out-of-Order Completion Lets the faster instruction exit the pipeline out of order, saving processing time without disrupting program flow. 80-bit Floating Point Unit (FPU) Provides high performance by speculatively executing FPU and integer instructions in parallel. 16-KByte Unified Write-Back Cache Stores the most recently used data and instructions for single-cycle, on-chip access. | Features and Benefits | Architectural Overview | Architectural Comparison | | Technical Specifications | Performance Benchmarks | Page Top | Architectural Overview The 6x86 is the first in a new generation of high-performance, x86-compatible processors. This sixth-generation processor achieves optimum performance on existing and emerging software applications. The superscalar architecture of the Integer Unit allows multiple instructions to be processed simultaneously in two separate pipelines. Through the use of innovative architectural techniques, the 6x86 eliminates many data dependencies and resource conflicts inherent in other microprocessor designs. The 6x86 consists of five major functional blocks the Integer Unit, Cache Unit, Memory Management Unit, Floating Point Unit and Bus Interface Unit. Instructions are executed in the X and Y pipelines within the Integer Unit and the Floating Point Unit. The Cache Unit stores the most recently used data and instructions allowing fast access to the information by the Integer Unit and FPU. Physical addresses are calculated by the Memory Management Unit and passed to the Cache Unit and the Bus Interface Unit (BIU). The BIU provides the interface between the external system board and the processor's internal execution units. Integer Unit The Integer Unit provides parallel instruction execution using two seven-stage integer pipelines. Each of the two pipelines, X and Y, can process several instructions simultaneously. The Instruction Fetch (IF) stage fetches 16 bytes of code from the cache unit in a single clock cycle and checks the code stream for any branch instructions that could affect normal program sequencing. Instruction Decode (ID). ID1 evaluates the code stream and determines the number of bytes in each instruction. Up to two instructions per clock are delivered to the ID2 stages. Address Calculation (AC). AC1 calculates a linear memory address for the instruction if the instruction refers to a memory operand. AC2 performs any required memory management functions, cache accesses and register file accesses. If a floating point instruction is detected, AC2 sends it to the FPU for processing. The Execute (EX) stage executes instructions using the operands provided by the address calculation stage. The Write-Back (WB) stage stores execution results either to a register file within the Integer Unit or to a write buffer in the cache control unit. Out-of-order processing. If an instruction executes faster than a previous instruction in the other pipeline, the instructions may complete out of order. Out-of-order completion occurs in the EX and WB stages. Data dependency solutions. Data dependencies typically force serialized execution of instructions and can degrade performance. The 6x86, however, implements register renaming, data dependency removal (including operand and result forwarding), and data bypassing to effectively resolve data dependencies and allow parallel execution of instructions containing these dependencies. Branch control. Branch instructions occur on average every four to six instructions in x86 compatible programs. The pipeline stages may stall while waiting for the CPU to process the new instruction stream. The 6x86 minimizes the performance degradation and latency of branch instructions through the use of branch prediction and speculative execution. The 6x86 uses a 256-entry, four-way set associative Branch Target Buffer (BTB) to store branch target addresses and branch prediction information, and an eight-entry return stack to cache the target address of RET instructions. The decision to fetch the taken or not taken target address is based on a four-state branch prediction algorithm that achieves approximately 90% accuracy. Floating Point Unit The on-chip FPU achieve high performance by executing floating point instructions in parallel with integer instructions through a 64-bit interface. It is x87 instruction set compatible and adheres to the IEEE-754 standard. The FPU incorporates a four-deep instruction queue and a four-deep store queue to facilitate parallel execution. Information is passed to and from the FPU using eight data registers accessed in a stack-like manner, a control register, and a status register. Cache Unit The 6x86 contains two caches a 16-KByte dual-ported unified cache and a 256-byte instruction line cache. As the unified cache can store instructions and data in any ratio, it offers a higher hit rate than separate data and instruction caches of equal size. An increase in overall cache-to-integer unit bandwidth is achieved by supplementing the unified cache with a small, high-speed, fully associative instruction line cache. Memory Management Unit The Memory Management Unit (MMU) translates the linear address supplied by the IU into a physical address to be used by the unified cache and the bus interface. Memory management procedures are x86 compatible, adhering to standard paging mechanisms. Bus Interface Unit The BIU provides the signals and timing required by external circuitry. The 64-bit data bus supports two different burst cycle address sequence modes. The "one-plus-four" burst mode is compatible with the P54C burst order. Operating the CPU in linear burst mode minimizes bus activity and results in higher performance. Linear burst mode is supported in many existing 64-bit chipsets. System Management Mode (SMM) provides an interrupt that can be used for system power management or software transparent emulation of I/O peripherals. Additionally, the 6x86 supports a hardware interface that allows the CPU to be placed into a low-power suspend mode. | Features and Benefits | Architectural Overview | Architectural Comparison | | Technical Specifications | Performance Benchmarks | Page Top | Architectural Comparison | Features and Benefits | Architectural Overview | Architectural Comparison | | Technical Specifications | Performance Benchmarks | Page Top | Technical Specifications Products | Buy Cyrix | Reseller | Developers | Corporate | Off the Page | Top Copyright & Legal Info © 1997 by Cyrix Corporation, U.S.A.