Cyrix 6x86™ Processor
Processor Brief

The Cyrix 6x86™ processor family offers the highest level of performance
available for desktop PCs today. Through the use of innovative, sixth-generation
architectural techniques, the 6x86 processors achieve best-in-class
performance that surpasses the Pentium® processor in each performance
class.

The superscalar, superpipelined 6x86 processor, available in PR200+,
PR166+, PR150+, PR133+ and PR120+ performance classes, is
optimized to run both 16-bit and 32-bit software. It is fully compatible with the
x86 instruction set and delivers industry-leading performance running Windows®
95, Windows NT, Windows, OS/2®, DOS, Solaris UNIX® and other operating
systems.

The Cyrix 6x86 processor is optimized for both 16-bit and 32-bit applications.
Our goal is to offer users of 6x86-based PCs an easy path to higher
performance for Windows NT and to MMX that protects today’s PC investment.
The next version of Cyrix’s 6x86 processor, code-named M2, will provide
optimum performance on 32-bit software and will be fully compatible with MMX.
This new processor will leverage existing 6x86 motherboard platforms.

The Cyrix 6x86 processor achieves top performance through the use of two
optimized superpipelined integer units and an on-chip FPU. The integer and
floating point units are optimized for maximum instruction throughput by using
advanced architectural techniques including register renaming, out-of-order
completion, data dependency removal, branch prediction and speculative
execution. These design innovations eliminate many data dependencies and
resource conflicts to achieve high performance when executing existing
non-recompiled software programs as well as future x86-compatible code. While
the 6x86 achieves superior performance with existing software, it takes
advantage of any recompiled code to gain an additional 5-10% performance
increase.



     Features and Benefits
     Architectural Overview (Synopsis)
     Architectural Comparison
     Technical Specifications
     Performance Benchmarks



Features and Benefits

Superscalar architecture
     Provides two pipelines to execute multiple instructions in parallel for
     faster processing and higher performance.
Superpipelining
     Increases the number of pipeline stages to avoid execution stalls and
     keep information flowing faster for higher frequency scalability.
Register Renaming
     Provides temporary data storage for instant data availability without
     waiting for the CPU to access the on-chip cache or main system
     memory.
Data Dependency Removal
     Provides instruction results to both pipelines simultaneously so that
     neither pipeline is stalled.
Multi-Branch Prediction
     Boosts processor performance by predicting with high accuracy the next
     instructions needed.
Speculative Execution
     Allows the pipelines to continuously execute instructions following a
     branch without stalling the pipelines.
Out-of-Order Completion
     Lets the faster instruction exit the pipeline out of order, saving processing
     time without disrupting program flow.
80-bit Floating Point Unit (FPU)
     Provides high performance by speculatively executing FPU and integer
     instructions in parallel.
16-KByte Unified Write-Back Cache
     Stores the most recently used data and instructions for single-cycle,
     on-chip access.


 | Features and Benefits | Architectural Overview | Architectural Comparison |
    | Technical Specifications | Performance Benchmarks | Page Top |


Architectural Overview

The 6x86 is the first in a new generation of high-performance, x86-compatible
processors. This sixth-generation processor achieves optimum performance on
existing and emerging software applications. The superscalar architecture of the
Integer Unit allows multiple instructions to be processed simultaneously in two
separate pipelines. Through the use of innovative architectural techniques, the
6x86 eliminates many data dependencies and resource conflicts inherent in
other microprocessor designs.

The 6x86 consists of five major functional blocks the Integer Unit, Cache Unit,
Memory Management Unit, Floating Point Unit and Bus Interface Unit.
Instructions are executed in the X and Y pipelines within the Integer Unit and the
Floating Point Unit. The Cache Unit stores the most recently used data and
instructions allowing fast access to the information by the Integer Unit and FPU.

Physical addresses are calculated by the Memory Management Unit and
passed to the Cache Unit and the Bus Interface Unit (BIU). The BIU provides the
interface between the external system board and the processor's internal
execution units.

Integer Unit
The Integer Unit provides parallel instruction execution using two seven-stage
integer pipelines. Each of the two pipelines, X and Y, can process several
instructions simultaneously.

     The Instruction Fetch (IF) stage fetches 16 bytes of code from the cache
     unit in a single clock cycle and checks the code stream for any branch
     instructions that could affect normal program sequencing.
     Instruction Decode (ID). ID1 evaluates the code stream and determines
     the number of bytes in each instruction. Up to two instructions per clock
     are delivered to the ID2 stages.
     Address Calculation (AC). AC1 calculates a linear memory address for
     the instruction if the instruction refers to a memory operand. AC2
     performs any required memory management functions, cache accesses
     and register file accesses. If a floating point instruction is detected, AC2
     sends it to the FPU for processing.
     The Execute (EX) stage executes instructions using the operands
     provided by the address calculation stage.
     The Write-Back (WB) stage stores execution results either to a register
     file within the Integer Unit or to a write buffer in the cache control unit.

Out-of-order processing. If an instruction executes faster than a previous
instruction in the other pipeline, the instructions may complete out of order.
Out-of-order completion occurs in the EX and WB stages.

Data dependency solutions. Data dependencies typically force serialized
execution of instructions and can degrade performance. The 6x86, however,
implements register renaming, data dependency removal (including operand and
result forwarding), and data bypassing to effectively resolve data dependencies
and allow parallel execution of instructions containing these dependencies.

Branch control. Branch instructions occur on average every four to six
instructions in x86 compatible programs. The pipeline stages may stall while
waiting for the CPU to process the new instruction stream. The 6x86 minimizes
the performance degradation and latency of branch instructions through the use
of branch prediction and speculative execution.

The 6x86 uses a 256-entry, four-way set associative Branch Target Buffer (BTB)
to store branch target addresses and branch prediction information, and an
eight-entry return stack to cache the target address of RET instructions. The
decision to fetch the taken or not taken target address is based on a four-state
branch prediction algorithm that achieves approximately 90% accuracy.

Floating Point Unit
The on-chip FPU achieve high performance by executing floating point
instructions in parallel with integer instructions through a 64-bit interface. It is
x87 instruction set compatible and adheres to the IEEE-754 standard. The FPU
incorporates a four-deep instruction queue and a four-deep store queue to
facilitate parallel execution. Information is passed to and from the FPU using
eight data registers accessed in a stack-like manner, a control register, and a
status register.

Cache Unit
The 6x86 contains two caches a 16-KByte dual-ported unified cache and a
256-byte instruction line cache. As the unified cache can store instructions and
data in any ratio, it offers a higher hit rate than separate data and instruction
caches of equal size. An increase in overall cache-to-integer unit bandwidth is
achieved by supplementing the unified cache with a small, high-speed, fully
associative instruction line cache.

Memory Management Unit
The Memory Management Unit (MMU) translates the linear address supplied by
the IU into a physical address to be used by the unified cache and the bus
interface. Memory management procedures are x86 compatible, adhering to
standard paging mechanisms.

Bus Interface Unit
The BIU provides the signals and timing required by external circuitry. The 64-bit
data bus supports two different burst cycle address sequence modes. The
"one-plus-four" burst mode is compatible with the P54C burst order. Operating
the CPU in linear burst mode minimizes bus activity and results in higher
performance. Linear burst mode is supported in many existing 64-bit chipsets.

System Management Mode (SMM) provides an interrupt that can be used for
system power management or software transparent emulation of I/O peripherals.
Additionally, the 6x86 supports a hardware interface that allows the CPU to be
placed into a low-power suspend mode.


 | Features and Benefits | Architectural Overview | Architectural Comparison |
    | Technical Specifications | Performance Benchmarks | Page Top |


Architectural Comparison



 | Features and Benefits | Architectural Overview | Architectural Comparison |
    | Technical Specifications | Performance Benchmarks | Page Top |


Technical Specifications





 Products | Buy Cyrix | Reseller | Developers | Corporate | Off the Page | Top

       Copyright & Legal Info © 1997 by Cyrix Corporation, U.S.A.