CRAY-2 Architecture and Design
In addition to the cooling technology, the CRAY-2's extremely high processing rates were achieved by a balanced integration of scalar and vector capabilities and a large Common Memory in a multiprocessing environment.
The significant architectural components of the CRAY-2 Computer System included four identical Background Processors, 256 million 64-bit words of Common Memory, a Foreground Processor and a maintenance control console.
Each of the four identical Background Processors contained registers and functional units to perform both vector and scalar operations. The single Foreground Processor supervised the four Background Processors, while the large Common Memory complemented the processors and provided architectural balance, thus assuring extremely high throughput rates.
Onsite maintenance is possible via the maintenance control console.
Background Processors
Each Background Processor consisted of a computation section, a control section, and a high-speed Local Memory. The computation section performed arithmetic and logical calculations. These operations and the other functions of a Background Processor were coordinated through the control section. Local Memory was used to store temporarily scalar and vector data during computations. Each Local Memory was 16,384 64-bit words.
Control and data paths for one Background Processor are shown in the block diagram left
Computation Section
The computation section contains registers and functional units that operate together to execute a program of instructions stored in memory.
Computation Section Characteristics
Twos complement integer and signed magnitude floating-point arithmetic
Address and Arithmetic Registers
Eight 32-bit address (A) registers
Eight 64-bit address (S) registers
Eight 64-element vector (V) registers; 64-bits per element
Address Functional Units
Add/Subtract
Multiply
Scaler Functional Units
Add/Subtract
Shift
Logical
Population/Parity
Leading Zero Count
Vector Functional Units
Logical
Integer
Shift
Add/Subtract
Population/Parity
Leading Zero Count
Compressed IOTA
Floating-point Functional Units
Add/Subtract
Multiply/Reciprocal/Square Root
Scatter and Gather vector operations to and from Common Memory
Local Memory
Each Background Processor contained 16,384 64-bit words of Local Memory. Local Memory was treated as a register file to hold scalar operands during computation. It could also be used for temporary storage of vector segments where these segments were used more than once in a calculation in the vector registers. The access time for Local Memory is four clock periods; accesses could overlap accesses to Common Memory. This Local Memory replaced the B and T registers on the CRAY-1 and was readily available for user jobs. One application was for small matrices.
Local Memory Characteristics
16,384 64-bit words
Holds scaler and vector operands during computation
Temporary storage of vector segments
Four-clock period accesses with Common Memory accesses
Replaces CRAY-1 B and T registers
Control Section
Each Background Processor contained an identical independent control section of registers and instruction buffers for instruction issue and control. Each Background Processor had a 64-bit real-time clock. These clocks and the Foreground Processor real-time clock were synchronized at system start-up and were advanced by one count in each clock period.
Control Section Characteristics
Eight instruction buffers, each holding 64 16-bit instruction parcels
128 basic instruction codes
32-bit Program Address Register
32-bit Base Address Register
32-bit Limit Address Register
64-bit real-time clock
Eight Semaphore flags to provide interlocks for Common Memory access
32-bit Status register
Background Processor Intercommunication
Background Processor Intercommunication
Synchronization of two or more Background Processors cooperating on a single job was achieved through the use of one of the eight Semaphore flags shared by the Background Processors. These flags were one-bit registers providing interlocks for common access to shared memory fields, A Background Processor was assigned access to one Semaphore flag by a field in the Status register. The Background Processor had instructions to test and branch, set and clear a Semaphore flag.
Common Memory
One of the primary technological advantages of the CRAY-2 Computer System was its extremely large, directly addressable Common Memory. Featuring 268,435,456 words, this Common Memory was significantly larger than that offered by any other commercially available computer system. It allowed the individual user to run programs that would be impossible to run on any other system. It also enhanced multiprogramming by allowing an exponential increase in the number of jobs that can reside concurrently In memory (that is, that can be multi-programmed).
Common Memory was arranged in four quadrants of 32 banks each, for a total of 128 banks. A word of memory consisted of 64 data bits and 8 error correction bits (SECDED). This memory was shared by the Foreground Processor, Background Processors, and peripheral equipment controllers. Each bank of memory had an independent data path to each of the four Common Memory ports. Each bi-directional Common Memory port is connected to a Background Processor and a foreground communications channel. Total memory bandwidth was 64 gigabits or 1 billion words per second.
Common Memory Characteristics
256 million words
64 data bits, 8 error correction bits per word
128 banks; 2 million words per bank
Dynamic MOS memory technology
Foreground Processor and I/O section
The Foreground Processor supervised overall system activity among the Foreground Processor, Background Processors, Common Memory and peripheral controllers. System communication occurred through four high-speed synchronous data channels.
Firmware control programs for normal system operation and a set of diagnostic routines for system maintenance were integral to the Foreground Processor.
Control circuitry for external devices was also located within the CRAY-2 mainframe.
Foreground Communication Channels
The Foreground Processor was connected to four 4-gigabit communication channels. These channels linked the Background Processors, Foreground Processors, peripheral controllers, and Common Memory. Each channel connected one Background Processor, one group of peripheral controllers, one Common Memory port, and the Foreground Processor. Data traffic traveled directly between controllers and Common Memory.