Overview of the Tera MTA II
The Tera Processors are multithreaded (called a stream) and each processor switches context every cycle among as many as 128 hardware threads, thereby hiding up to 128 cycles (384 ns) of memory latency.
Each processor executes a 21 stage pipeline and so can have 21 separate streams executing simultaneously
Each stream can issue as many as eight memory references without waiting for earlier ones to finish, further augmenting the memory latency tolerance of the processor.
A stream implements a load-store architecture with three addressing modes and 31 general-purpose 64-bit registers.
- Switching between such streams (“threads”) is fully supported by hardware
The peak memory bandwidth is 2.67 gigabytes per second.