Nonuniform Mmeory Access in the context of Shared Memory Architecture
Nonuniform Memory Access (NUMA) in the context of Shared Memory Architecture
One possible approach to build a scalable shared memory system is to
maintain the uniform memory access or what is called the "dancehall"
as shown below and provide a scalable interconnect between the
processors and memory:
Every memory access is translated as a message transaction over the
network. Same as it might be translated into a bus transaction in the
SMP architecture.
Disadvantage: The latencies incurred by accesses to memory
would require large bandwidth to be supplied to every processor in the network.
An alternative approach is interconnecting a set of complete
processors, each with not only his own cache ($) but also local
memory. This approach is referred to as NonUniform Memory Access
(NUMA):
- In this organization, processor and memory modules are closely
integrated such that access to local memory is faster than access to
remote memories.
- Also in this approach the I/O system may either be part of every
node or consolidated into special I/O nodes.
- The ability to access local memory quickly doesn't effect much, if
at all, the access time of remote data.
- Overall bandwidth placed on the network is also reduced.
- The NUMA approach is widely used at a large scale because of its
inherent performance advantages and because it makes a better use of
the mainstream processor memory system technology.
- Systems such as the CRAY T3E
and SGI/CRAY Origin2000
incorporate the NUMA design. In the case of the Origin2000, it also
has built-in hardware to support cache
coherency so it is referred to as cache coherent NUMA or cc-NUMA
architecture. Recently there has been a convergence between the T3E
and the Origin2000 designs.