The Big Deal About Big Data

Typically, Big Data refers to large, unstructured databases containing huge amounts of disparate information. Many databases are far too large to fit into a single memory system, and are usually too big to fit in the main memory of most computer systems. Processing the database requires accessing data across many (often hundreds or thousands) of memory systems. Conventional computers are designed around an assumption that the large majority of references are to local memory. For Big Data this is not the case! … so processing grinds to a slow crawl.


Our Solution

Emu is designed from the ground up to deal with data that has little or no locality – Referencing data spread across many memories is not a problem! Now we can solve more complex analytics on larger data sets.

Achieve the next level of insight

Faster through massive parallelism and efficient memory retrieval

Lower energy – less data moved shorter distances

Compute, memory size, memory bandwidth and software all scale simultaneously


Migratory Memory-Side Processing

Emu Technology has developed Migratory Memory-Side Processing, which is processing tightly coupled to a distributed shared memory, without needing buses or caches. We do this with:

Many lightweight cores tightly coupled to memory

  • Minimizes latency and energy use
  • Unnecessary data is not fetched to fill cache lines
  • No cache coherency traffic is required

Executing thread moves to the data

  • Network traffic is one way
  • Moving a thread context moves less data than reading a data block from a remote memory

Programmed in a true parallel language, Cilk, instead of library calls


Weak Locality

In EMU, reading a memory location on a different node causes the context to move to the node containing that location (the Locale of the reference), instead sending a read across the network. This approach wins whenever more than one reference occurs at a locale. Processors never stall for long periods waiting for remote reads, overall utilization is improved. The network is simplified because it no longer requires round trip read and response messages. Remote Writes can be performed directly or via migrations, under programmer (compiler) control.

Conventional cache-based computers rely on Strong Locality for performance

Strong Locality is the situation where multiple data accesses come from a single cache line of 64 to 1024 bytes. Modern codes, especially sparse matrix and graph codes, increasingly fail to exhibit this situation.

Weak Locality is the situation where multiple data accesses come from the same locale (an entire bank of 4 GB or more). EMU gains performance from Weak Locality, and has no reliance on data adjacency.


EMU Architecture

Click image to enlarge.

Stationary Cores (SCs) execute the Operating System and File Systems, and Call or Spawn Gossamer Threadlets to access shared memory and perform migratory processing.

Gossamer Cores (GCs) execute Threadlets at Gossamer Nodelets, perform computations, migrate to other Nodelets, spawn new Threadlets and call System Services on SCs.

  • 32 nodes/motherboard
  • Each motherboard and its nodes makes up a Supergroup
  • Supergroups are interconnected with a high radix RapidIO network, configurable to as much as 64k nodes