Why did Emu Choose Cilk?

An interview with Martin Deneroff, Chief Operating Officer.

Which criteria did the Emu team deem essential to delivering a highly scalable High Performance Computing / Big Data solution?

First and foremost, we needed a good match to our architecture to make the compiler implementation straightforward. We prioritized that it be easy to understand and learn, and, ideally, familiar to most programmers. Since our solution is a fine-grained, massively parallel implementation, we wanted a language which is inherently parallel without requiring that parallelism be invoked through external libraries.


Would you tell me a bit about Cilk’s history that makes it a great fit for Emu’s Migratory Memory Side Processing?

Cilk is based on C, which makes it relatively easy to compile, and is familiar to most programmers. C is different from Cilk insofar as Cilk is inherently parallel, but C requires use of a library like OpenMP. The parallel concepts in Cilk match well with the underlying concepts in our hardware for invoking parallelism – our hardware has a spawn instruction which corresponds to a Cilk_spawn almost exactly. The hardware implements a shared memory paradigm, which is what Cilk naturally supports.

Cilk is somewhat restrictive in the kinds of parallel constructs it allows you to create so that you can’t make foolish errors. The incidence of subtle programming errors like race conditions is much lower in Cilk programs than in programs using environments like OpenMP or MPI, and the simplified structures available in Cilk tend to make the programs easier to understand. While some developers see restrictions as a disadvantage, we see this as an advantage – the restrictions help engineers to focus on algorithmic development rather than worry about the intrinsic art of parallelization. While we recommend Cilk, for those who are committed to riskier approaches based on their personal expertise, we’ll certainly support them leveraging the underlying Emu advantages of migratory threads.


Which languages did we investigate/analyze? 

There were a number of other languages we could have selected. We looked at X10, Chapel, Habanero and UPC – they are all less familiar than Cilk and, in our estimation, are much harder to compile. We didn’t see particular advantages to choosing them for our Migratory Memory-side Processing architecture.

We also looked at OpenMP together with C. While we are in the process of adding support for this to our platform, we find it to be both less efficient and harder to program. That said, it is more widely adopted, and as such it’s important to have in our tool chest.

The parallelism of C plus OpenMP is done through a library which makes it less feasible for the compiler to perform error checking. This opens the door to a variety of serious programming errors, including race conditions and non-determinism. Bottom line, Cilk looked like less work for both us and our customers than using C plus OpenMP.


Which Cilk enhancements are needed to make it straight forward to port OpenMP codes?

We see that it’s important to add Eurekas. Most parallel codes implement barriers which cause codes to wait until every thread completes. A Eureka kicks off a bunch of threads – when one of them finds the answer, that thread calls out “eureka” and everyone jumps to the barrier and finishes. We see this capability being added to Cilk in its next release.

We’re also looking to add co-routines through use of libraries.  Using standard Cilk, when children finish they only return to the parent. OpenMP and Habanero have a notion of co-routines that run at the same level as the parent. Thise is sometimes useful, but frequently introduces bugs – it’s a religious position for those programmers who want complete control. As such, we want to support it for those who are confident in their abilities to handle the risks.

We’ll build the Cilk Race Detector right into the compiler. It can analyze your program and see if you’ve built it with a bug. Having a co-routine or other non-Cilk library operations makes it very difficult to utilize the Cilk Race Detector. That’s why we advise against giving up the great functionality it provides.


Are there other keywords or enhancements you envision are desirable and what would we use them for?

We will look to implement some vector capabilities that Intel introduced. There’s already work going on with Sparse Matrix Vector work for the Emu platform under Richard Vuduc at Georgia Tech.

We’re interested in reintroducing the Inlet. An Inlet sends data back to the parent before the child has completed. It creates a shared memory where a child can deposit a result and the parent can poll the data, analogous to Chapel’s “future.” The initial implementation can be improved upon.

A paper on Cilk basics can be found here:

The Implementation of the Cilk 5 Multi-threaded Language:  http://supertech.csail.mit.edu/papers/cilk5.pdf

Find out more about Cilk at http://cilk.mit.edu/

Emu Technology completes $5M Series A-2 Funding Round

(Nov. 28, 2017) – Emu Technology today announced that it has completed its series A-2 funding round, raising over $5M from a mix of new and previous investors. Samsung Ventures Investment Corporation joins previous investors Blu Ventures and IrishAngels in this round focused on scaling adoption of the migratory thread technology for data intensive Big Data analytics.
“We are thrilled to support EMU Technology’s growth and innovation as it is both unique and disruptive for the data intensive computing platforms,” said an executive from Samsung Ventures Investment Corporation. “So when we find a way by which the world’s largest platform can work with one of the most disruptive technologies and these two pieces come together, we find that to be a highly transformative and powerful way to innovate.”
“Emu will use this funding to accelerate availability and tuning of critical software, including Open MP, Python and Caffe, through investment of our own resources and by making Emu systems available to algorithm, middleware and application developers,” said Martin Deneroff, Chief Operating Officer Emu Technology. “At the same time, we will fund our next generation hardware development initiative focused on large system scaling coupled with deployable low energy designs. Beginning immediately, we are investing in a significant ramp in software and hardware developer hiring in in New York City, New York and in South Bend, Indiana.”
Previous funding was used to successfully complete the design and commercial manufacture of the Emu Chick, which is now generally available for data intensive Big Data analytics applications in government and enterprise environments. The Emu Chick is built and proudly delivered by Plexus manufacturing in Neehah, WI. The Emu team has also completed software development efforts to support industry standard Centos 7.3, Cilk, C++, CilkPlus and GMP library.
The Emu Chick is a 256 core computer tower system that operates from 120 VAC power and requires no special computer room infrastructure. It provides sufficient memory and storage for many common Data Analytics applications such as graph analysis, cybersecurity, non-obvious relationship analysis and semi-supervised machine learning. It is completely software compatible with larger Emu1 systems.
About Emu Technology. Emu Technology is the leader for data intensive, real-time Big Data computing, combining finely grained parallelism with in-Memory computing and migration of compute context to data. The result is greater scale, greater efficiency and lower energy required to deliver high fidelity insights in less time.

Emu Technology announces DRIVE Program for Developers

Emu Technology is pleased to announce its Developer/Researcher Initiative for high-Velocity Exascale (DRIVE) program for algorithm and application developers. Drive is a comprehensive program that enables access to essential tools, Emu hardware and a supportive community for researchers and developers who are chartered with addressing the toughest challenges in scientific, data intensive Big Data analytics today. “The Emu solution is designed from the ground up to minimize data movement by delivering a fine grained, parallel environment with massive shared memory, coupled with automatic migration of the computing context to data, rather than the traditional von Neumann model of moving data to the CPU or GPU,” said Martin Deneroff, Chief Operating Officer, Emu Technology. “We’ve had tremendous interest from the research community to have access to Emu systems, and the DRIVE Program is designed to accelerate that access.”

Emu Technology is partnering with scientists and researchers who are focused on developing new algorithms for Processing in Memory (PIM) systems using industry standard tools including Cilk, Open MP, Python and Caffe, as well as those with interest in optimizing algorithmic operations and library calls, such as matrix-matrix multiply and GraphBLAS. “We were impressed that we could quickly run codes on the Emu system, and our students immediately jumped on to do independent development after a brief tutorial” said Professor Tom Conte, Founding Director of the Georgia Tech Center for Research into Novel Computing Hierarchies, which received an Emu Chick at the beginning of November. “The programming environment is familiar, even though the architecture is truly innovative. As part of our commitment to broad research, we have made the Emu system available to students, faculty and industry collaborators, who can sign up for access on our CRNCH center site.”

Emu has partnered with Reservoir Labs for the CilkPlus front-end and runtime environment, knowing they embrace our emphasis on ease of use. “Our CilkPlus front-end enables fine-grain asynchronous task spawning and synchronization,” said Benoit Meister, Managing Research Engineer at Reservoir Labs. “Building a high-performance, extensible compiler that presents a familiar environment enables developers to focus their efforts on algorithms and applications rather than learning the intricacies of new programming models.”

The Drive program provides simulator software, remote access to migratory thread capable hardware running standard Linux, grant writing support and a community of like-minded innovators delivering an ever-expanding range of middleware, algorithms and applications. In contrast to other novel architectures such as Quantum computers, the Emu design utilizes familiar coding paradigms which enable accelerated adoption of the technology. In addition to providing remote access to an Emu system, the Emu team will help researchers develop a business case for a grant by providing insights into the work effort and estimates of performance. Sign up to accelerate your scientific efforts today by visiting the Emu booth 2101 at SC17. Or, by visiting Emu’s website.

Emu Technology is the leader for data intensive, real-time Big Data computing, combining finely grained parallelism with in-Memory computing and migration of compute context to data. The result is greater scale, greater efficiency and lower energy required to deliver high fidelity insights in less time.

Emu Chick Migratory Thread System First to Enter Georgia Tech’s Rogue’s Gallery

Emu Technology announces Georgia Tech’s receipt of an Emu Chick 256 core system with breakthrough migratory thread technology. Georgia Tech’s Center for Novel Computing Hierarchies (CRNCH) is spearheading an initiative to make unique computers available to industry and academia, and the Emu Chick is the first system selected. “This memory-centric architecture employs threads to move the compute context to the data, rather than a conventional approach of moving data to processors. We anticipate exciting breakthroughs in data intensive massive scale graph algorithms performance,” said David A Bader, Professor and Chair, School of Computational Science and Engineering who leads the effort.  “Emu’s paradigm flips the compute process on its head and makes it an ideal choice for the Rogue’s Gallery.”

The Emu team has been developing an Exascale-capable computing architecture specifically designed to tackle the data intensive Big Data applications that choke today’s most powerful computers. “Our systems put data at the heart of their design,” said Martin Deneroff, Chief Operating Officer Emu Technology. “By recognizing that today’s toughest compute challenges are data access and data movement dominated unlike applications of the past, we’ve rethought how to effectively drive efficient use of today’s technologies for real-time Big Data analysis.”

The Emu architecture is perfectly suited for real-time analysis of massive sparse data and streaming data that cannot be addressed with any traditional von Neumann architecture computers. This focus sits at the heart of Georgia Tech’s on-going research into high performance graph analytics with their innovative STINGER framework. The combination promises greater scale, greater efficiency and lower energy consumption required to deliver real-time pattern matching and trend analysis that is essential in threat intelligence, personalized medicine, fraud detection and machine learning.

The Georgia Tech CRUNCH system is funded with an Intelligence Advanced Research Projects Activity (IARPA) grant of $662,525.

About Emu Technology. Emu Technology is the leader for data intensive, real-time Big Data computing, combining finely grained parallelism with in-Memory computing and migration of compute context to data. The result is greater scale, greater efficiency and lower energy required to deliver high fidelity insights in less time.

Emu Technology delivers Emu Chick Memory Server to ORNL

chick right-sized memory server

(October 20, 2016) – Emu Technology today announced that it has delivered an Emu Chick Memory Server to Oak Ridge National Laboratory.  The Emu Chick, which features migratory threads and memory side-processing architecture, is a compact tower implementation of Emu’s rack-based Emu1 Memory Server and is capable of operating in a “copy room” environment using 120 VAC power.


Read More

Emu Technology completes Series A-1 Funding Round

(July 15, 2016) – Emu Technology today announced that it has completed its series A-1 funding round, raising over $3.49M from a mix of new and previous investors. Leading the round was Blu Ventures Inc., with major participation from the IrishAngels.

CEO Ken Jacobsen said “This new funding will enable Emu to complete the design, manufacture, and marketing of both our Emu Chick and Emu1 Memory Server Products. We are excited by the enthusiasm and confidence of our investors and customers, and thank the Board of Directors and management team for their efforts in completing the round.”

Emu Technology announces the Chick Right-sized Memory Server

(May 3, 2016) – Emu Technology today announced that it will make its patented Migratory Memory-Side Processing technology affordable to a much broader base of customers with the introduction of the Chick Memory Server.  The Chick is a compact tower implementation of Emu’s rack-based Emu 1 Memory Server, and is capable of operating in a “copy room” environment using 120 VAC power.  “The Chick will be a game changer for customers with shrinking IT hardware budgets and a growing dependence on big data and high performance computing (HPC) to drive business decisions in real-time, or to achieve engineering and scientific breakthroughs”, says Marty Deneroff, Emu Technology COO.

Read More

Emu Technology provides a quantum leap in performance for “Big Data”

Austin Texas Press Release

Nationally awarded co-founders Dr. Peter Kogge, Dr. Jay Brockman and Dr. Ed Upchurch put together a team of world renowned computer architects at Emu Technology. For the past five years, this team has been developing an Exascale-capable computing architecture designed specifically to tackle the ‘Big Data’ applications that are choking today’s supercomputers. Emu Technology will unveil its revolutionary Migrating Thread Computer at the SC15 supercomputing conference, November 16-19, 2015, in Austin, Texas.

Read More