Supercomputing poised for a massive speed boost

Plans to build ‘exascale’ machines are moving forward, but still face major technological challenges.
By Katherine Bourzac
29 November 2017

At the end of July, workers at the Oak Ridge National Laboratory in Tennessee began filling up a cavernous room with the makings of a computational behemoth: row upon row of neatly stacked computing units, some 290 kilometres of fibre-optic cable and a cooling system capable of carrying a swimming pool’s worth of water. The US Department of Energy (DOE) expects that when this US$280-million machine, called Summit, becomes ready next year, it will enable the United States to regain a title it hasn’t held since 2012 — home of the fastest supercomputer in the world.

Summit is designed to run at a peak speed of 200 petaflops, able to crunch through as many as 200 million billion ‘floating-point operations’ — a type of computational arithmetic — every second. That could make Summit 60% faster than the current world-record holder, in China.

But for many computer scientists, Summit’s completion is merely one lap of a much longer race. Around the world, teams of engineers and scientists are aiming for the next leap in processing ability: ‘exascale’ computers, capable of running at a staggering 1,000 or more petaflops. Already, four national or international teams, working with the computing industries in their regions, are pushing towards this ambitious target. China plans to have its first exascale machine running by 2020. The United States, through the DOE’s Exascale Computing Project, aims to build at least one by 2021. And the European Union and Japan are expected to be close behind.

Scientists anticipate that exascale computers will enable them to solve currently intractable problems in fields as varied as climate science, renewable energy, genomics, geophysics and artificial intelligence. That could include pairing detailed models of fuel chemistry and combustion engines in order to more quickly identify improvements that could lower greenhouse-gas emissions. Or it might allow for simulations of the global climate at a spatial resolution as high as a single kilometre. With the right software in hand, “there will be a lot of science we can then do that we can’t do now”, says Ann Almgren, a computational scientist at the Lawrence Berkeley National Laboratory in California.

But reaching the exascale regime is a tremendous technological challenge. The exponential increases in computing performance and energy efficiency that once accompanied Moore’s law are no longer guaranteed, and aggressive changes to supercomputer components are needed to keep making gains. Moreover, a supercomputer that performs well on a speed test is not necessarily one that will excel at scientific applications.

The effort to push high-performance computing to the next level is forcing a transformation in how supercomputers are designed and their performance measured. “This is one of the hardest problems I’ve seen in my career,” says Thomas Brettin, a computer scientist at the Argonne National Laboratory in Illinois, who is working on medical software for exascale machines.

Accelerated hardware

Broader trends in the computing industry are shaping the path to exascale computers. For more than a decade, transistors have been so tightly packed that computing chips can’t be made to run at faster rates. To circumvent this, today’s supercomputers lean heavily on parallelism, using banks of chips to create machines with millions of processing units called ‘cores’. A supercomputer can be made more powerful by stringing together more of these chips.

But as these machines get bigger, data management becomes more of a challenge. Moving data in and out of storage, and even within cores, takes much more energy than the calculations themselves. By some estimates, as much as 90% of the power supplied to a high-performance computer is used for data transport.

That has led to some alarming predictions. In 2008, in a report for the US Defense Advanced Research Projects Agency, a team headed by computer scientist Peter Kogge concluded that an exascale computer built from foreseeable technologies would need gigawatts of power — perhaps from a dedicated nuclear plant (see “Power is the number one, two, three and four problem with exascale computing,” says Kogge, a professor at the University of Notre Dame in Indiana.

In 2015, in light of technological improvements, Kogge reduced this estimate down to between 180 and 425 megawatts. But that is still substantially more power than today’s top supercomputers use; the system that leads the world rankings today — China’s Sunway TaihuLight — consumes about 15 megawatts.
Read the entire article on

Emu Technology Aims To Accelerate The Delivery Of A New Generation Low-Energy Supercomputers

Emu Technology is leading the way by delivering a new computing architecture for data intensive, real-time Big Data computing that breaks with the computer designs of the past by combining finely grained parallelism with in-Memory computing and migration of compute context to data. The result is greater scale, greater efficiency and lower energy required to deliver high fidelity insights in less time. Below is our interview with Martin Deneroff, Chief Operating Officer of Emu Technology:

Q: What is Emu’s story, how did you start?

A: Our founders Dr Peter Kogge, Dr. Jay Brockman and Dr. Ed Upchurch recognized a need for a new approach to computing and founded Emu Technology. While other companies are tackling the performance limits of existing computers including cpu clockspeed and transitor density, power demands, as well as memory, cache and GPU size, Emu envisioned a completely novel approach to achieving high-performance. Their “genius of the and” approach adds a breakthrough approach that complements the efforts of major players in the computing industry, and promises to accelerate the delivery of a new generation of deployable, low-energy supercomputers.

Recommended: Simply NUC Raises $8M Series A Funding To Deliver Compact And Fully Assembled Desktop Replacement Systems

Q: What makes your solution unique?

A: Emu was designed from the ground up, recognizing that data movement is the key contributor to bottlenecks throughout all of today’s computer systems. Traditional computers move data to a CPU or GPU for complex calculations. It’s this data movement where performance is throttled and that places extreme demands on power and cooling to make it happen. With Emu, data is resident in memory, there are many cores and rather than moving data to the CPU or GPU, the process context is sent to where the data is and calculated by the local core. We call this patented innovation Migratory Threads. Migratory Threads, along with our system design which does not have buses or caches is optimal for data intensive Big Data analytics. For these kinds of applications, our approach is lighter-weight, faster, more efficient and uses less energy.

Q: What exactly is Emu Chick, how does it work?

A: The Emu Chick is a 256 core compact tower computer system. It runs industry standard Centos 7.3 Linux, and is programmed in CilkPlus, which is a high-performance code that enables Emu’s migratory thread technology. While extremely innovative, CilkPlus presents a familiar programming environment that enables developers to focus on algorithms and applications, rather than learning the intricacies of a new language. The Chick operates from 120 VAC power and requires no special computer room infrastructure. It provides sufficient memory and storage memory and storage for common data analytics applications and is completely software compatible with larger Emu1 rack systems.

Q: Who are the primary users of Emu and what are some of the key challenges you are helping them solve?

A: Emu users are the scientists, researchers and developers who are tackling the most profound computing challenges of our era. These are people who are tasked with analyzing massive data warehouses and unstructured databases containing 100s of TBs or more of disparate information. For applications in segments like threat intelligence, graph analysis, cybersecurity, semi-supervised and unsupervised machine learning, a “no compromise” approach to addressing data intensive Big Data is needed. That means being able to handle data streaming and changes to the data set while analysis is underway. This need is common in defense, government research labs, real-time trend and pattern analysis and personalized medicine to name a few.

Recommended: ROOT Delivers Next Gen Colocation Solutions With Increased Energy Efficiency And Low-Latency Connectivity

Q: How did you chose to grow your business in South Bend, Indiana and New York City, New York instead of a more traditional high tech location?

A: South Bend is the home of our founders and the source for great research and innovative thinking. New York City boasts a vibrant high tech scene with industry emphasis on Big Data analytics. Our solution is designed to address data intensive Big Data analytics – that’s a combination streaming data, data sets with no predictable regularity and computation dominated by data movement instead of floating point operations. Having Emu engineering in technology centers outside the strictures of traditional computer design facilitates our own innovative thinking and enables us to be unchained from the limitations of the past.

Q: What can we expect from Emu in the coming months?

A: Emu will be ramping our hiring of both software and hardware developers to accelerate the delivery of a broader set of applications and algorithms. Jobs are based in New York City, and South Bend, Indiana. Emu will be making systems available for developers as part of a formal program to access to the platform, and we’ll be participating in multiple new partnerships with researchers and industry. To find out more about how Emu’s Migratory Thread technology works, to apply for a position or to join the developer community, visit Emu’s website. Or visit us at SC17 November 13-16 in Denver, Colorado – Booth 2101.

Georgia Tech CRNCH Center Summit November 3, 2017

The Georgia Institute of Technology’s CRNCH  (Center for Research into Novel Computing Hierarchies) is tasked with finding novel ways to compute by rethinking every level of the computing stack. CRNCH combines experts from all disciplines, from device and materials experts, through circuits and architecture experts, to language, software and application experts.  This novel approach is what makes the Georgia Tech CRNCH approach to post-Moore computing so unique among academic centers.

Dr Peter Kogge is a Featured Speaker.

Georgia Tech Awarded IARPA Contract to Evaluate Emu Technology System

Traditionally, a rogues gallery is collection of criminals, maybe in a police lineup or Batman’s nemeses. However, the Georgia Institute of Technology is putting together a much different sort of collection of rogues.

The Center for Research into Novel Computing Hierarchies (CRNCH), led by Computer Science Professor Tom Conte, is spearheading the Rogues Gallery as one of its first initiatives since being founded in November 2016.  It will be a collection of some of the most unique computers in the field. These machines are so rare, only a few know how to program them, or they are so new, no one has programmed them. The goal is simple: to collect unusual hardware to make it accessible to the industry and academia.

With an Intelligence Advanced Research Projects Activity (IARPA) grant of $662,525, CRNCH researchers can finally purchase their first rogue, the Emu Chick.

This is a memory-centric architecture that employs threads — not processors — to move massive irregular data sets, effectively expediting data analysis workloads. The Chick could eventually be used to tackle fraud detection, genome sequencing for personalized medicine, real-time portfolio valuation and trading, and software development for larger Emu systems. But the machine is so new that its full capabilities aren’t known yet, making it an ideal Rogues Gallery resident.

Computational Science and Engineering senior research scientist Jason Riedy hopes to study the Chick’s performance, scalability, and programmablity with massive graphs and multilinear modeling for data analysis. He won’t be the only one testing the Chick’s capabilities.

The Rogues Gallery is designed to be open to all researchers, who could access the machine remotely if needed. Students will also have the chance to work on some of the most cutting edge machines, making them more competitive in the job market.

“The goal is to maintain Georgia Tech’s presence in novel and unusual architecture,” Riedy said.

CS research scientist Jeffrey Young and researchers at Georgia Tech Research Institute are working to set up a small cluster of Field Programmable Gate Array (FPGA) devices with 3-D stacked memory. These devices allow for programming custom, reconfigurable hardware, so that sparse linear solvers and graph analytic algorithms can be run at a faster rate. The team expects to have the Rogues Gallery open by late fall, and is exploring opportunities for other novel hardware related to embedded systems, neuromorphic, and quantum computing.

“When you have the hardware and expertise, people come to you,” Young said.

Innovative Memory Server Knocks Analytics Systems into Balance

As the shift in high performance computing has taken an efficient data-intensive supercomputing turn in recent years, fundamental rethinks in architecture are coming to the fore. IBM, Intel, and others are striving to create more balanced systems, but in the midst of larger efforts sits something quite unique—and possibly key to the future of processing data-intensive HPC workloads very quickly and efficiently.

Read Full Article on The Next Platform expand-in-new-window (1)