The Frontier supercomputer gets to work
After debuting last year as ORNL’s fourth No. 1 supercomputer in a row and granting access to its first users in April, Frontier settled into regular operations this summer.
Frontier is the first machine capable of exascale computing — one quintillion calculations per second — and researchers have already made breakthroughs in materials research, climate modeling and astrophysics.
“Frontier marks a sea change in computing and will allow us to tackle the world’s most pressing scientific challenges,” interim ORNL Director Jeff Smith said. “It was made possible by an intense collaborative effort among national laboratories, academic institutions and private industry to provide the hardware, applications and integration necessary to drive scientific discovery at the exascale.”
Since the system’s debut last year, ORNL staff have increased Frontier’s performance by 92 petaflops — or 92 followed by 15 zeroes. If that added performance were a separate supercomputer, it would be the eighth fastest in the world.
Frontier features a theoretical peak performance of nearly 2 exaflops, or two quintillion calculations per second, making it 10 times more powerful than ORNL’s Summit system, which was the world’s fastest as recently as November 2019 and even now ranks fifth in the world. Frontier leverages ORNL’s extensive expertise in accelerated computing and will enable scientists to develop critically needed technologies for the country’s energy, economic and national security sectors, helping researchers tackle problems of national importance that were impossible to solve just five years ago.
In addition to breaking the exascale barrier with a full-precision algorithm, Frontier’s performance in mixed-precision computing — a much less demanding standard — clocked in at roughly 6.88 exaflops, or more than 6.8 quintillion calculations per second, as measured by the High-Performance Linpack-Accelerator Introspection, or HPL-AI, test. The HPL-AI test measures calculation speeds in the computing formats typically used by the machine learning methods that drive advances in artificial intelligence.
Detailed simulations traditional HPC users rely on to model such phenomena as cancer cells, supernovas, the coronavirus or the atomic structure of elements require 64-bit precision, a computationally demanding form of computing accuracy. Artificial intelligence and machine learning algorithms typically require much less precision — sometimes as little as 32-, 24- or 16-bit accuracy — and can take advantage of special hardware in the graphic processing units, or GPUs, relied on by machines like Frontier to reach their breathtaking speeds.
“As the world’s most powerful AI machine, Frontier’s novel architecture is also ideally suited for delivering unprecedented machine learning and data science insights and automations that could vastly improve our understanding of critical processes, from drug delivery to nuclear fusion to the global climate,” said Doug Kothe, who until June was associate laboratory director of ORNL’s Computing and Computational Sciences Directorate and director of the Exascale Computing Project.
Science on Day One
“Frontier marks the start of the exascale era for scientific computing,” said OLCF Director of Science Bronson Messer. “The science that we’re running on Frontier is going to ignite an explosion of innovation — and of new questions we haven’t even thought of before.”
ORNL scientific partners such as General Electric Aviation and GE Power have been among the first researchers to leverage the power of Frontier.
“GE Aerospace and Research is using exascale computing, including time on the Frontier supercomputer, to revolutionize the future of flight with sustainable hydrogen propulsion and hybrid electric technologies,” said David Kepczynski, chief information officer at GE Research. “In pursuit of a net-zero carbon future, exascale supercomputing systems will be indispensable tools for GE researchers and engineers working at the cutting edge.”
Frontier will also play a key role in climate science research because modeling climate phenomena is a challenging problem that brings researchers together from disciplines across the scientific arena. Staff members at ORNL will work with hospitals and the National Institutes of Health to draw connections between future climate data and human health data using the new supercomputer.
“With a computer like Frontier, we can take data out of climate models at very fine scales — at the county level of health,” said Kate Evans, director of ORNL’s Computational Sciences and Engineering Division. “For example, if you have hotter temperatures, you’ll have more humidity, you’ll have more ozone, and that affects the way we breathe.”
Frontier will enable scientists to dive deeper into the nuances of the global climate, guiding them to new insights about how people live, move, work and adapt in a changing climate.
The Frontier system will also enable scientists to analyze health data more efficiently using artificial intelligence and machine learning techniques, leading to new understandings of disease processes and insights into possible drug targets. Researchers are working to train computers to read and understand chunks of medical records where patient information is often hidden. The CANcer Distributed Learning Environment initiative, or CANDLE, a joint endeavor led by DOE and the National Cancer Institute, focuses on developing language processing techniques to identify connections in patients’ medical data.
“Exascale computing infrastructure, such as Frontier, will be the paradigm shift, because we can bring all of the data together and analyze it in a time-efficient way,” said Gina Tourassi, director of the National Center for Computational Sciences at ORNL. “The end goal is to develop a general CANDLE library so that anyone with a deep learning code and a dataset can train their model at scale on a big HPC system without significantly modifying their code. We’re also doing this in the context of DOE’s partnership with the National Cancer Institute, so specific deep learning models that we are developing are focused on cancer research and precision medicine challenges.”
In the same vein, Frontier will enable researchers to understand how seismic activity affects the structural integrity of buildings and infrastructure, and ultimately to predict earthquake damage at the building level.
Today, building-level earthquake risks are extrapolated from the limited data of past major seismic events. Small-scale models and historical observations are helpful, but they don’t tell the whole story of the impacts of a geological event as powerful and far-reaching as a major earthquake.
“Our goal is to advance the state of computational capabilities so we can model all the way from the fault rupture to the waves propagating through the earth to the waves interacting with the structure,” said Lawrence Berkeley National Laboratory senior scientist David McCallen, principal investigator for the EQSIM application.
“We’re doing things now that we only thought about doing a decade ago, like resolving high-frequency ground motions. It is really an exciting time for those of us who are working on simulating earthquakes.”
It takes a village
The work to deliver, install and test Frontier began in the midst of the COVID-19 pandemic, as shutdowns around the world strained international supply chains. More than 100 team members worked around the clock to source millions of components, ensure timely deliveries of system parts and carefully install and test 74 HPE Cray EX cabinets that include more than 9,400 AMD-powered nodes and 90 miles of interconnect cables.
“As the world’s first exascale supercomputer and the world’s largest system for AI, Frontier has already demonstrated its powerful innovation in driving scientific breakthroughs that were previously impossible, in aerodynamics, climate, medicine and nuclear physics,” said Antonio Neri, president and CEO of Hewlett Packard Enterprise. “We are proud of our continued collaboration on Frontier with Oak Ridge National Laboratory and look forward to future discoveries the system will unlock.”
Each of Frontier’s more than 9,400 nodes is equipped with a third-generation AMD EPYC CPU and four AMD Instinct MI250X GPUs. Combining traditional CPUs with GPUs to accelerate the performance of leadership-class scientific supercomputers exemplifies the hybrid computing paradigm pioneered by ORNL and its partners.
“At its heart, Frontier highlights the importance of long-term public–private partnerships and the important role high-performance computing plays in advancing scientific research and national security,” said Lisa Su, chair and CEO of AMD. “I am excited to see Frontier enable large-scale science research that was previously not possible, leading to new discoveries that will transform our daily lives.”
Frontier’s deployment adds to ORNL’s nearly 20-year tradition of supercomputing excellence alongside predecessors Jaguar, Titan and Summit — each the world’s fastest computer in its time.
“This project marks the culmination of more than three years of effort by hundreds of dedicated ORNL professionals and their counterparts at HPE and AMD and across the DOE community,” said Justin Whitt, who until June 2023 was program director of the OLCF. “Their hard work will enable scientists around the world to begin their explorations on Frontier. At the OLCF, we’re proud of our legacy of world-leading computer excellence.”
Frontier also ranks second on the Green500 list, which rates a supercomputer’s energy efficiency in terms of performance per watt. Frontier clocked in at 62.68 gigaflops, or nearly 63 billion calculations, per watt.
The Frontier team has fine-tuned the system’s performance as part of the scheduled acceptance testing process. Research teams prepared their scientific codes for the system so that they could run during the early system access period, and users selected through DOE’s Innovative and Novel Computational Impact on Theory and Experiment, or INCITE, and Advanced Scientific Computing Research Leadership Computing Challenge, or ALCC, programs now have access to Frontier.
Frontier supports many of the same compilers, programming models and tools that have been available to OLCF users on ORNL’s Titan and Summit supercomputers. Summit has been a premier development platform for Frontier and now will continue operating in tandem with the new system.
“We worked really hard to be ready for science on Day One, and we were,” said Ashley Barker, head of the OLCF’s Operations Section. “We have more than 1,200 users running on Frontier, and we’re already seeing ground-breaking science achievements.”