The Department of Energy’s Oak Ridge National Laboratory has joined a global consortium of scientists from federal laboratories, research institutes, academia and industry to address the challenges of building large-scale artificial intelligence systems and advancing trustworthy and reliable AI for scientific discovery.
The partnership, known as the Trillion Parameter Consortium, or TPC, seeks to grow and improve large-scale generative AI models aimed at tackling complex scientific challenges. These include the development of scalable model architectures and related training strategies, as well as data organization and curation for the training of models; the optimization of AI libraries for current and future exascale computing platforms; and the assessment of progress on scientific task learning, reliability and trust.
It’s a logical partnership, as ORNL’s documented mission of developing safe, trustworthy and energy-efficient AI will strengthen the consortium’s goals for responsible AI. Further, the laboratory is home to more than 300 researchers who use AI to tackle challenges of importance to DOE, and it hosts the world’s most powerful supercomputer, Frontier, which was built in part to facilitate energy-efficient and scalable AI-based algorithms and simulations.
ORNL’s AI research thrusts, when deployed alongside these resources, will be critical in assisting the consortium in tackling a number of challenges, including:
- Building an open community of researchers interested in creating state-of-the-art, large-scale generative AI models aimed broadly at advancing progress on scientific and engineering problems by sharing methods, approaches, tools, insights and workflows.
- Incubating, launching and coordinating projects voluntarily to avoid duplication of effort and to maximize the impact of the projects in the broader AI and scientific community.
- Creating a global network of resources and expertise to facilitate the next generation of AI and bring together researchers interested in developing and using large-scale AI for science and engineering.
“An integrated and community approach focusing on security, trustworthiness and energy efficiency is crucial to leverage the full potential of AI for scientific discovery and national security,” said Prasanna Balaprakash, ORNL distinguished R&D staff scientist and director of lab’s AI Initiative. “For this reason, ORNL expects to be a critical resource for the consortium going forward, and we look forward to ensuring the future of AI across the scientific spectrum.”
Specifically, TPC aims to provide the community with a venue in which multiple large model-building initiatives can collaborate to leverage global efforts, with flexibility to accommodate the diverse goals of individual initiatives. The consortium includes teams that are undertaking initiatives to leverage emerging exascale computing platforms to train LLMs — or alternative model architectures — on scientific research including papers, scientific codes and observational and experimental data to advance innovation and discoveries. Trillion-parameter models represent the next great milestone in large-scale scientific AI, as only the largest commercial AI tools currently approach this scale.
Training LLMs with this many parameters requires exascale-class computing resources such as Frontier. Even with such resources, however, training a state-of-the-art trillion-parameter model will require months of dedicated time, which is intractable on all but the largest systems. Consequently, such efforts will involve large multidisciplinary, multi-institutional teams, and TPC is envisioned as a vehicle to support collaboration and cooperative efforts among and within such teams.
ORNL’s AI research portfolio dates back more than four decades to 1979, when the laboratory launched the Oak Ridge Applied Artificial Intelligence Project. AAIP evaluated AI’s potential to advance scientific research, particularly across four key areas: spectroscopy, environmental management, nuclear fuel reprocessing and programming assistance.
Today the laboratory’s AI Initiative focuses on the development of secure, trustworthy and energy-efficient AI across a wide range of applications at the laboratory, from biology to chemistry to national security.
Other TPC partners include Allen Institute for AI; Argonne National Laboratory: Barcelona Supercomputing Center; Brookhaven National Laboratory; Caltech; CEA; Cerebras; Cineca; CSC - IT Center for Science; Commonwealth Scientific and Industrial Research Organisation; ETH Zürich; Flinders University; Fujitsu; Intel; Juelich; Kotoba Technology; Lawrence Berkeley National Laboratory; Los Alamos National Laboratory; Microsoft; National Center for Supercomputing Applications; National Renewable Energy Laboratory; NCI Australia; New Zealand eScience Infrastructure: NVIDIA; Pacific Northwest National Laboratory; Pawsey Institute; Princeton Plasma Physics Laboratory; Rutgers University; SambaNova; SLAC National Accelerator Laboratory; Stanford University; STFC Rutherford Appleton Laboratory, UKRI; Texas Advanced Computing Center; Thomas Jefferson National Accelerator Facility; Together AI; Tokyo Institute of Technology; Université de Montréal; the University of Chicago; the University of Delaware; the University of Illinois Chicago; the University of Illinois Urbana-Champaign; the University of Tokyo; and the University of Utah.
UT-Battelle manages ORNL for the Department of Energy’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. The Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.