Jul 8 2020
Artificial intelligence (AI) is indispensable today as a tool for cutting-edge research. Alongside the algorithms, specialized hardware is becoming an increasingly important factor for its successful application - whether in energy research or in the development of new materials. Karlsruhe Institute of Technology (KIT) is the first location in Europe to put the InfiniBand connected NVIDIA DGX A100 AI system into operation. The systems were acquired with funds from the Helmholtz Artificial Intelligence Cooperation Unit (HAICU).
Whether in the development of autonomous robot systems, novel functional materials, the optimisation of energy systems or the improvement of climate models, AI and machine learning (ML) are an important part of research at KIT.
To further promote the application of these future technologies, KIT is involved in the HAICU, a research-oriented platform of the Helmholtz Association for Applied AI that promotes cross-field research projects. Similarities between applications are identified and developed, and the creation of new methods are promoted.
"AI requires one thing above all else - an extreme amount of computing power," says Martin Frank, Director at the Steinbuch Centre for Computing (SCC) and Professor at the Institute for Applied and Numerical Mathematics (IANM) of KIT. "Conventional computer systems reach their limits when training AI with large data sets. Many AI algorithms can be accelerated with specialized hardware. For our researchers, access to accelerated computing systems is a decisive competitive factor today."
The SCC at KIT has used the procurement for its upcoming "Hochleistungsrechner Karlsruhe", or "HoreKa" for short, supercomputer to form a collaboration with NVIDIA, the leader in accelerated computing, to be the first location in Europe to deploy NVIDIA DGX A100 systems. The three DGX A100 systems, now installed at SCC, are high-performance servers containing eight NVIDIA A100 Tensor Core GPUs each. Each DGX A100 provides five PetaFLOPS of AI computing power, i.e. five quadrillion computing operations per second - about five times faster than the earlier NVIDIA DGX system based on NVIDIA V100 GPUs. At the same time, the new accelerators have been equipped with significantly larger and faster main memory and NVSwitch to provide full NVLink bandwidth of 600 GB/s between any pair of GPUs. The DGX A100 systems are connected via the high-performance NVIDIA Mellanox HDR InfiniBand interconnect technology and leverage its in-network computing engines to deliver top performance.
"Researchers are now able to train significantly larger neural networks in a much shorter time with even larger amounts of data," said Frank.
AI helps solve humanity's problems
The new NVIDIA DGX A100 systems also allow researchers to optimize their applications for KIT's future HoreKa supercomputer. HoreKa will also use NVIDIA A100 accelerators, but 740 instead of just eight per DGX system. Once the full system is operational in summer 2021, HoreKa is expected to be one of the ten fastest supercomputers in Europe.
"AI and machine learning can dramatically accelerate scientific computations in the most significant areas of research, where the world's problems are being solved," says Marc Hamilton, Vice President of Solutions Architecture and Engineering at NVIDIA. "Our new DGX A100 systems with Tensor Core GPUs and NVIDIA Mellanox HDR InfiniBand interconnects support this accelerated research and will speed up scientific discovery for a broad range of important research."
The new AI systems at KIT can also be used to help fight the current pandemic, such as by accelerating drug discovery research, the detection of infection hotspots, predicting propagation patterns, or relieving medical personnel during the analysis of X-ray images. Corresponding AI research initiatives have already been started at KIT and in the Helmholtz Association.