New Approach to Fault Tolerance Means More Efficient High-Performance Computers
Department of Energy, Office of ScienceHigh performance computer (HPC) systems are incredibly complex, with millions of cores. This creates many chances for small system faults that can affect HPC-based simulations and calculations. Researchers have developed a new approach to fault tolerance called coded computing that requires less time and less computer power to run than traditional fault tolerance solutions.