Result: Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor

Title:
Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor
Source:
The International journal of high performance computing applications. 27(2):193-209
Publisher Information:
Thousand Oaks, CA: Sage Publications, 2013.
Publication Year:
2013
Physical Description:
print, 1 p.1/4
Original Material:
INIST-CNRS
Subject Terms:
Computer science, Informatique, Sciences exactes et technologie, Exact sciences and technology, Sciences appliquees, Applied sciences, Informatique; automatique theorique; systemes, Computer science; control theory; systems, Logiciel, Software, Langages de programmation, Programming languages, Systèmes informatiques et systèmes répartis. Interface utilisateur, Computer systems and distributed systems. User interface, Simulation, Accès mémoire, Storage access, Acceso memoria, Calcul réparti, Distributed computing, Cálculo repartido, Calculateur SIMD, SIMD computer, Compilation, Compilación, Economies d'énergie, Energy savings, Ahorros energía, Efficacité, Efficiency, Eficacia, Equation dérivée partielle, Partial differential equation, Ecuación derivada parcial, Génération code, Code generation, Generación código, Haute performance, High performance, Alto rendimiento, Niveau détail, Detail level, Nivel detalle, Optimisation, Optimization, Optimización, Processeur vectoriel, Vector processor, Régularité, Regularity, Regularidad, Simulation ordinateur, Computer simulation, Simulación computadora, Superordinateur, Supercomputer, Supercomputador, Synthèse haut niveau, High level synthesis, Sintesis alto nivel, Transmission en continu, Streaming, Transmisión fluyente, Unité centrale, Central unit, Unidad central, Jeu d'instructions, Instruction set, Conjunto de instrucciones, Blue Gene/P, SIMD, code generation, high-performance computing, performance optimization
Document Type:
Academic journal Article
File Description:
text
Language:
English
Author Affiliations:
King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Argonne National Laboratory, Argonne, IL, United States
IBM T.J. Watson Research Center, Yorktown Heights, NY, United States
ISSN:
1094-3420
Rights:
Copyright 2014 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Computer science; theoretical automation; systems
Accession Number:
edscal.27321688
Database:
PASCAL Archive

Further Information

Several emerging petascale architectures use energy-efficient processors with vectorized computational units and in-order thread processing. On these architectures the sustained performance of streaming numerical kernels, ubiquitous in the solution of partial differential equations, represents a challenge despite the regularity of memory access. Sophisticated optimization techniques are required to fully utilize the CPU. We propose a new method for constructing streaming numerical kernels using a high-level assembly synthesis and optimization framework. We describe an implementation of this method in Python targeting the IBM® Blue Gene®/P supercomputer's PowerPC® 450 core. This paper details the high-level design, construction, simulation, verification, and analysis of these kernels utilizing a subset of the CPU's instruction set. We demonstrate the effectiveness of our approach by implementing several three-dimensional stencil kernels over a variety of cached memory scenarios and analyzing the mechanically scheduled variants, including a 27-point stencil achieving a 1.7× speedup over the best previously published results.