Result: Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor

Title:

Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor

Authors:

MALAS, Tareq, AHMADIA, Aron J, BROWN, Jed, GUNNELS, John A, KEYES, David E

Source:

The International journal of high performance computing applications. 27(2):193-209

Publisher Information:

Thousand Oaks, CA: Sage Publications, 2013.

Publication Year:

2013

Physical Description:

print, 1 p.1/4

Original Material:

INIST-CNRS

Subject Terms:

Computer science, Informatique, Sciences exactes et technologie, Exact sciences and technology, Sciences appliquees, Applied sciences, Informatique; automatique theorique; systemes, Computer science; control theory; systems, Logiciel, Software, Langages de programmation, Programming languages, Systèmes informatiques et systèmes répartis. Interface utilisateur, Computer systems and distributed systems. User interface, Simulation, Accès mémoire, Storage access, Acceso memoria, Calcul réparti, Distributed computing, Cálculo repartido, Calculateur SIMD, SIMD computer, Compilation, Compilación, Economies d'énergie, Energy savings, Ahorros energía, Efficacité, Efficiency, Eficacia, Equation dérivée partielle, Partial differential equation, Ecuación derivada parcial, Génération code, Code generation, Generación código, Haute performance, High performance, Alto rendimiento, Niveau détail, Detail level, Nivel detalle, Optimisation, Optimization, Optimización, Processeur vectoriel, Vector processor, Régularité, Regularity, Regularidad, Simulation ordinateur, Computer simulation, Simulación computadora, Superordinateur, Supercomputer, Supercomputador, Synthèse haut niveau, High level synthesis, Sintesis alto nivel, Transmission en continu, Streaming, Transmisión fluyente, Unité centrale, Central unit, Unidad central, Jeu d'instructions, Instruction set, Conjunto de instrucciones, Blue Gene/P, SIMD, code generation, high-performance computing, performance optimization

Document Type:

Academic journal Article

File Description:

text

Language:

English

Author Affiliations:

King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Argonne National Laboratory, Argonne, IL, United States
IBM T.J. Watson Research Center, Yorktown Heights, NY, United States

ISSN:

1094-3420

Access URL:

http://pascal-francis.inist.fr/vibad/index.php?action=search&terms=27321688

Rights:

Copyright 2014 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS

Notes:

Computer science; theoretical automation; systems

Accession Number:

edscal.27321688

Database:

PASCAL Archive

Further Information

Several emerging petascale architectures use energy-efficient processors with vectorized computational units and in-order thread processing. On these architectures the sustained performance of streaming numerical kernels, ubiquitous in the solution of partial differential equations, represents a challenge despite the regularity of memory access. Sophisticated optimization techniques are required to fully utilize the CPU. We propose a new method for constructing streaming numerical kernels using a high-level assembly synthesis and optimization framework. We describe an implementation of this method in Python targeting the IBM® Blue Gene®/P supercomputer's PowerPC® 450 core. This paper details the high-level design, construction, simulation, verification, and analysis of these kernels utilizing a subset of the CPU's instruction set. We demonstrate the effectiveness of our approach by implementing several three-dimensional stencil kernels over a variety of cached memory scenarios and analyzing the mechanically scheduled variants, including a 27-point stencil achieving a 1.7× speedup over the best previously published results.

Result: Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor

Further Information

Links

Additional functions