CE46 - Modèles numériques, simulation, applications

SOLvers for Heterogeneous Architectures over Runtime systems, Investigating Scalability – SOLHARIS

SOLHARIS

SOLvers for Heterogeneous Architectures over Runtime systems, Investigating Scalability

Scalability of linear algebra algorithms on large scale heterogeneous architecture

The advent of multicore technology in the early '00s brought a sharp <br />rupture with the past for the scientific computing community. The <br />researchers needed to revise methods and algorithms in order to take <br />full advantage of the increasing levels of parallelism. Ever since, <br />the number of cores per processor has continued to grow steadily. In <br />the meantime accelerators and GPUs have enjoyed an increasing success <br />because of their massive computing power. More recently, CPUs and GPUs <br />started to converge. On the one side, GPUs have become easier to use <br />and on a wider range of workloads. On the other side, CPUs have become <br />increasingly similar to accelerators thanks to the intensive use of <br />Thread Level Parallelism, Data Parallelism (with SIMD vectorization) <br />and fast, stacked memories. Nonetheless, GPUs are still far from being <br />general-purpose, and CPUs have not yet reached the same computational <br />power as accelerators. As a consequence, supercomputing nodes now <br />commonly include multiple multicore CPUs and multiple <br />accelerators. Such nodes are assembled in huge numbers to achieve <br />extreme performance in a scalable way. The large scale and <br />heterogeneity of these architectures, equipped with processing units <br />of different speed and capabilities, memories with different speed and <br />capacities and interconnects with different bandwidths and latencies, <br />bring numerous challenges to the scientific computing community, from <br />the choice of parallel programming models to the need for new or <br />redesigned methods and algorithms to better comply with and take <br />advantage of such systems. SOLHARIS aims to address these <br />challenges.

The methodology of the SOLHARIS project consists of three components.

First, it aims at producing scalable methods and algorithms for the
solution of large sparse linear systems on parallel, large-scale,
heterogeneous supercomputers. These will rely on task-based
parallelism and will take advantage of the performance and portability
of modern runtime systems. These methods will be implemented within
existing runtime-based solvers, namely PaStiX and qr_mumps, evaluated
on real-life problems provided by the project industrial partners, and
released to the community under free license. The target for the
developed methods and tools is the solution of linear systems with
hundred millions unknowns on thousands nodes.

Second, it aims at improving and extending a modern runtime system,
namely StarPU, with programming model features and execution
mechanisms that address the needs raised by the implementation and
scaling of complex algorithms, such as sparse direct solvers, on large
scale heterogeneous systems. These improvement not only concern the
programming interface but also the scalability of the runtime itself,
which is critical for efficiently handling large and complex workloads
over large supercomputers.

Third, it will develop scheduling methods that aim at achieving high
performance and scalability of both runtime systems and sparse direct
solvers on large scale heterogeneous supercomputers. These will be
designed with the objective of making the best possible use of the
heterogeneous resources available on the target platforms in order to
optimize not only the execution time but also the memory consumption.

For linear algebra:
- analysis of task-based parallel programming models and of the mappng, reduction, collective communicatrions and hierarchical tasks features for the implementation of scalable linear algebra algorithms
- development within the Chameleon solver of a hybrid linear solver approach where the system matrix is partitioned into blocks of homogeneous size, where each block can be represented in the H-matrix format

For the runtime systems:
- design and development of opportunistic optimization methods to implement collective communications within runtimes
- introduction of hierarchical tasks and their handling
- development within the StarPU runtime of scheduling methods for tasks with constraints

For the scheduling:
- analysis and development of scheduling methods with memory constraints
- methods for the placement and scheduling of linear algebra algorithms relying on low-rank approximation techniques

The algorithms, parallel programming approaches, runtime features and scheduling methods developed in this early stages of the project will be otpimized and extended or applies to more complex sparse linear algebra algorithms for distribued memory parallel computers

2 international journals papers
7 international conference proceedings papres
1 national conference without proceedings paper
2 technical reports

The advent of multicore technology in the early '00s brought a sharp rupture with the past for the
scientific computing community. The researchers needed to revise methods and algorithms in order
to take full advantage of the increasing levels of parallelism. Ever since, the number of cores per
processor has continued to grow steadily. In the meantime accelerators and GPUs have enjoyed an
increasing success because of their massive computing power. More recently, CPUs and GPUs
started to converge. On the one side, GPUs have become easier to use and on a wider range of
workloads. On the other side, CPUs have become increasingly similar to accelerators thanks to the
intensive use of Thread Level Parallelism, Data Parallelism (with SIMD vectorization) and fast,
stacked memories. Nonetheless, GPUs are still far from being general-purpose, and CPUs have not
yet reached the same computational power as accelerators. As a consequence, supercomputing
nodes now commonly include multiple multicore CPUs and multiple accelerators. Such nodes are
assembled in huge numbers to achieve extreme performance in a scalable way. The large scale and
heterogeneity of these architectures, equipped with processing units of different speed and
capabilities, memories with different speed and capacities and interconnects with different bandwidths
and latencies, bring numerous challenges to the scientific computing community, from the choice of
parallel programming models to the need for new or redesigned methods and algorithms to better
comply with and take advantage of such systems. SOLHARIS aims to address these challenges.
Specifically, In this context, the object of SOLHARIS is threefold: First, it aims at producing scalable
methods and algorithms for the solution of large sparse linear systems on parallel, large-scale,
heterogeneous supercomputers. These will rely on task-based parallelism and will take advantage of
the performance and portability of modern runtime systems. These methods will be implemented
within existing runtime-based solvers, namely PaStiX and qr_mumps, evaluated on real-life problems
provided by the project industrial partners, and released to the community under free license. The
target for the developed methods and tools is the solution of linear systems with hundred millions
unknowns on thousands nodes. Second, it aims at improving and extending a modern runtime
system, namely StarPU, with programming model features and execution mechanisms that address
the needs raised by the implementation and scaling of complex algorithms, such as sparse direct
solvers, on large scale heterogeneous systems. These improvement not only concern the
programming interface but also the scalability of the runtime itself, which is critical for efficiently
handling large and complex workloads over large supercomputers. Third, it will develop scheduling
methods that aim at achieving high performance and scalability of both runtime systems and sparse
direct solvers on large scale heterogeneous supercomputers. These will be designed with the
objective of making the best possible use of the heterogeneous resources available on the target
platforms in order to optimize not only the execution time but also the memory consumption. Although
these three components, the solvers, the runtime and the scheduler, could be developed
independently, the strength of the SOLHARIS project and the key to its success are in the coordination
and interplay of these three axes of research. The outcome of the fundamental research activity as
well as of the technical work related to the efficient and scalable implementation of parallel algorithms
will be of great interest and use not only for the scientific computing community, but also for
researchers of other fields, such as data analysts who have become heavy users of linear algebra
methods and software and large computing facilities.

Project coordinator

Monsieur Alfredo Buttari (Institut de Recherche en Informatique de Toulouse)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

IRIT Institut de Recherche en Informatique de Toulouse
INRIA Bordeaux Sud-Ouest Centre de Recherche Inria Bordeaux - Sud-Ouest
Airbus Central R&T
CEA Commissariat à l’énergie atomique et aux énergies alternatives

Help of the ANR 653,038 euros
Beginning and duration of the scientific project: September 2019 - 48 Months

Useful links

Sign up for the latest news:
Subscribe to our newsletter