Enabling dynamic and Intelligent workflows in the future EuroHPCecosystem – eFlows4HPC
Enabling dynamic and Intelligent workflows in the future EuroHPC ecosystem
A workflow platform for HPC with big data analytics demonstrated in manufacturing, climate computing, and urgent computing for natural hazards.
STAKES AND ISSUES, STATE OF THE ART
The methodologies currently available for developing workflows :<br />a) do not meet the demands of increasingly complex applications, which require new methodologies that support HPC simulations or modeling, data analysis and machine learning in a holistic workflow.<br />b) are unable to make the most of the complexity of the underlying infrastructure, which is distributed by nature. The infrastructure to be considered consists of a large number of nodes with new types of multi-core processors (including gas pedals such as GPUs), new storage devices that have the potential to change the way data is stored by applications, and connection to external instruments, peripheral devices and cloud storage as data sources.<br />c) do not include techniques for dynamically adapting the execution of workflows on computing platforms<br />d) do not provide a means of easily deploying, executing and reusing workflows in HPC systems<br />e) do not provide a fully integrated approach to managing HPC and Big Data requirements by providing data-driven frameworks that are easy to extend with additional functionality related to both HPC tasks and data analysis techniques.<br />f) neglect or do not fully meet the challenge of making the data required for processing available in the expected time, format and quality.s
The eFlows4HPC project has provided a workflow platform, the eFlows4HPC software stack, and a set of services for integrating HPC simulation with big data analysis and machine learning for scientific and industrial applications. The platform also includes methodologies to broaden access to HPC by different communities through the HPC Workflows as a Service (HPCWaaS) concept. Three application pillars serve as demonstrators: manufacturing industry, climate and urgent computing for natural hazards. The main objective has been broken down into specific objectives classified as:
? Scientific and technological objectives (STO) focused on a software stack of workflows and services for use by HPC centers in Europe.
? Pillar-specific scientific objectives (PSO) focused on the provision of application workflows and workflow models, which can be exploited by the stakeholders involved in the project and by the corresponding communities.
? Societal and industrial objectives (SIO) focused on the evaluation and pre-commercial validation of project solutions and the exploitation of results.
The project began by defining the requirements for the application pillars and software stack components. A first version of the software architecture was designed on the basis of these requirements. The partners defined and implemented abstractions to integrate the various stack components. An important step was the design and development of a minimal workflow based on a simple case, but covering most of the functionalities of the workflow lifecycle. In addition, the project partners designed and developed the HPCWaaS service. Other activities focused on optimizing various aspects of the software stack, such as a machine-learning-based tool for finding the optimal block size when sharing data in parallel, or strategies for moving from distributed to centralized storage. The project designed and implemented a data catalog service, now operational with links to some of the project's data. Computing and artificial intelligence kernels likely to constitute bottlenecks in applications have been identified and some optimized on GPU, FPFA and EPI.
Main results of the project
On the basis of the eFlows4HPC software stack Pillar I has achieved the development of a first “complete” version of the manufacturing workflow, running end-to-end a full reduction model of the cooling system of a SIEMENS electrical engine.
Pillar II has developed two workflows: the Dynamic (AI-assisted) Earth System Model that prunes ensemble runs based on runtime analytics; and the Statistical analysis and feature extraction workflow aiming at the prediction of tropical cyclones.
Pillar III has developed a workflow for earthquakes (UCIS4EQ) and one for subsequent tsunamis (FTRT/PTF). Both have a similar structure, generating ensemble simulations after an event, with the goal of predicting the impact of the natural hazard, which are followed by other phases that post-process and analyze the data.
The project has delivered a first version of the software stack and HPCWaaS methodology in public repositories with online documentation. The consortium has also performed a set of internal trainings about software stack components and about the HPCWaaS. eFlows4HPC has achieved good visibility through publications, conference keynotes and invited presentations, and presence in media.
Extensive work was carried out to identify an exhaustive list of eFlows4HPC project results, 47 of which were listed. For each result, proprietary partners, maturity, degree of innovation, exploitation and sustainability plans were specified. Then, 14 Key Exploitable Results (KER) were selected from this list, based on the degree of innovation, exploitability and impact of each result, and validated with all eFlows4HPC project partners. These KERs have been published on the Horizon Results platform. A description of these KERs with links to the results themselves is available on the project website ( eflows4hpc.eu/key-exploitable-results/ ).
eFlows4HPC has led to 30 peer reviewed articles in journals of the geophysics, computational physics and mechanics, parallel computing, software engineering, big data and artificial intelligence communities. 14 key exploitable results have been uploaded to the online Horizon Results Platform. These inclue end-to-end workflows for specific applications, recommendations for the deployment of computing scenarios on complex systems, and applications for the optimization of workflows’ execution.
Today, developers lack tools that enable the development of complex workflows involving HPC simulation and modelling
with data analytics (DA) and machine learning (ML). TheFlows4HPC aims to deliver a workflow software stack and an
additional set of services to enable the integration of HPC simulation and modelling with big data analytics and machine
learning in scientific and industrial applications. The software stack will allow to develop innovative adaptive workflows that
efficiently use the computing resources and also considering innovative storage solutions.
To widen the access to HPC to newcomers, the project will provide HPC Workflows as a Service (HPCWaaS), an
environment for sharing, reusing, deploying and executing existing workflows on HPC systems. The workflow technologies,
associated machine learning and big data libraries used in the project leverages previous open source European initiatives.
Specific optimization tasks for the use of accelerators (FPGAs, GPUs) and the EPI will be performed in the project use
cases.
To demonstrate the workflow software stack, use cases from three thematic pillars have been selected. Pillar I focuses on
the construction of DigitalTwins for the prototyping of complex manufactured objects integrating state-of-the-art adaptive
solvers with machine learning and data-mining, contributing to the Industry 4.0 vision. Pillar II develops innovative adaptive
workflows for climate and for the study of Tropical Cyclones (TC) in the context of the CMIP6 experiment, including in-situ
analytics. Pillar III explores the modelling of natural catastrophes - in particular, earthquakes and their associated tsunamisshortly
after such an event is recorded. Leveraging two existing workflows, the Pillar will work of integrating them with the
eFlows4HPC software stack and on producing policies for urgent access to supercomputers. The pillar results will be
demonstrated in the target community CoEs to foster adoption and get feedback.
Project coordination
Mario RICCHIUTO (Centre de Recherche Inria Bordeaux - Sud-Ouest)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
Inria BSO Centre de Recherche Inria Bordeaux - Sud-Ouest
Help of the ANR 151,125 euros
Beginning and duration of the scientific project:
December 2020
- 36 Months