New publication: “Adaptive request scheduling for the I/O forwarding layer using reinforcement learning”

The SSEC team is proud to announce that the following work has been accepted for publication:

J. Luca Bez, F. Zanon Boito, R. Nou, A. Miranda, T. Cortes, and P. O. A. Navaux, “Adaptive Request Scheduling for the I/O Forwarding Layer using Reinforcement Learning,” Future Generation Computer Systems, 2020.

The publication describes the results of an ongoing collaboration with the Institute of Informatics, Federal University of Rio Grande do Sul (UFRGS) in Porto Alegre (Brazil), where we explore the idea of using reinforcement learning techniques to optimize the I/O forwarding layer.

Abstract

In this paper, we propose an approach to adapt the I/O forwarding layer of HPC systems to applications’ access patterns. I/O optimization techniques can improve performance for the access patterns they were designed to target, but they often decrease performance for others. Furthermore, these techniques usually depend on the precise tune of their parameters, which commonly falls back to the users. Instead, we propose to do it dynamically at runtime based on the I/O workload observed by the system. Our approach uses a reinforcement learning technique – contextual bandits – to make the system capable of learning the best parameter value to each observed access pattern during its execution. That eliminates the need of a complicated and time-consuming previous training phase. Our case study is the TWINS scheduling algorithm, where performance improvements depend on the time window parameter, which in turn depends on the workload. We evaluate our proposal and demonstrate it can reach a precision of 88% on the parameter selection in the first hundreds of observations of an access pattern, achieving 99% of the optimal performance. We demonstrate that the system – which is expected to live for years – will be able to adapt to changes and optimize its performance after having observed an access pattern for a few (not necessarily contiguous) minutes.

Publication

Abstract