This release adds a new execution mode called GekkoFWD that allows GekkoFS to be used as a user-level I/O forwarding infrastructure. Many thanks to Jean Bez for this huge contribution. This release also brings support for Verbs and Omni-Path as communication protocols plus improved stability to the code base through the addition of a Python testing harness designed to assist developers with integration tests.
New
- Both client library and daemon have been extended to support the ofi+verbs protocol.
- A new Python testing harness has been implemented to support integration tests. The end goal is to increase the robustness of the code in the mid- to long-term.
- The RPC protocol and the usage of shared memory for intra-node communication
no longer need to be activated on compile time. New arguments
-P|--rpc-protocol
and--auto-sm
have been added to the daemon to this effect. This configuration options are propagated to clients when they initialize and contact daemons. - Native support for the Omni-Path network protocol by choosing the
ofi+psm2
RPC protocol. Note that this requireslibfabric
’s version to be greater than1.8
as well aspsm2
to be installed in the system. Clients must setFI_PSM2_DISCONNECT=1
to be able to reconnect once the client is shut down once. Known limitations: Client reconnect doesn’t always work. Apparently, if clients reconnect too fast the servers won’t accept the connections. Also, currently more than 16 clients per node are not supported. - A new execution mode called
GekkoFWD
that allows GekkoFS to run as a user-level I/O forwarding infrastructure for applications. In this mode, I/O operations from an application are intercepted and forwarded to a single GekkoFS daemon that is chosen according to a pre-defined distribution. In the daemons, the requests are scheduled using the AGIOS scheduling library before they are dispatched to the shared backend parallel file system. - The
fsync()
system call is now fully supported.
Improved
- Argobots tasks in the daemon are now wrapped in a dedicated class, effectively removing the dependency. This lays ground work for future non-Argobots I/O implementations.
- The
readdir()
implementation has been refactored and improved. - Improvements on how to the installation scripts manage dependencies.
Fixed
- The server sometimes crashed due to uncaught system errors in the storage backend. This has now been fixed.
- Fixed a bug that broke
ls
on some architectures. - Fixed a bug that leaked internal errors from the interception library to
client applications via
errno
propagation.