This release adds a new execution mode called GekkoFWD that allows GekkoFS to be used as a user-level I/O forwarding infrastructure. Many thanks to Jean Bez for this huge contribution. This release also brings support for Verbs and Omni-Path as communication protocols plus improved stability to the code base through the addition of a Python testing harness designed to assist developers with integration tests.
- Both client library and daemon have been extended to support the ofi+verbs protocol.
- A new Python testing harness has been implemented to support integration tests. The end goal is to increase the robustness of the code in the mid- to long-term.
- The RPC protocol and the usage of shared memory for intra-node communication
no longer need to be activated on compile time. New arguments
--auto-smhave been added to the daemon to this effect. This configuration options are propagated to clients when they initialize and contact daemons.
- Native support for the Omni-Path network protocol by choosing the
ofi+psm2RPC protocol. Note that this requires
libfabric’s version to be greater than
1.8as well as
psm2to be installed in the system. Clients must set
FI_PSM2_DISCONNECT=1to be able to reconnect once the client is shut down once. Known limitations: Client reconnect doesn’t always work. Apparently, if clients reconnect too fast the servers won’t accept the connections. Also, currently more than 16 clients per node are not supported.
- A new execution mode called
GekkoFWDthat allows GekkoFS to run as a user-level I/O forwarding infrastructure for applications. In this mode, I/O operations from an application are intercepted and forwarded to a single GekkoFS daemon that is chosen according to a pre-defined distribution. In the daemons, the requests are scheduled using the AGIOS scheduling library before they are dispatched to the shared backend parallel file system.
fsync()system call is now fully supported.
- Argobots tasks in the daemon are now wrapped in a dedicated class, effectively removing the dependency. This lays ground work for future non-Argobots I/O implementations.
readdir()implementation has been refactored and improved.
- Improvements on how to the installation scripts manage dependencies.
- The server sometimes crashed due to uncaught system errors in the storage backend. This has now been fixed.
- Fixed a bug that broke
lson some architectures.
- Fixed a bug that leaked internal errors from the interception library to
client applications via