hpc issueshttps://storage.bsc.es/gitlab/groups/hpc/-/issues2024-03-10T19:31:53+01:00https://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/284RPCs write/read: host size no longer necessary2024-03-10T19:31:53+01:00Marc VefRPCs write/read: host size no longer necessarySince we moved to using a bitset instead of calculating the chunks manually, we can remove the host_size of the corresponding RPC functions.Since we moved to using a bitset instead of calculating the chunks manually, we can remove the host_size of the corresponding RPC functions.v0.9.3https://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/282Fix CMake warnings regarding compatibility with older versions2024-03-08T19:05:34+01:00Marc VefFix CMake warnings regarding compatibility with older versionsWhen running CMake, there are warnings for our external libraries which should be fixed:
```
CMake Deprecation Warning at external/fmt/CMakeLists.txt:1 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a fu...When running CMake, there are warnings for our external libraries which should be fixed:
```
CMake Deprecation Warning at external/fmt/CMakeLists.txt:1 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
```v0.9.3https://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/281Refactor path resolution2024-03-05T13:44:20+01:00Marc VefRefactor path resolutionCurrently, path resolution is using `lstat()` to ensure that each path component exists. This can add unnecessary overhead, especially when the mount point is located under a long path within a parallel file system like Lustre.
The requ...Currently, path resolution is using `lstat()` to ensure that each path component exists. This can add unnecessary overhead, especially when the mount point is located under a long path within a parallel file system like Lustre.
The requirements for path checking:
1. Detect paths within GekkoFS and pass them (without prefix) to GekkoFS
2. No system calls for checking of path component existence.
3. When a path is not within GekkoFS, the *unmodified* path is passed to the kernel, except a GekkoFS path is part of the path which must be removed first.
Therefore, the following features/changes are necessary:
- A prefix is a defined as the absolute path to the GekkoFS mountpoint. E.g., for `/tmp/gkfs_mount/foofile` the prefix is `/tmp/gkfs_mount`.
- No system call like `lstat()` should be called
- Path checking is done by prefix matching. Therefore a non-absolute path first needs to be resolved:
- if the middle of the path uses `..`, the path needs to be resolved for prefix checking (similar to now)
- relative paths (based on the current working directory) also need to be resolved for prefix checking
- If a path is not within GekkoFS, the path should be passed to the kernel unmodified.
- When part of the GekkoFS namespace is in the middle of a path and then undone via `..`, the path needs to be resolved before passing it to the kernel as the kernel is unaware of GekkoFS
- This also applies to relative paths
- If a path is within GekkoFS, the prefix is cut from the path and passed to GekkoFS.
Part of these improvements are included but unfinished in [this branch](https://storage.bsc.es/gitlab/hpc/gekkofs/-/tree/marc/100-client-fails-when-mountdir-does-not-exist-on-underlying-fs)v0.9.3Julius AthenstaedtJulius Athenstaedthttps://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/280Hooks narrow cast types2024-02-19T09:43:00+01:00Marc VefHooks narrow cast typesCurrently, our client hooks return whatever syscall intercept returns. This is usually a cast from `long` to `int`. This is a narrowing operation and therefore we should us `narrow_cast` from gsl to make clear that this is acceptable (we...Currently, our client hooks return whatever syscall intercept returns. This is usually a cast from `long` to `int`. This is a narrowing operation and therefore we should us `narrow_cast` from gsl to make clear that this is acceptable (we do not need an exception in this case). See here https://github.com/microsoft/GSL/blob/main/docs/headers.md#user-content-H-util-narrow_cast
This issue includes adding GSL to GekkoFS and adding `gsl::narrow_cast` to all hooks that narrow.v0.9.3Julius AthenstaedtJulius Athenstaedthttps://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/272OPX support2023-10-23T12:40:44+02:00Marc VefOPX supportAdd OPX support to GekkoFS and Hermes. This replaces ofi+psm2 which seems to be dysfunctional at this point.Add OPX support to GekkoFS and Hermes. This replaces ofi+psm2 which seems to be dysfunctional at this point.v0.9.3Marc VefMarc Vefhttps://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/271Client: Support MessagePack for process bandwidth monitoring2023-10-23T10:47:00+02:00Marc VefClient: Support MessagePack for process bandwidth monitoringThe client library should use MessagePack to capture the bandwidth information for a configurable time interval.
This will be part of the client library. In the future this info is forwarded to the proxy and then consolidated there.The client library should use MessagePack to capture the bandwidth information for a configurable time interval.
This will be part of the client library. In the future this info is forwarded to the proxy and then consolidated there.v0.9.3Marc VefMarc Vefhttps://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/255GekkoFS v0.9.3 dependency bump2024-02-23T16:52:08+01:00Marc VefGekkoFS v0.9.3 dependency bumpUpdate dependencies for next release v0.9.3
- [x] Hermes incompatible with recent Margo versions. Need to be fixed first.
- [x] Hermes CMake incompatible with Mercury v2.2.0
- [x] RocksDB compilation issues to be inspected
- [x] Update ...Update dependencies for next release v0.9.3
- [x] Hermes incompatible with recent Margo versions. Need to be fixed first.
- [x] Hermes CMake incompatible with Mercury v2.2.0
- [x] RocksDB compilation issues to be inspected
- [x] Update header code file header for 2024v0.9.3Marc VefMarc Vefhttps://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/245fadvise64 syscall not hooked2023-03-21T12:55:16+01:00Marc Veffadvise64 syscall not hookedReported by 王荣耀 <m201972975@alumni.hust.edu.cn> via email.
Running fio with libaio engine reports Bad file descriptor. Reproduced:
```bash
gkfs fio --name=write_test --directory=/tmp/gkfs_mountdir --rw=write --bs=4k --size=4k --thread=...Reported by 王荣耀 <m201972975@alumni.hust.edu.cn> via email.
Running fio with libaio engine reports Bad file descriptor. Reproduced:
```bash
gkfs fio --name=write_test --directory=/tmp/gkfs_mountdir --rw=write --bs=4k --size=4k --thread=1 --numjobs=1 --iodepth=1 --loops=1 --buffered=0 --ioengine=libaio
write_test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.27
Starting 1 thread
write_test: Laying out IO file (1 file / 0MiB)
fio: cache invalidation of /tmp/gkfs_mountdir/write_test.0.0 failed: Bad file descriptor
fio: cache invalidation of /tmp/gkfs_mountdir/write_test.0.0 failed: Bad file descriptor
fio: pid=3993920, err=9/file:ioengines.c:486, func=io commit, error=Bad file descriptor
write_test: (groupid=0, jobs=1): err= 9 (file:ioengines.c:486, func=io commit, error=Bad file descriptor): pid=3993920: Thu Feb 16 09:31:04 2023
cpu : usr=35.71%, sys=0.00%, ctx=3, majf=0, minf=3
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=100.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,1,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
```
Looking at the logs, this is because fadvise64() is used but not hooked. As a result, the syscall is passed to the kernel with a file descriptor from us which is invalid.v0.9.3Marc VefMarc Vefhttps://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/242Add script profiles for latest release and master2023-03-21T12:53:10+01:00Marc VefAdd script profiles for latest release and masterDuring the release cycle, we update the latest version number so that latest is pointing to an unreleased version.
This should be clearly differentiated such that latest points to the latest release and master points to latest unreleas...During the release cycle, we update the latest version number so that latest is pointing to an unreleased version.
This should be clearly differentiated such that latest points to the latest release and master points to latest unreleased version. Terminology debatable.v0.9.3https://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/237Client fails to shutdown Mercury2023-03-21T12:52:52+01:00Marc VefClient fails to shutdown MercuryWhen the client is shut down when a command is finished, Mercury is not shut down gracefully. The following log entry appears:
```
[2022-09-15 14:33:46.521226 CEST] [2167362] [hermes] [error] Failed to shut down transport layer: HG_BUSY...When the client is shut down when a command is finished, Mercury is not shut down gracefully. The following log entry appears:
```
[2022-09-15 14:33:46.521226 CEST] [2167362] [hermes] [error] Failed to shut down transport layer: HG_BUSY
```v0.9.3https://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/230`dl_dep.sh` does not allow to checkout a specific commit from a branch2023-03-21T12:52:46+01:00Ramon Nou`dl_dep.sh` does not allow to checkout a specific commit from a branchIf the commit SHA is on another branch, git checkout does not work as we have used `--single-branch` on the clone.
The solution is to remove the option from the scripts.If the commit SHA is on another branch, git checkout does not work as we have used `--single-branch` on the clone.
The solution is to remove the option from the scripts.v0.9.3https://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/226Update libdate to avoid conflict with system libtz2023-03-21T12:52:38+01:00Ramon NouUpdate libdate to avoid conflict with system libtzIn some systems, `libtz` is installed by default, conflicting with the current version of `date`-libtz.so` (which is another package).
The error only appears at execution, not at compilation. It can be solved using `CMAKE_PREFIX_PATH`,...In some systems, `libtz` is installed by default, conflicting with the current version of `date`-libtz.so` (which is another package).
The error only appears at execution, not at compilation. It can be solved using `CMAKE_PREFIX_PATH`, `LD_LIBRARY_PATH`, `PKG_LIBRARY_PATH` environmental variables.
The issue was solved in a newer version than the one we are using [Issue 426](https://github.com/HowardHinnant/date/issues/426)v0.9.3https://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/219Search for Prometheus dependency should be done in root `CMakeLists.txt`2023-03-21T12:55:45+01:00Alberto MirandaSearch for Prometheus dependency should be done in root `CMakeLists.txt`Similarly to #218, it is currently being done in `common/CMakeLists.txt`, but it is better to do all dependency management in the root `CMakeLists.txt`. Otherwise, there might be issues with some `CMakeLists.txt` in a different subtree n...Similarly to #218, it is currently being done in `common/CMakeLists.txt`, but it is better to do all dependency management in the root `CMakeLists.txt`. Otherwise, there might be issues with some `CMakeLists.txt` in a different subtree not seeing the `prometheus` imported targets.v0.9.3Alberto MirandaAlberto Mirandahttps://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/218Search for Spdlog dependency should be done in root `CMakeLists.txt`2023-06-26T15:06:33+02:00Alberto MirandaSearch for Spdlog dependency should be done in root `CMakeLists.txt`It is currently being done in `common/CMakeLists.txt`, but it is better to do all dependency management in the root `CMakeLists.txt`. Otherwise, there might be issues with some `CMakeLists.txt` in a different subtree not seeing the `spdl...It is currently being done in `common/CMakeLists.txt`, but it is better to do all dependency management in the root `CMakeLists.txt`. Otherwise, there might be issues with some `CMakeLists.txt` in a different subtree not seeing the `spdlog` imported targets.v0.9.3Alberto MirandaAlberto Mirandahttps://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/217Documentation for 0.9.1 has incorrect version numbers for dependencies.2024-02-19T11:52:20+01:00Alberto MirandaDocumentation for 0.9.1 has incorrect version numbers for dependencies.The versions for the dependencies stated in `users/building.rst` have not been updated after releasing 0.9.1. To avoid this kind of issues in the future, I believe that we could (probably) parse this information from the latest profile a...The versions for the dependencies stated in `users/building.rst` have not been updated after releasing 0.9.1. To avoid this kind of issues in the future, I believe that we could (probably) parse this information from the latest profile and integrate it programmatically in Sphinx (I have done similar things in the past with Sphinx and I believe it should be possible). This way, every time that we change a version requirement it would be updated automatically.v0.9.3Marc VefMarc Vefhttps://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/207UCX support2023-03-21T12:55:05+01:00Hector WuUCX supportThe latest Mercury RPC framework provides support for UCX beyond OFI as network abstractions. We should support UCX to enable the deployment of GekkoFS in a wider range of networks.The latest Mercury RPC framework provides support for UCX beyond OFI as network abstractions. We should support UCX to enable the deployment of GekkoFS in a wider range of networks.v0.9.3Marc VefMarc Vefhttps://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/204Add code contribution information to documentation2023-03-21T12:51:00+01:00Marc VefAdd code contribution information to documentationHow can a user contribute and what is the review process.How can a user contribute and what is the review process.v0.9.3https://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/203Add GekkoFS semantic expectations section to documentation2023-03-21T12:50:53+01:00Marc VefAdd GekkoFS semantic expectations section to documentationTopics to cover:
- Application assumptions
- Consistency model
- Semantic expectations
- File system behavior
- General reliability: application failure, node failure
- Which I/O operations are not supported right nowTopics to cover:
- Application assumptions
- Consistency model
- Semantic expectations
- File system behavior
- General reliability: application failure, node failure
- Which I/O operations are not supported right nowv0.9.3https://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/198Implement RPC tester2023-03-21T12:50:49+01:00Marc VefImplement RPC testerImplement an RPC tester that allows us to check the network performance of the RPC layer. Later, this can then be used as a baseline to GekkoFS, allowing us to understand and optimize for a given environment combined with automatic tests...Implement an RPC tester that allows us to check the network performance of the RPC layer. Later, this can then be used as a baseline to GekkoFS, allowing us to understand and optimize for a given environment combined with automatic tests to find the best configuration.
The tester should include server and client code for both Margo and Hermes.
TheRPC tester server can also be added to the start scripts #52.v0.9.3https://storage.bsc.es/gitlab/hpc/gekkofs/-/issues/196Dependency scripts: Less technical output2023-03-21T12:56:20+01:00Marc VefDependency scripts: Less technical outputCurrent download and compile dependency scripts are technical and very verbose. Another default output mode needs to be added which gives a better overview for the user while still providing the technical output in case something goes wr...Current download and compile dependency scripts are technical and very verbose. Another default output mode needs to be added which gives a better overview for the user while still providing the technical output in case something goes wrong.v0.9.3Marc VefMarc Vef