- Jul 22, 2024
-
-
Marc Vef authored
-
Marc Vef authored
-
Marc Vef authored
-
Marc Vef authored
-
Marc Vef authored
-
Marc Vef authored
Resolve "Shared file metadata congestion" During write operations, the client must update the file size on the responsible metadata daemon. The write size cache can reduce the metadata load on the daemon and reduce the number of RPCs during write operations, especially for many small I/O operations. In the past, we have observed that a daemon can become network-congested, especially for single shared files, many processes, and small I/O operations, which bottlenecks the overall I/O throughput. Nevertheless, the cache can have a broad impact on small I/O operations as 1 RPC for updating the size is removed which already improves small file I/O on a single node. Note that this cache may impact file size consistency in which stat operations may not reflect the actual file size until the file is closed. The cache does not impact the consistency of the file data itself. We did not observe any issues with the cache for HPC applications and benchmarks, but it technically breaks POSIX. So, for now, I suggest it to be experimental and opt-in. - `LIBGKFS_WRITE_SIZE_CACHE` - Enable caching the write size of files (default: OFF). - `LIBGKFS_WRITE_SIZE_CACHE_THRESHOLD` - Set the number of write operations after which the file size is synchronized with the corresponding daemon (default: 1000). The file size is further synchronized when the file is `close()`d or when `fsync()` is called. Depends on https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/194 Closes #62 Closes #62 See merge request !193
-
Marc Vef authored
-
Marc Vef authored
, enable by config and disable via `LIBGKFS_WRITE_SIZE_CACHE=OFF`. Flush happens on close/fsync. flush threshold can be changed via config or `LIBGKFS_WRITE_SIZE_CACHE_THRESHOLD=100`
-
Marc Vef authored
Resolve "Tests fail when symlink support is disabled" This MR does several things: 1. `SUPPORT_SYMLINKS` is now disabled by default. It didn't do much in the first place and only affects incomplete code. The corresponding README entry has been removed. The only thing it does support is accessing GekkoFS from a foreign namespace via a symbolic link. However, only `stat` seems to be working. 2. `GKFS_FOLLOW_EXTERNAL_SYMLINKS` was also disabled by default in this [MR](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/183). This caused the `test_symlink` to fail as it tested external symlinks into GekkoFS, not actual symlinks within GekkoFS. This can only be done via `lstat()` for each component of the path which is a performance risk under certain circumstances. Overall, this is the relevant CMake variable for the test. 3. Unify code formatting for CMake files. Closes #298 Closes #298 See merge request !198
- Jul 19, 2024
- Jul 16, 2024
-
-
Marc Vef authored
Resolve "Add dentry cache" This MR adds a directory entry cache for the client to avoid a huge number of stat calls after readdir, e.g., for `ls -l` type operations. It is experimental and thus disabled by default. Can be enabled via `include/config.hpp` or with the env variable `LIBGKFS_DENTRY_CACHE=ON/OFF`. It works by using the `extended_dir_entry` RPC to receive some metadata along the the dentries from the daemons. This metadata is then placed into the cache and retrieved in a stat operation (for a cache miss, an RPC is sent with vanilla functionality). The cache is discarded upon close but can be changed via `include/config.hpp`. Note, this may cause semantical issues (removed files will remain in the cache forever). The performance improvements are already noticeable locally for a couple 1000 files. Depends on https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/195 Closes #292 Closes #292 See merge request !194
-
- Jul 15, 2024
-
-
Marc Vef authored
-
Marc Vef authored
-
Marc Vef authored
Env variable overrides config.hpp
-
Marc Vef authored
-
Marc Vef authored
-
Marc Vef authored
This uses the extended dir RPC call.
-
Marc Vef authored
-
Marc Vef authored
Resolve "Refactor path resolution" Refactors path resolve mechanism and adds two new CMake options: - `GKFS_USE_LEGACY_PATH_RESOLVE` - Use the legacy implementation of the resolve function, deprecated (default: OFF) - `GKFS_FOLLOW_EXTERNAL_SYMLINKS` - Enable support for following external links for resolving the path (default: OFF) - This is automatically enabled in the deprecated version and causes an `lstat()` system call on each individual path component. This has been an issue in the past where performance was considerably impacted by the mountpath being placed within the parallel file system. - It is now disabled by default to improve performance. In case, it causes issues, we can re-enable it. Closes #281 Closes #281 See merge request !183
-
Marc Vef authored
-
- Jul 12, 2024
-
-
-
-
-
-
-
-
Marc Vef authored
Resolve "File system expansion during runtime" # Description GekkoFS supports extending the current daemon configuration to additional compute nodes. This includes redistribution of the existing data and metadata and therefore scales file system performance and capacity of existing data. Note, that it is the user's responsibility to not access the GekkoFS file system during redistribution. A corresponding feature that is transparent to the user is planned. Note also, if the GekkoFS proxy is used, they need to be manually restarted, after expansion. To enable this feature, the following CMake compilation flags are required to build the `gkfs_malleability` tool: `-DGKFS_BUILD_TOOLS=ON`. The `gkfs_malleability` tool is then available in the `build/tools` directory. Please consult `-h` for its arguments. While the tool can be used manually to expand the file system, the `scripts/run/gkfs` script should be used instead which invokes the `gkfs_malleability` tool. The only requirement for extending the file system is a hostfile containing the hostnames/IPs of the new nodes (one line per host). Example starting the file system. The `DAEMON_NODELIST` in the `gkfs.conf` is set to a hostfile containing the initial set of file system nodes.: ```bash ~/gekkofs/scripts/run/gkfs -c ~/run/gkfs_verbs_expandtest.conf start * [gkfs] Starting GekkoFS daemons (4 nodes) ... * [gkfs] GekkoFS daemons running * [gkfs] Startup time: 10.853 seconds ``` ... Some computation ... Expanding the file system. Using `-e <hostfile>` to specify the new nodes. Redistribution is done automatically with a progress bar. When finished, the file system is ready to use in the new configuration: ```bash ~/gekkofs/scripts/run/gkfs -c ~/run/gkfs_verbs_expandtest.conf -e ~/hostfile_expand expand * [gkfs] Starting GekkoFS daemons (8 nodes) ... * [gkfs] GekkoFS daemons running * [gkfs] Startup time: 1.058 seconds Expansion process from 4 nodes to 12 nodes launched... * [gkfs] Expansion progress: [####################] 0/4 left * [gkfs] Redistribution process done. Finalizing ... * [gkfs] Expansion done. ``` Stop the file system: ```bash ~/gekkofs/scripts/run/gkfs -c ~/run/gkfs_verbs_expandtest.conf stop * [gkfs] Stopping daemon with pid 16462 srun: sending Ctrl-C to StepId=282378.1 * [gkfs] Stopping daemon with pid 16761 srun: sending Ctrl-C to StepId=282378.2 * [gkfs] Shutdown time: 1.032 seconds ``` # Results IOR results for writing/reading 768 GiB sequentially (192 procs) before and after expansion ![image](/uploads/57bd8f3a07a56c496b1ae0b096da24ef/image.png) MDTest results for creating, stating, removing, 19200000 (192 procs) before and after expansion ![image](/uploads/7e2f58d864789e657140ced3e9e9716e/image.png) Closes #294 Closes #294 See merge request !196
-
Marc Vef authored
-
Marc Vef authored
-
Marc Vef authored
Proxy must be restarted to know about the file system extension.
-
- Jul 11, 2024