diff --git a/CHANGELOG.md b/CHANGELOG.md index 443c8f85c81b3646f64f449743a789a5e70f1767..66be593383ce44de3226ad8c62edcedc65a131aa 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,7 +7,12 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] ### New +### Changed +### Removed +### Fixed +## [0.9.3] - 2024-07 +### New - Added a write size cache to the file system client to reduce potential metadata network bottlenecks during small I/O operations ([!193](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/193)). - The cache is experimental and thus disabled by default. Added the following environment variables. @@ -62,9 +67,7 @@ replicas ([!166](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/141) - Modified write and reads to use a bitset instead of the traditional hash per chunk in the server. - Added reattemp support in get_fs_config to other servers, when the initial server fails. - ### Changed - - Updated GekkoFS dependencies migrating to margo-shim-hg ([!165](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/165)). - Improves RPC stability @@ -74,10 +77,7 @@ replicas ([!166](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/141) use of syscall for following symlinks optional ([!183](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/183)). - -### Removed ### Fixed - - An issue that updated the last modified time of a file during `stat` operations was fixed([!176](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/176)). - Fixed a dependency conflict within the pytest dependency marshmallow ([!197](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/197)). diff --git a/README.md b/README.md index e70bceff4860c3d5ae9e89ccf85a2c03b4b0eb6e..79316322b8b3f31b55cbf25cdfc80531db8ad4a1 100644 --- a/README.md +++ b/README.md @@ -9,10 +9,42 @@ in a HPC cluster to produce a high-performance storage space that can be accesse This storage space allows HPC applications and simulations to run in isolation from each other with regards to I/O, which reduces interferences and improves performance. +# Table of contents + +- [Dependencies](#dependencies) + - [Debian/Ubuntu](#debianubuntu) + - [CentOS/Red Hat](#centosred-hat) +- [Step-by-step installation](#step-by-step-installation) +- [Run GekkoFS](#run-gekkofs) + - [The GekkoFS hostsfile](#the-gekkofs-hostsfile) + - [The GekkoFS daemon](#the-gekkofs-daemon) + - [Manual startup and shut down](#manual-startup-and-shut-down) + - [GekkoFS daemon orchestration via the gkfs script (recommended)](#gekkofs-daemon-orchestration-via-the-gkfs-script-recommended) + - [The GekkoFS client library](#the-gekkofs-client-library) + - [Interposition library via system call interception](#interposition-library-via-system-call-interception) + - [User library via linking against the application](#user-library-via-linking-against-the-application) + - [Logging](#logging) +- [Advanced and experimental features](#advanced-and-experimental-features) + - [Rename](#rename) + - [Replication](#replication) + - [Client-side metrics via MessagePack and ZeroMQ](#client-side-metrics-via-messagepack-and-zeromq) + - [Server-side statistics via Prometheus](#server-side-statistics-via-prometheus) + - [GekkoFS proxy](#gekkofs-proxy) + - [File system expansion](#file-system-expansion) +- [Miscellaneous](#miscellaneous) + - [External functions](#external-functions) + - [Data placement](#data-placement) + - [Simple Hash (Default)](#simple-hash-default) + - [Guided Distributor](#guided-distributor) + - [Metadata Backends](#metadata-backends) + - [CMake options](#cmake-options) + - [Environment variables](#environment-variables) +- [Acknowledgment](#acknowledgment) + # Dependencies -- \>gcc-8 (including g++) for C++11 support -- General build tools: Git, Curl, CMake >3.6 (>3.11 for GekkoFS testing), Autoconf, Automake +- \>gcc-12 (including g++) for C++17 support +- General build tools: Git, Curl, CMake >3.13, Autoconf, Automake - Miscellaneous: Libtool, Libconfig ### Debian/Ubuntu @@ -57,15 +89,13 @@ GekkoFS is now available at: - GekkoFS daemon (server): `/bin/gkfs_daemon` - GekkoFS client interception library: `/lib64/libgkfs_intercept.so` -## Use Spack to install GekkoFS (alternative) +## Spack for installing GekkoFS (alternative) The Spack tool can be used to easily install GekkoFS and its dependencies. Refer to the following [README](scripts/spack/README.md) for details. # Run GekkoFS -## General - On each node a daemon (`gkfs_daemon` binary) has to be started. Other tools can be used to execute the binary on many nodes, e.g., `srun`, `mpiexec/mpirun`, `pdsh`, or `pssh`. @@ -89,7 +119,11 @@ GPFS or Lustre. *Note: NFS is not strongly consistent and cannot be used for the hosts file!* -## GekkoFS daemon start and shut down +## The GekkoFS daemon + +The GekkoFS daemon is the server component of GekkoFS. It is responsible for managing the file system data and metadata. There are two options to run the daemons on one or several nodes: (1) manually by executing the `gkfs_daemon` binary directly or (2) by using the `gkfs` script (recommended). + +### Manual startup and shut down tl;dr example: `/bin/gkfs_daemon -r -m -H ` @@ -137,29 +171,7 @@ are part of the same file system, use the same `rootdir` with different `rootdir Shut it down by gracefully killing the process (SIGTERM). -## Use the GekkoFS client library - -tl;dr example: - -```bash -export LIBGKFS_ HOSTS_FILE= -LD_PRELOAD=/lib64/libgkfs_intercept.so cp ~/some_input_data /some_input_data -LD_PRELOAD=/lib64/libgkfs_intercept.so md5sum ~/some_input_data /some_input_data -``` - -Clients read the hostsfile to determine which daemons are part of the GekkoFS instance. Because the client is an -interposition library that is loaded within the context of the application, this information is passed via the -environment variable `LIBGKFS_HOSTS_FILE` pointing to the hostsfile path. The client library itself is loaded for each -application process via the `LD_PRELOAD` environment variable intercepting file system related calls. If they are -within (or hierarchically under) the GekkoFS mount directory they are processed in the library, otherwise they are -passed to the kernel. - -Note, if `LD_PRELOAD` is not pointing to the library and, hence the client is not loaded, the mounting directory appears -to be empty. - -For MPI application, the `LD_PRELOAD` variable can be passed with the `-x` argument for `mpirun/mpiexec`. - -## Run GekkoFS daemons on multiple nodes +### GekkoFS daemon orchestration via the `gkfs` script (recommended) The `scripts/run/gkfs` script can be used to simplify starting the GekkoFS daemon on one or multiple nodes. To start GekkoFS on multiple nodes, a Slurm environment that can execute `srun` is required. Users can further @@ -198,7 +210,39 @@ usage: gkfs [-h/--help] [-r/--rootdir ] [-m/--mountdir ] [-a/--args -v, --verbose Increase verbosity ``` -### Logging +## The GekkoFS client library + +### Interposition library via system call interception + +tl;dr example: + +```bash +export LIBGKFS_ HOSTS_FILE= +LD_PRELOAD=/lib64/libgkfs_intercept.so cp ~/some_input_data /some_input_data +LD_PRELOAD=/lib64/libgkfs_intercept.so md5sum ~/some_input_data /some_input_data +``` + +Clients read the hostsfile to determine which daemons are part of the GekkoFS instance. Because the client is an +interposition library that is loaded within the context of the application, this information is passed via the +environment variable `LIBGKFS_HOSTS_FILE` pointing to the hostsfile path. The client library itself is loaded for each +application process via the `LD_PRELOAD` environment variable intercepting file system related calls. If they are +within (or hierarchically under) the GekkoFS mount directory they are processed in the library, otherwise they are +passed to the kernel. + +Note, if `LD_PRELOAD` is not pointing to the library and, hence the client is not loaded, the mounting directory appears +to be empty. + +For MPI application, the `LD_PRELOAD` variable can be passed with the `-x` argument for `mpirun/mpiexec`. + +### User library via linking against the application + +GekkoFS offers a user library that can be linked against the application, which is built by default: +`libgkfs_user_lib.so` shared library. The corresponding API and developer headers are available in +`include/client/user_functions.hpp`. Please consult `examples/user_library` for details. + +In this case, `LD_PRELOAD` is not necessary. Nevertheless, `LIBGKFS_HOSTS_FILE` is still required. + +## Logging The following environment variables can be used to enable logging in the client library: `LIBGKFS_LOG=` and `LIBGKFS_LOG_OUTPUT=` to configure the output module and set the path to the log file of the client @@ -244,78 +288,9 @@ For the daemon, the `GKFS_DAEMON_LOG_PATH=` environment variable c log file, and the log module can be selected with the `GKFS_DAEMON_LOG_LEVEL={off,critical,err,warn,info,debug,trace}` environment variable. -# Miscellaneous - -## External functions - -GekkoFS allows to use external functions on your client code, via LD_PRELOAD. -Source code needs to be compiled with -fPIC. We include a pfind io500 substitution, -`examples/gfind/gfind.cpp` and a non-mpi version `examples/gfind/sfind.cpp` - -## Data distributors - -The data distribution can be selected at compilation time, we have 2 distributors available: - -### Simple Hash (Default) - -Chunks are distributed randomly to the different GekkoFS servers. - -### Guided Distributor - -The guided distributor allows defining a specific distribution of data on a per directory or file basis. -The distribution configurations are defined within a shared file (called `guided_config.txt` henceforth) with the -following format: -` ` - -To enable the distributor, the following CMake compilation flags are required: - -* `GKFS_USE_GUIDED_DISTRIBUTION` ON -* `GKFS_USE_GUIDED_DISTRIBUTION_PATH` `` - -To use a custom distribution, a path needs to have the prefix `#` (e.g., `#/mdt-hard 0 0`), in which all the data of all -files in that directory goes to the same place as the metadata. -Note, that a chunk/host configuration is inherited to all children files automatically even if not using the prefix. -In this example, `/mdt-hard/file1` is therefore also using the same distribution as the `/mdt-hard` directory. -If no prefix is used, the Simple Hash distributor is used. - -#### Guided configuration file - -Creating a guided configuration file is based on an I/O trace file of a previous execution of the application. -For this the `trace_reads` tracing module is used (see above). - -The `trace_reads` module enables a `TRACE_READS` level log at the clients writing the I/O information of the client -which is used as the input for a script that creates the guided distributor setting. -Note that capturing the necessary trace records can involve performance degradation. -To capture the I/O of each client within a SLURM environment, i.e., enabling the `trace_reads` module and print its -output to a user-defined path, the following example can be used: -`srun -N 10 -n 320 --export="ALL" /bin/bash -c "export LIBGKFS_LOG=trace_reads;LIBGKFS_LOG_OUTPUT=${HOME}/test/GLOBAL.txt;LD_PRELOAD=${GKFS_PRLD} "` - -Then, the `examples/distributors/guided/generate.py` scrpt is used to create the guided distributor configuration file: - -* `python examples/distributors/guided/generate.py ~/test/GLOBAL.txt >> guided_config.txt` - -Finally, modify `guided_config.txt` to your distribution requirements. - -## Metadata Backends - -There are two different metadata backends in GekkoFS. The default one uses `rocksdb`, however an alternative based -on `PARALLAX` from `FORTH` -is available. To enable it use the `-DGKFS_ENABLE_PARALLAX:BOOL=ON` option, you can also disable `rocksdb` -with `-DGKFS_ENABLE_ROCKSDB:BOOL=OFF`. - -Once it is enabled, `--dbbackend` option will be functional. - -## Statistics - -GekkoFS daemons are able to output general operations (`--enable-collection`) and data chunk -statistics (`--enable-chunkstats`) to a specified output file via `--output-stats `. Prometheus can also be used -instead or in addition to the output file. It must be enabled at compile time via the CMake -argument `-DGKFS_ENABLE_PROMETHEUS` and the daemon argument `--enable-prometheus`. The corresponding statistics are then -pushed to the Prometheus instance. - -## Advanced and experimental features +# Advanced and experimental features -### Rename +## Rename `-DGKFS_RENAME_SUPPORT` allows the application to rename files. This is an experimental feature, and some scenarios may not work properly. @@ -323,14 +298,14 @@ Support for fstat in renamed files is included. This is disabled by default. -### Replication +## Replication The user can enable the data replication feature by setting the replication environment variable: `LIBGKFS_NUM_REPL=`. The number of replicas should go from `0` to the `number of servers - 1`. The replication environment variable can be set up for each client independently. -### Client metrics via MessagePack and ZeroMQ +## Client-side metrics via MessagePack and ZeroMQ GekkoFS clients support capturing the I/O traces of each individual process and periodically exporting them to a given file or ZeroMQ sink via the TCP protocol. @@ -375,7 +350,15 @@ total_bytes: 1802366 total_iops: 4 ``` -### GekkoFS proxy +## Server-side statistics via Prometheus + +GekkoFS daemons are able to output general operations (`--enable-collection`) and data chunk +statistics (`--enable-chunkstats`) to a specified output file via `--output-stats `. Prometheus can also be used +instead or in addition to the output file. It must be enabled at compile time via the CMake +argument `-DGKFS_ENABLE_PROMETHEUS` and the daemon argument `--enable-prometheus`. The corresponding statistics are then +pushed to the Prometheus instance. + +## GekkoFS proxy The GekkoFS proxy is an additional (alternative) component that runs on each client and acts as gateway between the client and daemons. It can improve network stability, e.g., for opa-psm2, and provides a basis for future asynchronous @@ -417,7 +400,7 @@ Press 'q' to exit Please consult `include/config.hpp` for additional configuration options. Note, GekkoFS proxy does not support replication. -### File system expansion +## File system expansion GekkoFS supports extending the current daemon configuration to additional compute nodes. This includes redistribution of the existing data and metadata and therefore scales file system performance and capacity of existing data. Note, @@ -469,7 +452,68 @@ srun: sending Ctrl-C to StepId=282378.2 * [gkfs] Shutdown time: 1.032 seconds ``` -## All CMake options +# Miscellaneous + +## External functions + +GekkoFS allows to use external functions on your client code, via LD_PRELOAD. +Source code needs to be compiled with -fPIC. We include a pfind io500 substitution, +`examples/gfind/gfind.cpp` and a non-mpi version `examples/gfind/sfind.cpp` + +## Data placement + +The data distribution can be selected at compilation time, we have 2 distributors available: + +### Simple Hash (Default) + +Chunks are distributed randomly to the different GekkoFS servers. + +### Guided Distributor + +The guided distributor allows defining a specific distribution of data on a per directory or file basis. +The distribution configurations are defined within a shared file (called `guided_config.txt` henceforth) with the +following format: +` ` + +To enable the distributor, the following CMake compilation flags are required: + +* `GKFS_USE_GUIDED_DISTRIBUTION` ON +* `GKFS_USE_GUIDED_DISTRIBUTION_PATH` `` + +To use a custom distribution, a path needs to have the prefix `#` (e.g., `#/mdt-hard 0 0`), in which all the data of all +files in that directory goes to the same place as the metadata. +Note, that a chunk/host configuration is inherited to all children files automatically even if not using the prefix. +In this example, `/mdt-hard/file1` is therefore also using the same distribution as the `/mdt-hard` directory. +If no prefix is used, the Simple Hash distributor is used. + +#### Guided configuration file + +Creating a guided configuration file is based on an I/O trace file of a previous execution of the application. +For this the `trace_reads` tracing module is used (see above). + +The `trace_reads` module enables a `TRACE_READS` level log at the clients writing the I/O information of the client +which is used as the input for a script that creates the guided distributor setting. +Note that capturing the necessary trace records can involve performance degradation. +To capture the I/O of each client within a SLURM environment, i.e., enabling the `trace_reads` module and print its +output to a user-defined path, the following example can be used: +`srun -N 10 -n 320 --export="ALL" /bin/bash -c "export LIBGKFS_LOG=trace_reads;LIBGKFS_LOG_OUTPUT=${HOME}/test/GLOBAL.txt;LD_PRELOAD=${GKFS_PRLD} "` + +Then, the `examples/distributors/guided/generate.py` scrpt is used to create the guided distributor configuration file: + +* `python examples/distributors/guided/generate.py ~/test/GLOBAL.txt >> guided_config.txt` + +Finally, modify `guided_config.txt` to your distribution requirements. + +## Metadata Backends + +There are two different metadata backends in GekkoFS. The default one uses `rocksdb`, however an alternative based +on `PARALLAX` from `FORTH` +is available. To enable it use the `-DGKFS_ENABLE_PARALLAX:BOOL=ON` option, you can also disable `rocksdb` +with `-DGKFS_ENABLE_ROCKSDB:BOOL=OFF`. + +Once it is enabled, `--dbbackend` option will be functional. + +## CMake options #### Core - `GKFS_BUILD_TOOLS` - Build tools (default: OFF) @@ -495,7 +539,7 @@ srun: sending Ctrl-C to StepId=282378.2 - `GKFS_ENABLE_ROCKSDB` - Enable RocksDB metadata backend (default: ON) - `GKFS_ENABLE_PARALLAX` - Enable Parallax metadata support (default: OFF) -## All environment variables +## Environment variables The GekkoFS daemon, client, and proxy support a number of environment variables to augment its functionality: ### Client @@ -545,7 +589,7 @@ until the file is closed. The cache does not impact the consistency of the file - `GKFS_PROXY_LOG_PATH` - Path to the log file of the proxy. - `GKFS_PROXY_LOG_LEVEL` - Log level of the proxy. Available levels are: `off`, `critical`, `err`, `warn`, `info`, `debug`, `trace`. -## Acknowledgment +# Acknowledgment This software was partially supported by the EC H2020 funded NEXTGenIO project (Project ID: 671951, www.nextgenio.eu). diff --git a/docs/sphinx/conf.py.in b/docs/sphinx/conf.py.in index 16d706bd76c5e6a22325870ceb15cec2341f9a55..d166d04898d3346e13eead29ad3001d8c06d93bc 100644 --- a/docs/sphinx/conf.py.in +++ b/docs/sphinx/conf.py.in @@ -22,10 +22,10 @@ copyright = ['2018-2024, Barcelona Supercomputing Center, Spain', '2015-2024, Jo author = 'GekkoFS committers' # The short X.Y version -version = '0.9.2' +version = '0.9.3' # The full version, including alpha/beta/rc tags -release = '0.9.2-snapshot+96-g9cafaaa3-dirty' +release = '0.9.3-snapshot+96-g9cafaaa3-dirty' # Tell sphinx what the primary language being documented is. primary_domain = 'cpp' diff --git a/docs/sphinx/users/scripts.rst b/docs/sphinx/users/scripts.rst index 87c4038fb72abea8cbf5ed693aae1662649449cb..9506f44e85601364efc0acb619d668d29b990ab8 100644 --- a/docs/sphinx/users/scripts.rst +++ b/docs/sphinx/users/scripts.rst @@ -24,14 +24,14 @@ on the specifics of the particular GekkoFS build, both scripts rely on :code:`configuration profiles` which define a set of related software packages which should be downloaded and installed for a specific GekkoFS version and/or configuration. To illustrate this, let's take a look at the -contents of the :code:`default` profile for GekkoFS version :code:`0.9.2`: +contents of the :code:`default` profile for GekkoFS version :code:`0.9.3`: .. code-block:: console - $ dl_dep.sh -l default:0.9.2 - Configuration profiles for '0.9.2': + $ dl_dep.sh -l default:0.9.3 + Configuration profiles for '0.9.3': - * default:0.9.2 (/home/user/gekkofs/source/scripts/profiles/0.9.2/default.specs) + * default:0.9.3 (/home/user/gekkofs/source/scripts/profiles/0.9.3/default.specs) All dependencies @@ -59,10 +59,10 @@ supercomputer) followed by an optional :code:`VERSION_TAG`. .. code-block:: console - $ ./dl_dep.sh -p default:0.9.2 /home/user/gfks/deps + $ ./dl_dep.sh -p default:0.9.3 /home/user/gfks/deps Destination path is set to "/tmp/foo" Profile name: default - Profile version: 0.9.2 + Profile version: 0.9.3 ------------------------------------ Downloaded 'https://github.com/lz4/lz4/archive/v1.9.3.tar.gz' to 'lz4' Downloaded 'https://github.com/json-c/json-c/archive/json-c-0.15-20200726.tar.gz' to 'json-c' @@ -90,10 +90,10 @@ option. In this case, dependency names follow the .. code-block:: console - $ ./dl_dep.sh -d mercury@default:0.9.2 /home/user/gfks/deps + $ ./dl_dep.sh -d mercury@default:0.9.3 /home/user/gfks/deps Destination path is set to "/tmp/foo" Profile name: default - Profile version: 0.9.2 + Profile version: 0.9.3 ------------------------------------ Cloned 'https://github.com/mercury-hpc/mercury' to 'mercury' with commit '[v2.1.0]' and flags '--recurse-submodules' Done @@ -115,12 +115,12 @@ certain directory (e.g. :code:`/home/user/gkfs/deps`), the .. code-block:: console - $ ./compile_dep.sh -p default:0.9.2 /home/user/gkfs/deps /home/user/gkfs/install -j8 + $ ./compile_dep.sh -p default:0.9.3 /home/user/gkfs/deps /home/user/gkfs/install -j8 CORES = 8 (default) Sources download path = /tmp/foo Installation path = /tmp/bar Profile name: default - Profile version: 0.9.2 + Profile version: 0.9.3 ------------------------------------ ######## Installing: lz4 ############################### ... diff --git a/scripts/compile_dep.sh b/scripts/compile_dep.sh index d455bfa94e3c15f83f22190bddccb57723e68c69..66a454591c5dccde149f3c6ad53dde58a8ee702a 100755 --- a/scripts/compile_dep.sh +++ b/scripts/compile_dep.sh @@ -80,7 +80,7 @@ optional arguments: deploy specific library versions and/or configurations, using a recognizable name. Optionally, PROFILE_NAME may include a specific version for the profile, e.g. 'mogon2:latest' or - 'ngio:0.9.2', which will download the dependencies defined for + 'ngio:0.9.3', which will download the dependencies defined for that specific version. If unspecified, the 'default:latest' profile will be used, which should include all the possible dependencies. -d, --dependency DEPENDENCY_NAME[[@PROFILE_NAME][:PROFILE_VERSION]] diff --git a/scripts/dl_dep.sh b/scripts/dl_dep.sh index 0509510e1e3bfe039eb513e7bc68cda2b90b1c53..051c10eecce8fa1d1c2c6560789fff8d61820746 100755 --- a/scripts/dl_dep.sh +++ b/scripts/dl_dep.sh @@ -336,7 +336,7 @@ optional arguments: deploy specific library versions and/or configurations, using a recognizable name. Optionally, PROFILE_NAME may include a specific version for the profile, e.g. 'mogon2:latest' or - 'ngio:0.9.2', which will download the dependencies defined for + 'ngio:0.9.3', which will download the dependencies defined for that specific version. If unspecified, the 'default:latest' profile will be used, which should include all the possible dependencies. -d, --dependency DEPENDENCY_NAME[[@PROFILE_NAME][:PROFILE_VERSION]] diff --git a/scripts/gkfs_dep.sh b/scripts/gkfs_dep.sh index a0990555474226a84c68f869feabc3e355559eaa..2fc0369e00edfa1c8a7967ce1b4f24f48687bf20 100755 --- a/scripts/gkfs_dep.sh +++ b/scripts/gkfs_dep.sh @@ -62,7 +62,7 @@ optional arguments: deploy specific library versions and/or configurations, using a recognizable name. Optionally, PROFILE_NAME may include a specific version for the profile, e.g. 'mogon2:latest' or - 'ngio:0.9.2', which will download the dependencies defined for + 'ngio:0.9.3', which will download the dependencies defined for that specific version. If unspecified, the 'default:latest' profile will be used, which should include all the possible dependencies. -d, --dependency DEPENDENCY_NAME[[@PROFILE_NAME][:PROFILE_VERSION]]