Commit 05874f69 authored by Marc Vef's avatar Marc Vef
Browse files

Merge branch 'release-0.9.3' into 'master'

Release 0.9.3


See merge request !199
parents c300a359 c9e64c5e
Loading
Loading
Loading
Loading
Loading
+5 −5
Original line number Diff line number Diff line
@@ -7,7 +7,12 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]
### New
### Changed
### Removed
### Fixed

## [0.9.3] - 2024-07
### New
- Added a write size cache to the file system client to reduce potential metadata network bottlenecks during small I/O
  operations ([!193](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/193)).
  - The cache is experimental and thus disabled by default. Added the following environment variables.
@@ -62,9 +67,7 @@ replicas ([!166](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/141)
  - Modified write and reads to use a bitset instead of the traditional hash per chunk in the server.
  - Added reattemp support in get_fs_config to other servers, when the initial server fails.


### Changed

- Updated GekkoFS dependencies migrating to
  margo-shim-hg ([!165](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/165)).
  - Improves RPC stability
@@ -74,10 +77,7 @@ replicas ([!166](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/141)
    use of syscall for following symlinks optional
    ([!183](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/183)).


### Removed
### Fixed

- An issue that updated the last modified time of a file during `stat` operations was
  fixed([!176](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/176)).
- Fixed a dependency conflict within the pytest dependency marshmallow ([!197](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/197)).
+152 −108
Original line number Diff line number Diff line
@@ -9,10 +9,42 @@ in a HPC cluster to produce a high-performance storage space that can be accesse
This storage space allows HPC applications and simulations to run in isolation from each other with regards
to I/O, which reduces interferences and improves performance.

# Table of contents

- [Dependencies](#dependencies)
  - [Debian/Ubuntu](#debianubuntu)
  - [CentOS/Red Hat](#centosred-hat)
- [Step-by-step installation](#step-by-step-installation)
- [Run GekkoFS](#run-gekkofs)
  - [The GekkoFS hostsfile](#the-gekkofs-hostsfile)
  - [The GekkoFS daemon](#the-gekkofs-daemon)
    - [Manual startup and shut down](#manual-startup-and-shut-down)
    - [GekkoFS daemon orchestration via the gkfs script (recommended)](#gekkofs-daemon-orchestration-via-the-gkfs-script-recommended)
  - [The GekkoFS client library](#the-gekkofs-client-library)
    - [Interposition library via system call interception](#interposition-library-via-system-call-interception)
    - [User library via linking against the application](#user-library-via-linking-against-the-application)
  - [Logging](#logging)
- [Advanced and experimental features](#advanced-and-experimental-features)
  - [Rename](#rename)
  - [Replication](#replication)
  - [Client-side metrics via MessagePack and ZeroMQ](#client-side-metrics-via-messagepack-and-zeromq)
  - [Server-side statistics via Prometheus](#server-side-statistics-via-prometheus)
  - [GekkoFS proxy](#gekkofs-proxy)
  - [File system expansion](#file-system-expansion)
- [Miscellaneous](#miscellaneous)
  - [External functions](#external-functions)
  - [Data placement](#data-placement)
    - [Simple Hash (Default)](#simple-hash-default)
    - [Guided Distributor](#guided-distributor)
  - [Metadata Backends](#metadata-backends)
  - [CMake options](#cmake-options)
  - [Environment variables](#environment-variables)
- [Acknowledgment](#acknowledgment)

# Dependencies

- \>gcc-8 (including g++) for C++11 support
- General build tools: Git, Curl, CMake >3.6 (>3.11 for GekkoFS testing), Autoconf, Automake
- \>gcc-12 (including g++) for C++17 support
- General build tools: Git, Curl, CMake >3.13, Autoconf, Automake
- Miscellaneous: Libtool, Libconfig

### Debian/Ubuntu
@@ -57,15 +89,13 @@ GekkoFS is now available at:
- GekkoFS daemon (server): `<install_path>/bin/gkfs_daemon`
- GekkoFS client interception library: `<install_path>/lib64/libgkfs_intercept.so`

## Use Spack to install GekkoFS (alternative)
## Spack for installing GekkoFS (alternative)

The Spack tool can be used to easily install GekkoFS and its dependencies. Refer to the
following [README](scripts/spack/README.md) for details.

# Run GekkoFS

## General

On each node a daemon (`gkfs_daemon` binary) has to be started. Other tools can be used to execute
the binary on many nodes, e.g., `srun`, `mpiexec/mpirun`, `pdsh`, or `pssh`.

@@ -89,7 +119,11 @@ GPFS or Lustre.

*Note: NFS is not strongly consistent and cannot be used for the hosts file!*

## GekkoFS daemon start and shut down
## The GekkoFS daemon

The GekkoFS daemon is the server component of GekkoFS. It is responsible for managing the file system data and metadata. There are two options to run the daemons on one or several nodes: (1) manually by executing the `gkfs_daemon` binary directly or (2) by using the `gkfs` script (recommended).

### Manual startup and shut down

tl;dr example: `<install_path>/bin/gkfs_daemon -r <fs_data_path> -m <pseudo_gkfs_mount_dir_path> -H <hostsfile_path>`

@@ -137,29 +171,7 @@ are part of the same file system, use the same `rootdir` with different `rootdir

Shut it down by gracefully killing the process (SIGTERM).

## Use the GekkoFS client library

tl;dr example:

```bash
export LIBGKFS_ HOSTS_FILE=<hostfile_path>
LD_PRELOAD=<install_path>/lib64/libgkfs_intercept.so cp ~/some_input_data <pseudo_gkfs_mount_dir_path>/some_input_data
LD_PRELOAD=<install_path>/lib64/libgkfs_intercept.so md5sum ~/some_input_data <pseudo_gkfs_mount_dir_path>/some_input_data
```

Clients read the hostsfile to determine which daemons are part of the GekkoFS instance. Because the client is an
interposition library that is loaded within the context of the application, this information is passed via the
environment variable `LIBGKFS_HOSTS_FILE` pointing to the hostsfile path. The client library itself is loaded for each
application process via the `LD_PRELOAD` environment variable intercepting file system related calls. If they are
within (or hierarchically under) the GekkoFS mount directory they are processed in the library, otherwise they are
passed to the kernel.

Note, if `LD_PRELOAD` is not pointing to the library and, hence the client is not loaded, the mounting directory appears
to be empty.

For MPI application, the `LD_PRELOAD` variable can be passed with the `-x` argument for `mpirun/mpiexec`.

## Run GekkoFS daemons on multiple nodes
### GekkoFS daemon orchestration via the `gkfs` script (recommended)

The `scripts/run/gkfs` script can be used to simplify starting the GekkoFS daemon on one or multiple nodes. To start
GekkoFS on multiple nodes, a Slurm environment that can execute `srun` is required. Users can further
@@ -198,7 +210,39 @@ usage: gkfs [-h/--help] [-r/--rootdir <path>] [-m/--mountdir <path>] [-a/--args
            -v, --verbose           Increase verbosity
```

### Logging
## The GekkoFS client library

### Interposition library via system call interception

tl;dr example:

```bash
export LIBGKFS_ HOSTS_FILE=<hostfile_path>
LD_PRELOAD=<install_path>/lib64/libgkfs_intercept.so cp ~/some_input_data <pseudo_gkfs_mount_dir_path>/some_input_data
LD_PRELOAD=<install_path>/lib64/libgkfs_intercept.so md5sum ~/some_input_data <pseudo_gkfs_mount_dir_path>/some_input_data
```

Clients read the hostsfile to determine which daemons are part of the GekkoFS instance. Because the client is an
interposition library that is loaded within the context of the application, this information is passed via the
environment variable `LIBGKFS_HOSTS_FILE` pointing to the hostsfile path. The client library itself is loaded for each
application process via the `LD_PRELOAD` environment variable intercepting file system related calls. If they are
within (or hierarchically under) the GekkoFS mount directory they are processed in the library, otherwise they are
passed to the kernel.

Note, if `LD_PRELOAD` is not pointing to the library and, hence the client is not loaded, the mounting directory appears
to be empty.

For MPI application, the `LD_PRELOAD` variable can be passed with the `-x` argument for `mpirun/mpiexec`.

### User library via linking against the application

GekkoFS offers a user library that can be linked against the application, which is built by default:
`libgkfs_user_lib.so` shared library. The corresponding API and developer headers are available in
`include/client/user_functions.hpp`. Please consult `examples/user_library` for details.

In this case, `LD_PRELOAD` is not necessary. Nevertheless, `LIBGKFS_HOSTS_FILE` is still required.

## Logging

The following environment variables can be used to enable logging in the client library: `LIBGKFS_LOG=<module>`
and `LIBGKFS_LOG_OUTPUT=<path/to/file>` to configure the output module and set the path to the log file of the client
@@ -244,78 +288,9 @@ For the daemon, the `GKFS_DAEMON_LOG_PATH=<path/to/file>` environment variable c
log file, and the log module can be selected with the `GKFS_DAEMON_LOG_LEVEL={off,critical,err,warn,info,debug,trace}`
environment variable.

# Miscellaneous

## External functions

GekkoFS allows to use external functions on your client code, via LD_PRELOAD.
Source code needs to be compiled with -fPIC. We include a pfind io500 substitution,
`examples/gfind/gfind.cpp` and a non-mpi version `examples/gfind/sfind.cpp`

## Data distributors

The data distribution can be selected at compilation time, we have 2 distributors available:

### Simple Hash (Default)

Chunks are distributed randomly to the different GekkoFS servers.

### Guided Distributor

The guided distributor allows defining a specific distribution of data on a per directory or file basis.
The distribution configurations are defined within a shared file (called `guided_config.txt` henceforth) with the
following format:
`<path> <chunk_number> <host>`

To enable the distributor, the following CMake compilation flags are required:

* `GKFS_USE_GUIDED_DISTRIBUTION` ON
* `GKFS_USE_GUIDED_DISTRIBUTION_PATH` `<path_guided_config.txt>`

To use a custom distribution, a path needs to have the prefix `#` (e.g., `#/mdt-hard 0 0`), in which all the data of all
files in that directory goes to the same place as the metadata.
Note, that a chunk/host configuration is inherited to all children files automatically even if not using the prefix.
In this example, `/mdt-hard/file1` is therefore also using the same distribution as the `/mdt-hard` directory.
If no prefix is used, the Simple Hash distributor is used.

#### Guided configuration file

Creating a guided configuration file is based on an I/O trace file of a previous execution of the application.
For this the `trace_reads` tracing module is used (see above).

The `trace_reads` module enables a `TRACE_READS` level log at the clients writing the I/O information of the client
which is used as the input for a script that creates the guided distributor setting.
Note that capturing the necessary trace records can involve performance degradation.
To capture the I/O of each client within a SLURM environment, i.e., enabling the `trace_reads` module and print its
output to a user-defined path, the following example can be used:
`srun -N 10 -n 320 --export="ALL" /bin/bash -c "export LIBGKFS_LOG=trace_reads;LIBGKFS_LOG_OUTPUT=${HOME}/test/GLOBAL.txt;LD_PRELOAD=${GKFS_PRLD} <app>"`

Then, the `examples/distributors/guided/generate.py` scrpt is used to create the guided distributor configuration file:

* `python examples/distributors/guided/generate.py ~/test/GLOBAL.txt >> guided_config.txt`

Finally, modify `guided_config.txt` to your distribution requirements.

## Metadata Backends

There are two different metadata backends in GekkoFS. The default one uses `rocksdb`, however an alternative based
on `PARALLAX` from `FORTH`
is available. To enable it use the `-DGKFS_ENABLE_PARALLAX:BOOL=ON` option, you can also disable `rocksdb`
with `-DGKFS_ENABLE_ROCKSDB:BOOL=OFF`.

Once it is enabled, `--dbbackend` option will be functional.

## Statistics

GekkoFS daemons are able to output general operations (`--enable-collection`) and data chunk
statistics (`--enable-chunkstats`) to a specified output file via `--output-stats <FILE>`. Prometheus can also be used
instead or in addition to the output file. It must be enabled at compile time via the CMake
argument `-DGKFS_ENABLE_PROMETHEUS` and the daemon argument `--enable-prometheus`. The corresponding statistics are then
pushed to the Prometheus instance.

## Advanced and experimental features
# Advanced and experimental features

### Rename
## Rename

`-DGKFS_RENAME_SUPPORT` allows the application to rename files.
This is an experimental feature, and some scenarios may not work properly.
@@ -323,14 +298,14 @@ Support for fstat in renamed files is included.

This is disabled by default.

### Replication
## Replication

The user can enable the data replication feature by setting the replication environment variable:
`LIBGKFS_NUM_REPL=<num repl>`.
The number of replicas should go from `0` to the `number of servers - 1`. The replication environment variable can be
set up for each client independently.

### Client metrics via MessagePack and ZeroMQ
## Client-side metrics via MessagePack and ZeroMQ

GekkoFS clients support capturing the I/O traces of each individual process and periodically exporting them to a given
file or ZeroMQ sink via the TCP protocol.
@@ -375,7 +350,15 @@ total_bytes: 1802366
total_iops: 4
```

### GekkoFS proxy
## Server-side statistics via Prometheus

GekkoFS daemons are able to output general operations (`--enable-collection`) and data chunk
statistics (`--enable-chunkstats`) to a specified output file via `--output-stats <FILE>`. Prometheus can also be used
instead or in addition to the output file. It must be enabled at compile time via the CMake
argument `-DGKFS_ENABLE_PROMETHEUS` and the daemon argument `--enable-prometheus`. The corresponding statistics are then
pushed to the Prometheus instance.

## GekkoFS proxy

The GekkoFS proxy is an additional (alternative) component that runs on each client and acts as gateway between the
client and daemons. It can improve network stability, e.g., for opa-psm2, and provides a basis for future asynchronous
@@ -417,7 +400,7 @@ Press 'q' to exit
Please consult `include/config.hpp` for additional configuration options. Note, GekkoFS proxy does not support
replication.

### File system expansion
## File system expansion

GekkoFS supports extending the current daemon configuration to additional compute nodes. This includes redistribution of
the existing data and metadata and therefore scales file system performance and capacity of existing data. Note,
@@ -469,7 +452,68 @@ srun: sending Ctrl-C to StepId=282378.2
* [gkfs] Shutdown time: 1.032 seconds
```

## All CMake options
# Miscellaneous

## External functions

GekkoFS allows to use external functions on your client code, via LD_PRELOAD.
Source code needs to be compiled with -fPIC. We include a pfind io500 substitution,
`examples/gfind/gfind.cpp` and a non-mpi version `examples/gfind/sfind.cpp`

## Data placement

The data distribution can be selected at compilation time, we have 2 distributors available:

### Simple Hash (Default)

Chunks are distributed randomly to the different GekkoFS servers.

### Guided Distributor

The guided distributor allows defining a specific distribution of data on a per directory or file basis.
The distribution configurations are defined within a shared file (called `guided_config.txt` henceforth) with the
following format:
`<path> <chunk_number> <host>`

To enable the distributor, the following CMake compilation flags are required:

* `GKFS_USE_GUIDED_DISTRIBUTION` ON
* `GKFS_USE_GUIDED_DISTRIBUTION_PATH` `<path_guided_config.txt>`

To use a custom distribution, a path needs to have the prefix `#` (e.g., `#/mdt-hard 0 0`), in which all the data of all
files in that directory goes to the same place as the metadata.
Note, that a chunk/host configuration is inherited to all children files automatically even if not using the prefix.
In this example, `/mdt-hard/file1` is therefore also using the same distribution as the `/mdt-hard` directory.
If no prefix is used, the Simple Hash distributor is used.

#### Guided configuration file

Creating a guided configuration file is based on an I/O trace file of a previous execution of the application.
For this the `trace_reads` tracing module is used (see above).

The `trace_reads` module enables a `TRACE_READS` level log at the clients writing the I/O information of the client
which is used as the input for a script that creates the guided distributor setting.
Note that capturing the necessary trace records can involve performance degradation.
To capture the I/O of each client within a SLURM environment, i.e., enabling the `trace_reads` module and print its
output to a user-defined path, the following example can be used:
`srun -N 10 -n 320 --export="ALL" /bin/bash -c "export LIBGKFS_LOG=trace_reads;LIBGKFS_LOG_OUTPUT=${HOME}/test/GLOBAL.txt;LD_PRELOAD=${GKFS_PRLD} <app>"`

Then, the `examples/distributors/guided/generate.py` scrpt is used to create the guided distributor configuration file:

* `python examples/distributors/guided/generate.py ~/test/GLOBAL.txt >> guided_config.txt`

Finally, modify `guided_config.txt` to your distribution requirements.

## Metadata Backends

There are two different metadata backends in GekkoFS. The default one uses `rocksdb`, however an alternative based
on `PARALLAX` from `FORTH`
is available. To enable it use the `-DGKFS_ENABLE_PARALLAX:BOOL=ON` option, you can also disable `rocksdb`
with `-DGKFS_ENABLE_ROCKSDB:BOOL=OFF`.

Once it is enabled, `--dbbackend` option will be functional.

## CMake options

#### Core
- `GKFS_BUILD_TOOLS` - Build tools (default: OFF)
@@ -495,7 +539,7 @@ srun: sending Ctrl-C to StepId=282378.2
- `GKFS_ENABLE_ROCKSDB` - Enable RocksDB metadata backend (default: ON)
- `GKFS_ENABLE_PARALLAX` - Enable Parallax metadata support (default: OFF)

## All environment variables
## Environment variables
The GekkoFS daemon, client, and proxy support a number of environment variables to augment its functionality:

### Client
@@ -545,7 +589,7 @@ until the file is closed. The cache does not impact the consistency of the file
- `GKFS_PROXY_LOG_PATH` - Path to the log file of the proxy.
- `GKFS_PROXY_LOG_LEVEL` - Log level of the proxy. Available levels are: `off`, `critical`, `err`, `warn`, `info`, `debug`, `trace`.

## Acknowledgment
# Acknowledgment

This software was partially supported by the EC H2020 funded NEXTGenIO project (Project ID: 671951, www.nextgenio.eu).

+2 −2
Original line number Diff line number Diff line
@@ -22,10 +22,10 @@ copyright = ['2018-2024, Barcelona Supercomputing Center, Spain', '2015-2024, Jo
author = 'GekkoFS committers'

# The short X.Y version
version = '0.9.2'
version = '0.9.3'

# The full version, including alpha/beta/rc tags
release = '0.9.2-snapshot+96-g9cafaaa3-dirty'
release = '0.9.3-snapshot+96-g9cafaaa3-dirty'

# Tell sphinx what the primary language being documented is.
primary_domain = 'cpp'
+10 −10
Original line number Diff line number Diff line
@@ -24,14 +24,14 @@ on the specifics of the particular GekkoFS build, both scripts rely on
:code:`configuration profiles` which define a set of related software
packages which should be downloaded and installed for a specific GekkoFS
version and/or configuration. To illustrate this, let's take a look at the
contents of the :code:`default` profile for GekkoFS version :code:`0.9.2`:
contents of the :code:`default` profile for GekkoFS version :code:`0.9.3`:

.. code-block:: console

    $ dl_dep.sh -l default:0.9.2
    Configuration profiles for '0.9.2':
    $ dl_dep.sh -l default:0.9.3
    Configuration profiles for '0.9.3':

    * default:0.9.2 (/home/user/gekkofs/source/scripts/profiles/0.9.2/default.specs)
    * default:0.9.3 (/home/user/gekkofs/source/scripts/profiles/0.9.3/default.specs)

      All dependencies

@@ -59,10 +59,10 @@ supercomputer) followed by an optional :code:`VERSION_TAG`.

.. code-block:: console

    $ ./dl_dep.sh -p default:0.9.2 /home/user/gfks/deps
    $ ./dl_dep.sh -p default:0.9.3 /home/user/gfks/deps
    Destination path is set to  "/tmp/foo"
    Profile name: default
    Profile version: 0.9.2
    Profile version: 0.9.3
    ------------------------------------
    Downloaded 'https://github.com/lz4/lz4/archive/v1.9.3.tar.gz' to 'lz4'
    Downloaded 'https://github.com/json-c/json-c/archive/json-c-0.15-20200726.tar.gz' to 'json-c'
@@ -90,10 +90,10 @@ option. In this case, dependency names follow the

.. code-block:: console

   $ ./dl_dep.sh -d mercury@default:0.9.2 /home/user/gfks/deps
   $ ./dl_dep.sh -d mercury@default:0.9.3 /home/user/gfks/deps
   Destination path is set to  "/tmp/foo"
   Profile name: default
   Profile version: 0.9.2
   Profile version: 0.9.3
   ------------------------------------
   Cloned 'https://github.com/mercury-hpc/mercury' to 'mercury' with commit '[v2.1.0]' and flags '--recurse-submodules'
   Done
@@ -115,12 +115,12 @@ certain directory (e.g. :code:`/home/user/gkfs/deps`), the

.. code-block:: console

    $ ./compile_dep.sh -p default:0.9.2 /home/user/gkfs/deps /home/user/gkfs/install -j8
    $ ./compile_dep.sh -p default:0.9.3 /home/user/gkfs/deps /home/user/gkfs/install -j8
    CORES = 8 (default)
    Sources download path = /tmp/foo
    Installation path = /tmp/bar
    Profile name: default
    Profile version: 0.9.2
    Profile version: 0.9.3
    ------------------------------------
    ######## Installing:  lz4 ###############################
    ...
+1 −1
Original line number Diff line number Diff line
@@ -80,7 +80,7 @@ optional arguments:
                deploy specific library versions and/or configurations,
                using a recognizable name. Optionally, PROFILE_NAME may include
                a specific version for the profile, e.g. 'mogon2:latest' or
                'ngio:0.9.2', which will download the dependencies defined for
                'ngio:0.9.3', which will download the dependencies defined for
                that specific version. If unspecified, the 'default:latest' profile
                will be used, which should include all the possible dependencies.
    -d, --dependency DEPENDENCY_NAME[[@PROFILE_NAME][:PROFILE_VERSION]]
Loading