Verified Commit 49263be8 authored by Marc Vef's avatar Marc Vef
Browse files

Cleanup, Readme, changelog.

parent 0f42da53
Loading
Loading
Loading
Loading
Loading
+6 −0
Original line number Diff line number Diff line
@@ -8,6 +8,12 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### New

- Added file system expansion support ([!196](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/196)).
  - Added the tool `gkfs_malleability` to steer start, status, and finalize requests for expansion operations.
  - `-DGKFS_BUILD_TOOLS=ON` must be set for CMake to build the tool.
  - Overhauled the `gkfs` run script to accommodate the new tool.
  - During expansion, redistribution of data is performed by the daemons. Therefore, an RPC client for daemons was added.
  - See Readme for usage details.
- Propagate PKG_CONFIG_PATH to dependency scripts ([!185](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/185)).
- Added syscall support for listxattr family ([!186](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/186)).
- Remove optimization, removing one RPC per operation ([!195](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/195)).
+63 −9
Original line number Diff line number Diff line
@@ -159,7 +159,7 @@ to be empty.

For MPI application, the `LD_PRELOAD` variable can be passed with the `-x` argument for `mpirun/mpiexec`.

## Run GekkoFS daemons on multiple nodes (beta version!)
## Run GekkoFS daemons on multiple nodes

The `scripts/run/gkfs` script can be used to simplify starting the GekkoFS daemon on one or multiple nodes. To start
GekkoFS on multiple nodes, a Slurm environment that can execute `srun` is required. Users can further
@@ -168,9 +168,9 @@ modify `scripts/run/gkfs.conf` to mold default configurations to their environme
The following options are available for `scripts/run/gkfs`:

```bash
usage: gkfs [-h/--help] [-r/--rootdir <path>] [-m/--mountdir <path>] [-a/--args <daemon_args>] [-f/--foreground <false>]
        [--srun <false>] [-n/--numnodes <jobsize>] [--cpuspertask <64>] [--numactl <false>] [-v/--verbose <false>]
        {start,stop}
usage: gkfs [-h/--help] [-r/--rootdir <path>] [-m/--mountdir <path>] [-a/--args <daemon_args>] [--proxy <false>] [-f/--foreground <false>]
        [--srun <false>] [-n/--numnodes <jobsize>] [--cpuspertask <64>] [-v/--verbose <false>]
        {start,expand,stop}


    This script simplifies the starting and stopping GekkoFS daemons. If daemons are started on multiple nodes,
@@ -178,21 +178,23 @@ usage: gkfs [-h/--help] [-r/--rootdir <path>] [-m/--mountdir <path>] [-a/--args
    additional permanent configurations can be set.

    positional arguments:
            command                 Command to execute: 'start' and 'stop'
            COMMAND                 Command to execute: 'start', 'stop', 'expand'

    optional arguments:
            -h, --help              Shows this help message and exits
            -r, --rootdir <path>    Providing the rootdir path for GekkoFS daemons.
            -m, --mountdir <path>   Providing the mountdir path for GekkoFS daemons.
            -a, --args <daemon_arguments>
            -r, --rootdir <path>    The rootdir path for GekkoFS daemons.
            -m, --mountdir <path>   The mountdir path for GekkoFS daemons.
            -d, --daemon_args <daemon_arguments>
            --proxy                 Start proxy after the daemons are running.
                                    Add various additional daemon arguments, e.g., "-l ib0 -P ofi+psm2".
            -p, --proxy_args <proxy_arguments>
            -f, --foreground        Starts the script in the foreground. Daemons are stopped by pressing 'q'.
            --srun                  Use srun to start daemons on multiple nodes.
            -n, --numnodes <n>      GekkoFS daemons are started on n nodes.
                                    Nodelist is extracted from Slurm via the SLURM_JOB_ID env variable.
            --cpuspertask <#cores>  Set the number of cores the daemons can use. Must use '--srun'.
            --numactl               Use numactl for the daemon. Modify gkfs.conf for further numactl configurations.
            -c, --config            Path to configuration file. By defaults looks for a 'gkfs.conf' in this directory.
            -e, --expand_hostfile   Path to the hostfile with new nodes where GekkoFS should be extended to (hostfile contains one line per node).
            -v, --verbose           Increase verbosity
```

@@ -415,6 +417,58 @@ Press 'q' to exit
Please consult `include/config.hpp` for additional configuration options. Note, GekkoFS proxy does not support
replication.

### File system expansion

GekkoFS supports extending the current daemon configuration to additional compute nodes. This includes redistribution of
the existing data and metadata and therefore scales file system performance and capacity of existing data. Note,
that it is the user's responsibility to not access the GekkoFS file system during redistribution. A corresponding
feature that is transparent to the user is planned. Note also, if the GekkoFS proxy is used, they need to be manually
restarted, after expansion.

To enable this feature, the following CMake compilation flags are required to build the `gkfs_malleability` tool:
`-DGKFS_BUILD_TOOLS=ON`. The `gkfs_malleability` tool is then available in the `build/tools` directory. Please consult
`-h` for its arguments. While the tool can be used manually to expand the file system, the `scripts/run/gkfs` script
should be used instead which invokes the `gkfs_malleability` tool.

The only requirement for extending the file system is a hostfile containing the hostnames/IPs of the new nodes (one line
per host). Example starting the file system. The `DAEMON_NODELIST` in the `gkfs.conf` is set to a hostfile containing
the initial set of file system nodes.:

```bash
~/gekkofs/scripts/run/gkfs -c ~/run/gkfs_verbs_expandtest.conf start
* [gkfs] Starting GekkoFS daemons (4 nodes) ...
* [gkfs] GekkoFS daemons running
* [gkfs] Startup time: 10.853 seconds
```

... Some computation ...

Expanding the file system. Using `-e <hostfile>` to specify the new nodes. Redistribution is done automatically with a
progress bar. When finished, the file system is ready to use in the new configuration:

```bash
~/gekkofs/scripts/run/gkfs -c ~/run/gkfs_verbs_expandtest.conf -e ~/hostfile_expand expand
* [gkfs] Starting GekkoFS daemons (8 nodes) ...
* [gkfs] GekkoFS daemons running
* [gkfs] Startup time: 1.058 seconds
Expansion process from 4 nodes to 12 nodes launched...
* [gkfs] Expansion progress:
[####################] 0/4 left
* [gkfs] Redistribution process done. Finalizing ...
* [gkfs] Expansion done.
```

Stop the file system:

```bash
~/gekkofs/scripts/run/gkfs -c ~/run/gkfs_verbs_expandtest.conf stop
* [gkfs] Stopping daemon with pid 16462
srun: sending Ctrl-C to StepId=282378.1
* [gkfs] Stopping daemon with pid 16761
srun: sending Ctrl-C to StepId=282378.2
* [gkfs] Shutdown time: 1.032 seconds
```

## Acknowledgment

This software was partially supported by the EC H2020 funded NEXTGenIO project (Project ID: 671951, www.nextgenio.eu).
+0 −2
Original line number Diff line number Diff line
@@ -449,8 +449,6 @@ add_daemons() {
    NODE_CNT_EXPAND=$((${node_cnt_initial}+$(cat ${EXPAND_NODELIST} | wc -l)))
    # start new set of daemons
    start_daemons
    # TODO REMOVE
#    sed -i '0,/evie/! s/evie/evie2/' ${HOSTSFILE}
    export LIBGKFS_HOSTS_FILE=${HOSTSFILE}
    # start expansion which redistributes metadata and data
    ${GKFS_MALLEABILITY_BIN_} expand start
+4 −1
Original line number Diff line number Diff line
@@ -6,7 +6,10 @@ DAEMON_BIN=../../build/src/daemon/gkfs_daemon
PROXY_BIN=../../build/src/proxy/gkfs_proxy

# client configuration (needs to be set for all clients)
LIBGKFS_HOSTS_FILE=/home/evie/workdir/gkfs_hosts.txt
LIBGKFS_HOSTS_FILE=/home/XXX/workdir/gkfs_hosts.txt

# tools (if build)
GKFS_MALLEABILITY_BIN=../../build/tools/gkfs_malleability

## daemon configuration
#DAEMON_ROOTDIR=/dev/shm/vef_gkfs_rootdir
+7 −3
Original line number Diff line number Diff line
@@ -51,6 +51,8 @@ namespace fs = std::filesystem;

namespace gkfs::malleable {

// TODO The following three functions are almost identical to the proxy code
// They should be moved to a common and shared between the proxy and the daemon
vector<pair<string, string>>
MalleableManager::load_hostfile(const std::string& path) {

@@ -198,7 +200,7 @@ int
MalleableManager::redistribute_metadata() {
    uint64_t count = 0;
    auto estimate_db_size = GKFS_DATA->mdb()->db_size();
    auto percent_interval = estimate_db_size / 1000;
    auto percent_interval = estimate_db_size / 100;
    GKFS_DATA->spdlogger()->info(
            "{}() Starting metadata redistribution for '{}' estimated number of KV pairs...",
            __func__, estimate_db_size);
@@ -206,6 +208,7 @@ MalleableManager::redistribute_metadata() {
    string key, value;
    auto iter =
            static_cast<rocksdb::Iterator*>(GKFS_DATA->mdb()->iterate_all());
    // TODO parallelize
    for(iter->SeekToFirst(); iter->Valid(); iter->Next()) {
        key = iter->key().ToString();
        value = iter->value().ToString();
@@ -213,11 +216,11 @@ MalleableManager::redistribute_metadata() {
            continue;
        }
        auto dest_id = RPC_DATA->distributor()->locate_file_metadata(key, 0);
        GKFS_DATA->spdlogger()->info(
        GKFS_DATA->spdlogger()->trace(
                "{}() Migration: key {} and value {}. From host {} to host {}",
                __func__, key, value, RPC_DATA->local_host_id(), dest_id);
        if(dest_id == RPC_DATA->local_host_id()) {
            GKFS_DATA->spdlogger()->info("{}() SKIPPERS", __func__);
            GKFS_DATA->spdlogger()->trace("{}() SKIP", __func__);
            continue;
        }
        auto err = gkfs::malleable::rpc::forward_metadata(key, value, dest_id);
@@ -248,6 +251,7 @@ MalleableManager::redistribute_data() {
    auto chunk_dir = fs::path(GKFS_DATA->storage()->get_chunk_directory());
    auto dir_iterator = GKFS_DATA->storage()->get_all_chunk_files();

    // TODO this can be parallelized, e.g., async chunk I/O
    for(const auto& entry : dir_iterator) {
        if(!entry.is_regular_file()) {
            continue;