Commit 32b81b93 authored by Marc Vef's avatar Marc Vef Committed by Ramon Nou
Browse files

GekkoFS daemon can now be restarted without losing its data

A subdirectory is no longer created by default. Therefore, a server restart uses same directory for both data and metadata. Note, that the rootdir can be cleaned with the -c argument. For multiple daemons on one machine, the new argument --rootdir-suffix can be used which creates a subdirectory within the rootdir with a user-defined name. This also allows restarting multiple daemons on one node without losing data.
parent 8fe9033e
Loading
Loading
Loading
Loading
Loading
+16 −8
Original line number Diff line number Diff line
@@ -83,12 +83,13 @@ Further options are available:

```bash
Allowed options
Usage: bin/gkfs_daemon [OPTIONS]
Usage: src/daemon/gkfs_daemon [OPTIONS]

Options:
  -h,--help                   Print this help message and exit
  -m,--mountdir TEXT REQUIRED Virtual mounting directory where GekkoFS is available.
  -r,--rootdir TEXT REQUIRED  Local data directory where GekkoFS data for this daemon is stored.
  -s,--rootdir-suffix TEXT    Creates an additional directory within the rootdir, allowing multiple daemons on one node.
  -i,--metadir TEXT           Metadata directory where GekkoFS RocksDB data directory is located. If not set, rootdir is used.
  -l,--listen TEXT            Address or interface to bind the daemon to. Default: local hostname.
                              When used with ofi+verbs the FI_VERBS_IFACE environment variable is set accordingly which associates the verbs device with the network interface. In case FI_VERBS_IFACE is already defined, the argument is ignored. Default 'ib'.
@@ -97,27 +98,34 @@ Options:
                              Available: {ofi+sockets, ofi+verbs, ofi+psm2} for TCP, Infiniband, and Omni-Path, respectively. (Default ofi+sockets)
                              Libfabric must have enabled support verbs or psm2.
  --auto-sm                   Enables intra-node communication (IPCs) via the `na+sm` (shared memory) protocol, instead of using the RPC protocol. (Default off)
  --clean-rootdir             Cleans Rootdir >before< launching the deamon
  -c,--clean-rootdir          Cleans Rootdir >before< launching the deamon
  --version                   Print version and exit.
```

It is possible to run multiple independent GekkoFS instances on the same node. Note, that when these GekkoFS instances
are part of the same file system, use the same `rootdir` with different `rootdir-suffixe`s.

Shut it down by gracefully killing the process (SIGTERM).

## Use the GekkoFS client library

tl;dr example:

```bash
export LIBGKFS_ HOSTS_FILE=<hostfile_path>
LD_PRELOAD=<install_path>/lib64/libgkfs_intercept.so cp ~/some_input_data <pseudo_gkfs_mount_dir_path>/some_input_data
LD_PRELOAD=<install_path>/lib64/libgkfs_intercept.so md5sum ~/some_input_data <pseudo_gkfs_mount_dir_path>/some_input_data
```

Clients read the hostsfile to determine which daemons are part of the GekkoFS instance. 
Because the client is an interposition library that is loaded within the context of the application, this information is passed via the environment variable `LIBGKFS_HOSTS_FILE` pointing to the hostsfile path.
The client library itself is loaded for each application process via the `LD_PRELOAD` environment variable intercepting file system related calls.
If they are within (or hierarchically under) the GekkoFS mount directory they are processed in the library, otherwise they are passed to the kernel.
Clients read the hostsfile to determine which daemons are part of the GekkoFS instance. Because the client is an
interposition library that is loaded within the context of the application, this information is passed via the
environment variable `LIBGKFS_HOSTS_FILE` pointing to the hostsfile path. The client library itself is loaded for each
application process via the `LD_PRELOAD` environment variable intercepting file system related calls. If they are
within (or hierarchically under) the GekkoFS mount directory they are processed in the library, otherwise they are
passed to the kernel.

Note, if `LD_PRELOAD` is not pointing to the library and, hence the client is not loaded, the mounting directory appear to be empty.
Note, if `LD_PRELOAD` is not pointing to the library and, hence the client is not loaded, the mounting directory appears
to be empty.

For MPI application, the `LD_PRELOAD` variable can be passed with the `-x` argument for `mpirun/mpiexec`.

+6 −0
Original line number Diff line number Diff line
@@ -76,6 +76,10 @@ constexpr auto implicit_data_removal = true;
// level
constexpr auto create_exist_check = true;
} // namespace metadata
namespace data {
// directory name below rootdir where chunks are placed
constexpr auto chunk_dir = "chunks";
} // namespace data

namespace rpc {
constexpr auto chunksize = 524288; // in bytes (e.g., 524288 == 512KB)
@@ -94,6 +98,8 @@ constexpr auto daemon_handler_xstreams = 4;
namespace rocksdb {
// Write-ahead logging of rocksdb
constexpr auto use_write_ahead_log = false;
// directory name where the rocksdb instance is placed
constexpr auto data_dir = "rocksdb";
} // namespace rocksdb

} // namespace gkfs::config
+14 −7
Original line number Diff line number Diff line
@@ -50,20 +50,21 @@ namespace daemon {
class FsData {

private:
    FsData() {}
    FsData() = default;

    // logger
    std::shared_ptr<spdlog::logger> spdlogger_;

    // paths
    std::string rootdir_;
    std::string mountdir_;
    std::string metadir_;
    std::string rootdir_{};
    std::string rootdir_suffix_{};
    std::string mountdir_{};
    std::string metadir_{};

    // RPC management
    std::string rpc_protocol_;
    std::string bind_addr_;
    std::string hosts_file_;
    std::string rpc_protocol_{};
    std::string bind_addr_{};
    std::string hosts_file_{};
    bool use_auto_sm_;

    // Database
@@ -104,6 +105,12 @@ public:
    void
    rootdir(const std::string& rootdir_);

    const std::string&
    rootdir_suffix() const;

    void
    rootdir_suffix(const std::string& rootdir_suffix_);

    const std::string&
    mountdir() const;

+8 −1
Original line number Diff line number Diff line
@@ -174,6 +174,14 @@ load_hostfile(const std::string& path) {
                "Hosts file found but no suitable addresses could be extracted");
    }
    extract_protocol(hosts[0].second);
    // sort hosts so that data always hashes to the same place during restart
    std::sort(hosts.begin(), hosts.end());
    // remove rootdir suffix from host after sorting as no longer required
    for(auto& h : hosts) {
        auto idx = h.first.rfind("#");
        if(idx != string::npos)
            h.first.erase(idx, h.first.length());
    }
    return hosts;
}

@@ -362,7 +370,6 @@ read_hosts_file() {
    }

    LOG(INFO, "Hosts pool size: {}", hosts.size());
    sort(hosts.begin(), hosts.end()); // Sort hosts by alphanumerical value.
    return hosts;
}

+10 −0
Original line number Diff line number Diff line
@@ -80,6 +80,16 @@ FsData::rootdir(const std::string& rootdir) {
    FsData::rootdir_ = rootdir;
}

const std::string&
FsData::rootdir_suffix() const {
    return rootdir_suffix_;
}

void
FsData::rootdir_suffix(const std::string& rootdir_suffix) {
    FsData::rootdir_suffix_ = rootdir_suffix;
}

const std::string&
FsData::mountdir() const {
    return mountdir_;
Loading