Verified Commit b71f6229 authored by Marc Vef's avatar Marc Vef
Browse files

Cleanup, Readme, and Changelog

parent dfc810b8
Loading
Loading
Loading
Loading
Loading
+17 −0
Original line number Diff line number Diff line
@@ -8,6 +8,20 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### New

- Added client-side metrics including the periodic export to a file or ZeroMQ sink via the TCP
  protocol ([!176](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/176)).
  - CMake option added to enable this optional feature `-DGKFS_ENABLE_CLIENT_METRICS=ON`
  - A new CMake option `-DGKFS_BUILD_TOOLS=ON` was added which includes a ZeroMQ server to capture client-side metrics
  - The `libzmq` and `cppzmq` dependencies are required and can be found in the `default_zmq` profile.
  - Added new environment variables for the GekkoFS client:
    - `LIBGKFS_ENABLE_METRICS=ON` enables capturing client-side metrics
    - `LIBGKFS_METRICS_FLUSH_INTERVAL=10` sets the flush interval to 10 seconds (defaults to 5). All client metrics are
      flushed when the process ends.
    - `LIBGKFS_METRICS_PATH=<path>` sets the path to flush client-metrics (defaults to `/tmp/gkfs_client_metrics`).
    - `LIBGKFS_METRICS_IP_PORT=127.0.0.1:5555` enables flushing to a set ZeroMQ server. This option disables flushing to
      a file.
- Added the dependency profile for MOGON-NHR ([!176](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/176)).
- Added UCX and libfabric tcp support ([!176](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/176)).
- Added intercepton of `fadvise64()` and
  `fallocate()` ([!161](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/161)).
- Added user library `gkfs_user_lib` that can be used to directly link to an
@@ -31,6 +45,9 @@ replicas ([!166](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/141)
### Removed
### Fixed

- An issue that updated the last modified time of a file during `stat` operations was
  fixed([!176](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/176)).

## [0.9.2] - 2024-02

### New
+46 −1
Original line number Diff line number Diff line
@@ -310,7 +310,7 @@ instead or in addition to the output file. It must be enabled at compile time vi
argument `-DGKFS_ENABLE_PROMETHEUS` and the daemon argument `--enable-prometheus`. The corresponding statistics are then
pushed to the Prometheus instance.

## Advanced experimental features
## Advanced and experimental features

### Rename

@@ -327,6 +327,51 @@ The user can enable the data replication feature by setting the replication envi
The number of replicas should go from `0` to the `number of servers - 1`. The replication environment variable can be
set up for each client independently.

### Client metrics via MessagePack and ZeroMQ

GekkoFS clients support capturing the I/O traces of each individual process and periodically exporting them to a given
file or ZeroMQ sink via the TCP protocol.
To use this feature, the corresponding ZeroMQ (`libzmq` and `cppzmq`) dependencies are required which can be found in
the `default_zmq` dependency profile.
In addition, GekkoFS must be compiled with client metrics enabled (disabled by default) via the CMake argument
`-DGKFS_ENABLE_CLIENT_METRICS=ON`.

Client metrics are individually enabled per GekkoFS client process via the following environment variables:

- `LIBGKFS_ENABLE_METRICS=ON` enables capturing client-side metrics.
- `LIBGKFS_METRICS_FLUSH_INTERVAL=10` sets the flush interval to 10 seconds (defaults to 5). All outstanding client
  metrics are flushed when the process ends.
- `LIBGKFS_METRICS_PATH=<path>` sets the path to flush client-metrics (defaults to `/tmp/gkfs_client_metrics`).
- `LIBGKFS_METRICS_IP_PORT=127.0.0.1:5555` enables flushing to a set ZeroMQ server. This option disables flushing to a
  file.

The ZeroMQ export can be tested via the `gkfs_clientmetrics2json` application which is built when enabling the CMake
option `-DGKFS_BUILD_TOOLS=ON`:

- Starting the ZeroMQ server: `gkfs_clientmetrics2json tcp://127.0.0.1:5555`
- `gkfs_clientmetrics2json <path>` can also be used to unpack the Messagepack export from a file.
  Examplarily output with the ZeroMQ sink enabled when running:
  `LD_PRELOAD=libgkfs_intercept.so LIBGKFS_ENABLE_METRICS=ON LIBGKFS_METRICS_IP_PORT=127.0.0.1:5555 gkfs cp testfile /tmp/gkfs_mountdir/testfile`:

```bash
~ $ gkfs_clientmetrics2json tcp://127.0.0.1:5555
Binding to: tcp://127.0.0.1:5555
Waiting for message...

Received message with size 68
Generated JSON:
[extra]avg_thruput_mib: [221.93,175.87,266.81,135.69]
end_t_micro: [8008,12396,16006,18454]
flush_t: 18564
hostname: "evie"
io_type: "w"
pid: 1259304
req_size: [524288,524288,524288,229502]
start_t_micro: [5755,9553,14132,16841]
total_bytes: 1802366
total_iops: 4
```

## Acknowledgment

This software was partially supported by the EC H2020 funded NEXTGenIO project (Project ID: 671951, www.nextgenio.eu).
+0 −2
Original line number Diff line number Diff line
@@ -300,12 +300,10 @@ init_preload() {

    gkfs::preload::start_interception();
    errno = oerrno;
#ifdef GKFS_ENABLE_CLIENT_METRICS
    if(!CTX->init_metrics()) {
        exit_error_msg(EXIT_FAILURE,
                       "Unable to initialize client metrics. Exiting...");
    }
#endif
}

/**
+2 −0
Original line number Diff line number Diff line
@@ -128,5 +128,7 @@ if (GKFS_ENABLE_CLIENT_METRICS)
        Msgpack::Msgpack
        cppzmq
        rpc_utils
        PRIVATE
        fmt::fmt
    )
endif ()
+18 −13
Original line number Diff line number Diff line
@@ -33,11 +33,16 @@
#include <cstring>
#include <iostream>
#include <iomanip>
#include <memory>
#include <mutex>
#include <ratio>
#include <sstream>
#include <thread>
#include <chrono>

#include <config.hpp>
#include <zmq.hpp>
#include <fmt/format.h>

extern "C" {
#include <unistd.h>
@@ -94,11 +99,7 @@ ClientMetrics::add_event(
    auto start_offset =
            std::chrono::duration<double, std::micro>(start - init_t_);
    auto end_offset = std::chrono::duration<double, std::micro>(end - init_t_);
    auto duration = std::chrono::duration<double, std::micro>(end_offset -
                                                              start_offset);
    msgpack_data_.total_bytes_ += size;
    //    auto size_mib = size / (1024 * 1024);      // in MiB
    //    auto duration_s = duration.count() / 1000; // in seconds
    // throw away decimals
    msgpack_data_.start_t_.emplace_back(
            static_cast<size_t>(start_offset.count()));
@@ -133,12 +134,16 @@ ClientMetrics::flush_msgpack() {
        auto fd =
                open(flush_path_.c_str(), O_CREAT | O_WRONLY | O_APPEND, 0666);
        if(fd < 0) {
            //        cout << "error open" << endl;
            exit(1);
            std::cerr << "Error opening file to flush client metrics\n";
            return;
        }
        write(fd, data.data(), data.size());
        //    auto written = write(fd, data.data(), data.size());
        //    cout << "written: " << written << endl;
        size_t written_total = 0;
        auto size = data.size();
        auto buf = data.data();
        do {
            written_total +=
                    write(fd, buf + written_total, size - written_total);
        } while(written_total != size);
        close(fd);
    } else {
        zmq::message_t message(data.size());
@@ -146,7 +151,7 @@ ClientMetrics::flush_msgpack() {
        memcpy(message.data(), data.data(), data.size());
        // non-blocking zmq send
        if(zmq_flush_socket_->send(message, zmq::send_flags::none) == -1) {
            std::cerr << "Failed to send zmq message" << std::endl;
            std::cerr << "Failed to send zmq message\n";
        }
    }
    reset_metrics();
@@ -195,9 +200,9 @@ ClientMetrics::path(const string& path, const string prefix) {
    const std::time_t t = std::chrono::system_clock::to_time_t(init_t_);
    std::stringstream init_t_stream;
    init_t_stream << std::put_time(std::localtime(&t), "%F_%T");
    flush_path_ = path + "/" + prefix + "_" + init_t_stream.str() + "_" +
                  msgpack_data_.hostname_ + "_" +
                  to_string(msgpack_data_.pid_) + ".msgpack";
    flush_path_ = fmt::format("{}/{}_{}_{}_{}.msgpack", path, prefix,
                              init_t_stream.str(), msgpack_data_.hostname_,
                              msgpack_data_.pid_);
}
int
ClientMetrics::flush_count() const {
Loading