Loading CHANGELOG.md +17 −0 Original line number Diff line number Diff line Loading @@ -8,6 +8,20 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] ### New - Added client-side metrics including the periodic export to a file or ZeroMQ sink via the TCP protocol ([!176](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/176)). - CMake option added to enable this optional feature `-DGKFS_ENABLE_CLIENT_METRICS=ON` - A new CMake option `-DGKFS_BUILD_TOOLS=ON` was added which includes a ZeroMQ server to capture client-side metrics - The `libzmq` and `cppzmq` dependencies are required and can be found in the `default_zmq` profile. - Added new environment variables for the GekkoFS client: - `LIBGKFS_ENABLE_METRICS=ON` enables capturing client-side metrics - `LIBGKFS_METRICS_FLUSH_INTERVAL=10` sets the flush interval to 10 seconds (defaults to 5). All client metrics are flushed when the process ends. - `LIBGKFS_METRICS_PATH=<path>` sets the path to flush client-metrics (defaults to `/tmp/gkfs_client_metrics`). - `LIBGKFS_METRICS_IP_PORT=127.0.0.1:5555` enables flushing to a set ZeroMQ server. This option disables flushing to a file. - Added the dependency profile for MOGON-NHR ([!176](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/176)). - Added UCX and libfabric tcp support ([!176](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/176)). - Added intercepton of `fadvise64()` and `fallocate()` ([!161](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/161)). - Added user library `gkfs_user_lib` that can be used to directly link to an Loading @@ -31,6 +45,9 @@ replicas ([!166](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/141) ### Removed ### Fixed - An issue that updated the last modified time of a file during `stat` operations was fixed([!176](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/176)). ## [0.9.2] - 2024-02 ### New Loading README.md +46 −1 Original line number Diff line number Diff line Loading @@ -310,7 +310,7 @@ instead or in addition to the output file. It must be enabled at compile time vi argument `-DGKFS_ENABLE_PROMETHEUS` and the daemon argument `--enable-prometheus`. The corresponding statistics are then pushed to the Prometheus instance. ## Advanced experimental features ## Advanced and experimental features ### Rename Loading @@ -327,6 +327,51 @@ The user can enable the data replication feature by setting the replication envi The number of replicas should go from `0` to the `number of servers - 1`. The replication environment variable can be set up for each client independently. ### Client metrics via MessagePack and ZeroMQ GekkoFS clients support capturing the I/O traces of each individual process and periodically exporting them to a given file or ZeroMQ sink via the TCP protocol. To use this feature, the corresponding ZeroMQ (`libzmq` and `cppzmq`) dependencies are required which can be found in the `default_zmq` dependency profile. In addition, GekkoFS must be compiled with client metrics enabled (disabled by default) via the CMake argument `-DGKFS_ENABLE_CLIENT_METRICS=ON`. Client metrics are individually enabled per GekkoFS client process via the following environment variables: - `LIBGKFS_ENABLE_METRICS=ON` enables capturing client-side metrics. - `LIBGKFS_METRICS_FLUSH_INTERVAL=10` sets the flush interval to 10 seconds (defaults to 5). All outstanding client metrics are flushed when the process ends. - `LIBGKFS_METRICS_PATH=<path>` sets the path to flush client-metrics (defaults to `/tmp/gkfs_client_metrics`). - `LIBGKFS_METRICS_IP_PORT=127.0.0.1:5555` enables flushing to a set ZeroMQ server. This option disables flushing to a file. The ZeroMQ export can be tested via the `gkfs_clientmetrics2json` application which is built when enabling the CMake option `-DGKFS_BUILD_TOOLS=ON`: - Starting the ZeroMQ server: `gkfs_clientmetrics2json tcp://127.0.0.1:5555` - `gkfs_clientmetrics2json <path>` can also be used to unpack the Messagepack export from a file. Examplarily output with the ZeroMQ sink enabled when running: `LD_PRELOAD=libgkfs_intercept.so LIBGKFS_ENABLE_METRICS=ON LIBGKFS_METRICS_IP_PORT=127.0.0.1:5555 gkfs cp testfile /tmp/gkfs_mountdir/testfile`: ```bash ~ $ gkfs_clientmetrics2json tcp://127.0.0.1:5555 Binding to: tcp://127.0.0.1:5555 Waiting for message... Received message with size 68 Generated JSON: [extra]avg_thruput_mib: [221.93,175.87,266.81,135.69] end_t_micro: [8008,12396,16006,18454] flush_t: 18564 hostname: "evie" io_type: "w" pid: 1259304 req_size: [524288,524288,524288,229502] start_t_micro: [5755,9553,14132,16841] total_bytes: 1802366 total_iops: 4 ``` ## Acknowledgment This software was partially supported by the EC H2020 funded NEXTGenIO project (Project ID: 671951, www.nextgenio.eu). Loading src/client/preload.cpp +0 −2 Original line number Diff line number Diff line Loading @@ -300,12 +300,10 @@ init_preload() { gkfs::preload::start_interception(); errno = oerrno; #ifdef GKFS_ENABLE_CLIENT_METRICS if(!CTX->init_metrics()) { exit_error_msg(EXIT_FAILURE, "Unable to initialize client metrics. Exiting..."); } #endif } /** Loading src/common/CMakeLists.txt +2 −0 Original line number Diff line number Diff line Loading @@ -128,5 +128,7 @@ if (GKFS_ENABLE_CLIENT_METRICS) Msgpack::Msgpack cppzmq rpc_utils PRIVATE fmt::fmt ) endif () src/common/msgpack_util.cpp +18 −13 Original line number Diff line number Diff line Loading @@ -33,11 +33,16 @@ #include <cstring> #include <iostream> #include <iomanip> #include <memory> #include <mutex> #include <ratio> #include <sstream> #include <thread> #include <chrono> #include <config.hpp> #include <zmq.hpp> #include <fmt/format.h> extern "C" { #include <unistd.h> Loading Loading @@ -94,11 +99,7 @@ ClientMetrics::add_event( auto start_offset = std::chrono::duration<double, std::micro>(start - init_t_); auto end_offset = std::chrono::duration<double, std::micro>(end - init_t_); auto duration = std::chrono::duration<double, std::micro>(end_offset - start_offset); msgpack_data_.total_bytes_ += size; // auto size_mib = size / (1024 * 1024); // in MiB // auto duration_s = duration.count() / 1000; // in seconds // throw away decimals msgpack_data_.start_t_.emplace_back( static_cast<size_t>(start_offset.count())); Loading Loading @@ -133,12 +134,16 @@ ClientMetrics::flush_msgpack() { auto fd = open(flush_path_.c_str(), O_CREAT | O_WRONLY | O_APPEND, 0666); if(fd < 0) { // cout << "error open" << endl; exit(1); std::cerr << "Error opening file to flush client metrics\n"; return; } write(fd, data.data(), data.size()); // auto written = write(fd, data.data(), data.size()); // cout << "written: " << written << endl; size_t written_total = 0; auto size = data.size(); auto buf = data.data(); do { written_total += write(fd, buf + written_total, size - written_total); } while(written_total != size); close(fd); } else { zmq::message_t message(data.size()); Loading @@ -146,7 +151,7 @@ ClientMetrics::flush_msgpack() { memcpy(message.data(), data.data(), data.size()); // non-blocking zmq send if(zmq_flush_socket_->send(message, zmq::send_flags::none) == -1) { std::cerr << "Failed to send zmq message" << std::endl; std::cerr << "Failed to send zmq message\n"; } } reset_metrics(); Loading Loading @@ -195,9 +200,9 @@ ClientMetrics::path(const string& path, const string prefix) { const std::time_t t = std::chrono::system_clock::to_time_t(init_t_); std::stringstream init_t_stream; init_t_stream << std::put_time(std::localtime(&t), "%F_%T"); flush_path_ = path + "/" + prefix + "_" + init_t_stream.str() + "_" + msgpack_data_.hostname_ + "_" + to_string(msgpack_data_.pid_) + ".msgpack"; flush_path_ = fmt::format("{}/{}_{}_{}_{}.msgpack", path, prefix, init_t_stream.str(), msgpack_data_.hostname_, msgpack_data_.pid_); } int ClientMetrics::flush_count() const { Loading Loading
CHANGELOG.md +17 −0 Original line number Diff line number Diff line Loading @@ -8,6 +8,20 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] ### New - Added client-side metrics including the periodic export to a file or ZeroMQ sink via the TCP protocol ([!176](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/176)). - CMake option added to enable this optional feature `-DGKFS_ENABLE_CLIENT_METRICS=ON` - A new CMake option `-DGKFS_BUILD_TOOLS=ON` was added which includes a ZeroMQ server to capture client-side metrics - The `libzmq` and `cppzmq` dependencies are required and can be found in the `default_zmq` profile. - Added new environment variables for the GekkoFS client: - `LIBGKFS_ENABLE_METRICS=ON` enables capturing client-side metrics - `LIBGKFS_METRICS_FLUSH_INTERVAL=10` sets the flush interval to 10 seconds (defaults to 5). All client metrics are flushed when the process ends. - `LIBGKFS_METRICS_PATH=<path>` sets the path to flush client-metrics (defaults to `/tmp/gkfs_client_metrics`). - `LIBGKFS_METRICS_IP_PORT=127.0.0.1:5555` enables flushing to a set ZeroMQ server. This option disables flushing to a file. - Added the dependency profile for MOGON-NHR ([!176](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/176)). - Added UCX and libfabric tcp support ([!176](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/176)). - Added intercepton of `fadvise64()` and `fallocate()` ([!161](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/161)). - Added user library `gkfs_user_lib` that can be used to directly link to an Loading @@ -31,6 +45,9 @@ replicas ([!166](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/141) ### Removed ### Fixed - An issue that updated the last modified time of a file during `stat` operations was fixed([!176](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_request/176)). ## [0.9.2] - 2024-02 ### New Loading
README.md +46 −1 Original line number Diff line number Diff line Loading @@ -310,7 +310,7 @@ instead or in addition to the output file. It must be enabled at compile time vi argument `-DGKFS_ENABLE_PROMETHEUS` and the daemon argument `--enable-prometheus`. The corresponding statistics are then pushed to the Prometheus instance. ## Advanced experimental features ## Advanced and experimental features ### Rename Loading @@ -327,6 +327,51 @@ The user can enable the data replication feature by setting the replication envi The number of replicas should go from `0` to the `number of servers - 1`. The replication environment variable can be set up for each client independently. ### Client metrics via MessagePack and ZeroMQ GekkoFS clients support capturing the I/O traces of each individual process and periodically exporting them to a given file or ZeroMQ sink via the TCP protocol. To use this feature, the corresponding ZeroMQ (`libzmq` and `cppzmq`) dependencies are required which can be found in the `default_zmq` dependency profile. In addition, GekkoFS must be compiled with client metrics enabled (disabled by default) via the CMake argument `-DGKFS_ENABLE_CLIENT_METRICS=ON`. Client metrics are individually enabled per GekkoFS client process via the following environment variables: - `LIBGKFS_ENABLE_METRICS=ON` enables capturing client-side metrics. - `LIBGKFS_METRICS_FLUSH_INTERVAL=10` sets the flush interval to 10 seconds (defaults to 5). All outstanding client metrics are flushed when the process ends. - `LIBGKFS_METRICS_PATH=<path>` sets the path to flush client-metrics (defaults to `/tmp/gkfs_client_metrics`). - `LIBGKFS_METRICS_IP_PORT=127.0.0.1:5555` enables flushing to a set ZeroMQ server. This option disables flushing to a file. The ZeroMQ export can be tested via the `gkfs_clientmetrics2json` application which is built when enabling the CMake option `-DGKFS_BUILD_TOOLS=ON`: - Starting the ZeroMQ server: `gkfs_clientmetrics2json tcp://127.0.0.1:5555` - `gkfs_clientmetrics2json <path>` can also be used to unpack the Messagepack export from a file. Examplarily output with the ZeroMQ sink enabled when running: `LD_PRELOAD=libgkfs_intercept.so LIBGKFS_ENABLE_METRICS=ON LIBGKFS_METRICS_IP_PORT=127.0.0.1:5555 gkfs cp testfile /tmp/gkfs_mountdir/testfile`: ```bash ~ $ gkfs_clientmetrics2json tcp://127.0.0.1:5555 Binding to: tcp://127.0.0.1:5555 Waiting for message... Received message with size 68 Generated JSON: [extra]avg_thruput_mib: [221.93,175.87,266.81,135.69] end_t_micro: [8008,12396,16006,18454] flush_t: 18564 hostname: "evie" io_type: "w" pid: 1259304 req_size: [524288,524288,524288,229502] start_t_micro: [5755,9553,14132,16841] total_bytes: 1802366 total_iops: 4 ``` ## Acknowledgment This software was partially supported by the EC H2020 funded NEXTGenIO project (Project ID: 671951, www.nextgenio.eu). Loading
src/client/preload.cpp +0 −2 Original line number Diff line number Diff line Loading @@ -300,12 +300,10 @@ init_preload() { gkfs::preload::start_interception(); errno = oerrno; #ifdef GKFS_ENABLE_CLIENT_METRICS if(!CTX->init_metrics()) { exit_error_msg(EXIT_FAILURE, "Unable to initialize client metrics. Exiting..."); } #endif } /** Loading
src/common/CMakeLists.txt +2 −0 Original line number Diff line number Diff line Loading @@ -128,5 +128,7 @@ if (GKFS_ENABLE_CLIENT_METRICS) Msgpack::Msgpack cppzmq rpc_utils PRIVATE fmt::fmt ) endif ()
src/common/msgpack_util.cpp +18 −13 Original line number Diff line number Diff line Loading @@ -33,11 +33,16 @@ #include <cstring> #include <iostream> #include <iomanip> #include <memory> #include <mutex> #include <ratio> #include <sstream> #include <thread> #include <chrono> #include <config.hpp> #include <zmq.hpp> #include <fmt/format.h> extern "C" { #include <unistd.h> Loading Loading @@ -94,11 +99,7 @@ ClientMetrics::add_event( auto start_offset = std::chrono::duration<double, std::micro>(start - init_t_); auto end_offset = std::chrono::duration<double, std::micro>(end - init_t_); auto duration = std::chrono::duration<double, std::micro>(end_offset - start_offset); msgpack_data_.total_bytes_ += size; // auto size_mib = size / (1024 * 1024); // in MiB // auto duration_s = duration.count() / 1000; // in seconds // throw away decimals msgpack_data_.start_t_.emplace_back( static_cast<size_t>(start_offset.count())); Loading Loading @@ -133,12 +134,16 @@ ClientMetrics::flush_msgpack() { auto fd = open(flush_path_.c_str(), O_CREAT | O_WRONLY | O_APPEND, 0666); if(fd < 0) { // cout << "error open" << endl; exit(1); std::cerr << "Error opening file to flush client metrics\n"; return; } write(fd, data.data(), data.size()); // auto written = write(fd, data.data(), data.size()); // cout << "written: " << written << endl; size_t written_total = 0; auto size = data.size(); auto buf = data.data(); do { written_total += write(fd, buf + written_total, size - written_total); } while(written_total != size); close(fd); } else { zmq::message_t message(data.size()); Loading @@ -146,7 +151,7 @@ ClientMetrics::flush_msgpack() { memcpy(message.data(), data.data(), data.size()); // non-blocking zmq send if(zmq_flush_socket_->send(message, zmq::send_flags::none) == -1) { std::cerr << "Failed to send zmq message" << std::endl; std::cerr << "Failed to send zmq message\n"; } } reset_metrics(); Loading Loading @@ -195,9 +200,9 @@ ClientMetrics::path(const string& path, const string prefix) { const std::time_t t = std::chrono::system_clock::to_time_t(init_t_); std::stringstream init_t_stream; init_t_stream << std::put_time(std::localtime(&t), "%F_%T"); flush_path_ = path + "/" + prefix + "_" + init_t_stream.str() + "_" + msgpack_data_.hostname_ + "_" + to_string(msgpack_data_.pid_) + ".msgpack"; flush_path_ = fmt::format("{}/{}_{}_{}_{}.msgpack", path, prefix, init_t_stream.str(), msgpack_data_.hostname_, msgpack_data_.pid_); } int ClientMetrics::flush_count() const { Loading