Commit d87a4df1 authored by Marc Vef's avatar Marc Vef
Browse files

Merge branch 'rnou/stats_prometheus' into 'master'

Push stats to prometheus

See merge request !132
parents 126a171c 6d30ca4e
Pipeline #2476 canceled with stages
......@@ -61,6 +61,8 @@ gkfs:
-DGKFS_USE_GUIDED_DISTRIBUTION:BOOL=ON
-DGKFS_ENABLE_PARALLAX:BOOL=ON
-DGKFS_ENABLE_ROCKSDB:BOOL=ON
-DGKFS_CHUNK_STATS:BOOL=ON
-DGKFS_ENABLE_PROMETHEUS:BOOL=ON
${CI_PROJECT_DIR}
- make -j$(nproc) install
# reduce artifacts size
......
......@@ -7,8 +7,19 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### New
- Added statistics gathering on daemons ([!132](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/132)).
- Stats output can be enabled with:
- `--enable-collection` collects normal statistics.
- `--enable-chunkstats` collects extended chunk statistics.
- Statistics output to file is controlled by `--output-stats <filename>`
- Added Prometheus support for outputting
statistics ([!132](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/132)):
- Prometheus dependency optional and enabled at compile time with the CMake argument `GKFS_ENABLE_PROMETHEUS`.
- `--enable-prometheus` enables statistics pushing to Prometheus if statistics are enabled.
- `--prometheus-gateway` sets an IP and port for the Prometheus connection.
- Added new experimental metadata backend:
Parallax ([!110](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/110)).
- Added support to use multiple metadata backends.
......
......@@ -195,6 +195,12 @@ if(GKFS_USE_GUIDED_DISTRIBUTION)
message(STATUS "[gekkofs] Guided data distributor input file path: ${GKFS_USE_GUIDED_DISTRIBUTION_PATH}")
endif()
option(GKFS_ENABLE_PROMETHEUS "Enable Prometheus Push " OFF)
if(GKFS_ENABLE_PROMETHEUS)
add_definitions(-DGKFS_ENABLE_PROMETHEUS)
endif ()
message(STATUS "[gekkofs] Prometheus Output: ${GKFS_ENABLE_PROMETHEUS}")
configure_file(include/common/cmake_configure.hpp.in include/common/cmake_configure.hpp)
......
......@@ -109,6 +109,11 @@ Options:
RocksDB is default if not set. Parallax support is experimental.
Note, parallaxdb creates a file called rocksdbx with 8GB created in metadir.
--parallaxsize TEXT parallaxdb - metadata file size in GB (default 8GB), used only with new files
--enable-collection Enables collection of general statistics. Output requires either the --output-stats or --enable-prometheus argument.
--enable-chunkstats Enables collection of data chunk statistics in I/O operations.Output requires either the --output-stats or --enable-prometheus argument.
--output-stats TEXT Creates a thread that outputs the server stats each 10s to the specified file.
--enable-prometheus Enables prometheus output and a corresponding thread.
--prometheus-gateway TEXT Defines the prometheus gateway <ip:port> (Default 127.0.0.1:9091).
--version Print version and exit.
```
......@@ -231,19 +236,30 @@ Then, the `examples/distributors/guided/generate.py` scrpt is used to create the
Finally, modify `guided_config.txt` to your distribution requirements.
### Metadata Backends
There are two different metadata backends in GekkoFS. The default one uses `rocksdb`, however an alternative based on `PARALLAX` from `FORTH`
is available.
To enable it use the `-DGKFS_ENABLE_PARALLAX:BOOL=ON` option, you can also disable `rocksdb` with `-DGKFS_ENABLE_ROCKSDB:BOOL=OFF`.
There are two different metadata backends in GekkoFS. The default one uses `rocksdb`, however an alternative based
on `PARALLAX` from `FORTH`
is available. To enable it use the `-DGKFS_ENABLE_PARALLAX:BOOL=ON` option, you can also disable `rocksdb`
with `-DGKFS_ENABLE_ROCKSDB:BOOL=OFF`.
Once it is enabled, `--dbbackend` option will be functional.
### Statistics
GekkoFS daemons are able to output general operations (`--enable-collection`) and data chunk
statistics (`--enable-chunkstats`) to a specified output file via `--output-stats <FILE>`. Prometheus can also be used
instead or in addition to the output file. It must be enabled at compile time via the CMake
argument `-DGKFS_ENABLE_PROMETHEUS` and the daemon argument `--enable-prometheus`. The corresponding statistics are then
pushed to the Prometheus instance.
### Acknowledgment
This software was partially supported by the EC H2020 funded NEXTGenIO project (Project ID: 671951, www.nextgenio.eu).
This software was partially supported by the ADA-FS project under the SPPEXA project (http://www.sppexa.de/) funded by the DFG.
This software was partially supported by the ADA-FS project under the SPPEXA project (http://www.sppexa.de/) funded by
the DFG.
This software is partially supported by the FIDIUM project funded by the DFG.
This software is partially supported by the ADMIRE project (https://www.admire-eurohpc.eu/) funded by the European Union’s Horizon 2020 JTI-EuroHPC Research and Innovation Programme (Grant 956748).
This software is partially supported by the ADMIRE project (https://www.admire-eurohpc.eu/) funded by the European
Union’s Horizon 2020 JTI-EuroHPC Research and Innovation Programme (Grant 956748).
......@@ -21,7 +21,7 @@ RUN apt-get update && \
python3-dev \
python3-venv \
python3-setuptools \
libnuma-dev libyaml-dev \
libnuma-dev libyaml-dev libcurl4-openssl-dev \
procps && \
python3 -m pip install --upgrade pip && \
rm -rf /var/lib/apt/lists/* && \
......
......@@ -79,6 +79,11 @@ Options:
RocksDB is default if not set. Parallax support is experimental.
Note, parallaxdb creates a file called rocksdbx with 8GB created in metadir.
--parallaxsize TEXT parallaxdb - metadata file size in GB (default 8GB), used only with new files
--enable-collection Enables collection of general statistics. Output requires either the --output-stats or --enable-prometheus argument.
--enable-chunkstats Enables collection of data chunk statistics in I/O operations.Output requires either the --output-stats or --enable-prometheus argument.
--output-stats TEXT Creates a thread that outputs the server stats each 10s to the specified file.
--enable-prometheus Enables prometheus output and a corresponding thread.
--prometheus-gateway TEXT Defines the prometheus gateway <ip:port> (Default 127.0.0.1:9091).
--version Print version and exit.
````
......
/*
Copyright 2018-2022, Barcelona Supercomputing Center (BSC), Spain
Copyright 2015-2022, Johannes Gutenberg Universitaet Mainz, Germany
This software was partially supported by the
EC H2020 funded project NEXTGenIO (Project ID: 671951, www.nextgenio.eu).
This software was partially supported by the
ADA-FS project under the SPPEXA project funded by the DFG.
This file is part of GekkoFS.
GekkoFS is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
GekkoFS is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with GekkoFS. If not, see <https://www.gnu.org/licenses/>.
SPDX-License-Identifier: GPL-3.0-or-later
*/
#ifndef GKFS_COMMON_STATS_HPP
#define GKFS_COMMON_STATS_HPP
#include <cstdint>
#include <unistd.h>
#include <cassert>
#include <map>
#include <set>
#include <vector>
#include <deque>
#include <chrono>
#include <optional>
#include <initializer_list>
#include <thread>
#include <iostream>
#include <iomanip>
#include <fstream>
#include <atomic>
#include <mutex>
#include <config.hpp>
// PROMETHEUS includes
#ifdef GKFS_ENABLE_PROMETHEUS
#include <prometheus/counter.h>
#include <prometheus/summary.h>
#include <prometheus/exposer.h>
#include <prometheus/registry.h>
#include <prometheus/gateway.h>
using namespace prometheus;
#endif
/**
* Provides storage capabilities to provide stats about GekkoFS
* The information is per server.
* We do not provide accurate stats for 1-5-10 minute stats
*
*/
namespace gkfs::utils {
/**
*
* Number of operations (Create, write/ read, remove, mkdir...)
* Size of database (metadata keys, should be not needed, any)
* Size of data (+write - delete)
* Server Bandwidth (write / read operations)
*
* mean, (lifetime of the server)
* 1 minute mean
* 5 minute mean
* 10 minute mean
*
* To provide the stats that we need,
* we need to store the info and the timestamp to calculate it
* A vector should work, with a maximum of elements,
*/
class Stats {
public:
enum class IopsOp {
iops_create,
iops_write,
iops_read,
iops_stats,
iops_dirent,
iops_remove,
}; ///< enum storing IOPS Stats
enum class SizeOp { write_size, read_size }; ///< enum storing Size Stats
private:
constexpr static const std::initializer_list<Stats::IopsOp> all_IopsOp = {
IopsOp::iops_create, IopsOp::iops_write,
IopsOp::iops_read, IopsOp::iops_stats,
IopsOp::iops_dirent, IopsOp::iops_remove}; ///< Enum IOPS iterator
constexpr static const std::initializer_list<Stats::SizeOp> all_SizeOp = {
SizeOp::write_size, SizeOp::read_size}; ///< Enum SIZE iterator
const std::vector<std::string> IopsOp_s = {
"IOPS_CREATE", "IOPS_WRITE", "IOPS_READ",
"IOPS_STATS", "IOPS_DIRENTS", "IOPS_REMOVE"}; ///< Stats Labels
const std::vector<std::string> SizeOp_s = {"WRITE_SIZE",
"READ_SIZE"}; ///< Stats Labels
std::chrono::time_point<std::chrono::steady_clock>
start; ///< When we started the server
std::map<IopsOp, std::atomic<unsigned long>>
iops_mean; ///< Stores total value for global mean
std::map<SizeOp, std::atomic<unsigned long>>
size_mean; ///< Stores total value for global mean
std::mutex time_iops_mutex;
std::mutex size_iops_mutex;
std::map<IopsOp,
std::deque<std::chrono::time_point<std::chrono::steady_clock>>>
time_iops; ///< Stores timestamp when an operation comes removes if
///< first operation if > 10 minutes Different means will
///< be stored and cached 1 minuted
std::map<SizeOp, std::deque<std::pair<
std::chrono::time_point<std::chrono::steady_clock>,
unsigned long long>>>
time_size; ///< For size operations we need to store the timestamp
///< and the size
std::thread t_output; ///< Thread that outputs stats info
bool output_thread_; ///< Enables or disables the output thread
bool enable_prometheus_; ///< Enables or disables the prometheus output
bool enable_chunkstats_; ///< Enables or disables the chunk stats output
bool running =
true; ///< Controls the destruction of the class/stops the thread
/**
* @brief Sends all the stats to the screen
* Debug Function
*
* @param d is the time between output
* @param file_output is the output file
*/
void
output(std::chrono::seconds d, std::string file_output);
std::map<std::pair<std::string, unsigned long long>,
std::atomic<unsigned int>>
chunk_reads; ///< Stores the number of times a chunk/file is read
std::map<std::pair<std::string, unsigned long long>,
std::atomic<unsigned int>>
chunk_writes; ///< Stores the number of times a chunk/file is write
/**
* @brief Called by output to generate CHUNK map
*
* @param output is the output stream
*/
void
output_map(std::ofstream& output);
/**
* @brief Dumps all the means from the stats
* @param of Output stream
*/
void
dump(std::ofstream& of);
// Prometheus Push structs
#ifdef GKFS_ENABLE_PROMETHEUS
std::shared_ptr<Gateway> gateway; ///< Prometheus Gateway
std::shared_ptr<Registry> registry; ///< Prometheus Counters Registry
Family<Counter>* family_counter; ///< Prometheus IOPS counter (managed by
///< Prometheus cpp)
Family<Summary>* family_summary; ///< Prometheus SIZE counter (managed by
///< Prometheus cpp)
std::map<IopsOp, Counter*> iops_prometheus; ///< Prometheus IOPS metrics
std::map<SizeOp, Summary*> size_prometheus; ///< Prometheus SIZE metrics
#endif
public:
/**
* @brief Starts the Stats module and initializes structures
* @param enable_chunkstats Enables or disables the chunk stats
* @param enable_prometheus Enables or disables the prometheus output
* @param filename file where to write the output
* @param prometheus_gateway ip:port to expose the metrics
*/
Stats(bool enable_chunkstats, bool enable_prometheus,
const std::string& filename, const std::string& prometheus_gateway);
/**
* @brief Destroys the class, and any associated thread
*
*/
~Stats();
/**
* @brief Set the up Prometheus gateway and structures
*
* @param gateway_ip ip of the prometheus gateway
* @param gateway_port port of the prometheus gateway
*/
void
setup_Prometheus(const std::string& gateway_ip,
const std::string& gateway_port);
/**
* @brief Adds a new read access to the chunk/path specified
*
* @param path path of the chunk
* @param chunk chunk number
*/
void
add_read(const std::string& path, unsigned long long chunk);
/**
* @brief Adds a new write access to the chunk/path specified
*
* @param path path of the chunk
* @param chunk chunk number
*/
void
add_write(const std::string& path, unsigned long long chunk);
/**
* Add a new value for a IOPS, that does not involve any size
* No value needed as they are simple (1 create, 1 read...)
* Size operations internally call this operation (read,write)
*
* @param IopsOp Which operation to add
*/
void add_value_iops(enum IopsOp);
/**
* @brief Store a new stat point, with a size value.
* If it involves a IO operations it will call the corresponding
* operation
*
* @param SizeOp Which operation we refer
* @param value to store (SizeOp)
*/
void
add_value_size(enum SizeOp, unsigned long long value);
/**
* @brief Get the total mean value of the asked stat
* This can be provided inmediately without cost
* @param IopsOp Which operation to get
* @return mean value
*/
double get_mean(enum IopsOp);
/**
* @brief Get the total mean value of the asked stat
* This can be provided inmediately without cost
* @param SizeOp Which operation to get
* @return mean value
*/
double get_mean(enum SizeOp);
/**
* @brief Get all the means (total, 1,5 and 10 minutes) for a SIZE_OP
* Returns precalculated values if we just calculated them 1 minute ago
* @param SizeOp Which operation to get
*
* @return std::vector< double > with 4 means
*/
std::vector<double> get_four_means(enum SizeOp);
/**
* @brief Get all the means (total, 1,5 and 10 minutes) for a IOPS_OP
* Returns precalculated values if we just calculated them 1 minute ago
* @param IopsOp Which operation to get
*
* @return std::vector< double > with 4 means
*/
std::vector<double> get_four_means(enum IopsOp);
};
} // namespace gkfs::utils
#endif // GKFS_COMMON_STATS_HPP
\ No newline at end of file
......@@ -103,6 +103,11 @@ namespace rocksdb {
constexpr auto use_write_ahead_log = false;
} // namespace rocksdb
namespace stats {
constexpr auto max_stats = 1000000; ///< How many stats will be stored
constexpr auto prometheus_gateway = "127.0.0.1:9091";
} // namespace stats
} // namespace gkfs::config
#endif // GEKKOFS_CONFIG_HPP
......@@ -46,6 +46,11 @@ namespace data {
class ChunkStorage;
}
/* Forward declarations */
namespace utils {
class Stats;
}
namespace daemon {
class FsData {
......@@ -85,6 +90,16 @@ private:
bool link_cnt_state_;
bool blocks_state_;
// Statistics
std::shared_ptr<gkfs::utils::Stats> stats_;
bool enable_stats_ = false;
bool enable_chunkstats_ = false;
bool enable_prometheus_ = false;
std::string stats_file_;
// Prometheus
std::string prometheus_gateway_ = gkfs::config::stats::prometheus_gateway;
public:
static FsData*
getInstance() {
......@@ -209,8 +224,48 @@ public:
void
parallax_size_md(unsigned int size_md);
const std::shared_ptr<gkfs::utils::Stats>&
stats() const;
void
stats(const std::shared_ptr<gkfs::utils::Stats>& stats);
void
close_stats();
bool
enable_stats() const;
void
enable_stats(bool enable_stats);
bool
enable_chunkstats() const;
void
enable_chunkstats(bool enable_chunkstats);
bool
enable_prometheus() const;
void
enable_prometheus(bool enable_prometheus);
const std::string&
stats_file() const;
void
stats_file(const std::string& stats_file);
const std::string&
prometheus_gateway() const;
void
prometheus_gateway(const std::string& prometheus_gateway_);
};
} // namespace daemon
} // namespace gkfs
......
......@@ -43,6 +43,8 @@ wgetdeps=(
["rocksdb"]="6.26.1"
["psm2"]="11.2.185"
["json-c"]="0.15-20200726"
["curl"]="7.82.0"
["prometheus-cpp"]="v1.0.0"
)
# Dependencies that must be cloned
......@@ -69,7 +71,7 @@ clonedeps_patches=(
# Ordering that MUST be followed when downloading
order=(
"lz4" "capstone" "json-c" "psm2" "libfabric" "mercury" "argobots" "margo" "rocksdb" "syscall_intercept" "date"
"agios" "parallax"
"agios" "curl" "prometheus-cpp" "parallax"
)
# Extra arguments passed to the installation script. As such, they can
......
......@@ -39,6 +39,7 @@ comment="Dependencies required by the CI"
wgetdeps=(
["argobots"]="1.1"
["rocksdb"]="6.26.1"
["prometheus-cpp"]="v1.0.0"
)
# Dependencies that must be cloned
......@@ -65,7 +66,7 @@ clonedeps_patches=(
# Ordering that MUST be followed when downloading
order=(
"libfabric" "mercury" "argobots" "margo" "rocksdb" "syscall_intercept"
"date" "agios" "parallax"
"date" "agios" "parallax" "prometheus-cpp"
)
# Extra arguments passed to the installation script. As such, they can
......
################################################################################
# Copyright 2018-2022, Barcelona Supercomputing Center (BSC), Spain #
# Copyright 2015-2022, Johannes Gutenberg Universitaet Mainz, Germany #
# #
# This software was partially supported by the #
# EC H2020 funded project NEXTGenIO (Project ID: 671951, www.nextgenio.eu). #
# #
# This software was partially supported by the #
# ADA-FS project under the SPPEXA project funded by the DFG. #
# #
# This file is part of GekkoFS. #
# #
# GekkoFS is free software: you can redistribute it and/or modify #
# it under the terms of the GNU General Public License as published by #
# the Free Software Foundation, either version 3 of the License, or #
# (at your option) any later version. #
# #
# GekkoFS is distributed in the hope that it will be useful, #
# but WITHOUT ANY WARRANTY; without even the implied warranty of #
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the #
# GNU General Public License for more details. #
# #
# You should have received a copy of the GNU General Public License #
# along with GekkoFS. If not, see <https://www.gnu.org/licenses/>. #
# #
# SPDX-License-Identifier: GPL-3.0-or-later #
################################################################################
# vi: ft=bash
################################################################################
## The installation script must define both a pkg_install function and
## pkg_check function that, as their name implies, must specify how
## a dependency package should be installed and tested. ## ## The following
## variables can be used in the installation script:
## - CMAKE: a variable that expands to the cmake binary
## - SOURCE_DIR: the directory where the sources for the package were
## downloaded
## - INSTALL_DIR: the directory where the package should be installed
## - CORES: the number of cores to use when building
## - COMPILER_NAME: the name of the compiler being used (e.g. g++, clang, etc.)
## - COMPILER_FULL_VERSION: the compiler's full version (e.g. 9.3.0)
## - COMPILER_MAJOR_VERSION: the compiler's major version (e.g. 9)
## - PERFORM_TEST: whether tests for the package should be executed
################################################################################
pkg_install() {
ID="curl"
CURR="${SOURCE_DIR}/${ID}"
cd "${CURR}"
autoreconf -fi
./configure --prefix="${INSTALL_DIR}" --without-ssl
make -j"${CORES}"
make install
}
pkg_check() {
:
}
################################################################################
# Copyright 2018-2022, Barcelona Supercomputing Center (BSC), Spain #
# Copyright 2015-2022, Johannes Gutenberg Universitaet Mainz, Germany #
# #
# This software was partially supported by the #
# EC H2020 funded project NEXTGenIO (Project ID: 671951, www.nextgenio.eu). #
# #
# This software was partially supported by the #
# ADA-FS project under the SPPEXA project funded by the DFG. #
# #
# This file is part of GekkoFS. #
# #
# GekkoFS is free software: you can redistribute it and/or modify #