Commit 944a50e1 authored by Ramon Nou's avatar Ramon Nou
Browse files

Merge remote-tracking branch 'origin/88-alya-create-a-new-data-distributor'...

Merge remote-tracking branch 'origin/88-alya-create-a-new-data-distributor' into 88-alya-create-a-new-data-distributor
parents f225ff54 b702aa4d
Loading
Loading
Loading
Loading
+42 −0
Original line number Diff line number Diff line
@@ -5,6 +5,48 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]
- Created a Guided Distributor using a mapping file to map chunks to specific
  nodes.

## [0.8.0] - 2020-09-15
## New
- Both client library and daemon have been extended to support the ofi+verbs
  protocol.
- A new Python testing harness has been implemented to support integration
  tests. The end goal is to increase the robustness of the code in the mid- to
  long-term.
- The RPC protocol and the usage of shared memory for intra-node communication
  no longer need to be activated on compile time. New arguments
  `-P|--rpc-protocol` and `--auto-sm` have been added to the daemon to this
  effect. This configuration options are propagated to clients when they
  initialize and contact daemons.
- Native support for the Omni-Path network protocol by choosing the `ofi+psm2`
  RPC protocol. Note that this requires `libfabric`'s version to be greater
  than `1.8` as well as `psm2` to be installed in the system. Clients must set
  `FI_PSM2_DISCONNECT=1` to be able to reconnect once the client is shut down
  once.
  *Known limitations:* Client reconnect doesn't always work. Apparently, if
  clients reconnect too fast the servers won't accept the connections. Also,
  currently more than 16 clients per node are not supported.
- A new execution mode called `GekkoFWD` that allows GekkoFS to run as
  a user-level I/O forwarding infrastructure for applications. In this mode,
  I/O operations from an application are intercepted and forwarded to a single
  GekkoFS daemon that is chosen according to a pre-defined distribution. In the
  daemons, the requests are scheduled using the AGIOS scheduling library before
  they are dispatched to the shared backend parallel file system.
- The `fsync()` system call is now fully supported.
## Improved
- Argobots tasks in the daemon are now wrapped in a dedicated class,
  effectively removing the dependency. This lays ground work for future
  non-Argobots I/O implementations.
- The `readdir()` implementation has been refactored and improved.
- Improvements on how to the installation scripts manage dependencies.
## Fixed
- The server sometimes crashed due to uncaught system errors in the storage
  backend. This has now been fixed.
- Fixed a bug that broke `ls` on some architectures.
- Fixed a bug that leaked internal errors from the interception library to
  client applications via `errno` propagation.

## [0.7.0] - 2020-02-05
## Added
+6 −9
Original line number Diff line number Diff line
@@ -2,7 +2,7 @@ cmake_minimum_required(VERSION 3.6)

project(
    GekkoFS
    VERSION 0.7.0
    VERSION 0.8.0
)

enable_testing()
@@ -141,15 +141,12 @@ add_definitions(-DLIBGKFS_LOG_MESSAGE_SIZE=${CLIENT_LOG_MESSAGE_SIZE})
message(STATUS "[gekkofs] Maximum log message size in the client library: ${CLIENT_LOG_MESSAGE_SIZE}")
mark_as_advanced(CLIENT_LOG_MESSAGE_SIZE)

option(USE_GUIDED "Use guided data distributor " OFF)
message(STATUS "[gekkofs] Guided data distributor: ${USE_GUIDED}")
option(GKFS_USE_GUIDED_DISTRIBUTION "Use guided data distributor " OFF)
message(STATUS "[gekkofs] Guided data distributor: ${GKFS_USE_GUIDED_DISTRIBUTION}")

set(USE_GUIDED_PATH "~/guided.txt" CACHE STRING "File Path for guided distributor")
set_property(CACHE USE_GUIDED_PATH PROPERTY STRINGS)
message(STATUS "[gekkofs] Guided data distributor input file path: ${USE_GUIDED_PATH}")

option(TRACE_GUIDED "Output at INFO level information for guided distributor generation: " OFF)
message(STATUS "[gekkofs] Generate log line at INFO level for guided distributor: ${TRACE_GUIDED}")
set(GKFS_USE_GUIDED_DISTRIBUTION_PATH "guided.txt" CACHE STRING "File Path for guided distributor")
set_property(CACHE GKFS_USE_GUIDED_DISTRIBUTION_PATH PROPERTY STRINGS)
message(STATUS "[gekkofs] Guided data distributor input file path: ${GKFS_USE_GUIDED_DISTRIBUTION_PATH}")

configure_file(include/global/cmake_configure.hpp.in include/global/cmake_configure.hpp)

+33 −31
Original line number Diff line number Diff line
# GekkoFS

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![pipeline status](https://storage.bsc.es/gitlab/hpc/gekkofs/badges/master/pipeline.svg)](https://storage.bsc.es/gitlab/hpc/gekkofs/commits/master)

GekkoFS is a file system capable of aggregating the local I/O capacity and performance of each compute node
in a HPC cluster to produce a high-performance storage space that can be accessed in a distributed manner.
This storage space allows HPC applications and simulations to run in isolation from each other with regards
@@ -161,7 +165,7 @@ Allowed options:
                            available.
  -r [ --rootdir ] arg      Local data directory where GekkoFS data for this
                            daemon is stored.
  -i [ --metadir ] arg      Metadata directory where GekkoFS' RocksDB data 
  -i [ --metadir ] arg      Metadata directory where GekkoFS RocksDB data
                            directory is located. If not set, rootdir is used.
  -l [ --listen ] arg       Address or interface to bind the daemon to.
                            Default: local hostname.
@@ -223,6 +227,7 @@ The following modules are available:
   module will only be available if the client library is built in `Debug`
   mode.
 - `all`: All previous options combined.
 - `traces_reads`: Generate log line with extra information in read operations for guided distributor
 - `help`: Print a help message and exit.

When tracing sytem calls, specific syscalls can be removed from log messages by
@@ -251,24 +256,21 @@ Guided distributor distributes chunks using a shared file with the next format:

Chunks not specified, are distributed using the Simple Hash distributor.

To generate such file we need to follow a first execution, using the next compilation options:
* `TRACE_GUIDED` ON
* `USE_GUIDED` OFF
To generate such file we need to follow a first execution, using the trace_reads log option

This will enable a `INFO` level log at the clients offering several lines that can be used to generate the input file. 
This will enable a `TRACE_READS` level log at the clients offering several lines that can be used to generate the input file.
In this stage, each node should generate a separated file this can be done in SLURM using the next line :
`srun -N 10 -n 320 --export="ALL" /bin/bash -c "export LIBGKFS_LOG_OUTPUT=${HOME}/test/GLOBAL.txt;LD_PRELOAD=${GKFS_PRLD} <app>"`

Then, use the `utils/generate.py` to create the output file. 
* `python utils/generate.py ~/test/GLOBAL.txt >> guided.txt`
Then, use the `examples/distributors/guided/generate.py` to create the output file.
* `python examples/distributors/guided/generate.py ~/test/GLOBAL.txt >> guided.txt`

This should work if the nodes are sorted in alphabetical order, which is the usual scenario.
This should work if the nodes are sorted in alphabetical order, which is the usual scenario. Users should take care of multi-server configurations.

```
Finally, enable the distributor using the next compilation flags:
* `TRACE_GUIDED` OFF
* `USE_GUIDED` ON
* `USE_GUIDED_PATH` `<path to guided.txt>`
* `GKFS_USE_GUIDED_DISTRIBUTION` ON
* `GKFS_USE_GUIDED_DISTRIBUTION_PATH` `<full path to guided.txt>`



+40 −0
Original line number Diff line number Diff line
###
#  Copyright 2018-2020, Barcelona Supercomputing Center (BSC), Spain
#  Copyright 2015-2020, Johannes Gutenberg Universitaet Mainz, Germany

#  This software was partially supported by the
#  EC H2020 funded project NEXTGenIO (Project ID: 671951, www.nextgenio.eu).

#  This software was partially supported by the
#  ADA-FS project under the SPPEXA project funded by the DFG.

#  SPDX-License-Identifier: MIT
###

import re
import sys
import collections

file = sys.argv[1]

pattern = re.compile(r".+(read )(.*)( host: )(\d+).+(path: )(.+),.+(chunk_start: )(\d+).+(chunk_end: )(\d+)")

d = collections.OrderedDict()

with open(file) as f:
    for line in f:
        result = pattern.match(line)
        if result:
            d[result[2]] = 1
keys = sorted(d.keys())
i = 0
for key in keys:
    d[key] = i
    i = i + 1

with open(file) as f:
    for line in f:
        result = pattern.match(line)
        if result:
            for i in range(int(result[8]), int(result[10])+1):
                print (result[6], i, d[result[2]])
+64 −53
Original line number Diff line number Diff line
@@ -32,7 +32,7 @@
namespace gkfs {
namespace log {

enum class log_level : short {
enum class log_level : unsigned int {
    print_syscalls       = 1 << 0,
    print_syscalls_entry = 1 << 1,
    print_info           = 1 << 2,
@@ -42,6 +42,7 @@ enum class log_level : short {
    print_hermes         = 1 << 6,
    print_mercury        = 1 << 7,
    print_debug          = 1 << 8,
    print_trace_reads    = 1 << 9,

    // for internal use
    print_none           = 0,
@@ -105,6 +106,7 @@ static const auto constexpr warning = log_level::print_warnings;
static const auto constexpr hermes           = log_level::print_hermes;
static const auto constexpr mercury          = log_level::print_mercury;
static const auto constexpr debug            = log_level::print_debug;
static const auto constexpr trace_reads      = log_level::print_trace_reads;
static const auto constexpr none             = log_level::print_none;
static const auto constexpr most             = log_level::print_most;
static const auto constexpr all              = log_level::print_all;
@@ -120,7 +122,8 @@ static const auto constexpr level_names =
        "warning",
        "hermes",
        "mercury",
        "debug"
        "debug",
        "trace_reads"
);

inline constexpr auto
@@ -478,6 +481,7 @@ static_buffer::grow(std::size_t size) {
#define LOG_MERCURY(...) do {} while(0);
#define LOG_SYSCALL(...) do {} while(0);
#define LOG_DEBUG(...) do {} while(0);
#define LOG_TRACE_READS(...) do {} while(0);

#else // !GKFS_ENABLE_LOGGING

@@ -523,6 +527,13 @@ static_buffer::grow(std::size_t size) {
    }                                                                   \
} while(0);

#define LOG_TRACE_READS(...) do {                                             \
    if(gkfs::log::get_global_logger()) {                                \
        gkfs::log::get_global_logger()->log(                            \
                gkfs::log::trace_reads, __func__, __LINE__, __VA_ARGS__);     \
    }                                                                   \
} while(0);

#ifdef GKFS_DEBUG_BUILD

#define LOG_SYSCALL(...) do {                                           \
Loading