README.md 13.2 KiB
Newer Older
# GekkoFS
Alberto Miranda's avatar
Alberto Miranda committed

[![License: GPL3](https://img.shields.io/badge/License-GPL3-blue.svg)](https://opensource.org/licenses/GPL-3.0)
Alberto Miranda's avatar
Alberto Miranda committed
[![pipeline status](https://storage.bsc.es/gitlab/hpc/gekkofs/badges/master/pipeline.svg)](https://storage.bsc.es/gitlab/hpc/gekkofs/commits/master)
[![coverage report](https://storage.bsc.es/gitlab/hpc/gekkofs/badges/master/coverage.svg)](https://storage.bsc.es/gitlab/hpc/gekkofs/-/commits/master)
Alberto Miranda's avatar
Alberto Miranda committed

Marc Vef's avatar
Marc Vef committed
GekkoFS is a file system capable of aggregating the local I/O capacity and performance of each compute node
in a HPC cluster to produce a high-performance storage space that can be accessed in a distributed manner.
This storage space allows HPC applications and simulations to run in isolation from each other with regards
Marc Vef's avatar
Marc Vef committed
to I/O, which reduces interferences and improves performance.
Marc Vef's avatar
Marc Vef committed

# Dependencies

Marc Vef's avatar
Marc Vef committed
- Upgrade your gcc to version at least 4.8 to get C++11 support
- CMake >3.6 (>3.11 for GekkoFS testing)
Marc Vef's avatar
Marc Vef committed
## Debian/Ubuntu - Dependencies
Marc Vef's avatar
Marc Vef committed
- snappy: `sudo apt-get install libsnappy-dev`
- zlib: `sudo apt-get install zlib1g-dev`
- bzip2: `sudo apt-get install libbz2-dev`
- zstandard: `sudo apt-get install libzstd-dev`
- lz4: `sudo apt-get install liblz4-dev`
- uuid: `sudo apt-get install uuid-dev`
- capstone: `sudo apt-get install libcapstone-dev`
Marc Vef's avatar
Marc Vef committed
### CentOS/Red Hat - Dependencies

Marc Vef's avatar
Marc Vef committed

- snappy: `sudo yum install snappy snappy-devel`
- zlib: `sudo yum install zlib zlib-devel`
- bzip2: `sudo yum install bzip2 bzip2-devel`
Marc Vef's avatar
Marc Vef committed
```bash
   wget https://github.com/facebook/zstd/archive/v1.1.3.tar.gz
   mv v1.1.3.tar.gz zstd-1.1.3.tar.gz
   tar zxvf zstd-1.1.3.tar.gz
   cd zstd-1.1.3
   make && sudo make install
```
Marc Vef's avatar
Marc Vef committed
- lz4: `sudo yum install lz4 lz4-devel`
- uuid: `sudo yum install libuuid-devel`
- capstone: `sudo yum install capstone capstone-devel`

## Clone and compile direct GekkoFS dependencies
Marc Vef's avatar
Marc Vef committed
- Go to the `scripts` folder and first clone all dependencies projects. You can choose the according network (na) plugin
(execute the script for help):
Marc Vef's avatar
Marc Vef committed
usage: dl_dep.sh [-h] [-l] [-n <NAPLUGIN>] [-c <CONFIG>] [-d <DEPENDENCY>]
Marc Vef's avatar
Marc Vef committed

This script gets all GekkoFS dependency sources (excluding the fs itself)
positional arguments:
        source_path              path where the dependency downloads are put
optional arguments:
        -h, --help              shows this help message and exits
Alberto Miranda's avatar
Alberto Miranda committed
        -l, --list-dependencies
Marc Vef's avatar
Marc Vef committed
                                list dependencies available for download with descriptions
        -n <NAPLUGIN>, --na <NAPLUGIN>
Tommaso Tocci's avatar
Tommaso Tocci committed
                                network layer that is used for communication. Valid: {bmi,ofi,all}
Marc Vef's avatar
Marc Vef committed
                                defaults to 'ofi'
        -c <CONFIG>, --config <CONFIG>
                                allows additional configurations, e.g., for specific clusters
                                supported values: {mogon2, mogon1, ngio, direct, all}
                                defaults to 'direct'
Alberto Miranda's avatar
Alberto Miranda committed
        -d <DEPENDENCY>, --dependency <DEPENDENCY>
Marc Vef's avatar
Marc Vef committed
                                download a specific dependency and ignore --config setting. If unspecified
                                all dependencies are built and installed based on set --config setting.
                                Multiple dependencies are supported: Pass a space-separated string (e.g., "ofi mercury"
        -v, --verbose           Increase download verbosity
- Now use the install script to compile them and install them to the desired directory. You can choose the according
Marc Vef's avatar
Marc Vef committed
(na) network plugin (execute the script for help):
Marc Vef's avatar
Marc Vef committed
usage: compile_dep.sh [-h] [-l] [-n <NAPLUGIN>] [-c <CONFIG>] [-d <DEPENDENCY>] [-j <COMPILE_CORES>]
                      source_path install_path
Marc Vef's avatar
Marc Vef committed

This script compiles all GekkoFS dependencies (excluding the fs itself)
positional arguments:
Marc Vef's avatar
Marc Vef committed
    source_path         path to the cloned dependencies path from clone_dep.sh
    install_path    path to the install path of the compiled dependencies
optional arguments:
Alberto Miranda's avatar
Alberto Miranda committed
    -h, --help  shows this help message and exits
    -l, --list-dependencies
                list dependencies available for building and installation
    -n <NAPLUGIN>, --na <NAPLUGIN>
Tommaso Tocci's avatar
Tommaso Tocci committed
                network layer that is used for communication. Valid: {bmi,ofi,all}
Marc Vef's avatar
Marc Vef committed
    -c <CONFIG>, --config <CONFIG>
                allows additional configurations, e.g., for specific clusters
                supported values: {mogon1, mogon2, ngio, direct, all}
                defaults to 'direct'
Alberto Miranda's avatar
Alberto Miranda committed
    -d <DEPENDENCY>, --dependency <DEPENDENCY>
Marc Vef's avatar
Marc Vef committed
                download a specific dependency and ignore --config setting. If unspecified
                all dependencies are built and installed based on set --config setting.
                Multiple dependencies are supported: Pass a space-separated string (e.g., "ofi mercury"
    -j <COMPILE_CORES>, --compilecores <COMPILE_CORES>
Alberto Miranda's avatar
Alberto Miranda committed
                number of cores that are used to compile the dependencies
                defaults to number of available cores
Alberto Miranda's avatar
Alberto Miranda committed
    -t, --test  Perform libraries tests.
## Compile GekkoFS
Marc Vef's avatar
Marc Vef committed

If above dependencies have been installed outside of the usual system paths, use CMake's `-DCMAKE_PREFIX_PATH` to
make this path known to CMake.
Marc Vef's avatar
Marc Vef committed
```bash
mkdir build && cd build
Marc Vef's avatar
Marc Vef committed
cmake -DCMAKE_BUILD_TYPE=Release ..
In order to build self-tests, the *optional* GKFS_BUILD_TESTS CMake option needs
to be enabled when building. Once that is done, tests can be run by running
`make test` in the `build` directory:

```bash
mkdir build && cd build
Marc Vef's avatar
Marc Vef committed
cmake -DGKFS_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
Marc Vef's avatar
Marc Vef committed
**IMPORTANT:** Please note that the testing framework requires Python 3.6 and >CMake 3.11 as
additional dependencies in order to run.
## Run GekkoFS
Marc Vef's avatar
Marc Vef committed
On each node a daemon (`gkfs_daemon` binary) has to be started. Other tools can be used to execute
the binary on many nodes, e.g., `srun`, `mpiexec/mpirun`, `pdsh`, or `pssh`.

You need to decide what Mercury NA plugin you want to use for network communication. `ofi+sockets` is the default.
The `-P` argument is used for setting another RPC protocol. See below.

 - `ofi+sockets` for using the libfabric plugin with TCP (stable)
 - `ofi+tcp` for using the libfabric plugin with TCP (slower than sockets)
 - `ofi+verbs` for using the libfabric plugin with Infiniband verbs (reasonably stable)
 - `ofi+psm2` for using the libfabric plugin with Intel Omni-Path (unstable)
 - `bmi+tcp` for using the bmi plugin with TCP (alternative to libfabric)
### Start and shut down daemon directly
Marc Vef's avatar
Marc Vef committed
`./build/src/daemon/gkfs_daemon -r <fs_data_path> -m <pseudo_mount_dir_path>`

Further options:
```bash
Allowed options:
  -h [ --help ]             Help message
  -m [ --mountdir ] arg     Virtual mounting directory where GekkoFS is
  -r [ --rootdir ] arg      Local data directory where GekkoFS data for this
Alberto Miranda's avatar
Alberto Miranda committed
  -i [ --metadir ] arg      Metadata directory where GekkoFS RocksDB data
                            directory is located. If not set, rootdir is used.
  -l [ --listen ] arg       Address or interface to bind the daemon to.
Marc Vef's avatar
Marc Vef committed
                            Default: local hostname.
                            When used with ofi+verbs the FI_VERBS_IFACE
                            environment variable is set accordingly which
                            associates the verbs device with the network
                            interface. In case FI_VERBS_IFACE is already
Marc Vef's avatar
Marc Vef committed
                            defined, the argument is ignored. Default 'ib'.
  -H [ --hosts-file ] arg   Shared file used by deamons to register their
                            endpoints. (default './gkfs_hosts.txt')
  -P [ --rpc-protocol ] arg Used RPC protocol for inter-node communication.
                            Available: {ofi+sockets, ofi+verbs, ofi+psm2} for
                            TCP, Infiniband, and Omni-Path, respectively.
Marc Vef's avatar
Marc Vef committed
                            (Default ofi+sockets)
                            Libfabric must have enabled support verbs or psm2.
  --auto-sm                 Enables intra-node communication (IPCs) via the
                            `na+sm` (shared memory) protocol, instead of using
Marc Vef's avatar
Marc Vef committed
                            the RPC protocol. (Default off)
  --version                 Print version and exit.
Marc Vef's avatar
Marc Vef committed
```
Marc Vef's avatar
Marc Vef committed
Shut it down by gracefully killing the process (SIGTERM).

## Miscellaneous

Metadata and actual data will be stored at the `<fs_data_path>`. The path where the application works on is set with
Marc Vef's avatar
Marc Vef committed
`<pseudo_mount_dir_path>`.
Tommaso Tocci's avatar
Tommaso Tocci committed
Run the application with the preload library: `LD_PRELOAD=<path>/build/lib/libgkfs_intercept.so ./application`. In the case of
an MPI application use the `{mpirun, mpiexec} -x` argument.
Alberto Miranda's avatar
Alberto Miranda committed
The following environment variables can be used to enable logging in the client
library: `LIBGKFS_LOG=<module>` and `LIBGKFS_LOG_OUTPUT=<path/to/file>` to
configure the output module and set the path to the log file of the client
library. If not path is specified in `LIBGKFS_LOG_OUTPUT`, the client library
will send log messages to `/tmp/gkfs_client.log`.
Alberto Miranda's avatar
Alberto Miranda committed

The following modules are available:

 - `none`: don't print any messages
 - `syscalls`: Trace system calls: print the name of each system call, its
   arguments, and its return value. All system calls are printed after being
   executed save for those that may not return, such as `execve()`,
   `execve_at()`, `exit()`, and `exit_group()`. This module will only be
   available if the client library is built in `Debug` mode.
 - `syscalls_at_entry`: Trace system calls: print the name of each system call
   and its arguments. All system calls are printed before being executed and
   therefore their return values are not available in the log. This module will
   only be available if the client library is built in `Debug` mode.
 - `info`: Print information messages.
 - `critical`: Print critical errors.
 - `errors`: Print errors.
 - `warnings`: Print warnings.
 - `mercury`: Print Mercury messages.
 - `debug`: Print debug messages.  This module will only be available if the
   client library is built in `Debug` mode.
 - `most`: All previous options combined except `syscalls_at_entry`. This
   module will only be available if the client library is built in `Debug`
   mode.
 - `all`: All previous options combined.
 - `trace_reads`: Generate log line with extra information in read operations for guided distributor
Alberto Miranda's avatar
Alberto Miranda committed
 - `help`: Print a help message and exit.

When tracing sytem calls, specific syscalls can be removed from log messages by
setting the `LIBGKFS_LOG_SYSCALL_FILTER` environment variable. For instance,
setting it to `LIBGKFS_LOG_SYSCALL_FILTER=epoll_wait,epoll_create` will filter
out any log entries from the `epoll_wait()` and `epoll_create()` system calls.

Alberto Miranda's avatar
Alberto Miranda committed
Additionally, setting the `LIBGKFS_LOG_OUTPUT_TRUNC` environment variable with
a value different from `0` will instruct the logging subsystem to truncate
Alberto Miranda's avatar
Alberto Miranda committed
the file used for logging, rather than append to it.

For the daemon, the `GKFS_DAEMON_LOG_PATH=<path/to/file>` environment variable
can be provided to set the path to the log file, and the log module can be
Alberto Miranda's avatar
Alberto Miranda committed
selected with the `GKFS_LOG_LEVEL={off,critical,err,warn,info,debug,trace}`
environment variable.
Tommaso Tocci's avatar
Tommaso Tocci committed

### External functions

GekkoFS allows to use external functions on your client code, via LD_PRELOAD. 
Source code needs to be compiled with -fPIC. We include a pfind io500 substitution,
 `examples/gfind/gfind.cpp` and a non-mpi version `examples/gfind/sfind.cpp`
Tommaso Tocci's avatar
Tommaso Tocci committed

### Data distributors
The data distribution can be selected at compilation time, we have 2 distributors available:

## Simple Hash (Default)
Chunks are distributed randomly to the different GekkoFS servers.

## Guided Distributor
Guided distributor distributes chunks using a shared file with the next format:
`<path> <chunk_number> <host>`

Moreover if you prepend a path with #, all the data from that path will go to the same place as the metadata. 
Specifically defined paths (without#) will be prioritary.

i.e.,
#/mdt-hard 0 0 

GekkoFS will store data and metadata to the same server. The server will still be random (0 0 has no meaning, yet).
 
Chunks not specified, are distributed using the Simple Hash distributor.

To generate such file we need to follow a first execution, using the trace_reads log option

This will enable a `TRACE_READS` level log at the clients offering several lines that can be used to generate the input file.
In this stage, each node should generate a separated file this can be done in SLURM using the next line :
`srun -N 10 -n 320 --export="ALL" /bin/bash -c "export LIBGKFS_LOG_OUTPUT=${HOME}/test/GLOBAL.txt;LD_PRELOAD=${GKFS_PRLD} <app>"`

Then, use the `examples/distributors/guided/generate.py` to create the output file.
* `python examples/distributors/guided/generate.py ~/test/GLOBAL.txt >> guided.txt`

This should work if the nodes are sorted in alphabetical order, which is the usual scenario. Users should take care of multi-server configurations.

```
Finally, enable the distributor using the next compilation flags:
* `GKFS_USE_GUIDED_DISTRIBUTION` ON
* `GKFS_USE_GUIDED_DISTRIBUTION_PATH` `<full path to guided.txt>`

Ramon Nou's avatar
Ramon Nou committed
### Acknowledgment
Tommaso Tocci's avatar
Tommaso Tocci committed

This software was partially supported by the EC H2020 funded project NEXTGenIO (Project ID: 671951, www.nextgenio.eu).

Alberto Miranda's avatar
Alberto Miranda committed
This software was partially supported by the ADA-FS project under the SPPEXA project funded by the DFG.