Commit 2730a9ce authored by Marc Vef's avatar Marc Vef
Browse files

Merge branch 'marc/58-support-spack-and-others' into 'master'

Resolve "Support Spack and others"

### Usage information

Download Spack and setup environment:
```bash
git clone -c feature.manyFiles=true https://github.com/spack/spack.git
. spack/share/spack/setup-env.sh
```
Add GekkoFS Spack repository to Spack:
```bash
spack repo add gekkofs/scripts/spack
```
Check that Spack can find GekkoFS:
```
spack info gekkofs
```
Install GekkoFS (and run optional tests). Check `spack info gekkofs` for available option and versions:
```bash
spack install gekkofs
# for installing tests dependencies and running tests
spack install -v --test=root gekkofs +tests
```
Load GekkoFS into environment:
```
spack load gekkofs
```
If you want to use the latest developer branch of GekkoFS:
```
spack install gekkofs@latest
```
The default is using version 0.9.1 the last stable release.

### TODO

- [x] Base Spack functionality, versions, and configuration support
- [x] Documentation
- [x] Advanced functionality, more detailed configuration support, e.g., Parallax and Prometheus
- [x] More easy way to get path to client library
- [x] Add GekkoFS client wrapper for `LD_PRELOAD`
- [ ] Add final version to main Spack repository if possible. (Not possible right now as it not clear how 3rd party libraries should be treated.





















Closes #58

Closes #58

See merge request !137
parents ba81942d 3f59f7ad
Loading
Loading
Loading
Loading
Loading
+2 −0
Original line number Diff line number Diff line
@@ -38,6 +38,8 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
    when building to precisely see how a GekkoFS instance has been configured.
- Added (parallel) append support for consecutive writes with file descriptor opened
  with `O_APPEND` ([!164](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/164)).
- Added support for Spack so that it can be used to install
  GekkoFS ([!137](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/137)).

### Changed

+68 −46
Original line number Diff line number Diff line
@@ -16,12 +16,15 @@ to I/O, which reduces interferences and improves performance.
- Miscellaneous: Libtool, Libconfig

### Debian/Ubuntu

GekkoFS base dependencies: `apt install git curl cmake autoconf automake libtool libconfig-dev`

GekkoFS testing support: `apt install python3-dev python3 python3-venv`

With testing

### CentOS/Red Hat

GekkoFS base dependencies: `yum install gcc-c++ git curl cmake autoconf automake libtool libconfig`

GekkoFS testing support: `python38-devel` (**>Python-3.6 required**)
@@ -42,15 +45,22 @@ GekkoFS testing support: `python38-devel` (**>Python-3.6 required**)
5. Compile GekkoFS and run optional tests
    - Create build directory: `mkdir gekkofs/build && cd gekkofs/build`
    - Configure GekkoFS: `cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=/home/foo/gekkofs_deps/install ..`
       - add `-DCMAKE_INSTALL_PREFIX=<install_path>` where the GekkoFS client library and server executable should be available 
        - add `-DCMAKE_INSTALL_PREFIX=<install_path>` where the GekkoFS client library and server executable should be
          available
        - add `-DGKFS_BUILD_TESTS=ON` if tests should be build
    - Build and install GekkoFS: `make -j8 install`
    - Run tests: `make test`

GekkoFS is now available at:

- GekkoFS daemon (server): `<install_path>/bin/gkfs_daemon`
- GekkoFS client interception library: `<install_path>/lib64/libgkfs_intercept.so`

## Use Spack to install GekkoFS (alternative)

The Spack tool can be used to easily install GekkoFS and its dependencies. Refer to the
following [README](scripts/spack/README.md) for details.

# Run GekkoFS

## General
@@ -70,9 +80,11 @@ The `-P` argument is used for setting another RPC protocol. See below.

## The GekkoFS hostsfile

Each GekkoFS daemon needs to register itself in a shared file (*hostsfile*) which needs to be accessible to _all_ GekkoFS clients and daemons.
Each GekkoFS daemon needs to register itself in a shared file (*hostsfile*) which needs to be accessible to _all_
GekkoFS clients and daemons.
Therefore, the hostsfile describes a file system and which node is part of that specific GekkoFS file system instance.
In a typical cluster environment this hostsfile should be placed within a POSIX-compliant parallel file system, such as GPFS or Lustre.
In a typical cluster environment this hostsfile should be placed within a POSIX-compliant parallel file system, such as
GPFS or Lustre.

*Note: NFS is not strongly consistent and cannot be used for the hosts file!*

@@ -80,7 +92,9 @@ In a typical cluster environment this hostsfile should be placed within a POSIX-

tl;dr example: `<install_path>/bin/gkfs_daemon -r <fs_data_path> -m <pseudo_gkfs_mount_dir_path> -H <hostsfile_path>`

Run the GekkoFS daemon on each node specifying its locally used directory where the file system data and metadata is stored (`-r/--rootdir <fs_data_path>`), e.g., the node-local SSD;
Run the GekkoFS daemon on each node specifying its locally used directory where the file system data and metadata is
stored (`-r/--rootdir <fs_data_path>`), e.g., the node-local SSD;

2. the pseudo mount directory used by clients to access GekkoFS (`-m/--mountdir <pseudo_gkfs_mount_dir_path>`); and
3. the hostsfile path (`-H/--hostsfile <hostfile_path>`).

@@ -235,22 +249,27 @@ Source code needs to be compiled with -fPIC. We include a pfind io500 substituti
`examples/gfind/gfind.cpp` and a non-mpi version `examples/gfind/sfind.cpp`

## Data distributors

The data distribution can be selected at compilation time, we have 2 distributors available:

### Simple Hash (Default)

Chunks are distributed randomly to the different GekkoFS servers.

### Guided Distributor

The guided distributor allows defining a specific distribution of data on a per directory or file basis.
The distribution configurations are defined within a shared file (called `guided_config.txt` henceforth) with the following format:
The distribution configurations are defined within a shared file (called `guided_config.txt` henceforth) with the
following format:
`<path> <chunk_number> <host>`

To enable the distributor, the following CMake compilation flags are required:

* `GKFS_USE_GUIDED_DISTRIBUTION` ON
* `GKFS_USE_GUIDED_DISTRIBUTION_PATH` `<path_guided_config.txt>`

To use a custom distribution, a path needs to have the prefix `#` (e.g., `#/mdt-hard 0 0`), in which all the data of all files in that directory goes to the same place as the metadata.
To use a custom distribution, a path needs to have the prefix `#` (e.g., `#/mdt-hard 0 0`), in which all the data of all
files in that directory goes to the same place as the metadata.
Note, that a chunk/host configuration is inherited to all children files automatically even if not using the prefix.
In this example, `/mdt-hard/file1` is therefore also using the same distribution as the `/mdt-hard` directory.
If no prefix is used, the Simple Hash distributor is used.
@@ -260,12 +279,15 @@ If no prefix is used, the Simple Hash distributor is used.
Creating a guided configuration file is based on an I/O trace file of a previous execution of the application.
For this the `trace_reads` tracing module is used (see above).

The `trace_reads` module enables a `TRACE_READS` level log at the clients writing the I/O information of the client which is used as the input for a script that creates the guided distributor setting.
The `trace_reads` module enables a `TRACE_READS` level log at the clients writing the I/O information of the client
which is used as the input for a script that creates the guided distributor setting.
Note that capturing the necessary trace records can involve performance degradation.
To capture the I/O of each client within a SLURM environment, i.e., enabling the `trace_reads` module and print its output to a user-defined path, the following example can be used:
To capture the I/O of each client within a SLURM environment, i.e., enabling the `trace_reads` module and print its
output to a user-defined path, the following example can be used:
`srun -N 10 -n 320 --export="ALL" /bin/bash -c "export LIBGKFS_LOG=trace_reads;LIBGKFS_LOG_OUTPUT=${HOME}/test/GLOBAL.txt;LD_PRELOAD=${GKFS_PRLD} <app>"`

Then, the `examples/distributors/guided/generate.py` scrpt is used to create the guided distributor configuration file:

* `python examples/distributors/guided/generate.py ~/test/GLOBAL.txt >> guided_config.txt`

Finally, modify `guided_config.txt` to your distribution requirements.
+93 −3
Original line number Diff line number Diff line
@@ -108,9 +108,10 @@ dependencies:
- `AGIOS <https://github.com/francielizanon/agios>`_ (commit c26a654 or
  newer) to enable the :code:`GekkoFWD` I/O forwarding mode.

- `PARALLAX` There are two different metadata backends in GekkoFS. The default one uses `rocksdb`, however an alternative based on `PARALLAX` from `FORTH` 
is available. To enable it, use the `-DGKFS_ENABLE_PARALLAX:BOOL=ON` option, you can also disable `rocksdb` with `-DGKFS_ENABLE_ROCKSDB:BOOL=OFF`.
  Once it is enabled, `--dbbackend` option will be functional.
- :code:`PARALLAX` There are two different metadata backends in GekkoFS. The default one uses :code:`rocksdb`, however
an alternative based on :code:`PARALLAX` from :code:`FORTH` is available. To enable it, use the
:code:`-DGKFS_ENABLE_PARALLAX:BOOL=ON` option, you can also disable :code:`rocksdb` with
:code:`-DGKFS_ENABLE_ROCKSDB:BOOL=OFF`. Once it is enabled, :code:`--dbbackend` option will be functional.


.. _step_by_step_installation:
@@ -213,3 +214,92 @@ appropriate subdirectories of :code:`GKFS_INSTALL_PATH`:

- GekkoFS daemon (server): :code:`${GKFS_INSTALL_PATH}/bin/gkfs_daemon`
- GekkoFS client interception library: :code:`${GKFS_INSTALL_PATH}/lib/libgkfs_intercept.so`

Install GekkoFS via Spack (alternative)
---------------------

Spack is a package manager for supercomputers and Linux. It makes it easy to install scientific software for regular
users. Spack is another method to install GekkoFS where Spack handles all the dependencies and setting up the environment.

Install Spack
=========================

First, install Spack. You can find the instructions here: https://spack.readthedocs.io/en/latest/getting_started.html

    .. code-block:: console

        git clone https://github.com/spack/spack.git
        . spack/share/spack/setup-env.sh

.. attention::
    Note that the second line needs to be executed every time you open a new terminal. It sets up the environment for Spack
and the corresponding environment variables, e.g., $PATH.

Install GekkoFS with Spack
=========================

To install GekkoFS with Spack, the GekkoFS repository needs to be added to Spack as it is not part of the official Spack
repository.

    .. code-block:: console

        spack repo add gekkofs/scripts/spack

When added, the GekkoFS package is available. Its installation variants and options can be checked via:

    .. code-block:: console

        spack info gekkofs

Then install GekkoFS with Spack:

    .. code-block:: console

        spack install gekkofs
        # for installing tests dependencies and running tests
        spack install -v --test=root gekkofs

Finally, GekkoFS is loaded into the currently used environment:

    .. code-block:: console

        spack load gekkofs

This installs the latest release version including its required Git submodules. The installation directory is
:code:`$SPACK_ROOT/opt/spack/linux-<arch>/<compiler>/<version>/gekkofs-<version>`. The GekkoFS daemon (:code:`gkfs_daemon`) is
located in the :code:`bin` directory and the GekkoFS client (:code:`libgkfs_intercept.so`) is located in the :code:`lib` directory.

Note that loading the environment adds the GekkoFS daemon to the `$PATH` environment variable. Therefore, the GekkoFS
daemon is started by running :code:`gkfs_daemon`. Loading GekkoFS in Spack further provides the :code:`$GKFS_CLIENT` environment
variable pointing to the interception library.

Therefore, the following commands can be run to use GekkoFS:

        .. code-block:: console

            # Consult `-h` or the Readme for further options
            gkfs_daemon -r /tmp/gkfs_rootdir -m /tmp/gkfs_mountdir &
            LD_PRELOAD=$GKFS_CLIENT ls -l /tmp/gkfs_mountdir
            LD_PRELOAD=$GKFS_CLIENT touch /tmp/gkfs_mountdir/foo
            LD_PRELOAD=$GKFS_CLIENT ls -l /tmp/gkfs_mountdir

When done using GekkoFS, unload it from the environment:

    .. code-block:: console

        spack unload gekkofs

Miscellaneous
=========================

Use GekkoFS's latest version (master branch) with Spack:

        .. code-block:: console

            spack install gekkofs@latest

Use a specific compiler on your system, e.g., gcc-11.2.0:

        .. code-block:: console

            spack install gekkofs@latest%gcc@11.2.0
 No newline at end of file
+117 −0
Original line number Diff line number Diff line
## Spack

Spack is a package manager for supercomputers and Linux. It makes it easy to install scientific software for regular
users.
Spack is another method to install GekkoFS where Spack handles all the dependencies and setting up the environment.

### Install Spack

First, install Spack. You can find the instructions [here](https://spack.readthedocs.io/en/latest/getting_started.html)

```bash
git clone https://github.com/spack/spack.git
. spack/share/spack/setup-env.sh
```

Note that the second line needs to be executed every time you open a new terminal. It sets up the environment for Spack
and the corresponding environment variables, e.g., $PATH.

### Install GekkoFS with Spack

To install GekkoFS with Spack, the GekkoFS repository needs to be added to Spack as it is not part of the official Spack
repository.

```bash
spack repo add gekkofs/scripts/spack
```

When added, the GekkoFS package is available. Its installation variants and options can be checked via:

```bash
spack info gekkofs
```

Then install GekkoFS with Spack:

```bash
spack install gekkofs
# for installing tests dependencies and running tests
spack install -v --test=root gekkofs
```

Finally, GekkoFS is loaded into the currently used environment:

```bash
spack load gekkofs
```

This installs the latest release version including its required Git submodules. The installation directory is
`$SPACK_ROOT/opt/spack/linux-<arch>/<compiler>/<version>/gekkofs-<version>`. The GekkoFS daemon (`gkfs_daemon`) is
located in the `bin` directory and the GekkoFS client (`libgkfs_intercept.so`) is located in the `lib` directory.

Note that loading the environment adds the GekkoFS daemon to the `$PATH` environment variable. Therefore, the GekkoFS
daemon is started by running `gkfs_daemon`. Loading GekkoFS in Spack further provides the `$GKFS_CLIENT` environment
variable pointing to the interception library.

Therefore, the following commands can be run to use GekkoFS:

```bash
# Consult `-h` or the Readme for further options
gkfs_daemon -r /tmp/gkfs_rootdir -m /tmp/gkfs_mountdir &
LD_PRELOAD=$GKFS_CLIENT ls -l /tmp/gkfs_mountdir
LD_PRELOAD=$GKFS_CLIENT touch /tmp/gkfs_mountdir/foo
LD_PRELOAD=$GKFS_CLIENT ls -l /tmp/gkfs_mountdir
```

When done using GekkoFS, unload it from the environment:

```bash
spack unload gekkofs
```

### Alternative deployment (on many nodes)

`gekkofs/scripts/run/gkfs` provides a script to deploy GekkoFS in a single command on several nodes by using `srun`.
Consult the main [README](../../README.md) or GekkoFS documentation for details.

### Miscellaneous

Use GekkoFS's latest version (master branch) with Spack:

```
spack install gekkofs@latest
```

Use a specific compiler on your system, e.g., gcc-11.2.0:

```bash
spack install gekkofs@latest%gcc@11.2.0
```

#### FAQ

I cannot run the tests because Python is missing? For Spack and GCC, we rely on the system installed versions. If you
are working on a supercomputer, you may need to load the corresponding Python module first:

```bash
# GekkoFS tests require at least Python version 3.6.
module load python/3.9.10
```

Everything is failing during the compilation process? See question above, either a GCC is not loaded or it is too old
and does not support C++17 which we require. In any case, when using Spack it is good practice to use the system
compiler if possible:

```bash
# GekkoFS requires at least GCC version 8
module load gcc/11.2.0
```

This may not be enough for Spack to recognize it (depending on what time Spack is installed). Therefore, you need to add
the compiler to spack via:

```bash
spack compiler find
```

`spack compiler list` should then list the loaded compiler.
 No newline at end of file
+20 −0
Original line number Diff line number Diff line
# Copyright 2013-2021 Lawrence Livermore National Security, LLC and other
# Spack Project Developers. See the top-level COPYRIGHT file for details.
#
# SPDX-License-Identifier: (Apache-2.0 OR MIT)

from spack import *


class Agios(CMakePackage):
    """AGIOS: an I/O request scheduling library at file level."""

    homepage = "https://github.com/francielizanon/agios"
    url      = "https://github.com/jeanbez/agios/archive/refs/tags/v1.0.tar.gz"
    git      = "https://github.com/francielizanon/agios.git"

    version('latest', branch='development')
    version('1.0', sha256='e8383a6ab0180ae8ba9bb2deb1c65d90c00583c3d6e77c70c415de8a98534efd')

    depends_on('libconfig')
Loading