Run the GekkoFS daemon on each node specifying its locally used directory where the file system data and metadata is stored (`-r/--rootdir <fs_data_path>`), e.g., the node-local SSD;
Run the GekkoFS daemon on each node specifying its locally used directory where the file system data and metadata is
stored (`-r/--rootdir <fs_data_path>`), e.g., the node-local SSD;
2. the pseudo mount directory used by clients to access GekkoFS (`-m/--mountdir <pseudo_gkfs_mount_dir_path>`); and
3. the hostsfile path (`-H/--hostsfile <hostfile_path>`).
@@ -235,22 +249,27 @@ Source code needs to be compiled with -fPIC. We include a pfind io500 substituti
`examples/gfind/gfind.cpp` and a non-mpi version `examples/gfind/sfind.cpp`
## Data distributors
The data distribution can be selected at compilation time, we have 2 distributors available:
### Simple Hash (Default)
Chunks are distributed randomly to the different GekkoFS servers.
### Guided Distributor
The guided distributor allows defining a specific distribution of data on a per directory or file basis.
The distribution configurations are defined within a shared file (called `guided_config.txt` henceforth) with the following format:
The distribution configurations are defined within a shared file (called `guided_config.txt` henceforth) with the
following format:
`<path> <chunk_number> <host>`
To enable the distributor, the following CMake compilation flags are required:
To use a custom distribution, a path needs to have the prefix `#` (e.g., `#/mdt-hard 0 0`), in which all the data of all files in that directory goes to the same place as the metadata.
To use a custom distribution, a path needs to have the prefix `#` (e.g., `#/mdt-hard 0 0`), in which all the data of all
files in that directory goes to the same place as the metadata.
Note, that a chunk/host configuration is inherited to all children files automatically even if not using the prefix.
In this example, `/mdt-hard/file1` is therefore also using the same distribution as the `/mdt-hard` directory.
If no prefix is used, the Simple Hash distributor is used.
@@ -260,12 +279,15 @@ If no prefix is used, the Simple Hash distributor is used.
Creating a guided configuration file is based on an I/O trace file of a previous execution of the application.
For this the `trace_reads` tracing module is used (see above).
The `trace_reads` module enables a `TRACE_READS` level log at the clients writing the I/O information of the client which is used as the input for a script that creates the guided distributor setting.
The `trace_reads` module enables a `TRACE_READS` level log at the clients writing the I/O information of the client
which is used as the input for a script that creates the guided distributor setting.
Note that capturing the necessary trace records can involve performance degradation.
To capture the I/O of each client within a SLURM environment, i.e., enabling the `trace_reads` module and print its output to a user-defined path, the following example can be used:
To capture the I/O of each client within a SLURM environment, i.e., enabling the `trace_reads` module and print its
output to a user-defined path, the following example can be used: