Verified Commit d2d2671b authored by Marc Vef's avatar Marc Vef
Browse files

Adding documentation for gkfs script

parent 642cdcb3
......@@ -144,16 +144,52 @@ to be empty.
For MPI application, the `LD_PRELOAD` variable can be passed with the `-x` argument for `mpirun/mpiexec`.
## Run GekkoFS daemons on multiple nodes (beta version!)
The `scripts/run/gkfs` script can be used to simplify starting the GekkoFS daemon on one or multiple nodes. To start
GekkoFS on multiple nodes, a Slurm environment that can execute `srun` is required. Users can further
modify `scripts/run/gkfs.conf` to mold default configurations to their environment.
The following options are available for `scripts/run/gkfs`:
```bash
usage: gkfs [-h/--help] [-r/--rootdir <path>] [-m/--mountdir <path>] [-a/--args <daemon_args>] [-f/--foreground <false>]
[--srun <false>] [-n/--numnodes <jobsize>] [--cpuspertask <64>] [--numactl <false>] [-v/--verbose <false>]
{start,stop}
This script simplifies the starting and stopping GekkoFS daemons. If daemons are started on multiple nodes,
a Slurm environment is required. The script looks for the 'gkfs.conf' file in the same directory where
additional permanent configurations can be set.
positional arguments:
command Command to execute: 'start' and 'stop'
optional arguments:
-h, --help Shows this help message and exits
-r, --rootdir <path> Providing the rootdir path for GekkoFS daemons.
-m, --mountdir <path> Providing the mountdir path for GekkoFS daemons.
-a, --args <daemon_arguments>
Add various additional daemon arguments, e.g., "-l ib0 -P ofi+psm2".
-f, --foreground Starts the script in the foreground. Daemons are stopped by pressing 'q'.
--srun Use srun to start daemons on multiple nodes.
-n, --numnodes <n> GekkoFS daemons are started on n nodes.
Nodelist is extracted from Slurm via the SLURM_JOB_ID env variable.
--cpuspertask <#cores> Set the number of cores the daemons can use. Must use '--srun'.
--numactl Use numactl for the daemon. Modify gkfs.conf for further numactl configurations.
-v, --verbose Increase verbosity
```
### Logging
The following environment variables can be used to enable logging in the client
library: `LIBGKFS_LOG=<module>` and `LIBGKFS_LOG_OUTPUT=<path/to/file>` to
configure the output module and set the path to the log file of the client
library. If not path is specified in `LIBGKFS_LOG_OUTPUT`, the client library
will send log messages to `/tmp/gkfs_client.log`.
The following environment variables can be used to enable logging in the client library: `LIBGKFS_LOG=<module>`
and `LIBGKFS_LOG_OUTPUT=<path/to/file>` to configure the output module and set the path to the log file of the client
library. If not path is specified in `LIBGKFS_LOG_OUTPUT`, the client library will send log messages
to `/tmp/gkfs_client.log`.
The following modules are available:
- `none`: don't print any messages
- `none`: don't print any messages
- `syscalls`: Trace system calls: print the name of each system call, its
arguments, and its return value. All system calls are printed after being
executed save for those that may not return, such as `execve()`,
......
......@@ -136,6 +136,42 @@ to be empty.
For MPI applications, the `LD_PRELOAD` and `LIBGKFS_HOSTS_FILE` variables can be passed with the `-x` argument
for `mpirun/mpiexec`.
## Run GekkoFS daemons on multiple nodes (beta version!)
The `scripts/run/gkfs` script can be used to simplify starting the GekkoFS daemon on one or multiple nodes. To start
GekkoFS on multiple nodes, a Slurm environment that can execute `srun` is required. Users can further
modify `scripts/run/gkfs.conf` to mold default configurations to their environment.
The following options are available for `scripts/run/gkfs`:
```bash
usage: gkfs [-h/--help] [-r/--rootdir <path>] [-m/--mountdir <path>] [-a/--args <daemon_args>] [-f/--foreground <false>]
[--srun <false>] [-n/--numnodes <jobsize>] [--cpuspertask <64>] [--numactl <false>] [-v/--verbose <false>]
{start,stop}
This script simplifies the starting and stopping GekkoFS daemons. If daemons are started on multiple nodes,
a Slurm environment is required. The script looks for the 'gkfs.conf' file in the same directory where
additional permanent configurations can be set.
positional arguments:
command Command to execute: 'start' and 'stop'
optional arguments:
-h, --help Shows this help message and exits
-r, --rootdir <path> Providing the rootdir path for GekkoFS daemons.
-m, --mountdir <path> Providing the mountdir path for GekkoFS daemons.
-a, --args <daemon_arguments>
Add various additional daemon arguments, e.g., "-l ib0 -P ofi+psm2".
-f, --foreground Starts the script in the foreground. Daemons are stopped by pressing 'q'.
--srun Use srun to start daemons on multiple nodes.
-n, --numnodes <n> GekkoFS daemons are started on n nodes.
Nodelist is extracted from Slurm via the SLURM_JOB_ID env variable.
--cpuspertask <#cores> Set the number of cores the daemons can use. Must use '--srun'.
--numactl Use numactl for the daemon. Modify gkfs.conf for further numactl configurations.
-v, --verbose Increase verbosity
```
### Logging
#### Client logging
......
#!/bin/bash
# global variables
export FI_PSM2_DISCONNECT=1
export PSM2_MULTI_EP=1
SCRIPTDIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
CONFIGPATH="${SCRIPTDIR}/gkfs.conf"
source "$CONFIGPATH"
VERBOSE=false
NODE_NUM=1
MOUNTDIR=${DAEMON_MOUNTDIR}
ROOTDIR=${DAEMON_ROOTDIR}
HOSTSFILE=${LIBGKFS_HOSTS_FILE}
CPUS_PER_TASK=$(grep -c ^processor /proc/cpuinfo)
ARGS=${DAEMON_ARGS}
USE_SRUN=false
RUN_FOREGROUND=false
#######################################
# Poll GekkoFS hostsfile until all daemons are started.
# Exits with 1 if daemons cannot be started.
# Globals:
# HOSTSFILE
# NODE_NUM
# Arguments:
# None
# Outputs:
# Writes error to stdout
#######################################
wait_for_gkfs_daemons() {
sleep 2
local server_wait_cnt=0
......@@ -35,11 +28,21 @@ wait_for_gkfs_daemons() {
fi
done
}
#######################################
# Creates a pid file for a given pid. If pid file exists, we check if its pids are still valid.
# If valid, an additional line is added. Otherwise, the pid in the file is deleted.
# Globals:
# DAEMON_PID_FILE
# VERBOSE
# Arguments:
# pid to write to pid file
# Outputs:
# Writes status to stdout if VERBOSE is true
#######################################
create_pid_file() {
local pid_file=${DAEMON_PID_FILE}
local pid=${1}
if [[ $VERBOSE == true ]]; then
if [[ ${VERBOSE} == true ]]; then
echo "Creating pid file at ${pid_file} with pid ${pid} ..."
fi
# if PID file exists another daemon could run
......@@ -59,7 +62,25 @@ create_pid_file() {
fi
echo "${pid}" >> "${pid_file}"
}
#######################################
# Starts GekkoFS daemons.
# Globals:
# SLURM_JOB_ID
# NODE_NUM
# MOUNTDIR
# ROOTDIR
# ARGS
# CPUS_PER_TASK
# VERBOSE
# USE_NUMACTL
# DAEMON_CPUNODEBIND
# DAEMON_MEMBIND
# GKFS_DAEMON_LOG_PATH
# GKFS_DAEMON_LOG_LEVEL
# RUN_FOREGROUND
# Outputs:
# Writes status to stdout
#######################################
start_daemon() {
local node_list
local srun_cmd
......@@ -74,21 +95,21 @@ start_daemon() {
srun_cmd="srun --disable-status -N ${NODE_NUM} --ntasks=${NODE_NUM} --ntasks-per-node=1 --overcommit --contiguous --cpus-per-task=${CPUS_PER_TASK} --oversubscribe --mem=0 "
fi
if [[ $VERBOSE == true ]]; then
if [[ ${VERBOSE} == true ]]; then
echo "### mountdir: ${MOUNTDIR}"
echo "### rootdir: ${ROOTDIR}"
echo "### node_num: ${NODE_NUM}"
echo "### args: ${ARGS}"
echo "### cpus_per_task: ${CPUS_PER_TASK}"
fi
if [[ $VERBOSE == true ]]; then
if [[ ${VERBOSE} == true ]]; then
echo "# Cleaning host file ..."
fi
rm "${HOSTSFILE}" 2> /dev/null
# Setting up base daemon cmd
local daemon_cmd="${DAEMON_BIN} -r ${ROOTDIR} -m ${MOUNTDIR} -H ${HOSTSFILE} ${ARGS}"
# Setting up numactl
if [[ ${DAEMON_NUMACTL} == true ]]; then
if [[ ${USE_NUMACTL} == true ]]; then
daemon_cmd="numactl --cpunodebind=${DAEMON_CPUNODEBIND} --membind=${DAEMON_MEMBIND} ${daemon_cmd}"
fi
# final daemon execute command
......@@ -128,19 +149,26 @@ start_daemon() {
create_pid_file ${daemon_pid}
fi
}
#######################################
# Stops GekkoFS daemons for the configured pid file
# Globals:
# DAEMON_PID_FILE
# VERBOSE
# Outputs:
# Writes status to stdout
#######################################
stop_daemons() {
local pid_file=${DAEMON_PID_FILE}
if [[ -e ${pid_file} ]]; then
while IFS= read -r line
do
if ps -p "${line}" > /dev/null; then
if [[ $VERBOSE == true ]]; then
if [[ ${VERBOSE} == true ]]; then
echo "Stopping daemon with pid ${line}"
fi
kill -s SIGINT "${line}" &
# poll pid until it stopped
if [[ $VERBOSE == true ]]; then
if [[ ${VERBOSE} == true ]]; then
echo "Waiting for daemons to exit ..."
fi
timeout 1 tail --pid=${line} -f /dev/null
......@@ -151,19 +179,68 @@ stop_daemons() {
echo "No pid file found -> no daemon running. Exiting ..."
fi
}
#######################################
# Print short usage information
# Outputs:
# Writes help to stdout
#######################################
usage_short() {
echo "
usage: gkfs.sh [-h] [-r/--rootdir <config>] [-m/--mountdir <config>] [-n/--numnodes <jobsize>] [-f/--foreground <false>]
[-a/--args <daemon_args>] [--srun <false>] [-c/--cpuspertask <64>] [-v/--verbose <false>]
usage: gkfs [-h/--help] [-r/--rootdir <path>] [-m/--mountdir <path>] [-a/--args <daemon_args>] [-f/--foreground <false>]
[--srun <false>] [-n/--numnodes <jobsize>] [--cpuspertask <64>] [--numactl <false>] [-v/--verbose <false>]
{start,stop}
"
}
#######################################
# Print detailed usage information
# Outputs:
# Writes help to stdout
#######################################
help_msg() {
usage_short
echo "
This script simplifies the starting and stopping GekkoFS daemons. If daemons are started on multiple nodes,
a Slurm environment is required. The script looks for the 'gkfs.conf' file in the same directory where
additional permanent configurations can be set.
positional arguments:
command Command to execute: 'start' and 'stop'
optional arguments:
-h, --help Shows this help message and exits
-r, --rootdir <path> Providing the rootdir path for GekkoFS daemons.
-m, --mountdir <path> Providing the mountdir path for GekkoFS daemons.
-a, --args <daemon_arguments>
Add various additional daemon arguments, e.g., \"-l ib0 -P ofi+psm2\".
-f, --foreground Starts the script in the foreground. Daemons are stopped by pressing 'q'.
--srun Use srun to start daemons on multiple nodes.
-n, --numnodes <n> GekkoFS daemons are started on n nodes.
Nodelist is extracted from Slurm via the SLURM_JOB_ID env variable.
--cpuspertask <#cores> Set the number of cores the daemons can use. Must use '--srun'.
--numactl Use numactl for the daemon. Modify gkfs.conf for further numactl configurations.
-v, --verbose Increase verbosity
"
}
# global variables
export FI_PSM2_DISCONNECT=1
export PSM2_MULTI_EP=1
SCRIPTDIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
CONFIGPATH="${SCRIPTDIR}/gkfs.conf"
source "$CONFIGPATH"
# more global variables which may be overwritten by user input
VERBOSE=false
NODE_NUM=1
MOUNTDIR=${DAEMON_MOUNTDIR}
ROOTDIR=${DAEMON_ROOTDIR}
HOSTSFILE=${LIBGKFS_HOSTS_FILE}
CPUS_PER_TASK=$(grep -c ^processor /proc/cpuinfo)
ARGS=${DAEMON_ARGS}
USE_SRUN=${USE_SRUN}
RUN_FOREGROUND=false
USE_NUMACTL=${DAEMON_NUMACTL}
# parse input
POSITIONAL=()
while [[ $# -gt 0 ]]; do
......@@ -186,7 +263,7 @@ while [[ $# -gt 0 ]]; do
shift # past value
;;
-a | --args)
ARGS=$2
ARGS="${ARGS} $2"
shift # past argument
shift # past value
;;
......@@ -198,7 +275,11 @@ while [[ $# -gt 0 ]]; do
RUN_FOREGROUND=true
shift # past argument
;;
-c | --cpuspertask)
--numactl)
USE_NUMACTL=true
shift # past argument
;;
--cpuspertask)
CPUS_PER_TASK=$2
shift # past argument
shift # past value
......@@ -226,18 +307,18 @@ if [[ -z ${1+x} ]]; then
exit 1
fi
command="${1}"
# checking input
if [[ ${command} != *"start"* ]] && [[ ${command} != *"stop"* ]]; then
echo "ERROR: command ${command} not supported"
usage_short
exit 1
fi
# Run script
if [[ ${command} == "start" ]]; then
start_daemon
elif [[ ${command} == "stop" ]]; then
stop_daemons
fi
if [[ $VERBOSE == true ]]; then
if [[ ${VERBOSE} == true ]]; then
echo "Nothing left to do. Exiting :)"
fi
\ No newline at end of file
......@@ -3,10 +3,9 @@
# binaries (default for project_dir/build
PRELOAD_LIB=../../build/src/client/libgkfs_intercept.so
DAEMON_BIN=../../build/src/daemon/gkfs_daemon
PROXY_BIN=../../build/src/proxy/gkfs_proxy
# client configuration
LIBGKFS_HOSTS_FILE=../../build/gkfs_hostfile
LIBGKFS_HOSTS_FILE=./gkfs_hostfile
# daemon configuration
DAEMON_ROOTDIR=/dev/shm/gkfs_rootdir
......@@ -14,8 +13,9 @@ DAEMON_MOUNTDIR=/dev/shm/gkfs_mountdir
DAEMON_NUMACTL=false
DAEMON_CPUNODEBIND="1"
DAEMON_MEMBIND="1"
DAEMON_PID_FILE=/dev/shm/gkfs_daemon.pid
DAEMON_PID_FILE=./gkfs_daemon.pid
DAEMON_ARGS=""
USE_SRUN=false
# logging
GKFS_DAEMON_LOG_LEVEL=info
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment