Skip to content
Commit cddedd6f authored by Marc Vef's avatar Marc Vef
Browse files

Merge branch 'marc/294-file-system-expansion-during-runtime' into 'master'

Resolve "File system expansion during runtime"

# Description

GekkoFS supports extending the current daemon configuration to additional compute nodes. This includes redistribution
of the existing data and metadata and therefore scales file system performance and capacity of existing data. Note,
that it is the user's responsibility to not access the GekkoFS file system during redistribution. A corresponding feature
that is transparent to the user is planned. Note also, if the GekkoFS proxy is used, they need to be manually restarted, after expansion.

To enable this feature, the following CMake compilation flags are required to build the `gkfs_malleability` tool: `-DGKFS_BUILD_TOOLS=ON`.
The `gkfs_malleability` tool is then available in the `build/tools` directory. Please consult `-h` for its arguments.
While the tool can be used manually to expand the file system, the `scripts/run/gkfs` script should be used instead which invokes the `gkfs_malleability` tool.

The only requirement for extending the file system is a hostfile containing the hostnames/IPs of the new nodes (one line per host).
Example starting the file system. The `DAEMON_NODELIST` in the `gkfs.conf` is set to a hostfile containing the initial set of file system nodes.:
```bash
~/gekkofs/scripts/run/gkfs -c ~/run/gkfs_verbs_expandtest.conf start
* [gkfs] Starting GekkoFS daemons (4 nodes) ...
* [gkfs] GekkoFS daemons running
* [gkfs] Startup time: 10.853 seconds
```
... Some computation ...

Expanding the file system. Using `-e <hostfile>` to specify the new nodes. Redistribution is done automatically with a progress bar. 
When finished, the file system is ready to use in the new configuration:
```bash
~/gekkofs/scripts/run/gkfs -c ~/run/gkfs_verbs_expandtest.conf -e ~/hostfile_expand expand
* [gkfs] Starting GekkoFS daemons (8 nodes) ...
* [gkfs] GekkoFS daemons running
* [gkfs] Startup time: 1.058 seconds
Expansion process from 4 nodes to 12 nodes launched...
* [gkfs] Expansion progress:
[####################] 0/4 left
* [gkfs] Redistribution process done. Finalizing ...
* [gkfs] Expansion done.
```
Stop the file system:
```bash
~/gekkofs/scripts/run/gkfs -c ~/run/gkfs_verbs_expandtest.conf stop
* [gkfs] Stopping daemon with pid 16462
srun: sending Ctrl-C to StepId=282378.1
* [gkfs] Stopping daemon with pid 16761
srun: sending Ctrl-C to StepId=282378.2
* [gkfs] Shutdown time: 1.032 seconds
```

# Results
IOR results for writing/reading 768 GiB sequentially (192 procs) before and after expansion

![image](/uploads/57bd8f3a07a56c496b1ae0b096da24ef/image.png)

MDTest results for creating, stating, removing, 19200000 (192 procs) before and after expansion

![image](/uploads/7e2f58d864789e657140ced3e9e9716e/image.png)

Closes #294

Closes #294

See merge request !196
parents 318e5c76 49263be8
Pipeline #4753 passed with stages
in 20 minutes and 50 seconds