Commit ef197ba9 authored by Ramon Nou's avatar Ramon Nou
Browse files

first shrink attempt

parent 1c6936bd
Loading
Loading
Loading
Loading
Loading
+73 −0
Original line number Diff line number Diff line
@@ -33,6 +33,7 @@ to I/O, which reduces interferences and improves performance.
  - [Server-side statistics via Prometheus](#server-side-statistics-via-prometheus)
  - [GekkoFS proxy](#gekkofs-proxy)
  - [File system expansion](#file-system-expansion)
  - [File system shrinking](#file-system-shrinking)
- [Miscellaneous](#miscellaneous)
  - [External functions](#external-functions)
  - [Data placement](#data-placement)
@@ -508,6 +509,78 @@ srun: sending Ctrl-C to StepId=282378.2
* [gkfs] Shutdown time: 1.032 seconds
```

## File system shrinking

GekkoFS supports **shrinking** the current daemon configuration, removing one or more nodes from the cluster while
safely redistributing all existing data and metadata to the remaining nodes. As with expansion, it is the user's
responsibility not to access the file system during redistribution.

The same `gkfs_malleability` tool (built with `-DGKFS_BUILD_TOOLS=ON`) is used. Shrinking requires two hostfiles:

| File | Description |
|---|---|
| `gkfs_hosts.txt` | Current (old) hostfile — set via `LIBGKFS_HOSTS_FILE` |
| `gkfs_hosts_new.txt` | New hostfile listing **only** the surviving nodes |

### Step-by-step

**1. Create the new hostfile** containing only the nodes that should remain after shrink.
The format is identical to `gkfs_hosts.txt`. The order does not matter — any nodes present in the old file
but absent from the new file will be removed.

```bash
# Example: remove node4, keep node1–node3
grep -v node4 gkfs_hosts.txt > gkfs_hosts_new.txt
```

**2. Start the shrink** process. Each surviving node redistributes the data that was owned by the removed nodes,
and the removed nodes forward all their data before stopping:

```bash
LIBGKFS_HOSTS_FILE=gkfs_hosts.txt \
  gkfs_malleability shrink --new-hosts-file gkfs_hosts_new.txt start
Shrink process from 4 nodes to 3 nodes launched...
```

The old and new node counts are **auto-detected** from the respective hostfiles. They can be overridden with
`--old-nodes <N>` and `--new-nodes <N>` if needed.

**3. Poll status** until all nodes have finished:

```bash
LIBGKFS_HOSTS_FILE=gkfs_hosts.txt gkfs_malleability shrink status
No shrink running/finished.
```

When active: `Shrink in progress: 2 nodes not finished.`

**4. Finalize** the shrink. This disables maintenance mode on all remaining daemons and atomically replaces
`gkfs_hosts.txt` with `gkfs_hosts_new.txt`:

```bash
LIBGKFS_HOSTS_FILE=gkfs_hosts.txt \
  gkfs_malleability shrink --new-hosts-file gkfs_hosts_new.txt finalize
Shrink finalize 0
Hosts file updated: gkfs_hosts_new.txt -> gkfs_hosts.txt
```

After finalize, `gkfs_hosts.txt` contains only the surviving nodes and all clients automatically use the
updated configuration on their next initialization.

**5. Shut down the removed daemons** (they have already forwarded all data but are still running):

```bash
# Send SIGTERM to each daemon on the removed nodes
pdsh -w node4 'kill $(cat /tmp/gkfs_daemon.pid)'
```

### Environment variables

| Variable | Description |
|---|---|
| `LIBGKFS_HOSTS_FILE` | Path to the **current** (old) hosts file |
| `LIBGKFS_HOSTS_FILE_NEW` | Alternative to `--new-hosts-file` for the new hosts file |

# Miscellaneous

## External functions
+12 −0
Original line number Diff line number Diff line
@@ -37,6 +37,8 @@
  SPDX-License-Identifier: LGPL-3.0-or-later
*/

#include <string>

#ifndef GEKKOFS_CLIENT_FORWARD_MALLEABILITY_HPP
#define GEKKOFS_CLIENT_FORWARD_MALLEABILITY_HPP

@@ -50,6 +52,16 @@ forward_expand_status();

int
forward_expand_finalize();

int
forward_shrink_start(int old_server_conf, int new_server_conf,
                     const std::string& new_hosts_file);

int
forward_shrink_status();

int
forward_shrink_finalize();
} // namespace gkfs::malleable::rpc

#endif // GEKKOFS_CLIENT_FORWARD_MALLEABILITY_HPP
+26 −0
Original line number Diff line number Diff line
@@ -179,6 +179,32 @@ expand_status();
 */
int
expand_finalize();

/**
 * @brief Start a shrinking of the file system
 * @param old_server_conf old number of nodes
 * @param new_server_conf new number of nodes
 * @param new_hosts_file path to hostfile containing only the surviving nodes
 * @return error code
 */
int
shrink_start(int old_server_conf, int new_server_conf,
             const std::string& new_hosts_file);

/**
 * @brief Check for the current status of the shrinking process
 * @return 0 when finished, positive numbers indicate how many daemons
 * are still redistributing data
 */
int
shrink_status();

/**
 * @brief Finalize the shrinking process
 * @return error code
 */
int
shrink_finalize();
} // namespace malleable
} // namespace gkfs

+3 −0
Original line number Diff line number Diff line
@@ -129,6 +129,9 @@ namespace malleable::rpc::tag {
constexpr auto expand_start = "rpc_srv_expand_start";
constexpr auto expand_status = "rpc_srv_expand_status";
constexpr auto expand_finalize = "rpc_srv_expand_finalize";
constexpr auto shrink_start = "rpc_srv_shrink_start";
constexpr auto shrink_status = "rpc_srv_shrink_status";
constexpr auto shrink_finalize = "rpc_srv_shrink_finalize";
// migrate data uses the write rpc
constexpr auto migrate_metadata = "rpc_srv_migrate_metadata";
} // namespace malleable::rpc::tag
+11 −0
Original line number Diff line number Diff line
@@ -433,6 +433,17 @@ struct rpc_expand_start_in_t {
    }
};

struct rpc_shrink_start_in_t {
    uint32_t old_server_conf;
    uint32_t new_server_conf;
    std::string new_hosts_file;
    template <class Archive>
    void
    serialize(Archive& ar) {
        ar(old_server_conf, new_server_conf, new_hosts_file);
    }
};

struct rpc_migrate_metadata_in_t {
    std::string key;
    std::string value;
Loading