Skip to content
README.md 10.4 KiB
Newer Older
Alberto Miranda's avatar
Alberto Miranda committed
# Slurm Docker Cluster

This is a multi-container Slurm cluster using  `docker-compose` with `sshd` and `systemd` enabled.
The compose file creates named volumes for persistent storage of MySQL data files as well as
Slurm state and log directories. It is heavily based on work by [giovtorres/docker-slurm-cluster](
https://github.com/giovtorres/slurm-docker-cluster).

## Containers, Networks, and Volumes

The compose file will run the following containers:

* `mysql`
* `slurmdbd`
* `slurmctld`
* `login` (slurmd)
* `c1`, `c2`, `c3`. `c4` (slurmd)

The compose file will create the following named volumes:

* `etc_munge`         ( -> `/etc/munge`     )
* `slurm_jobdir`      ( -> `/data`          )
* `var_lib_mysql`     ( -> `/var/lib/mysql` )

The compose file will create the `slurm_cluster` network for all containers and will assign the
following IPv4 static addresses:

Alberto Miranda's avatar
Alberto Miranda committed
* `slurmctld`: 192.18.0.129
* `c1`: 192.18.0.10
* `c2`: 192.18.0.11
* `c3`: 192.18.0.12
* `c4`: 192.18.0.13
* `login`: 192.18.0.128
Alberto Miranda's avatar
Alberto Miranda committed


## Package contents

- `docker-compose.yml`: docker-compose file for running the cluster
- `slurm-docker-cluster/Dockerfile`: dockerfile for building the main cluster services.
- `slurm-docker-cluster-node/Dockerfile`: dockerfile with specific software for
  the compute nodes. Specific for scord. NEEDS TO BE BUILT BEFORE RUNNING THE
  CLUSTER.
Alberto Miranda's avatar
Alberto Miranda committed
- `scripts/register_cluster.sh`: script for registering the cluster with the `slurmdbd` daemon.
- `scripts/refresh.sh`: script for refreshing the scord installation in the cluster.
Alberto Miranda's avatar
Alberto Miranda committed
  This script uses the `slurm-docker-cluster-node` image to generate the
  binaries so that there are no compatibility issues with dependencies.

  The script relies on the following variables:
    - `REPO`: The repository where the `scord` source code is located.
    - `VOLUMES`: The host directory where the output of the build process will
      be placed.
    - `USER`: The container user that should be used to run the build
      process (so that ownership matches with the host and container user).
      The `scord` build process relies on a CMake Preset for Rocky Linux that
      has been configured to match the container environment:

        ```json
          {
              "name": "rocky",
              "displayName": "Rocky Linux",
              "description": "Build options for Rocky Linux",
              "inherits": "base",
              "environment" : {
                "PKG_CONFIG_PATH": "/usr/lib/pkgconfig;/usr/lib64/pkgconfig"
              },
              "generator": "Unix Makefiles",
              "cacheVariables": {
                "CMAKE_CXX_COMPILER_LAUNCHER": "",
                "CMAKE_C_COMPILER_LAUNCHER": "",
                "CMAKE_CXX_FLAGS": "-fdiagnostics-color=always",
                "CMAKE_C_FLAGS": "-fdiagnostics-color=always",
                "CMAKE_PREFIX_PATH": "/usr/lib;/usr/lib64",
                "CMAKE_INSTALL_PREFIX": "/scord_prefix",
                "SCORD_BUILD_EXAMPLES": true,
                "SCORD_BUILD_TESTS": true,
                "SCORD_BIND_ADDRESS": "192.18.0.128"
              }
          }
        ```

- `volumes`: directory for the volumes used by the cluster:
    - `etc_munge`: munge configuration files. A shared `munge.key` needs to be
      generated and placed here.
    - `etc_slurm`: slurm configuration files. At least a `slurm.conf` file needs
      to be placed here. The `slurm.conf` file should be configured with the
      compute node and partition information. For example:
        ```conf
          # COMPUTE NODES
          NodeName=c[1-4] RealMemory=1000 State=UNKNOWN

          # PARTITIONS
          PartitionName=normal Default=yes Nodes=c[1-4] Priority=50 DefMemPerCPU=500 Shared=NO MaxNodes=4 MaxTime=5-00:00:00 DefaultTime=5-00:00:00 State=UP
        ```
    - `etc_ssh`: ssh configuration files. Server keys and configuration files
      should be placed here.
    - `ld.so.conf.d`: ld.so configuration files.
    - `scord_prefix`: scord installation directory. The scord installation
      should
      be placed here. This should match with the directory outside the
      container where we are generating the binaries.
    - `user_home`: user home directory. Any files and directories that we want to
      have available in all compute nodes (e.g. `.ssh`), should be added here.
    - `docker-entrypoint.sh`: Overridden entry point

## Build arguments

The following build arguments are available:

* `SLURM_TAG`: The Slurm Git tag to build. Defaults to `slurm-21-08-6-1`.
* `GOSU_VERSION`: The gosu version to install. Defaults to `1.11`.
* `SHARED_USER_NAME`: The name of the user that will be shared with the cluster. Defaults to `user`.
* `SHARED_USER_UID`: The UID of the user that will be shared with the cluster. Defaults to `1000`.
* `SHARED_GROUP_NAME`: The name of the group that will be shared with the cluster. Defaults to `user`.
* `SHARED_GROUP_GID`: The GID of the group that will be shared with the cluster. Defaults to `1000`.


## Configuration

To run, the cluster services expect some files to be present in the host system. The simpler way to do this is to
provide the files in the `volumes` directory with the correct ownership and permissions so that they can be
mounted in the containers. The `volumes` directory should be placed in the same directory as the `docker-compose.yml`
file. The `volumes` directory should have the following structure:

```bash
volumes/
├── docker-entrypoint.sh         -> /usr/local/bin/docker-entrypoint.sh
├── etc_munge                    -> /etc/munge
├── etc_slurm                    -> /etc/slurm
├── etc_ssh                      -> /etc/ssh
├── ld.so.conf.d                 -> /etc/ld.so.conf.d
└── user_home                    -> /home/$SHARED_USER_NAME
Alberto Miranda's avatar
Alberto Miranda committed
```

Alberto Miranda's avatar
Alberto Miranda committed
The following ownership and permissions should be set for the cluster to work properly.
The `slurm` and `munge` users are not actually required to exist in the host system
as they are created automatically whie building the images, though it helps to actually
create them rather than having a weird number pop up each time `ls` is called.
Note however, that if created in the host, the `slurm` and `munge` users/groups
need to have the same UIDs/GIDs in the host and container systems.

```bash
volumes
├── [-rwxrwxr-x example-user example-user 1.9K Jun 29 16:30]  docker-entrypoint.sh
├── [drwxrwxr-x munge        munge    4.0K Jun 17 09:11]  etc_munge
│   └── [-r-------- munge    munge    1.0K Jun 17 09:11]  munge.key
├── [drwxrwxr-x slurm    slurm    4.0K Jul  4 09:49]  etc_slurm
│   ├── [-rw-r--r-- slurm    slurm     216 Jun 16 15:48]  cgroup.conf.example
│   ├── [-rw-r--r-- slurm    slurm     213 Jun 30 14:28]  plugstack.conf
│   ├── [drwxrwxr-x slurm    slurm    4.0K Jun 16 16:13]  plugstack.conf.d
│   ├── [-rw-r--r-- slurm    slurm    2.2K Jun 23 15:24]  slurm.conf
│   ├── [-rw-r--r-- slurm    slurm    3.0K Jun 16 15:48]  slurm.conf.example
│   ├── [-rw------- slurm    slurm     722 Jun 16 15:48]  slurmdbd.conf
│   └── [-rw-r--r-- slurm    slurm     745 Jun 16 15:48]  slurmdbd.conf.example
├── [drwxrwxr-x example-user example-user 4.0K Jun 29 12:46]  etc_ssh
│   ├── [-rw------- root     root     3.6K May  9 19:14]  sshd_config
│   ├── [drwx------ root     root     4.0K Jun 29 12:46]  sshd_config.d [error opening dir]
│   ├── [-rw------- root     root     1.4K Jun 29 11:17]  ssh_host_dsa_key
│   ├── [-rw-r--r-- root     root      600 Jun 29 11:17]  ssh_host_dsa_key.pub
│   ├── [-rw------- root     root      505 Jun 29 11:26]  ssh_host_ecdsa_key
│   ├── [-rw-r--r-- root     root      172 Jun 29 11:26]  ssh_host_ecdsa_key.pub
│   ├── [-rw------- root     root      399 Jun 29 11:26]  ssh_host_ed25519_key
│   ├── [-rw-r--r-- root     root       92 Jun 29 11:26]  ssh_host_ed25519_key.pub
│   ├── [-rw------- root     root     2.5K Jun 29 11:26]  ssh_host_rsa_key
│   └── [-rw-r--r-- root     root      564 Jun 29 11:26]  ssh_host_rsa_key.pub
├── [drwxrwxr-x example-user example-user 4.0K Jun 19 10:46]  ld.so.conf.d
├── [drwxrwxr-x example-user example-user 4.0K Jun 20 11:20]  scord_prefix
└── [drwxr-xr-x example-user example-user 4.0K Jul  7 08:27]  user_home

42 directories, 149 files
```
Alberto Miranda's avatar
Alberto Miranda committed

Alberto Miranda's avatar
Alberto Miranda committed
## Optional configurations

### Cluster registration

Though it's not required for the cluster to work properly, the newly created
cluster can be registered with the internal `slurmdbd` daemon. To do so, run the
`scripts/register_cluster.sh` script:

```console
scripts/register_cluster.sh
```

### Enabling name resolution

Though the cluster internally will be able to properly resolve the names of each service,
the host will be unable to do so. The simplest solution is to edit the `/etc/hosts` file
and add entries for the services which have static IPv4 addresses assigned:

```
192.18.0.128	login
192.18.0.129	slurmctld
192.18.0.10	c1
192.18.0.11	c2
192.18.0.12	c3
192.18.0.13	c4
```

Alberto Miranda's avatar
Alberto Miranda committed
## Usage

Alberto Miranda's avatar
Alberto Miranda committed
1. Find out the UID and GID of the host user that will be shared with the
   cluster. This can be done by running `id` in the host machine.
2. Build the services, making sure sure to set the `SHARED_USER_NAME`,
   `SHARED_USER_UID`, `SHARED_GROUP_NAME`, and `SHARED_GROUP_GID` build arguments to the
   values obtained in Step 1:
   ```shell
   $ docker compose build \
       --build-arg SHARED_USER_NAME=example-user \
       --build-arg SHARED_USER_UID=1000 \
       --build-arg SHARED_GROUP_NAME=example-user \
       --build-arg=SHARED_GROUP_GID=1000
   ```
Alberto Miranda's avatar
Alberto Miranda committed
3. Start the cluster with `docker compose up -d`.
Alberto Miranda's avatar
Alberto Miranda committed
4. You can log into the cluster containers as root with
   `docker compose exec <container> bash`.
5. Alternatively, if ssh keys for the shared user have been configured in the `user_home` volume and the host's
   `/etc/hosts` file has been updated to include the cluster's IP addresses and hostnames, you can log into
   the cluster login or compute nodes as `$SHARED_USER_NAME` with ssh. For example, if the shared user is `example-user`:

      ```bash
      [example-user@host]$ ssh example-user@login
      ```
6. Jobs can be submitted to the cluster by ssh-ing into the `login` container and
   using the typical Slurm commands:
    ```bash
      [example-user@host]$ ssh example-user@login
      [example-user@login]$ sinfo
      PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
      normal*      up 5-00:00:00      4   idle c[1-4]
      [example-user@login]$ srun -N 4 hostname
      c2
      c3
      c1
      c4
    ```
Alberto Miranda's avatar
Alberto Miranda committed