Newer
Older
# Slurm Docker Cluster
This is a multi-container Slurm cluster using `docker-compose` with `sshd` and `systemd` enabled.
The compose file creates named volumes for persistent storage of MySQL data files as well as
Slurm state and log directories. It is heavily based on work by [giovtorres/docker-slurm-cluster](
https://github.com/giovtorres/slurm-docker-cluster).
## Containers, Networks, and Volumes
The compose file will run the following containers:
* `mysql`
* `slurmdbd`
* `slurmctld`
* `login` (slurmd)
* `c1`, `c2`, `c3`. `c4` (slurmd)
The compose file will create the following named volumes:
* `etc_munge` ( -> `/etc/munge` )
* `slurm_jobdir` ( -> `/data` )
* `var_lib_mysql` ( -> `/var/lib/mysql` )
The compose file will create the `slurm_cluster` network for all containers and will assign the
following IPv4 static addresses:
* `slurmctld`: 192.18.0.129
* `c1`: 192.18.0.10
* `c2`: 192.18.0.11
* `c3`: 192.18.0.12
* `c4`: 192.18.0.13
* `login`: 192.18.0.128
## Package contents
- `docker-compose.yml`: docker-compose file for running the cluster
- `slurm-docker-cluster/Dockerfile`: dockerfile for building the main cluster services.
- `slurm-docker-cluster-node/Dockerfile`: dockerfile with specific software for
the compute nodes. Specific for scord. NEEDS TO BE BUILT BEFORE RUNNING THE
CLUSTER.
- `scripts/register_cluster.sh`: script for registering the cluster with the `slurmdbd` daemon.
- `scripts/refresh.sh`: script for refreshing the scord installation in the cluster.
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
This script uses the `slurm-docker-cluster-node` image to generate the
binaries so that there are no compatibility issues with dependencies.
The script relies on the following variables:
- `REPO`: The repository where the `scord` source code is located.
- `VOLUMES`: The host directory where the output of the build process will
be placed.
- `USER`: The container user that should be used to run the build
process (so that ownership matches with the host and container user).
The `scord` build process relies on a CMake Preset for Rocky Linux that
has been configured to match the container environment:
```json
{
"name": "rocky",
"displayName": "Rocky Linux",
"description": "Build options for Rocky Linux",
"inherits": "base",
"environment" : {
"PKG_CONFIG_PATH": "/usr/lib/pkgconfig;/usr/lib64/pkgconfig"
},
"generator": "Unix Makefiles",
"cacheVariables": {
"CMAKE_CXX_COMPILER_LAUNCHER": "",
"CMAKE_C_COMPILER_LAUNCHER": "",
"CMAKE_CXX_FLAGS": "-fdiagnostics-color=always",
"CMAKE_C_FLAGS": "-fdiagnostics-color=always",
"CMAKE_PREFIX_PATH": "/usr/lib;/usr/lib64",
"CMAKE_INSTALL_PREFIX": "/scord_prefix",
"SCORD_BUILD_EXAMPLES": true,
"SCORD_BUILD_TESTS": true,
"SCORD_BIND_ADDRESS": "192.18.0.128"
}
}
```
- `volumes`: directory for the volumes used by the cluster:
- `etc_munge`: munge configuration files. A shared `munge.key` needs to be
generated and placed here.
- `etc_slurm`: slurm configuration files. At least a `slurm.conf` file needs
to be placed here. The `slurm.conf` file should be configured with the
compute node and partition information. For example:
```conf
# COMPUTE NODES
NodeName=c[1-4] RealMemory=1000 State=UNKNOWN
# PARTITIONS
PartitionName=normal Default=yes Nodes=c[1-4] Priority=50 DefMemPerCPU=500 Shared=NO MaxNodes=4 MaxTime=5-00:00:00 DefaultTime=5-00:00:00 State=UP
```
- `etc_ssh`: ssh configuration files. Server keys and configuration files
should be placed here.
- `ld.so.conf.d`: ld.so configuration files.
- `scord_prefix`: scord installation directory. The scord installation
should
be placed here. This should match with the directory outside the
container where we are generating the binaries.
- `user_home`: user home directory. Any files and directories that we want to
have available in all compute nodes (e.g. `.ssh`), should be added here.
- `docker-entrypoint.sh`: Overridden entry point
## Build arguments
The following build arguments are available:
* `SLURM_TAG`: The Slurm Git tag to build. Defaults to `slurm-21-08-6-1`.
* `GOSU_VERSION`: The gosu version to install. Defaults to `1.11`.
* `SHARED_USER_NAME`: The name of the user that will be shared with the cluster. Defaults to `user`.
* `SHARED_USER_UID`: The UID of the user that will be shared with the cluster. Defaults to `1000`.
* `SHARED_GROUP_NAME`: The name of the group that will be shared with the cluster. Defaults to `user`.
* `SHARED_GROUP_GID`: The GID of the group that will be shared with the cluster. Defaults to `1000`.
## Configuration
To run, the cluster services expect some files to be present in the host system. The simpler way to do this is to
provide the files in the `volumes` directory with the correct ownership and permissions so that they can be
mounted in the containers. The `volumes` directory should be placed in the same directory as the `docker-compose.yml`
file. The `volumes` directory should have the following structure:
```bash
volumes/
├── docker-entrypoint.sh -> /usr/local/bin/docker-entrypoint.sh
├── etc_munge -> /etc/munge
├── etc_slurm -> /etc/slurm
├── etc_ssh -> /etc/ssh
├── ld.so.conf.d -> /etc/ld.so.conf.d
└── user_home -> /home/$SHARED_USER_NAME
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
The following ownership and permissions should be set for the cluster to work properly.
The `slurm` and `munge` users are not actually required to exist in the host system
as they are created automatically whie building the images, though it helps to actually
create them rather than having a weird number pop up each time `ls` is called.
Note however, that if created in the host, the `slurm` and `munge` users/groups
need to have the same UIDs/GIDs in the host and container systems.
```bash
volumes
├── [-rwxrwxr-x example-user example-user 1.9K Jun 29 16:30] docker-entrypoint.sh
├── [drwxrwxr-x munge munge 4.0K Jun 17 09:11] etc_munge
│ └── [-r-------- munge munge 1.0K Jun 17 09:11] munge.key
├── [drwxrwxr-x slurm slurm 4.0K Jul 4 09:49] etc_slurm
│ ├── [-rw-r--r-- slurm slurm 216 Jun 16 15:48] cgroup.conf.example
│ ├── [-rw-r--r-- slurm slurm 213 Jun 30 14:28] plugstack.conf
│ ├── [drwxrwxr-x slurm slurm 4.0K Jun 16 16:13] plugstack.conf.d
│ ├── [-rw-r--r-- slurm slurm 2.2K Jun 23 15:24] slurm.conf
│ ├── [-rw-r--r-- slurm slurm 3.0K Jun 16 15:48] slurm.conf.example
│ ├── [-rw------- slurm slurm 722 Jun 16 15:48] slurmdbd.conf
│ └── [-rw-r--r-- slurm slurm 745 Jun 16 15:48] slurmdbd.conf.example
├── [drwxrwxr-x example-user example-user 4.0K Jun 29 12:46] etc_ssh
│ ├── [-rw------- root root 3.6K May 9 19:14] sshd_config
│ ├── [drwx------ root root 4.0K Jun 29 12:46] sshd_config.d [error opening dir]
│ ├── [-rw------- root root 1.4K Jun 29 11:17] ssh_host_dsa_key
│ ├── [-rw-r--r-- root root 600 Jun 29 11:17] ssh_host_dsa_key.pub
│ ├── [-rw------- root root 505 Jun 29 11:26] ssh_host_ecdsa_key
│ ├── [-rw-r--r-- root root 172 Jun 29 11:26] ssh_host_ecdsa_key.pub
│ ├── [-rw------- root root 399 Jun 29 11:26] ssh_host_ed25519_key
│ ├── [-rw-r--r-- root root 92 Jun 29 11:26] ssh_host_ed25519_key.pub
│ ├── [-rw------- root root 2.5K Jun 29 11:26] ssh_host_rsa_key
│ └── [-rw-r--r-- root root 564 Jun 29 11:26] ssh_host_rsa_key.pub
├── [drwxrwxr-x example-user example-user 4.0K Jun 19 10:46] ld.so.conf.d
├── [drwxrwxr-x example-user example-user 4.0K Jun 20 11:20] scord_prefix
└── [drwxr-xr-x example-user example-user 4.0K Jul 7 08:27] user_home
42 directories, 149 files
```
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
## Optional configurations
### Cluster registration
Though it's not required for the cluster to work properly, the newly created
cluster can be registered with the internal `slurmdbd` daemon. To do so, run the
`scripts/register_cluster.sh` script:
```console
scripts/register_cluster.sh
```
### Enabling name resolution
Though the cluster internally will be able to properly resolve the names of each service,
the host will be unable to do so. The simplest solution is to edit the `/etc/hosts` file
and add entries for the services which have static IPv4 addresses assigned:
```
192.18.0.128 login
192.18.0.129 slurmctld
192.18.0.10 c1
192.18.0.11 c2
192.18.0.12 c3
192.18.0.13 c4
```
1. Find out the UID and GID of the host user that will be shared with the
cluster. This can be done by running `id` in the host machine.
2. Build the services, making sure sure to set the `SHARED_USER_NAME`,
`SHARED_USER_UID`, `SHARED_GROUP_NAME`, and `SHARED_GROUP_GID` build arguments to the
values obtained in Step 1:
```shell
$ docker compose build \
--build-arg SHARED_USER_NAME=example-user \
--build-arg SHARED_USER_UID=1000 \
--build-arg SHARED_GROUP_NAME=example-user \
--build-arg=SHARED_GROUP_GID=1000
```
4. You can log into the cluster containers as root with
`docker compose exec <container> bash`.
5. Alternatively, if ssh keys for the shared user have been configured in the `user_home` volume and the host's
`/etc/hosts` file has been updated to include the cluster's IP addresses and hostnames, you can log into
the cluster login or compute nodes as `$SHARED_USER_NAME` with ssh. For example, if the shared user is `example-user`:
```bash
[example-user@host]$ ssh example-user@login
```
6. Jobs can be submitted to the cluster by ssh-ing into the `login` container and
using the typical Slurm commands:
```bash
[example-user@host]$ ssh example-user@login
[example-user@login]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
normal* up 5-00:00:00 4 idle c[1-4]
[example-user@login]$ srun -N 4 hostname
c2
c3
c1
c4
```