Mogon-nhr: srun not working
Problem
Command:
srun -N 1 --export=ALL,LD_PRELOAD=/lustre/project/nhr-admire/vef/gekkofs/build/src/client/libgkfs_intercept.so,LIBGKFS_HOSTS_FILE=/lustre/project/nhr-admire/vef/run/io500/gkfs_hostfile,LIBGKFS_LOG_OUTPUT=/dev/shm/vef_gkfs_client.log,LIBGKFS_LOG=debug hostname
Note: --export=ALL passes all environment variables and overwrite passed arguments with that!
Result:
terminate called after throwing an instance of 'std::runtime_error'
what(): Unable to open cmdline file
srun: error: cpu0237: task 0: Aborted (core dumped)
With srun
, the application crashes with a runtime error. When executed with a debug build, the Error is:
hostname: /lustre/project/nhr-admire/jathenst/gekkofs/src/client/preload_context.cpp:465: int gkfs::preload::PreloadContext::register_internal_fd(int): Assertion `::syscall_error_code(ifd) == 0' failed.
srun: error: cpu0548: task 0: Aborted (core dumped)
318e5c76)
Analysis (commitNo log file is created, because as soon as the log file is opened, gekkofs tries to shift the file descriptor in its range. The range is defined by MAX_USER_FDS = GKFS_MAX_OPEN_FDS - GKFS_MAX_INTERNAL_FDS (preload_context.hpp:81). If gekkofs is freshly checked out, the build option: GKFS_MAX_OPEN_FDS is set to 262144.
When printing out the file descriptor where the assertion fails, then one find that it crashes at fd=131072.
When we do not use a file but the standard output for logging, the program crashes at the assertion in preload_context.cpp:429.
It crashes at the same file descriptor as before. So the problem is not related to the debugger.
With additional fstat
and getrlimit
syscalls, I checked, that the filedescriptor is valid and the file descriptor limits
of the environment are valid. Both indicating no problem.
If one sets the build variable: GKFS_MAX_OPEN_FDS = 131072, then everything works as expected. (Workaround)
Comparison to MPI
mpiexec -np 1 -x LIBGKFS_LOG=debug -x LIBGKFS_LOG_OUTPUT=/dev/shm/jathenst_gkfs_client2.log -x LD_PRELOAD=/lustre/project/nhr-admire/jathenst/gekkofs/build/src/client/libgkfs_intercept.so -x LIBGKFS_HOSTS_FILE=/lustre/project/nhr-admire/jathenst/run/io500/gkfs_hostfile hostname
If one compares the logs of the MPI and srun execution, one sees that there are way more syscalls in the srun execution. Theses syscalls are mainly epoll_wait calls. The epfd is mainly in the same range as the threshold where srun works. e.g.: epfd=130826.
It could be possible that the slurm deamon handles internally file descriptors differently and uses the linux epoll API to manage files more performantly. Theses epfd's could clash with the private used range of file descriptors of gekkofs.