Hi, in #78 (closed) we noticed that there was no way to choose the type of the adhoc storage instance to register, so we have implemented it in !53 (merged). Please @clement, @rnou update your working copies and any code using ADM_register_adhoc_storage() to add the extra field.
ADM_storage_create() is an internal function just to create the type itself. We should have made it so that it could not be called by client code but didn't get to it yet. It just allocates space and fills it with the values provided but doesn't contact scord to report the creation of the storage tier.
ADM_register_adhoc_storage() is the official way of registering an adhoc_storage instance into scord. As such, it contacts the daemon and returns a valid ADM_storage_t with a valid internal id that scord can use to track subsequent calls on the same storage tier.
I will add it to job::resources since it is simpler than updating the API itself. Also, it makes sense if in the future we add something like a slurm_context to the job::resources.
Actually, it makes more sense to add it directly to ADM_register_job, otherwise it gives the impression that ADM_update_job could be used to change the slurm id of an already registered job :/
Added in !56 (merged). @clement if @rnou approves and integrates the MR into main you'll need to update your calls to ADM_register_job() with an extra argument for the SLURM_JOB_ID.
I have a partial issue (but I understand that is due to that the scenario is not covered yet).
When we launch the gkfs script, it issues some srun to launch the daemon instances (in the same nodes, or not using the job id). This instances are going through the plugin, so in scord we get register_job and register_adhoc_storage / deploy_adhoc_storage again.
I managed to avoid it using adhoc_nodes as a shortcut to avoid entering again the loop, as a quick fix
However, I still don't know if we should register that job or not in scord.
You mean a simple srun without the scord options but run in the context of a job where these options have been passed will behaves as if the same options were passed?
If so I think this is expected behaviour for Slurm. There is some environment variables you can unset to avoid that. Alternatively, I can add an option to override arguments processing.
Yes, in the absence of the option on the command line, Slurm uses the environment variables SLURM_SPANK__SLURM_SPANK_OPTION_admire_cli_*. This is "useful" because it means you can do an salloc that register all options, and a subsequent naked srun will use them.
To handle the problem:
you could unset all the the environment variables that start with SLURM_SPANK__SLURM_SPANK_OPTION_admire_cli_ before calling srun for gkfs deployment
I can add an option (something like --adm-adhoc-ignore) to disable processing