Native Omni-Path support
This involves multiple points:
-
Potentially modify Hermes for updated Mercury psm2 address strings -
Use libfabric to convert IP to native psm2 address (check if Mercury can do this again as they did in the past). Should this be done in Hermes or GKFS? -
Confirm which libfabric, opa-psm2, and Mercury versions work well together for the following points: - Client reconnection.
FI_PSM2_DISCONNECT=1
environment variable must be set on client for the chance for it to work - MPI must be able to use native psm2 as well. So far we have always disabled it with the following parameter of MPI:
--mca mtl ^psm2,ofi
. (We got this to work on Mogon2 without at some point. It seemed to be unrelated to any of our dependencies.) - some more info with various versions attached in PDF omni-path-investigation.pdf
- Client reconnection.
-
Small I/O requests, i.e., 4 KiB didn't work in CLUSTER version. Re-evaluate
Edited by Marc Vef