Skip to content

Native Omni-Path support

This involves multiple points:

  • Potentially modify Hermes for updated Mercury psm2 address strings
  • Use libfabric to convert IP to native psm2 address (check if Mercury can do this again as they did in the past). Should this be done in Hermes or GKFS?
  • Confirm which libfabric, opa-psm2, and Mercury versions work well together for the following points:
    • Client reconnection. FI_PSM2_DISCONNECT=1 environment variable must be set on client for the chance for it to work
    • MPI must be able to use native psm2 as well. So far we have always disabled it with the following parameter of MPI: --mca mtl ^psm2,ofi. (We got this to work on Mogon2 without at some point. It seemed to be unrelated to any of our dependencies.)
    • some more info with various versions attached in PDF omni-path-investigation.pdf
  • Small I/O requests, i.e., 4 KiB didn't work in CLUSTER version. Re-evaluate
Edited by Marc Vef