Skip to content

Resolve "Shared file metadata congestion"

Marc Vef requested to merge marc/62-shared-file-metadata-congestion-2 into master

During write operations, the client must update the file size on the responsible metadata daemon. The write size cache can reduce the metadata load on the daemon and reduce the number of RPCs during write operations, especially for many small I/O operations. In the past, we have observed that a daemon can become network-congested, especially for single shared files, many processes, and small I/O operations, which bottlenecks the overall I/O throughput. Nevertheless, the cache can have a broad impact on small I/O operations as 1 RPC for updating the size is removed which already improves small file I/O on a single node.

Note that this cache may impact file size consistency in which stat operations may not reflect the actual file size until the file is closed. The cache does not impact the consistency of the file data itself. We did not observe any issues with the cache for HPC applications and benchmarks, but it technically breaks POSIX. So, for now, I suggest it to be experimental and opt-in.

  • LIBGKFS_WRITE_SIZE_CACHE - Enable caching the write size of files (default: OFF).
  • LIBGKFS_WRITE_SIZE_CACHE_THRESHOLD - Set the number of write operations after which the file size is synchronized with the corresponding daemon (default: 1000). The file size is further synchronized when the file is close()d or when fsync() is called.

Depends on !194 (merged)

Closes #62 (closed)

Edited by Marc Vef

Merge request reports