Commit 01a86b08 authored by Ramon Nou's avatar Ramon Nou
Browse files

Merge branch 'faster' into 'master'

Faster IO500

Add some optimizations.
1. With write inlined, mdtest inneficciently gets all the 4k data for all the request. A stringview optimizes the feature.
2. batching mmetadata request at the client (LIBGKFS_METADATA_BATCH=ON/OFF and LIBGKFS_METADATA_BATCH_THRESHOLD=64).

See merge request !305
parents 0af7bda8 2de9afc0
Loading
Loading
Loading
Loading
Loading
+3 −0
Original line number Diff line number Diff line
@@ -7,6 +7,9 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]
### New
  - Metadata batching ([!305](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/305))
    - Added client-side metadata batching for file/node creation to reduce metadata RPC bottlenecks.
    - Introduced new environment variables: `LIBGKFS_METADATA_BATCH` and `LIBGKFS_METADATA_BATCH_THRESHOLD`.
  - directory optimization with compression and reattemp ([!270](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/270))
    - Refactor sfind so it can use SLURM_ environment variables to ask to different servers.
    - Create a sample bash script to gather all the info (map->reduce)
+8 −0
Original line number Diff line number Diff line
@@ -738,6 +738,14 @@ Using two environment variables
- `LIBGKFS_PROTECT_FILES_GENERATOR=1` enables the application as generator, so a open will create (and increase) a file .lockgekko and close will remove it (or decrease its value). The behaviour only uses metadata at server side.
- `LIBGKFS_PROTECT_FILES_CONSUMER=1` enables the application as consumer, so a open will wait until the .lockgekko dissapears. The wait is limited to ~40 seconds. 

##### Metadata batching
During file/node creation (e.g., `open` with `O_CREAT`), the client sends a separate RPC for each file creation. The metadata batching feature allows buffering these creation operations on the client-side and sending them in batches to the daemons, reducing network RPC overhead during massive file creations.

Remaining buffered creation requests are automatically flushed when the application exits.

- `LIBGKFS_METADATA_BATCH=ON` - Enable client-side metadata batching for file creation (default: OFF).
- `LIBGKFS_METADATA_BATCH_THRESHOLD` - Set the number of file creation operations per host after which the batch is flushed (default: 64).

### Daemon
#### Core
- `GKFS_DAEMON_CREATE_CHECK_PARENTS` - Enable checking parent directory for existence before creating children.
+3 −0
Original line number Diff line number Diff line
@@ -95,6 +95,9 @@ static constexpr auto CREATE_WRITE_OPTIMIZATION =
        ADD_PREFIX("CREATE_WRITE_OPTIMIZATION");
static constexpr auto READ_INLINE_PREFETCH = ADD_PREFIX("READ_INLINE_PREFETCH");
static constexpr auto ENABLE_FORK = ADD_PREFIX("ENABLE_FORK");
static constexpr auto METADATA_BATCH = ADD_PREFIX("METADATA_BATCH");
static constexpr auto METADATA_BATCH_THRESHOLD =
        ADD_PREFIX("METADATA_BATCH_THRESHOLD");

} // namespace gkfs::env

+29 −0
Original line number Diff line number Diff line
@@ -41,6 +41,7 @@
#define GEKKOFS_PRELOAD_CTX_HPP

#include <map>
#include <unordered_map>
#include <thallium.hpp>
#include <memory>
#include <vector>
@@ -156,6 +157,12 @@ private:
    std::shared_ptr<thallium::engine> rpc_engine_;
    std::shared_ptr<thallium::engine> ipc_engine_;

    bool use_metadata_batch_{false};
    size_t metadata_batch_threshold_{64};
    std::unordered_map<uint64_t, std::vector<std::pair<std::string, mode_t>>>
            metadata_batch_buffer_;
    mutable std::mutex metadata_batch_mutex_;


public:
    static PreloadContext*
@@ -375,6 +382,28 @@ public:

    void
    ipc_engine(std::shared_ptr<thallium::engine> engine);

    bool
    use_metadata_batch() const;

    void
    use_metadata_batch(bool use_metadata_batch);

    size_t
    metadata_batch_threshold() const;

    void
    metadata_batch_threshold(size_t threshold);

    void
    flush_metadata_batches();

    void
    flush_metadata_batch(uint64_t host_id);

    void
    add_metadata_batch_entry(uint64_t host_id, const std::string& path,
                             mode_t mode);
};

} // namespace preload
+4 −0
Original line number Diff line number Diff line
@@ -63,6 +63,10 @@ namespace rpc {
int
forward_create(const std::string& path, mode_t mode, const int copy);

int
forward_batch_create(uint64_t host_id, const std::vector<std::string>& paths,
                     const std::vector<uint32_t>& modes);

int
forward_create_write_inline(const std::string& path, mode_t mode,
                            const std::string& data, uint64_t count,
Loading