Commit 8812ccdf authored by Marc Vef's avatar Marc Vef
Browse files

Merge branch 'marc/292-add-dentry-cache' into 'master'

Resolve "Add dentry cache"

This MR adds a directory entry cache for the client to avoid a huge number of stat calls after readdir, e.g., for `ls -l` type operations. It is experimental and thus disabled by default. Can be enabled via `include/config.hpp` or with the env variable `LIBGKFS_DENTRY_CACHE=ON/OFF`.

It works by using the `extended_dir_entry` RPC to receive some metadata along the the dentries from the daemons. This metadata is then placed into the cache and retrieved in a stat operation (for a cache miss, an RPC is sent with vanilla functionality). The cache is discarded upon close but can be changed via `include/config.hpp`. Note, this may cause semantical issues (removed files will remain in the cache forever).

The performance improvements are already noticeable locally for a couple 1000 files.

Depends on !195

Closes #292

Closes #292

See merge request !194
parents e4245996 ab3e3c0f
Loading
Loading
Loading
Loading
Loading
+3 −0
Original line number Diff line number Diff line
@@ -8,6 +8,9 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### New

- Added a directory cache for the file system client to improve `ls -l` type operations by avoiding consecutive stat calls
  ([!194](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/194)).
  - The cache is experimental and thus disabled by default and can be enabled with the env variable `LIBGKFS_DISABLE_DIR_CACHE` set to `ON`.
- Added file system expansion support ([!196](https://storage.bsc.es/gitlab/hpc/gekkofs/-/merge_requests/196)).
  - Added the tool `gkfs_malleability` to steer start, status, and finalize requests for expansion operations.
  - `-DGKFS_BUILD_TOOLS=ON` must be set for CMake to build the tool.
+3 −0
Original line number Diff line number Diff line
@@ -517,6 +517,9 @@ Client-metrics require the CMake argument `-DGKFS_ENABLE_CLIENT_METRICS=ON` (see
- `LIBGKFS_METRICS_IP_PORT` - Enable flushing to a set ZeroMQ server (replaces `LIBGKFS_METRICS_PATH`).
- `LIBGKFS_PROXY_PID_FILE` - Path to the proxy pid file (when using the GekkoFS proxy).
- `LIBGKFS_NUM_REPL` - Number of replicas for data.
#### Caching
- `LIBGKFS_DENTRY_CACHE` - Enable caching directory entries until closing the directory (default: OFF). 
Improves performance for `ls -l` type operations. Further compile-time settings available at `include/config.hpp`.

### Daemon
#### Logging
+50 −48
Original line number Diff line number Diff line
@@ -42,6 +42,7 @@ target_sources(
    preload.hpp
    preload_context.hpp
    preload_util.hpp
    cache.hpp
    rpc/rpc_types.hpp
    rpc/forward_management.hpp
    rpc/forward_metadata.hpp
@@ -68,6 +69,7 @@ target_sources(
    preload.hpp
    preload_context.hpp
    preload_util.hpp
    cache.hpp
    rpc/rpc_types.hpp
    rpc/forward_management.hpp
    rpc/forward_metadata.hpp
+137 −0
Original line number Diff line number Diff line
/*
  Copyright 2018-2024, Barcelona Supercomputing Center (BSC), Spain
  Copyright 2015-2024, Johannes Gutenberg Universitaet Mainz, Germany

  This software was partially supported by the
  EC H2020 funded project NEXTGenIO (Project ID: 671951, www.nextgenio.eu).

  This software was partially supported by the
  ADA-FS project under the SPPEXA project funded by the DFG.

  This file is part of GekkoFS' POSIX interface.

  GekkoFS' POSIX interface is free software: you can redistribute it and/or
  modify it under the terms of the GNU Lesser General Public License as
  published by the Free Software Foundation, either version 3 of the License,
  or (at your option) any later version.

  GekkoFS' POSIX interface is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU Lesser General Public License for more details.

  You should have received a copy of the GNU Lesser General Public License
  along with GekkoFS' POSIX interface.  If not, see
  <https://www.gnu.org/licenses/>.

  SPDX-License-Identifier: LGPL-3.0-or-later
*/

#ifndef GKFS_CLIENT_CACHE
#define GKFS_CLIENT_CACHE

#include <client/open_file_map.hpp>

#include <ctime>
#include <functional>
#include <string>
#include <unordered_map>
#include <mutex>
#include <optional>
#include <cstdint>

namespace gkfs::cache {

namespace dir {

/**
 * @brief Cache entry metadata.
 * The entries are limited to the get_dir_extended RPC.
 */
struct cache_entry {
    gkfs::filemap::FileType file_type;
    uint64_t size;
    time_t ctime;
};

/**
 * @brief Cache for directory entries to accelerate ls -l type operations
 */
class DentryCache {
private:
    // <dir_id, <name, cache_entry>>: Associate a directory id with its entries
    // containing the directory name and cache entry metadata
    std::unordered_map<uint32_t, std::unordered_map<std::string, cache_entry>>
            entries_;
    // <dir_path, dir_id>: Associate a directory path with a unique id
    std::unordered_map<std::string, uint32_t> entry_dir_id_;
    std::mutex mtx_;                 // Mutex to protect the cache
    std::hash<std::string> str_hash; // hash to generate ids

    /**
     * @brief Generate a unique id for caching a directory
     * @param dir_path
     * @return id
     */
    uint32_t
    gen_dir_id(const std::string& dir_path);

    /**
     * @brief Get the unique id for a directory to retrieve its entries. Creates
     * an id if it does not exist.
     * @param dir_path
     * @return id
     */
    uint32_t
    get_dir_id(const std::string& dir_path);

public:
    DentryCache() = default;

    virtual ~DentryCache() = default;

    /**
     * @brief Insert a new entry in the cache
     * @param parent_dir
     * @param name
     * @param value
     */
    void
    insert(const std::string& parent_dir, std::string name, cache_entry value);

    /**
     * @brief Get an entry from the cache for a given directory
     * @param parent_dir
     * @param name
     * @return std::optional<cache_entry>
     */
    std::optional<cache_entry>
    get(const std::string& parent_dir, const std::string& name);

    /**
     * @brief Clear the cache for a given directory. Called when a directory is
     * closed
     * @param dir_path
     */
    void
    clear_dir(const std::string& dir_path);

    /**
     * @brief Dump the cache to the log for debugging purposes. Not used in
     * production.
     * @param dir_path
     */
    void
    dump_cache_to_log(const std::string& dir_path);

    /**
     * @brief Clear the entire cache
     */
    void
    clear();
};
} // namespace dir

} // namespace gkfs::cache

#endif // GKFS_CLIENT_CACHE
+1 −0
Original line number Diff line number Diff line
@@ -60,6 +60,7 @@ static constexpr auto METRICS_IP_PORT = ADD_PREFIX("METRICS_IP_PORT");

static constexpr auto NUM_REPL = ADD_PREFIX("NUM_REPL");
static constexpr auto PROXY_PID_FILE = ADD_PREFIX("PROXY_PID_FILE");
static constexpr auto DENTRY_CACHE = ADD_PREFIX("DENTRY_CACHE");

} // namespace gkfs::env

Loading