Commit 8930cfc7 authored by Marc Vef's avatar Marc Vef
Browse files

Merge branch 'marc/254-support-for-parallel-append-operations' into 'master'

Resolve "Support for (parallel) append operations"

This MR adds (parallel) append support for write operations. There was already some append code available that was run for each `pwrite` when the file size was updated. As a result, parts of strings were serialized and deserialized within RocksDB's merge operation even if not needed. Previously, `open()` was returning `ENOTSUP` when `O_APPEND` was used. When removing this statement, append was not functional due to how size updates and the append case worked. Overall, `gkfs_pwrite()` which first updates the file size and then writes the file was quite messy with unused return values and arguments. Further, the server calculated the updated size without regard on what occurred in the KV store. Therefore, as part of this MR, the entire update size process within `pwrite()` was refactored. 

Parallel appends are achieved by hooking into RocksDB's `Merge Operator` which is triggered at some point (e.g., during `Get()`). Without append being used, the offset is known to the client already and therefore the file size is updated to `offset + count` set in `gkfs_pwrite()`. There is no further coordination required since overlapping offsets are the user's responsibility. The code path for non-append operations was slightly optimized but largely remains the same. 

Append operations are treated differently because it is not clear during a write operation where a process calling `write()` should start writing. Using the EOF information that is loaded during open may be outdated when multiple processes try to append at the same time -> causing a race condition. Since the size update on the daemon is atomic, a process (updating the size before performing a write) can be reserved a corresponding byte interval `[EOF, EOF + count]`. Now, calling `Merge()` on RocksDB does not trigger a Merge operation since multiple Merges are batched before the operation is run. For append, the Merge operation is forced by running `Get()` on RocksDB. The corresponding Merge operation then responds the starting write offset to the updating size process. Therefore, appends are more expensive than non-appends.

Lastly, some missing documentation was added.

As reported, this MR adds support for the DASI application, used in IO-SEA.

Note: This MR does not consider failing writes which would require us to collapse a reserved interval and tie up the hole in the file. 

Closes #254
Closes #12

Closes #12 and #254

See merge request !164
parents a2b88702 6c264285
Pipeline #3802 passed with stages
in 38 minutes and 48 seconds
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment