Commit 7778f598 authored by Alberto Miranda's avatar Alberto Miranda
Browse files

Merge branch '75-daemon-chunk-storage-backend-crashes-the-server-on-error' into 'master'

Resolve "[daemon] chunk storage backend crashes the server on error"

Closes #75

The `ChunkStorage` backend class on the daemon was throwing `system_errors` without being caught, crashing the server in the process. `ChunkStorage` now uses a designated error class for errors that might occur. In addition the dependency to Argobots was removed which was used to trigger `ABT_eventuals`, laying ground work for future non-Argobots IO implementations. Further, the whole class was refactored for consistency and failure resistance.

A new class `ChunkOperation` is introduced which wraps Argobots' IO task operations which allows the removal of IO queue specific code within RPC handlers, i.e., read and write handlers. The idea is to separate eventuals, tasks and their arguments from handler logic into a designated class. Therefore, an object of an inherited class of `ChunkOperation` is instantiated within the handlers that drives all IO tasks. The corresponding code was added to the read and write RPC handlers. Note, `ChunkOperation` is not thread-safe and is supposed to be called by a single thread.

In addition, truncate was reworked for error handling (it crashed the server on error) and that it uses the IO queue as well since truncate causes a write operation and should not overtake IO tasks in the queue.

The chunk stat rpc handler was refactored for error handling and to use error codes as well. 

Further minor changes:
- dead chunk stat code has been removed
- some namespaces were missing: `gkfs::rpc`
- more flexible handler cleanup and response code
- fixed a bug where the chunk dir wasn't removed when the metadata didn't exist on the same node

Misc:
There was some discussion about putting the removal of the chunk directory into the IO queue as well with the same argument as truncate, but I refrain to do so as it would likely notably increase remove performance. I think, we can put this under *eventual consistency* and call it a day for now. Truncate was another story as glibc makes heavy use of truncate in various operations.

See merge request !32
parents 2a236e33 e1054ae7
Pipeline #1199 passed with stages
in 10 minutes and 54 seconds
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment