Data replication (client side, synchronous)

Merged Ramon Nou requested to merge rnou/replication into master

This MR adds support for data replication using one environment variable:

LIBGKFS_NUM_REPL=<num repl> The number of replicas should go from 0 to the number of servers-1. The replicas are guided by the client, so it reduces write performance but we mantain the same level of consistency. On the other hand, it may increase read performance on some corner scenearios. Metadata replication is also implemented The replication environment variable can be set up for each client, independently.

If a server is down, the data will be read from another replica. The metadata management is also done from another replica.

The replication is done in a synchronous way. A new function forward_write is used to sent to the different replicas. The reads are distributed, but this shouldn't produce an performance improvement as the distribution is similar to the original.

In the case of the write, the original is sent to the target servers, and then the replicas are processed. This is done to avoid issues if a server, that should host a replica, is not available.

In order to process the replicas a new method to check that a chunk needs to be processed inside a server is included, a bitset of 1024 is sent (coded in base-64 in a string). This represents 1024-chunks per write-read operation. If that is exceeded the normal hash check per chunk is done in the server. Exceeding this value, will disable the replica capabilities and produce unknown behaviours.

This can be potentially increased.

Finally, most of the operations are replica-aware, but some of them are missing yet. i.e., dirent.

Edited by Ramon Nou

Merge request reports