Shared file metadata congestion
When a large number of processes work on a single file, the size is updated after each write. This causes congestion at the metadata node for this file and heavily limits throughput. This is discussed in both CLUSTER and JCST papers.
Instead of sending the size on every write, it could be send it batches to avoid congestion. It is not clear yet what the best approach is.