Small files optimization
Small files, or a large number of files have some perf drawbacks:
- All the workers issue a "file size" operation on each file. It is important to reduce it on adhoc fs.
- All the workers open-close-send a message for each file, even if that is not processed due to block size > file size. This produces a biased processing on the first worker
We solve 1, getting the size in the master.
We solve 2, filtering and using "single" when only a worker is needed, and nearly round-robin
the small files over all the workers.