pycompss gkfs integration: python file read is not complete
In the pycompss sort example the file is read with:
dataset = []
for line in f:
print("line", len(line))
dataset.append(asarray([int(x) for x in line.strip().split(" ")]))
Therefore, a fstat64 syscall is triggered which gives the size of the file, in my case: 659995B
Then it starts to read the file but it stops after two chunks:
<forward_read():454> host: 0, path: /dataset.txt, chunk_start: 0, chunk_end: 0, chunks: 1, size: 8192, offset: 0
<forward_read():454> host: 0, path: /dataset.txt, chunk_start: 0, chunk_end: 0, chunks: 1, size: 8192, offset: 8192
With this original implementation of sort.py, the program fails, because it gets into a loop repeating:
<log_arguments():117> [BYPASS] Calling read with arguments: [i] 60 [Pv] 0x7a50dd5fc6b0 [m] 8192
<log_arguments():117> [BYPASS] Calling read with arguments: [i] 77 [Pv] 0x7a50dcffc6b0 [m] 8192
If I read the whole file at once:
f = open(nums_file, 'rb') # Open in binary mode
data = f.read() # Read entire file at once
f.close()
It reads until:
pos 659781 ret 0 newpos 659781
So it still misses some Bytes compared to the fstat64 file size. It misses 214B. Why?
Pythons file implementation is here: read, read_all
The execution works though.