Skip to content

pycompss gkfs integration: python file read is not complete

In the pycompss sort example the file is read with:

    dataset = []
    for line in f:
        print("line", len(line))
        dataset.append(asarray([int(x) for x in line.strip().split(" ")]))

Therefore, a fstat64 syscall is triggered which gives the size of the file, in my case: 659995B

Then it starts to read the file but it stops after two chunks:
<forward_read():454> host: 0, path: /dataset.txt, chunk_start: 0, chunk_end: 0, chunks: 1, size: 8192, offset: 0
<forward_read():454> host: 0, path: /dataset.txt, chunk_start: 0, chunk_end: 0, chunks: 1, size: 8192, offset: 8192

With this original implementation of sort.py, the program fails, because it gets into a loop repeating:
<log_arguments():117> [BYPASS] Calling read with arguments: [i] 60 [Pv] 0x7a50dd5fc6b0 [m] 8192
<log_arguments():117> [BYPASS] Calling read with arguments: [i] 77 [Pv] 0x7a50dcffc6b0 [m] 8192

If I read the whole file at once:

    f = open(nums_file, 'rb')  # Open in binary mode
    data = f.read()  # Read entire file at once
    f.close()

It reads until:

pos 659781 ret 0 newpos 659781

So it still misses some Bytes compared to the fstat64 file size. It misses 214B. Why?

Pythons file implementation is here: read, read_all

The execution works though.

Edited by Julius Athenstaedt