Testing
This page describes the testing infrastructure of the GekkoFS file system. It is intended to provide a general idea of how the different infrastructure components are organized and their interactions, so that developers can use them to write new tests (and/or extend the framework) with as few prototyping overhead as possible.
Integration and functionality tests
GekkoFS provides an automated testing harness to simplify writing and running
integration and functional tests for the file system. For simplicity and ease of
development, tests are written in Python (3.6+ required). The harness itself
relies on the pytest framework to simplify
testing the GekkoFS infrastructure. Among others, pytest
offers the following
features:
Detailed info on failing assert statements;
Auto-discovery of test modules and functions;
Modular fixtures for managing small or parametrized long-lived test resources;
Rich plugin architecture, in case further functionality is required.
The code below shows an example of a simple test that creates a directory on the
file system’s root by invoking mkdir()
and checks the returned value
and errno
error code.
import harness, errno, stat
from pathlib import Path
def test_mkdir(gkfs_daemon, gkfs_client):
"""Create a new directory in the FS's root"""
topdir = gkfs_daemon.mountdir / "top"
ret = gkfs_client.mkdir(
topdir,
stat.S_IRWXU | stat.S_IRWXG | stat.S_IRWXO)
assert ret.retval == 0
assert ret.errno == 0
Executing functional tests
In order for tests to be built, the GKFS_BUILD_TESTS
option needs to be
enabled when configuring the package:
$ mkdir build && cd build
$ cmake -DGKFS_BUILD_TESTS=ON ..
$ make -j8
Once built, tests can be run in several ways:
# 1. run all tests registered with CTest
$ make test
# 2. build and run all tests registered with CTest
# (equivalent to ctest --force-new-ctest-process --verbose --output-on-failure)
$ make check
# 3. configure the run (see `ctest --help` for available options)
$ ctest [ options ]
It is also possible to invoke pytest
directly by taking advantage of the
Python virtualenv that CMake creates to run the tests. This has the advantage
that the user can pass arguments directly to the pytest
framework and that
pretty-printed output is sent directly to the console.
Note
py.test
must be run from the build/tests/integration
directory to work.
# option 1: run the 'mkdir' test directly from the 'build/tests' directory
# in verbose mode, without console capture and using the 'docker0' interface
$ pytest-venv/bin/py.test -s -v --interface=docker0 -k test_mkdir
# option2: same example but activating the virtualenv
$ source pytest-venv/bin/activate
(pytest-venv) $ py.test -s -v --interface=docker0 -k test_mkdir
(pytest-venv) $ deactivate
In both cases, the test output should be similar to this:
===================================================================================== test session starts ======================================================================================
platform linux -- Python 3.7.3, pytest-5.3.5, py-1.8.1, pluggy-0.13.1 -- /opt/gekkofs/build/tests/pytest-venv/bin/python3.7
cachedir: .pytest_cache
rootdir: /opt/gekkofs/build/tests, inifile: pytest.ini, testpaths: /opt/gekkofs/tests
plugins: xdist-1.31.0, dependency-0.4.0, forked-1.1.3
collected 20 items / 19 deselected / 1 selected
test_directories.py::test_mkdir 2020-02-25 14:42:08.923 | INFO | harness.workspace:__init__:43 - workspace created at /tmp/pytest-of-testuser/pytest-190/test_mkdir0
2020-02-25 14:42:08.924 | INFO | harness.gkfs:run:122 - spawning daemon
2020-02-25 14:42:08.924 | INFO | harness.gkfs:run:123 - cmdline: /opt/gekkofs/build/src/daemon/gkfs_daemon --mountdir /tmp/pytest-of-testuser/pytest-190/test_mkdir0/mnt --rootdir /tmp/pytest-of-testuser/pytest-190/test_mkdir0/root -l docker0:12769
2020-02-25 14:42:08.924 | INFO | harness.gkfs:run:124 - env: {'LD_LIBRARY_PATH': ':/opt/gekkofs/build/tests:/opt/gekkofs/prefix/lib', 'GKFS_HOSTS_FILE': PosixPath('/tmp/pytest-of-testuser/pytest-190/test_mkdir0/gkfs_hosts.txt'), 'GKFS_DAEMON_LOG_PATH': PosixPath('/tmp/pytest-of-testuser/pytest-190/test_mkdir0/logs/gkfs_daemon.log'), 'GKFS_LOG_LEVEL': '100'}
2020-02-25 14:42:08.927 | INFO | harness.gkfs:run:134 - daemon process spawned (PID=14946)
2020-02-25 14:42:09.040 | INFO | harness.gkfs:run:251 - running client
2020-02-25 14:42:09.041 | INFO | harness.gkfs:run:252 - cmdline: /opt/gekkofs/build/tests/harness/gkfs.io /tmp/pytest-of-testuser/pytest-190/test_mkdir0/mnt/top 511
2020-02-25 14:42:09.042 | INFO | harness.gkfs:run:253 - env: {'LD_LIBRARY_PATH': ':/opt/gekkofs/build/tests:/opt/gekkofs/prefix/lib', 'LD_PRELOAD': PosixPath('libgkfs_intercept.so'), 'LIBGKFS_HOSTS_FILE': PosixPath('/tmp/pytest-of-testuser/pytest-190/test_mkdir0/gkfs_hosts.txt'), 'LIBGKFS_LOG': 'all', 'LIBGKFS_LOG_OUTPUT': PosixPath('/tmp/pytest-of-testuser/pytest-190/test_mkdir0/logs/gkfs_client.log')}
2020-02-25 14:42:09.390 | DEBUG | harness.gkfs:run:262 - command output: b'{\n "errnum": 0,\n "retval": 0\n}\n'
PASSED
2020-02-25 14:42:15.219 | INFO | harness.gkfs:shutdown:197 - terminating daemon
=============================================================================== 1 passed, 19 deselected in 1.54s ================================================================================
Warning
Be careful not to run make
with the virtualenv activated. When that
happens, CMake considers the virtualenv’s Python interpreter as the
system-wide one and caches this information, thus failing to run tests when
the virtualenv is deactivated.
Integration with CMake
As shown in the examples above, the functional testing harness is integrated
with CMake’s built-in testing tool. The CMake software suite includes
the CTest
tool which can be used to automate the testing phase, or even the
entire process of configuring, building, testing and even submitting results to
a dashboard. Thus, integrating the pytest
testing harness with it allows
end-users to easily execute tests, and also integrate with expected CMake
workflows. Nevertheless, CTest
requires all tests to be defined by hand in any
involved CMakeLists.txt
, whereas pytest
is capable of automatically finding
all Pyhton test functions whose names follow certain patterns, which is why
tests are semantically organized into directories, and test groups are added
to CTest as shown below.
Adding new tests to CMake
The GekkoFS testing framework provides the gkfs_add_python_test
CMake function
to simplify creating test groups. Thus, if a ${GKFS_ROOTDIR}/tests/io
subdirectory exists that should contain all the tests that exercise and verify
that I/O works as expected, the test group could be added to CMake by adding the
following code to ${GKFS_ROOTDIR}/tests/CMakeLists.txt
:
gkfs_add_python_test(
NAME test_io
PYTHON_VERSION 3.6
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}
SOURCE tests/io/
)
The function creates a new CTest
test called test_io
that will internally
call pytest
and instruct it to auto-discover tests starting on the file or
directory defined by SOURCE
. It is also possible to define
the WORKING_DIRECTORY
required for the test, though for now only a value
of ${PROJECT_SOURCE_DIR}
is supported.
Testing harness organization
The testing harness resides on the ${GKFS_ROOTDIR}/tests
subdirectory and
consists of the following files, directories, and helper programs:
pytest.ini.in
:pytest
determines arootdir
for each test run which depends on the command line arguments (specified test files, paths) and on the existence of ini-files, which allow to set pre-configured settings for a test run without having to rely on command line arguments ( see here) . Unfortunately,pytest
’srootdir
finding algorithm will always determine that the first directory to contain apytest.ini
file will become therootdir
. Since we want to be able to run the tests by invokingmake test
from${CMAKE_BINARY_DIR}
, but we don’t want to copy the tests source files to the binary directory, we use this template to generate apytest.ini
file that instructspytest
to run the tests in${CMAKE_CURRENT_SOURCE_DIR}
, i.e. our maintests
subdirectory.conftest.py
: This is the mainpytest
configuration file that should contain the definition of all fixtures that should be shared by multiple tests (refer topytest
’ s documentation for more information). Its purpose is to add extra options topytest
’s CLI, setup logging, and define the fixtures for theWorkspace
,Daemon
, andClient
classes (see below).conftest.py.in
: This template file allows adding CLI arguments topytest
regardless of whether it is run from${CMAKE_CURRENT_SOURCE_DIR}
or${CMAKE_BINARY_DIR}
. As mentioned, thepytest
framework auto-discovers tests, ini-files, andconftest.py
files and uses this information to determine therootdir
of the tests. Unfortunately, when the first ini-file is found,pytest
considers its parent directory as therootdir
and expects to find the mainconftest.py
in the same directory. Thus, in order to successfully add extra arguments topytest
’s CLI while keeping the tests sources in${CMAKE_CURRENT_SOURCE_DIR}
, we must have a validconftest.py
file in${CMAKE_BINARY_DIR}
.cli.py
: This module exports the functionadd_cli_options()
that adds the--interface
,--bin-dir
, and--lib-dir
CLI arguments topytest
. This file is the only python source file that is copied to${CMAKE_BINARY_DIR}
so that it can be called from${CMAKE_BINARY_DIR}/conftest.py
.gkfs.io/
: This directory contains the sources for thegkfs.io
helper program. This program acts as a proxy for theClient
class described below to execute I/O-related system calls and library functions from aLD_PRELOAD
context. Results from the function execution are returned in JSON format so that they can be easily parsed by theClient
class (seeio
module below).gkfs.py
: This module exports theDaemon
,Client
, andShellClient
classes which allow tests to interface easily with the GekkoFS daemon, client library, and shell, respectively (see below for more details).io.py
: This module exports anIOParser
class that is used internally by theClient
class to deserialize any JSON output produced by thegkfs.io
helper program. JSON deserialization relies on marshmallow fields and schemas to convert JSON strings to native Python datatypes.cmd.py
: This module exports aCommandParser
class that is used internally byShellClient
class to deserialize output strings generated by shell commands into native Python datatypes.logger.py
: This module exports theharness.logger
alias which hides the implementation details of the actual logging framework used. Any tests willing to produce logging messages only need toimport harness.logger
and call the appropriatelogger.LEVEL(msg)
function. All standard Python logging levels are supported.workspace.py
: This module exports theWorkspace
class which allows tests to setup and interact with their workspace.
Useful fixtures available to tests
The project’s testing harness relies on pytest
internal fixture mechanism to
setup and teardown resources for the test (e.g. daemon and client processes). By
leveraging pytest
’s mechanism for automatic fixture injection, any test can
activate the harness’ automatic resource management by simply declaring the use
of a harness fixture from test functions, modules, classes or whole projects.
In order to simplify writing tests and to ensure that they can run in parallel,
the harness currently provides the following fixtures, which are defined
in ${GKFS_ROOTDIR}/tests/conftest.py
:
workspace
The workspace
fixture returns an instance of
the Worskpace
class that implements a self-contained subdirectory where a test can run and
generate artifacts in a self-contained manner. Given a test working
directory twd
, a workspace for a test is initialized by creating the
following directories under twd
:
twd # base directory for test (typically under the system's temporary directory)
├── logs # directory for logs
│ ├── gkfs_client.log
│ └── gkfs_daemon.log
├── mnt # directory for GekkoFS' virtual mount point
├── root # directory for GekkoFS' internal data
│ └── 14935
│ ├── data
│ │ └── chunks
│ └── rocksdb
│ ├── 000003.log
│ ├── 000006.sst
│ ├── CURRENT
│ ├── IDENTITY
│ ├── LOCK
│ ├── LOG
│ ├── MANIFEST-000007
│ └── OPTIONS-000005
└── tmp # temporary directory for test
For convenience, the Workspace
class can be passed a bindirs
and libdirs
arguments that allow to respectively influence the PATH
and LD_LIBRARY_PATH
environment variables that will be used when executing internal shell commands
under it.
IMPORTANT
The workspace fixture is a direct dependency of the gkfs_daemon
and gkfs_client
test fixtures described below, which means that it’s
transitively added to any tests depending on these fixtures. There is seldom any
need to directly declare it or use it when writing a test.
file_factory
The file_factory
fixture returns an instance of
the harness.workspace.FileCreator
class that allows test developers to create custom files with random binary
contents in the test workspace. To do so, the FileCreator
class provides
a create(pathname, size, unit)
function to generate the desired file, which in
turn returns a workspace.File
object that represents the newly created file.
This File
object offers convenience methods and properties to interact with
the created file in the context of a test (e.g. md5sum()
):
def test_foobar(gkfs_daemon, gkfs_shell, file_factory):
# create a 4MB file and compute its md5sum
lf01 = file_factory.create(file01, size=4.0, unit='MB')
digest = lf01.md5sum() # digest => '7f45c62700402ce5f9abe5b8d70d2844'
gkfs_daemon
The gkfs_daemon
fixture returns an instance of
the harness.gkfs.Daemon
class that represents a local running daemon running in an isolated test
workspace. The Daemon
class provides methods to control the daemon process
which are automatically invoked by pytest
when a test starts/finishes to start
up/shut down the daemon required by the test.
The class relies on Python’s sh module to
configure, spawn and monitor a GekkoFS daemon process. The process is executed
independently of the test process and the class sends the appropriate SIGTERM
signal when the test finishes to properly shut down the daemon.
Currently, the daemon is executed with the following configuration:
LD_LIBRARY_PATH
: TheLD_LIBRARY_PATH
variable is set to the contents of$LD_LIBRARY_PATH
when the test is invoked plus any additional paths defined inlibdirs
.GKFS_HOSTS_FILE
: TheGKFS_HOSTS_FILE
variable is set to<twd>/gkfs_hosts.txt
.GKFS_DAEMON_LOG_PATH
: TheGKFS_DAEMON_LOG_PATH
variable is set to<twd>/logs/gkfs_daemon.log
.GKFS_LOG_LEVEL
: TheGKFS_LOG_LEVEL
variable is set to100
.--mountdir,-m
: The--mountdir
CLI argument is set to<twd>/mnt
.--rootdir,-r
: The--rootdir
CLI argument is set to<twd>/root
.--listen,-l
: The--listen
CLI argument is set to an ephemeral network address (IPv4:port) generated from the interface defined provided by the test’sWorkspace
(lo
by default) and a randomly selected unused port in the range[1024, 32768)
.
The Daemon
class exposes the following properties from a running daemon to
tests:
cwd
: the daemon’s current working directory.rootdir
: the daemon’s root directory in the host file system.mountdir
: the daemon’s mount directory in the host file system.logdir
: the directory used by the daemon to store logs.interface
: the daemon’s address used for communication.
gkfs_client
The gkfs_client
fixture returns an instance of
the harness.gkfs.Client
class. The Client
class allows users to execute I/O-related functions from the
glibc (i.e. system calls and library functions) in their own separate processes
directly from Python code. The harness again relies on the sh
module to spawn
the processes with an appropriately patched environment so that they can
communicate successfully with a running daemon.
To simplify usage and reduce coding overhead, the class provides a
special __getattr__()
method that tries to transform any method called on
a Client
instance to a gkfs.io
command. Thus, the following code:
gkfs_client.mkdir("/foobar/", stat.S_IRWXU | stat.S_IRWXG | stat.S_IRWXO)
transforms into:
$ gkfs.io mkdir /foobar/ 0777
{
"errnum": 0,
"retval": 0
}
As shown in the excerpt above, gkfs.io
returns function call results as JSON
records through stdout
, which are deserialized by the IOParser
in io.py
and transformed into a Python’s namedtuple
. Thus, the previous function call
would return the following:
In[1]: gkfs_client.mkdir("/foobar/", stat.S_IRWXU | stat.S_IRWXG | stat.S_IRWXO)
MkdirReturn(retval=0, errno=115)
Warning
Though providing a __getattr__()
method allows for more expressive
tests, it also means that any function name called from an instance of
the Client class
automatically gets transformed into a
gkfs.io
argument. This may cause unexpected errors if the
corresponding subcommand has not yet been implemented in gkfs.io
gkfs_shell
The gkfs_shell
fixture returns an instance of
the harness.gkfs.ShellClient
class. The ShellClient
class allows users to execute shell commands and
scripts in their own separate processes directly from Python test code. The
harness again relies on the sh
module to spawn the processes with an
appropriately patched environment so that they can communicate successfully with
a running daemon.
Single commands
To simplify usage and reduce coding overhead, the class provides a
special __getattr__()
method that tries to transform any method called on
a ShellClient
instance to a bash -c
command. Thus, in the following code:
def test_cp(gkfs_daemon, gkfs_shell, file_factory):
"""Copy a file into gkfs using the shell"""
lf01 = file_factory.create(file01, size=4.0, unit='MB')
# lf01.pathname: '${TWD}/tmp/file01'
# gkfs_daemon.mountdir: '${TWD}/mnt/'
cmd = gkfs_shell.cp(lf01.pathname, gkfs_daemon.mountdir)
assert cmd.exit_code == 0
assert cmd.stdout.decode() == ''
assert cmd.stderr.decode() == ''
gkfs_shell.cp(lf01.pathname, gkfs_daemon.mountdir)
transforms into:
$ LD_LIBRARY_PATH=<XXX> LD_PRELOAD=<YYY> bash -c 'cp ${TWD}/tmp/file01 ${TWD}/mnt/'
where XXX
and YYY
are respectively substituted by the appropriate library
paths and libgfs_intercept.so
library required for the test.
Similarly to the sh
module, the raw output generated by a shell command
executed in isolation (e.g. cp
, stat
, md5sum
, etc.) can be accessed using
the stdout
and stderr
properties as exemplified in the code above. Note,
however, that the output for such commands can be accessed more conveniently as
a Python object using the parsed_stdout
property, provided that an
appropriate CommandParser
has been implemented for the command:
In[1]: gkfs_shell.stat("--terse /tmp/foobar")
statOutput(filename='/tmp/foobar', size=4000000, blocks=0, raw_mode='81b4', uid=1000, gid=1000, device='0', inode=10075480095715127217, hard_links=1, major='0', minor='0', last_access=0, last_modification=0, last_status_change=0, creation=0, transfer_size=524288)
If no command parser is available for the command, a NotImplementedError
will
be raised by the harness.cmd.CommandParser
class.
Complex scripts
For convenience, the ShellClient
class also provides a script()
method that
allows to execute complex chains of commands:
def test_shell_if_e(gkfs_daemon, gkfs_shell, file_factory):
"""
Copy a file into gkfs using the shell and check that it exists using `if [[ -e <file> ]]`.
"""
logger.debug("creating input file")
lf01 = file_factory.create(file01, size=4.0, unit='MB')
logger.debug("copying into gkfs")
cmd = gkfs_shell.cp(lf01.pathname, gkfs_daemon.mountdir)
assert cmd.exit_code == 0
logger.debug("checking if file exists")
cmd = gkfs_shell.script(
f"""
expected_pathname={gkfs_daemon.mountdir / file01}
if [[ -e ${{expected_pathname}} ]];
then
exit 0
fi
exit 1
""")
assert cmd.exit_code == 0
Thanks to Python 3’s f-strings, it
is really simple to expand Python expressions in the shell script, though care
must be taken to escape braces when referencing shell variables in the script
code (e.g. ${{expected_pathname}}
in the excerpts above).
Note that by default, gkfs_shell.script()
will patch the LD_LIBRARY_PATH
and LD_PRELOAD
of the shell executing the script so that it gets intercepted
by the GekkoFS client library. If such intrusive interception is not desired, (
because a test may require a single command of a complex script to be
intercepted), it is possible to disable it by setting the intercept_shell
argument to False
as shown in the code below:
def test_shell_stat_script(gkfs_daemon, gkfs_shell, file_factory):
"""
Copy a file into gkfs using the shell and check that `stat <file> succeeds`
"""
logger.debug("creating input file")
lf01 = file_factory.create(file01, size=4.0, unit='MB')
logger.debug("copying into gkfs")
cmd = gkfs_shell.cp(lf01.pathname, gkfs_daemon.mountdir)
assert cmd.exit_code == 0
logger.debug("checking metadata")
cmd = gkfs_shell.script(
f"""
expected_pathname={gkfs_daemon.mountdir / file01}
{gkfs_shell.patched_environ} stat ${{expected_pathname}}
exit $?
""",
intercept_shell=False)
assert cmd.exit_code == 0
As demonstrated by the code above, the gkfs_shell
class provides
the patched_environ
property to make it simple to provide the appropriate
environment variables to any commands that should be intercepted. Thus, the
above line:
f"""
expected_pathname={gkfs_daemon.mountdir / file01}
{gkfs_shell.patched_environ} stat ${{expected_pathname}}
exit $?
""",
will be transformed by the harness into:
LD_LIBRARY_PATH="/opt/gekkofs/build/tests:/opt/gekkofs/prefix/lib" \
LD_PRELOAD="/opt/gekkofs/build/src/client/libgkfs_intercept.so" \
LIBGKFS_HOSTS_FILE="/tmp/pytest-of-user/pytest-45/test_stat_script0/gkfs_hosts.txt" \
LIBGKFS_LOG="all" LIBGKFS_LOG_OUTPUT="/tmp/pytest-of-user/pytest-45/test_stat_script0/logs/gkfs_client.log" \
stat ${expected_pathname}
Warning
Please note that, unlike for single shell commands, the parsed_stdout
property is not available for complex shell scripts run
using gkfs_shell.script()
. The reason is that it is not possible to
provide a generic parser for all possible scripts. The raw output generated
by a script, if any, can however be accessed using the stdout
and
stderr
properties.
Please note that, unlike for single shell commands, the parsed_stdout
property is not available for complex shell scripts run
using gkfs_shell.script()
. The reason is that it is not possible to provide a
generic parser for all possible scripts. The raw output generated by a script,
if any, can however be accessed using the stdout
and stderr
properties.
Unit tests
TBD