Skipping Tests in Rust

Handling environment changes for code under test

I’ve been coding in Rust for various projects since around June 2016. I helped developed and maintain one project that’s been in production since December 2016 and, due to Rust’s ability to help make robust software, has only had a handful of issues since then. Most of them have just been issues with network services adding enum variants in JSON payloads failing to deserialize on our side (easy to fix) and authentication mechanism updates (not so easy). I attribute much of this project’s success to Rust’s explicit error handling and the ability to make robust test suites. However, one thing that is missing in Rust’s built-in test suite is a “skip” result.

Feedback

I’ve posted this on Reddit. Please keep discussion there rather than ganging up on the pre-RFC thread (feel free to subscribe though). I’ll summarize any discussion from there before posting it to the pre-RFC thread again.

Rust test functions

Currently, tests can only have three results:

  • pass
  • fail
  • timeout (which pretty useless in practice as there’s no way to cancel such a test)

This works for the most part, however there is another result a test can have. Certain environments might be set up in such a way that no useful result can be derived from the code. In this case, none of these results are suitable. Passing indicates that everything is OK even though the test could not actually perform its purpose. Failing is not great because the test isn’t failing, but putting failures into CI is not a good way to do development. A timeout also isn’t ideal in any circumstance.

Solutions with stable today

There are a few ways of handling this situation today in stable Rust. Neither of them are really all that great however.

Build configuration

Rust does provide ways of doing this if the condition can be accurately determined at build time by using build.rs to inject features that then can be used to compile the test out completely via a #[cfg] attribute. However, not every possible condition is decidable at build time. If a test depends on external executables, data files, network access, or other situations, lacking these is a situation that can change without rebuilding the project. Indeed, even trying to add these things into the rerun-if-changed to force a rebuild if any of them change is problematic because nothing is actually going to change in the compiled binary if, for example, /usr/bin/git is updated.

Passing with a message

The best solution today is to just pass the test in such cases and print out a message that its passing is not actually indicative of what was intended to be tested. With cargo test suppressing output by default, these messages are not shown by default. This leaves this in a situation where the expected result shows as “all good” when in reality, there’s a hidden message saying “yeah…that won’t work here”.

Prior art

Rust is not breaking new ground here. Here is a list of test harness runners which support skipping tests (many of which are courtesy of others in the pre-RFC thread):

  • CMake supports it via return code or matching the test’s output. CMake does not support in-target testing, so it only supports runtime detection of skipping tests.
  • pytest supports it via decorators with predicates performed at runtime or by throwing specific exceptions.
  • Ruby’s RSpec is similar to pytest with analogous features (just spelled differently).
  • Xcode’s XCTest detects test functions which throw XCTSkip to let them skip.
  • The TAP (Test Anything Protocol) supports skipping via skip directives. Of potential interest is a way to support skipping entire modules at once.
  • Automake and Autoconf detect tests which return the special return code 77 as indicating a “skip”.

Use cases

Here’s a selection of use cases that I have personally run into before where being able to dynamically skip a test is just about the only way to do things reliably. Note that not all of these use cases are from Rust-using projects, but nothing here is specific to their implementation language or test harness.

OpenGL driver support

Some of the projects I work on are OpenGL dependent and some of the tests require better driver support in order to actually do what they need. One case which comes to mind is the amount of memory available for textures is extremely high for a small selection of tests. Detecting this at build time is not possible because one would need to build some code to perform the check to know whether the test should be added or not. This also means that if someone upgrades (or downgrade) the video card driver in use, they would need to rebuild the project in order to make this test “active” again.

Kernel API support

One crate I work on is intimately tied to the Linux kernel API (as it is Rust bindings for the API). The problem is that any given development machine may or may not actually have specific parts of the API available. Currently, in this case, there is a “canary” test that fails if the rest of the tests in the module will fail due to an API not being available.

Network connectivity

Another project I work on does lots of Git operations during its test suite. Some of these need to pull submodules in order to ensure that the submodule-wrangling logic is working. This requires an active network connection to work reliably. It would be nice if the test could detect that the submodule cloning is going to fail for such reasons.

File permissions

Another project I work on has file permission tests as part of its test suite. Basically, it sets up the directory as it should work and then removes read permissions from a file and then ensures that the code handles it as expected (due to Rust’s explicit error handling, the test was added in an interest in poking as many error code paths as possible). When we started using GitLab-CI to run the test suite, this test started failing for some reason. Eventually, it was noticed that the CI was running as root. One interesting quirk of root is that file permissions basically don’t exist for the user (missing write permissions can cause programs to balk at writing blindly, but nothing fundamentally prevents it in general). The test started failing because the mechanism for triggering the problem wasn’t effective.

Possible solutions

I’ve written a pre-RFC describing this problem and there are a few possible solutions. It seems that at most only one is really going to get in, so I would like to get the input from others in the community to know whether to continue with it or not. My main issue is that if this ends up having the test crate expose the (only) mechanism for skipping, I don’t think this is very useful since the test crate has no known path to stabilization and I don’t know that I have the motivation to have such a useful feature to be forever locked behind a nightly feature gate. However, if I’m wrong about that or there’s enough interest in a non-test crate mechanism, I’m much more interested in pushing this forward.

#[skip_if]

The first, and primary, solution I’ve considered is to have a new #[skip_if(predicate = func)] attribute on #[test] (and #[benchmark]) functions.

fn is_root() -> Option<String> { // The signature for "skip predicates"
    if libc::getuid() == 0 {
        Some("`root` cannot have permissions removed".into())
    } else {
        None
    }
}

#[test]
#[skip_if(predicate = "is_root")]
fn test_no_permissions() {
    // test code
}

Pros:

  • can be made stable independent of the test crate becoming stable
  • it’s declarative and up front on the test function
  • I have a proof-of-concept implementation which still needs some tweaking, but is at least headed in a good direction
  • stacking is possible (though not done in the PoC)
  • test code is unchanged

Cons:

  • each test must be decorated with the skip predicates
  • skipping must be known at the test level rather than “far away” (violating separation of concerns)
  • complicated implementation (parsing the attribute, adding it to the internal test type structure, etc.)

panic_any!(Skip)

The test crate could also export a Skip type which, if given to panic_any!() would indicate that the test should be skipped rather than fail (or pass with should_panic).

#[test]
fn test_no_permissions() {
    if libc::getuid() == 0 {
        panic_any!(test::Skip("`root` cannot have permissions removed"));
        // possible alternative: `test::skip(reason);`
    }

    // test code
}

Pros:

  • any code the test calls can skip
  • test code is otherwise unchanged
  • fairly easy to implement

Cons:

  • requires stabilization of the test crate
  • tests performing panic:catch_unwind will need to handle this type specifically and (most likely) re-panic with it again after whatever cleanup they may need to do is complete

Thread-local storage

In the interest of completion, I’ve added this mechanism to the list, though I think it is definitely the least viable of the solutions. Basically, there would be a thread-local variable which would indicate that the test should be skipped. It would be reset before each test and then checked at the end to see whether the test skipped or not.

#[test]
fn test_no_permissions() {
    if libc::getuid() == 0 {
        test::skip("`root` cannot have permissions removed");
        return;
    }

    // test code
}

Pros:

  • very simple, probably as easy to implement as the panic_any!() mechanism
  • test signatures don’t have to change

Cons:

  • requires test crate stabilization
  • this brings back shadows of the “tests which spawn threads leak output” problem that was recently solved due to thread-local state being lost across that boundary within test-created threads
  • have to remember to return; manually after this happens (or run code which
  • no way to “revoke” the state (without another test::unskip() API)
  • what happens if called multiple times with different reasons?
  • requires thread-local storage which (IIRC) is not available on every platform, namely embedded

impl Termination

Another tactic could be to steal a page from the Automake, Autoconf, and CMake prior art and look at the return code from the test function and treat one of them as “special”. Historically, I’ve used 125 in CMake code due to it being git bisect run’s way to indicate that a commit should be skipped. However, 77 is probably a better one to use if there’s going to be just one based on the prior art of Automake and Autoconf. CMake allows any return code to be interpreted as a “skip”, and there’s no default, so any return code is valid there.

struct Skip {}

impl Termination for Skip {
    fn report(self) -> i32 {
        77
    }
}

#[test]
fn test_no_permissions() -> Result<(), Skip> {
    if libc::getuid() == 0 {
        return Err(Skip {});
    }

    // test code

    Ok(()) // Needed to satisfy the return type
}

Pros:

  • custom types can be provided to do this (with extra context about why it is skipping)
  • cuts a middle ground between “the test function knows it can be skipped” and separation of concerns about determining that skip status
  • knowing that some subset of the test can skip, other mitigations can be performed before other logic intervenes
  • probably the simplest implementation of any of these solutions (though extracting a “reason” from it is more complicated)

Cons:

  • impl Termination would need stabilized (though this doesn’t seem nearly as big a blocker as the test crate)
  • no easy way to communicate why a test was skipped, though perhaps some subset of the output could be used for this in the cargo test output
  • does not compose well with tests already using Result<_, E> return types
  • that Ok(()) is a bit unfortunate, but I believe there is an RFC which would allow this to be elided (though I can’t find it right now; will update if found or someone links it to me)

After writing more about it, this one actually seems the most Rust-like because it feels like error handling. Functions can either handle the error and change the logic or bubble it up to the caller to deal with. Since the type which will trigger the skip can be defined locally, any context that might be useful can be passed around as well. For example, if root is detected, maybe (if the platform supports it) a read-only tmpfs mount could be made to then give to the rest of the test to actually enforce the “no permissions” state. If this fails, it is then up to the test to either decide whether this is fatal or to instead say “nope, this isn’t testable right now” and skip.

Conclusion

After writing this post, I think I’ve moved from the “attribute” as my preferred solution to the impl Termination as my preferred solution. The only thing missing is the “reason”, but we are already missing that in failure states anyways (other than a backtrace or other error messages), so maybe it’s not that big of a deal.