Item 30: Write more than unit tests
"All companies have test environments.
The lucky ones have production environments separate from the test environment." – @FearlessSon
Like most other modern languages, Rust includes features that make it easy to write tests that live alongside your code and that give you confidence that the code is working correctly.
This isn't the place to expound on the importance of tests; suffice it to say that if code isn't tested, it probably doesn't work the way you think it does. So this Item assumes that you're already signed up to write tests for your code.
Unit tests and integration tests, described in the next two sections, are the key forms of tests. However, the Rust toolchain, and extensions to the toolchain, allow for various other types of tests. This Item describes their distinct logistics and rationales.
Unit Tests
The most common form of test for Rust code is a unit test, which might look something like this:
#![allow(unused)] fn main() { // ... (code defining `nat_subtract*` functions for natural // number subtraction) #[cfg(test)] mod tests { use super::*; #[test] fn test_nat_subtract() { assert_eq!(nat_subtract(4, 3).unwrap(), 1); assert_eq!(nat_subtract(4, 5), None); } #[should_panic] #[test] fn test_something_that_panics() { nat_subtract_unchecked(4, 5); } } }
Some aspects of this example will appear in every unit test:
- A collection of unit test functions.
- Each test function is marked with the
#[test]
attribute. - The module holding the test functions is annotated with a
#[cfg(test)]
attribute, so the code gets built only in test configurations.
Other aspects of this example illustrate things that are optional and may be relevant only for particular tests:
- The test code here is held in a separate module, conventionally called
tests
ortest
. This module may be inline (as here) or held in a separate tests.rs file. Using a separate file for the test module has the advantage that it's easier to spot whether code that uses a function is test code or "real" code. - The test module might have a wildcard
use super::*
to pull in everything from the parent module under test. This makes it more convenient to add tests (and is an exception to the general advice in Item 23 to avoid wildcard imports). - The normal visibility rules for modules mean that a unit test has the ability to use anything from the parent module,
whether it is
pub
or not. This allows for "open-box" testing of the code, where the unit tests exercise internal features that aren't visible to normal users. - The test code makes use of
expect()
orunwrap()
for its expected results. The advice in Item 18 isn't really relevant for test-only code, wherepanic!
is used to signal a failing test. Similarly, the test code also checks expected results withassert_eq!
, which will panic on failure. - The code under test includes a function that panics on some kinds of invalid input; to exercise that, there's a unit test function that's marked with the
#[should_panic]
attribute. This might be needed when testing an internal function that normally expects the rest of the code to respect its invariants and preconditions, or it might be a public function that has some reason to ignore the advice in Item 18. (Such a function should have a "Panics" section in its doc comment, as described in Item 27.)
Item 27 suggests not documenting things that are already expressed by the type system. Similarly, there's no need to
test things that are guaranteed by the type system. If your enum
types start holding values that aren't in the
list of allowed variants, you've got bigger problems than a failing unit test!
However, if your code relies on specific functionality from your dependencies, it can be helpful to include basic tests of that functionality. The aim here is not to repeat testing that's already done by the dependency itself but instead to have an early warning system that indicates whether the behavior that you need from the dependency has changed—separately from whether the public API signature has changed, as indicated by the semantic version number (Item 21).
Integration Tests
The other common form of test included with a Rust project is integration tests, held under
tests/
. Each file in that directory is run as a separate test program that executes all of the functions marked with
#[test]
.
Integration tests do not have access to crate internals and so act as behavior tests that can exercise only the public API of the crate.
Doc Tests
Item 27 described the inclusion of short code samples in documentation comments, to illustrate the use of a particular
public API item. Each such chunk of code is enclosed in an implicit fn main() { ... }
and run as part of
cargo test
, effectively making it an additional test case for your code, known as a doc test. Individual
tests can also be executed selectively by running cargo test --doc <item-name>
.
Regularly running tests as part of your CI environment (Item 32) ensures that your code samples don't drift too far from the current reality of your API.
Examples
Item 27 also described the ability to provide example programs that exercise your public API. Each Rust file under
examples/
(or each subdirectory under examples/
that includes a main.rs
) can be run as a standalone binary
with cargo run --example <name>
or cargo test --example <name>
.
These programs have access to only the public API of your crate and are intended to illustrate the use of your API as a
whole. Examples are not specifically designated as test code (no #[test]
, no #[cfg(test)]
), and they're a poor
place to put code that exercises obscure nooks and crannies of your crate—particularly as examples are not
run by cargo test
by default.
Nevertheless, it's a good idea to ensure that your CI system (Item 32) builds and runs all
the associated examples for a crate (with cargo test --examples
), because it can act as a good early warning system
for regressions that are likely to affect lots of users. As noted, if your examples demonstrate mainline use of
your API, then a failure in the examples implies that something significant is wrong:
- If it's a genuine bug, then it's likely to affect lots of users—the very nature of example code means that users are likely to have copied, pasted, and adapted the example.
- If it's an intended change to the API, then the examples need to be updated to match. A change to the API also implies a backward incompatibility, so if the crate is published, then the semantic version number needs a corresponding update to indicate this (Item 21).
The likelihood of users copying and pasting example code means that it should have a different style than test code. In
line with Item 18, you should set a good example for your users by avoiding unwrap()
calls for
Result
s. Instead, make each example's main()
function return something like Result<(), Box<dyn Error>>
, and then use the question mark operator throughout (Item 3).
Benchmarks
Item 20 attempts to persuade you that fully optimizing the performance of your code isn't always necessary. Nevertheless, there are definitely times when performance is critical, and if that's the case, then it's a good idea to measure and track that performance. Having benchmarks that are run regularly (e.g., as part of CI; Item 32) allows you to detect when changes to the code or the toolchains adversely affect that performance.
The cargo bench
command runs special
test cases that repeatedly perform an operation, and emits average timing information for the operation. At the time of
writing, support for benchmarks is not stable, so the precise command may need to be cargo +nightly bench
. (Rust's
unstable features, including the test
feature
used here, are described in the Rust Unstable
Book.)
However, there's a danger that compiler optimizations may give misleading results, particularly if you restrict the operation that's being performed to a small subset of the real code. Consider a simple arithmetic function:
#![allow(unused)] fn main() { pub fn factorial(n: u128) -> u128 { match n { 0 => 1, n => n * factorial(n - 1), } } }
A naive benchmark for this code:
#![allow(unused)] #![feature(test)] fn main() { extern crate test; #[bench] fn bench_factorial(b: &mut test::Bencher) { b.iter(|| { let result = factorial(15); assert_eq!(result, 1_307_674_368_000); }); } }
gives incredibly positive results:
test bench_factorial ... bench: 0 ns/iter (+/- 0)
With fixed inputs and a small amount of code under test, the compiler is able to optimize away the iteration and directly emit the result, leading to an unrealistically optimistic result.
The std::hint::black_box
function can help with this; it's an identity function whose implementation the compiler is "encouraged, but not
required" (their italics) to pessimize.
Moving the benchmark code to use this hint:
#[bench]
fn bench_factorial(b: &mut test::Bencher) {
b.iter(|| {
let result = factorial(std::hint::black_box(15));
assert_eq!(result, 1_307_674_368_000);
});
}
gives more realistic results:
test blackboxed::bench_factorial ... bench: 16 ns/iter (+/- 3)
The Godbolt compiler explorer can also help by showing the actual machine code emitted by the compiler, which may make it obvious when the compiler has performed optimizations that would be unrealistic for code running a real scenario.
Finally, if you are including benchmarks for your Rust code, the criterion
crate may provide an alternative to the standard
test::bench::Bencher
functionality that is more convenient (it runs with stable Rust) and more fully featured (it has support for statistics and graphs).
Fuzz Testing
Fuzz testing is the process of exposing code to randomized inputs in the hope of finding bugs, particularly crashes that result from those inputs. Although this can be a useful technique in general, it becomes much more important when your code is exposed to inputs that may be controlled by someone who is deliberately trying to attack the code—so you should run fuzz tests if your code is exposed to potential attackers.
Historically, the majority of defects in C/C++ code that have been exposed by fuzzers have been memory safety problems, typically found by combining fuzz testing with runtime instrumentation (e.g., AddressSanitizer or ThreadSanitizer) of memory access patterns.
Rust is immune to some (but not all) of these memory safety problems, particularly when there is no unsafe
code
involved (Item 16). However, Rust does not prevent bugs in general, and a code path that triggers a panic!
(see Item 18) can still result in a denial-of-service (DoS) attack on the codebase as a whole.
The most effective forms of fuzz testing are coverage-guided: the test infrastructure monitors which parts of the code
are executed and favors random mutations of the inputs that explore new code paths. "American fuzzy lop"
(AFL) was the original heavyweight champion of this technique, but in more recent
years equivalent functionality has been included in the LLVM toolchain as
libFuzzer
.
The Rust compiler is built on LLVM, and so the cargo-fuzz
subcommand
exposes libFuzzer
functionality for Rust (albeit for only a limited number of platforms).
The primary requirement for a fuzz test is to identify an entrypoint of your code that takes (or can be adapted to take) arbitrary bytes of data as input:
#![allow(unused)] fn main() { /// Determine if the input starts with "FUZZ". pub fn is_fuzz(data: &[u8]) -> bool { if data.len() >= 3 /* oops */ && data[0] == b'F' && data[1] == b'U' && data[2] == b'Z' && data[3] == b'Z' { true } else { false } } }
With a target entrypoint identified, the Rust Fuzz Book gives instructions on how to arrange the fuzzing subproject. At its core is a small driver that connects the target entrypoint to the fuzzing infrastructure:
// fuzz/fuzz_targets/target1.rs file
#![no_main]
use libfuzzer_sys::fuzz_target;
fuzz_target!(|data: &[u8]| {
let _ = somecrate::is_fuzz(data);
});
Running cargo +nightly fuzz run target1
continuously executes the fuzz target with random data, stopping only if a
crash is found. In this case, a failure is found almost immediately:
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 1607525774
INFO: Loaded 1 modules: 1624 [0x108219fa0, 0x10821a5f8),
INFO: Loaded 1 PC tables (1624 PCs): 1624 [0x10821a5f8,0x108220b78),
INFO: 9 files found in fuzz/corpus/target1
INFO: seed corpus: files: 9 min: 1b max: 8b total: 46b rss: 38Mb
#10 INITED cov: 26 ft: 26 corp: 6/22b exec/s: 0 rss: 39Mb
thread panicked at 'index out of bounds: the len is 3 but the index is 3',
testing/src/lib.rs:77:12
stack backtrace:
0: rust_begin_unwind
at /rustc/f77bfb7336f2/library/std/src/panicking.rs:579:5
1: core::panicking::panic_fmt
at /rustc/f77bfb7336f2/library/core/src/panicking.rs:64:14
2: core::panicking::panic_bounds_check
at /rustc/f77bfb7336f2/library/core/src/panicking.rs:159:5
3: somecrate::is_fuzz
4: _rust_fuzzer_test_input
5: ___rust_try
6: _LLVMFuzzerTestOneInput
7: __ZN6fuzzer6Fuzzer15ExecuteCallbackEPKhm
8: __ZN6fuzzer6Fuzzer6RunOneEPKhmbPNS_9InputInfoEbPb
9: __ZN6fuzzer6Fuzzer16MutateAndTestOneEv
10: __ZN6fuzzer6Fuzzer4LoopERNSt3__16vectorINS_9SizedFileENS_
16fuzzer_allocatorIS3_EEEE
11: __ZN6fuzzer12FuzzerDriverEPiPPPcPFiPKhmE
12: _main
and the input that triggered the failure is emitted.
Normally, fuzz testing does not find failures so quickly, and so it does not make sense to run fuzz tests as part of your CI. The open-ended nature of the testing, and the consequent compute costs, mean that you need to consider how and when to run fuzz tests—perhaps only for new releases or major changes, or perhaps for a limited period of time.1
You can also make subsequent runs of the fuzzing infrastructure more efficient, by storing and reusing a corpus of previous inputs that the fuzzer found to explore new code paths; this helps subsequent runs of the fuzzer explore new ground, rather than retesting code paths previously visited.
Testing Advice
An Item about testing wouldn't be complete without repeating some common advice (which is mostly not Rust-specific):
- As this Item has endlessly repeated, run all your tests in CI on every change (with the exception of fuzz tests).
- When you're fixing a bug, write a test that exhibits the bug before fixing the bug. That way you can be sure that the bug is fixed and that it won't be accidentally reintroduced in the future.
- If your crate has features (Item 26), run tests over every possible combination of available features.
- More generally, if your crate includes any config-specific code (e.g.,
#[cfg(target_os = "windows")]
), run tests for every platform that has distinct code.
This Item has covered a lot of different types of tests, so it's up to you to decide how much each of them is relevant and worthwhile for your project.
If you have a lot of test code and you are publishing your crate to crates.io
, then you
might need to consider which of the tests make sense to include in the published crate. By default, cargo
will
include unit tests, integration tests, benchmarks, and examples (but not fuzz tests, because the cargo-fuzz
tools
store these as a separate crate in a subdirectory), which may be more than end users need. If that's the case, you can
either exclude
some of the
files or (for behavior tests) move the tests out of the crate and into a separate test crate.
Things to Remember
- Write unit tests for comprehensive testing that includes testing of internal-only code.
Run them with
cargo test
. - Write integration tests to exercise your public API.
Run them with
cargo test
. - Write doc tests that exemplify how to use individual items in your public API.
Run them with
cargo test
. - Write example programs that show how to use your public API as a whole.
Run them with
cargo test --examples
orcargo run --example <name>
. - Write benchmarks if your code has significant performance requirements.
Run them with
cargo bench
. - Write fuzz tests if your code is exposed to untrusted inputs.
Run them (continuously) with
cargo fuzz
.
If your code is a widely used open source crate, the Google OSS-Fuzz program may be willing to run fuzzing on your behalf.