Item 32: Set up a continuous integration (CI) system

A CI system is a mechanism for automatically running tools over your codebase, which is triggered whenever there's a change to the codebase—or a proposed change to the codebase.

The recommendation to set up a CI system is not at all Rust-specific, so this Item is a mélange of general advice mixed with Rust-specific tool suggestions.

CI Steps

Moving to specifics, what kinds of steps should be included in your CI system? The obvious initial candidates are the following:

  • Build the code.
  • Run the tests for the code.

In each case, a CI step should run cleanly, quickly, deterministically, and with a zero false positive rate; more on this in the next section.

The "deterministic" requirement also leads to advice for the build step: use rust-toolchain.toml to specify a fixed version of the toolchain in your CI build.

The rust-toolchain.toml file indicates which version of Rust should be used to build the code—either a specific version (e.g., 1.70), or a channel (stable, beta, or nightly) possibly with an optional date (e.g., nightly-2023-09-19).1 Choosing a floating channel value here would make the CI results vary as new toolchain versions are released; a fixed value is more deterministic and allows you to deal with toolchain upgrades separately.

Throughout this book, various Items have suggested tools and techniques that can help improve your codebase; wherever possible, these should be included with the CI system. For example, the two fundamental parts of a CI system previously mentioned can be enhanced:

  • Build the code.
    • Item 26 describes the use of features to conditionally include different chunks of code. If your crate has features, build every valid combination of features in CI (and realize that this may involve 2N different variants—hence the advice to avoid feature creep).
    • Item 33 suggests that you consider making library code no_std compatible where possible. You can be confident that your code is genuinely no_std compatible only if you test no_std compatibility in CI. One option is to make use of the Rust compiler's cross-compilation abilities and build for an explicitly no_std target (e.g., thumbv6m-none-eabi).
    • Item 21 includes a discussion around declaring a minimum supported Rust version (MSRV) for your code. If you have this, check your MSRV in CI by including a step that tests with that specific Rust version.
  • Run the tests for the code.
    • Item 30 describes the various different styles of test; run all test types in CI. Some test types are automatically included in cargo test (unit tests, integration tests, and doc tests), but other test types (e.g., example programs) may need to be explicitly triggered.

However, there are other tools and suggestions that can help improve the quality of your codebase:

  • Item 29 waxes lyrical about the advantages of running Clippy over your code; run Clippy in CI. To ensure that failures are flagged, set the -Dwarnings option (for example, via cargo clippy -- -Dwarnings).
  • Item 27 suggests documenting your public API; use the cargo doc tool to check that the documentation generates correctly and that any hyperlinks in it resolve correctly.
  • Item 25 mentions tools such as cargo-udeps and cargo-deny that can help manage your dependency graph; running these as a CI step prevents regressions.
  • Item 31 discusses the Rust tool ecosystem; consider which of these tools are worth regularly running over your codebase. For example, running rustfmt / cargo fmt in CI allows detection of code that doesn't comply with your project's style guidelines. To ensure that failures are flagged, set the --check option.

You can also include CI steps that measure particular aspects of your code:

  • Generate code coverage statistics (e.g., with cargo-tarpaulin) to show what proportion of your codebase is exercised by your tests.
  • Run benchmarks (e.g., with cargo bench; Item 30) to measure the performance of your code on key scenarios. However, note that most CI systems run in shared environments where external factors can affect the results; getting more reliable benchmark data is likely to require a more dedicated environment.

These measurement suggestions are a bit more complicated to set up, because the output of a measurement step is more useful when it's compared to previous results. In an ideal world, the CI system would detect when a code change is not fully tested or has an adverse effect on performance; this typically involves integration with some external tracking system.

Here are other suggestions for CI steps that may or may not be relevant for your codebase:

  • If your project is a library, recall (from Item 25) that any checked-in Cargo.lock file will be ignored by the users of your library. In theory, the semantic version constraints (Item 21) in Cargo.toml should mean that everything works correctly anyway; in practice, consider including a CI step that builds without any local Cargo.lock, to detect whether the current versions of dependencies still work correctly.
  • If your project includes any kind of machine-generated resources that are version-controlled (e.g., code generated from protocol buffer messages by prost), then include a CI step that regenerates the resources and checks that there are no differences compared to the checked-in version.
  • If your codebase includes platform-specific (e.g., #[cfg(target_arch = "arm")]) code, run CI steps that confirm that the code builds and (ideally) works on that platform. (The former is easier than the latter because the Rust toolchain includes support for cross-compilation.)
  • If your project manipulates secret values such as access tokens or cryptographic keys, consider including a CI step that searches the codebase for secrets that have been inadvertently checked in. This is particularly important if your project is public (in which case it may be worth moving the check from CI to a version-control presubmit check).

CI checks don't always need to be integrated with Cargo and the Rust toolchains; sometimes a simple shell script can give more bang for the buck, particularly when a codebase has a local convention that's not universally followed. For example, a codebase might include a convention that any panic-inducing method invocation (Item 18) has a special marker comment or that every TODO: comment has an owner (a person or a tracking ID), and a shell script is ideal for checking this.

Finally, consider examining the CI systems of public Rust projects to get ideas for additional CI steps that might be useful for your project. For example, Cargo has a CI system that includes many steps that may provide inspiration.

CI Principles

Moving from the specific to the general, there are some overall principles that should guide the details of your CI system.

The most fundamental principle is don't waste the time of humans. If a CI system unnecessarily wastes people's time, they will start looking for ways to avoid it.

The most annoying waste of an engineer's time is a flaky test: sometimes it passes and sometimes it fails, even when the setup and codebase are identical. Whenever possible, be ruthless with flaky tests: hunt them down, and put in the time up front to investigate and fix the cause of the flakiness—it will pay for itself in the long run.

Another common waste of engineering time is a CI system that takes a long time to run and that runs only after a request for a code review has been triggered. In this situation, there's the potential to waste two people's time: both the author and also the code reviewer, who may spend time spotting and pointing out issues with the code that the CI bots could have flagged.

To help with this, try to make it easy to run the CI checks manually, independent from the automated system. This allows engineers to get into the habit of triggering them regularly so that code reviewers never even see problems that the CI would have flagged. Better still, make the integration even more continuous by incorporating some of the tools into your editor or IDE setup so that (for example) poorly formatted code never even makes it to disk.

This may also require splitting the checks up if there are time-consuming tests that rarely find problems but are there as a backstop to prevent obscure scenarios breaking.

More generally, a large project may need to divide up its CI checks according to the cadence at which they are run:

  • Checks that are integrated into each engineer's development environment (e.g., rustfmt)
  • Checks that run on every code review request (e.g., cargo build, cargo clippy) and are easy to run manually
  • Checks that run on every change that makes it to the main branch of the project (e.g., full cargo test in all supported environments)
  • Checks that run at scheduled intervals (e.g., daily or weekly), which can catch rare regressions after the fact (e.g., long-running integration tests and benchmark comparison tests)
  • Checks that run on the current code at all times (e.g., fuzz tests)

It's important that the CI system be integrated with whatever code review system is used for your project so that a code review can clearly see a green set of checks and be confident that its code review can focus on the important meaning of the code, not on trivial details.

This need for a green build also means that there can be no exceptions to whatever checks your CI system has put in place. This is worthwhile even if you have to work around an occasional false positive from a tool; once your CI system has an accepted failure ("Oh, everyone knows that test never passes"), then it's vastly harder to spot new regressions.

Item 30 included the common advice of adding a test to reproduce a bug, before fixing the bug. The same principle applies to your CI system: when you discover process problems add a CI step that detects a process issue, before fixing the issue. For example, if you discover that some auto-generated code has gotten out of sync with its source, add a check for this to the CI system. This check will initially fail but then turn green once the problem is solved—giving you confidence that this category of process error will not occur again in the future.

Public CI Systems

If your codebase is open source and visible to the public, there are a few extra things to think about with your CI system.

First is the good news: there are lots of free, reliable options for building a CI system for open source code. At the time of writing, GitHub Actions are probably the best choice, but it's far from the only choice, and more systems appear all the time.

Second, for open source code it's worth bearing in mind that your CI system can act as a guide for how to set up any prerequisites needed for the codebase. This isn't a concern for pure Rust crates, but if your codebase requires additional dependencies—databases, alternative toolchains for FFI code, configuration, etc.—then your CI scripts will be an existence proof of how to get all of that working on a fresh system. Encoding these setup steps in reusable scripts allows both the humans and the bots to get a working system in a straightforward way.

Finally, there's bad news for publicly visible crates: the possibility of abuse and attacks. This can range from attempts to perform cryptocurrency mining in your CI system to theft of codebase access tokens, supply chain attacks, and worse. To mitigate these risks, consider these guidelines:

  • Restrict access so that CI scripts run automatically only for known collaborators and have to be triggered manually for new contributors.
  • Pin the versions of any external scripts to particular versions, or (better yet) specific known hashes.
  • Closely monitor any integration steps that need more than just read access to the codebase.

1

If your code relies on particular features that are available only in the nightly compiler, a rust-toolchain.toml file also makes that toolchain dependency clear.