Item 28: Use macros judiciously

"In some cases it's easy to decide to write a macro instead of a function, because only a macro can do what's needed" – Paul Graham, "On Lisp"

Rust's macro systems allow you perform metaprogramming: to write code that emits code into your project. This is most valuable when there are chunks of "boilerplate" code that are deterministic and repetitive, and which would otherwise need to be kept in sync manually.

The macros that programmers coming to Rust are most likely to have previously encountered are those provided by C/C++'s preprocessor. However, the Rust approach is a completely different beast – where the C preprocessor performs textual substitution on the tokens of the input text, Rust macros instead operate on the abstract syntax tree (AST) of the program.

This means that Rust macros can be aware of code structure and can consequently avoid entire classes of macro-related footguns. In particular, Rust macros are hygienic – they cannot accidentally refer to ("capture") local variables in the surrounding code.

One way to think about macros is see it as a different level of abstraction in the code. A simple form of abstraction is a function: it abstracts away the differences between different values of the same type, with implementation code that can use any of the features and methods of that type, regardless of the current value being operated on. A generic is a different level of abstraction : it abstracts away the difference between different types that satisfy a trait bound, with implementation code that can use any of the methods provided by the trait bounds, regardless of the current type being operated on.

A macro abstracts away the difference between different chunks of the AST that play the same role (type, identifier, expression, etc.); the implementation can then include any code that makes use of those chunks in the same AST role.

Macro Basics

Although this Item isn't the place to reproduce the documentation for macros, a few reminders of details to watch out for are in order.

First, be aware that the scoping rules for using a macro are different than for other Rust items. If a declarative macro is defined in a source code file, only the code after the macro definition can make use of it:

fn before() {
    println!("double {} is {}", 2, double!(2));
}

macro_rules! double {
    { $e:expr } => { $e * $e }
}

fn after() {
    println!("double {} is {}", 2, double!(2));
}
error: cannot find macro `double` in this scope
 --> macros/src/main.rs:4:36
  |
4 |     println!("double {} is {}", 2, double!(2));
  |                                    ^^^^^^
  |
  = help: have you added the `#[macro_use]` on the module/import?

The #[macro_export] attribute makes a macro more widely visible, but this also has an oddity: a macro appears at the top level of a crate, even if it defined in a module:

#![allow(unused)]
fn main() {
mod defn {
    #[macro_export]
    macro_rules! treble {
        { $e:expr } => { $e * $e * $e }
    }
}

mod user {
    pub fn use_macro() {
        // Note: *not* `crate::defn::treble!`
        let cubed = crate::treble!(3);
        println!("treble {} is {}", 3, cubed);
    }
}
}

Procedural macros (which are macros that get access to the program's syntax tree at compile time), also have a limitation around code location, in that they must be defined in a separate crate from where they are used.

Even though Rust's macros are safer than C preprocessor macros, there are still a couple of minor gotchas to be aware of in their use.

The first is to realize that even if a macro invocation looks like a function invocation, it's not. In particular, the normal intuition about whether parameters are moved or &referred to doesn't apply:

    let mut x = Item { contents: 42 }; // type is not `Copy`
    inc!(x); // Item is *not* moved, despite the (x) syntax, and *can* be modified
    println!("x is {:?}", x);

The exclamation mark serves as a warning: the expanded code for the macro may do arbitrary things to/with its arguments.

The expanded code can also include control flow operations that aren't visible in the calling code, whether they be loops, conditionals, return statements, or use of the ? operator. Obviously, this is likely to violate the principle of least astonishment, so prefer macros whose behaviour aligns with normal Rust where possible.

For example, a macro that silently includes a return in its body:

/// Check that an HTTP status is successful; exit function if not.
macro_rules! check_successful {
    { $e:expr } => {
        if $e.group() != Group::Successful {
            return Err(MyError("HTTP operation failed"));
        }
    }
}

makes the control flow of the calling code somewhat obscure:

    let rc = perform_http_operation();
    check_successful!(rc); // may silently exit the function

    // ...

An alternative version of the macro that generates code which emits a Result:

/// Convert an HTTP status into a `Result<(), MyError>` indicating success.
macro_rules! check_success {
    { $e:expr } => {
        match $e.group() {
            Group::Successful => Ok(()),
            _ => Err(MyError("HTTP operation failed")),
        }
    }
}

gives calling code that's easier to follow:

    let rc = perform_http_operation();
    check_success!(rc)?; // error flow is visible via `?`

    // ...

The second thing to watch out for with declarative macros is a problem shared with the C preprocessor: if the argument to a macro is an expression with side effects, watch out for repeated use of the argument in the macro:

    let mut x = 1;
    let y = double!({
        x += 1;
        x
    });
    println!("x = {}, y = {}", x, y);
    // output: x = 3, y = 6

Assuming that this behaviour isn't intended, the solution is simply to evaluate the expression once, and assign the result to a local variable:

macro_rules! double_once {
    { $e:expr } => { { let x = $e; x*x } }
}
// output now: x = 2, y = 4

When to use Macros

The primary reason to use macros is to avoid repetitive code. In this respect, writing a macro is just an extension of the same kind of generalization process that normally forms part of programming:

  • If you repeat exactly the same code for multiple instances a specific type, encapsulate that code into a common function and call the function from all of the repeated places.
  • If you repeat exactly the same code for multiple different types, encapsulate that code into a generic and trait bound, and use the generic from all of the repeated places.
  • If you repeat the same structure of code in multiple different places, encapsulate that code into a macro, and use the macro from all of the repeated places.

For example, avoiding repetition for code that works on different enum variants can only be done by a macro:

#![allow(unused)]
fn main() {
enum Multi {
    Byte(u8),
    Int(i32),
    Str(String),
}

#[macro_export]
macro_rules! values_of_type {
    { $values:expr, $variant:ident } => {
        {
            let mut result = Vec::new();  // explicit use of Vec allows type deduction
            for val in $values {
                if let Multi::$variant(v) = val {
                    result.push(v.clone());
                }
            }
            result
        }
    }
}

fn use_multi() {
    let values = vec![
        Multi::Byte(1),
        Multi::Int(1000),
        Multi::Str("a string".to_string()),
        Multi::Byte(2),
    ];
    let ints = values_of_type!(&values, Int);
    assert_eq!(ints, vec![1000]);
    let bytes = values_of_type!(&values, Byte);
    assert_eq!(bytes, vec![1u8, 2u8]);
}
}

Macros also allow for grouping all of the key information about a collection of data values together:

http_codes! {
    Continue => (100, Informational, "Continue"),
    SwitchingProtocols => (101, Informational, "Switching Protocols"),
    // ...
    Ok => (200, Successful, "Ok"),
    Created => (201, Successful, "Created"),
    // ...
}

Only the information that changes between different values is encoded, in a compact form that acts as a kind of domain-specific language (DSL) holding the source-of-truth for the data. The macro definition then takes care of generating all of the code that derives from these values:

#![allow(unused)]
fn main() {
macro_rules! http_codes {
    { $( $name:ident => ($val:literal, $group:ident, $text:literal), )+ } => {
        #[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
        #[repr(i32)]
        enum Status {
            $( $name = $val, )+
        }
        impl Status {
            fn group(&self) -> Group {
                match self {
                    $( Self::$name => Group::$group, )+
                }
            }
            fn text(&self) -> &'static str {
                match self {
                    $( Self::$name => $text, )+
                }
            }
        }
        impl TryFrom<i32> for Status {
            type Error = ();
            fn try_from(v: i32) -> Result<Self, Self::Error> {
                match v {
                    $( $val => Ok(Self::$name), )+
                    _ => Err(())
                }
            }
        }
    }
}
}

If an extra value needs to be added later, rather than having to manually adjust four different places, all that's needed is a single additional line:

    ImATeapot => (418, ClientError, "I'm a teapot"),

Because macros are expanded in-place in the invoking code, they can also be used to automatically emit additional diagnostic information – in particular, by using the file!() and line!() macros from the standard library that emit source code information:

#![allow(unused)]
fn main() {
macro_rules! diags {
    { $e:expr } => {
        {
            let result = $e;
            if let Err(err) = &result {
                log::error!("{}:{}: operation '{}' failed: {:?}",
                            file!(),
                            line!(),
                            stringify!($e),
                            err);
            }
            result
        }
    }
}
}

When failures occur, the log file then automatically includes details of what failed and where:

#![allow(unused)]
fn main() {
    let x: Result<u8, _> = diags!(512.try_into());
    let y = diags!(std::str::from_utf8(b"\xc3\x28")); // invalid UTF-8
}
[2023-04-16T08:54:14Z ERROR macros] macros/src/main.rs:239: operation '512.try_into()' failed: TryFromIntError(())
[2023-04-16T08:54:14Z ERROR macros] macros/src/main.rs:240: operation 'std::str::from_utf8(b"\xc3\x28")' failed: Utf8Error { valid_up_to: 0, error_len: Some(1) }

Disadvantages of Macros

The primary disadvantage of using a macro is the impact that it has on code readability and maintainability. The previous section explained that macros allow you to create a domain-specific language to concisely express key features of your code and data. However, this means that anyone reading or maintaining the code now has to understand this DSL – and its implementation in macro definitions – in addition to understanding Rust.

This potential impenetrability of macro-based code extends beyond other engineers: various of the tools that analyze and interact with Rust code may treat the code as opaque, because it no longer necessarily follows the syntactical conventions of Rust code. Even the compiler itself is less helpful: its error messages don't always follow the chain of macro use and definition.

Another possible downside for macro use is the possibility of code bloat – a single line of macro invocation can result in hundreds of lines of generated code, which will be invisible to a cursory survey of the code. This is rarely a problem when the code is first written, because at that point the code is needed and saves the humans involved from having to write it themselves; however, if the code subsequently stops being necessary, it's not so obvious that there are large amounts of code that could be deleted.

Advice

Although the previous section listed some downsides of macros, they are still fundamentally the right tool for the job when there are different chunks of code that need to be kept consistent, but which cannot be coalesced any other way: use a macro whenever it's the only way to ensure that disparate code stays in sync.

Macros are also the tool to reach for when there's boilerplate code to be squashed: use a macro for repeated boilerplate code that can't be coalesced into a function or a generic.

To reduce the impact on readability, try to avoid syntax in your macros that clashes with Rust's normal syntax rules; either make the macro invocation look like normal code, or make it look sufficiently different that no-one could confuse the two. In particular:

  • Avoid macro expansions that insert references where possible – a macro invocation like my_macro!(&list) aligns better with normal Rust code than my_macro!(list) would.
  • Prefer to avoid non-local control flow operations in macros, so that anyone reading the code is able to follow the flow without needing to know the details of the macro.

This preference for Rust-like readability sometimes affects the choice between declarative macros and procedural macros. If you need to emit code for each field of a structure, or each variant of an enum, prefer a derive macro to a procedural macro that emits a type (despite the example shown in an earlier section); it's more idiomatic and makes the code easier to read.

However, if you're adding a derive macro with functionality that's not specific to your project, check whether an external crate already provides what you need (cf. Item 25). For example, the problem of converting integer values into the appropriate variant of a C-like enum is well-covered: all of enumn::N, num_enum::ITryFromPrimitive, num_derive::FromPrimitive, and strum::FromRepr cover some aspect of this problem.