Item 2: Use the type system to express common behaviour

Item 1 discussed how to express data structures in the type system; this Item moves on to discuss the encoding of behaviour in Rust's type system.

The first stage of this is to add methods to data structures: functions that act on an item of that type, identified by self. Methods can be added to struct types, but can also be added to enum types, in keeping with the pervasive nature of Rust's enum (Item 1). The name of a method gives a label for the behaviour that it encodes, and the method signature gives type information for how to invoke it.

Code that needs to make use of behaviour associated with a type can accept an item of that type (or a reference to it), and invoke the methods needed. However, this tightly couples the two parts of the code; the code that invokes the method only accepts exactly one input type.

If greater flexibility is needed, the desired behaviour can be abstracted into the type system. The simplest such abstraction is the function pointer: a pointer to (just) some code, with a type that reflects the signature of the function. The type is checked at compile time, so by the time the program runs the value is just the size of a pointer.

    fn sum(x: i32, y: i32) -> i32 {
        x + y
    }
    // Explicit coercion to `fn` type is required...
    let op: fn(i32, i32) -> i32 = sum;

Function pointers have no other data associated with them, so they can be treated as values in various ways:

    // `fn` types implement `Copy`
    let op1 = op;
    let op2 = op;
    // `fn` types implement `Eq`
    assert!(op1 == op2);
    // `fn` implements `std::fmt::Pointer`, used by the {:p} format specifier.
    println!("op = {:p}", op);
    // Example output: "op = 0x101e9aeb0"

One technical detail to watch out for: the explicit coercion to a fn type is needed, because just using the name of a function doesn't give you something of fn type;

        let op1 = sum;
        let op2 = sum;
        // Both op1 and op2 are of a type that cannot be named in user code,
        // and this internal type does not implement `Eq`.
        assert!(op1 == op2);
error[E0369]: binary operation `==` cannot be applied to type `fn(i32, i32) -> i32 {main::sum}`
  --> use-types-behaviour/src/main.rs:53:21
   |
53 |         assert!(op1 == op2);
   |                 --- ^^ --- fn(i32, i32) -> i32 {main::sum}
   |                 |
   |                 fn(i32, i32) -> i32 {main::sum}
   |
help: you might have forgotten to call this function
   |
53 |         assert!(op1( /* arguments */ ) == op2);
   |                 ^^^^^^^^^^^^^^^^^^^^^^
help: you might have forgotten to call this function
   |
53 |         assert!(op1 == op2( /* arguments */ ));
   |                        ^^^^^^^^^^^^^^^^^^^^^^

Instead, the compiler error indicates that the type is something like fn(i32, i32) -> i32 {main::sum}, a type that's entirely internal to the compiler (i.e. could not be written in user code), and which identifies the specific function as well as its signature. To put it another way, the type of sum encodes both the function's signature and its location (for optimization reasons); this type can be automatically coerced (Item 6) to a fn type.

Bare function pointers are very limiting, in two ways:

  • The data provided when invoking a function pointer is limited to just what's held in its arguments (along with any global data).
  • The only information encoded in the function pointer type is the signature of this particular function.

For the first of these, Rust supports closures: chunks of code defined by lambda expressions which can capture parts of their environment. At runtime, Rust automatically converts a lambda together with its captured environment into a closure that implements one of Rust's Fn* traits, and this closure can in turn be invoked.

    let amount_to_add = 2;
    let closure = |y| y + amount_to_add;
    assert_eq!(closure(5), 7);

The three different Fn* traits express some nice distinctions that are needed because of this environment capturing behaviour; the compiler automatically implements the appropriate subset of these Fn* traits for any lambda expression in the code (and it's not possible to manually implement any of these traits1, unlike C++'s operator() overload).

  • FnOnce describes a closure that can only be called once. If some part of its environment is moved into the closure, then that move can only happen once – there's no other copy of the source item to move from – and so the closure can only be invoked once.
  • FnMut describes a closure that can be called repeatedly, and which can make changes to its environment because it mutably borrows from the environment.
  • Fn describes a closure that can be called repeatedly, and which only borrows values from the environment immutably.

The latter two traits in this list each has a trait bound of the preceding trait: a closure that can be repeatedly called with immutable references (Fn) is also safe to call with mutable references (FnMut), and a closure that can be called repeatedly with mutable references (FnMut) is also safe to call once, with moved items rather than mutable references (FnOnce). The bare function pointer type fn also notionally belongs at the end of this list; any (not-unsafe) fn type automatically implements all of the Fn* traits, because it borrows nothing from the environment.

As a result, when writing code that accepts closures, use the most general Fn* trait that works, to allow the greatest flexibility for callers – for example, accept FnOnce for closures that are only used once. The same reasoning also leads to advice to prefer Fn* trait bounds to bare function pointers (fn).

The Fn* traits are more flexible than a bare function pointer, but they can still only describe the behaviour of a single function, and that only in terms of the function's signature. Continuing to generalize, collections of related operations are described in the type system by a trait: a collection of related methods that some underlying item makes publicly available. Each method in a trait also has a name, providing a label which allows the compiler to disambiguate methods with the same signature, and more importantly which allows programmers to deduce the intent of the method.

A Rust trait is roughly analogous to an "interface" in Go and Java, or to an "abstract class" (all virtual methods, no data members) in C++. Implementations of the trait must provide all the methods (but note that the trait definition can include a default implementation, Item 12), and can also have associated data that those implementations make use of. This means that code and data gets encapsulated together, in a somewhat object-oriented manner.

Returning to the original situation, code that accepts a struct and calls methods on it is more flexible if instead the struct implements some trait, so that the calling code invokes trait methods rather than struct methods. This leads to the same kind of advice that turns up for other OO-influenced languages2: prefer accepting trait types to concrete types if future flexibility is anticipated.

Sometimes, there is some behaviour that you want to distinguish in the type system, but which cannot be expressed as some specific method signature in a trait definition. For example, consider a trait for sorting collections; an implementation might be stable (elements that compare the same will appear in the same order before and after the sort) but there's no way to express this in the sort method arguments.

In this case, it's still worth using the type system to track this requirement, using a marker trait.

pub trait Sort {
    /// Re-arrange contents into sorted order.
    fn sort(&mut self);
}

/// Marker trait to indicate that a [`Sortable`] sort stably.
pub trait StableSort: Sort {}

A marker trait has no methods, but an implementation still has to declare that it is implementing the trait – which acts as a promise from the implementer: "I solemnly swear that my implementation sorts stably". Code that relies on a stable sort can then specify the StableSort trait bound, relying on the honour system to preserve its invariants. Use marker traits to distinguish behaviours that cannot be expressed in the trait method signatures.

Once behaviour has been encapsulated into Rust's type system as a trait, there are two ways it can be used:

  • as a trait bound, which constrains what types are acceptable for a generic data type or method at compile-time, or
  • as a trait object. which constrains what types can be stored or passed to a method at run-time.

(Item 11 discusses which of the two you should prefer, where possible3.)

A trait bound indicates that generic code which is parameterized by some type T can only be used when that type T implements some specific trait. The presence of the trait bound means that the implementation of the generic can use the methods from that trait, secure in the knowledge that the compiler will ensure that any T that compiles does indeed have those methods. This check happens at compile-time, when the generic is monomorphized (Rust's term for what C++ would call "template instantiation").

This restriction on the target type T is explicit, encoded in the trait bounds: the trait can only be implemented by types that satisfy the trait bounds. This is in contrast to the equivalent situation in C++, where the constraints on the type T used in a template<typename T> are implicit 4: C++ template code still only compiles if all of the referenced methods are available at compile-time, but the checks are purely based on method and signature. (This "duck typing" leads to the chance of confusion; a C++ template that uses t.pop() might compile for a T type parameter of either Stack or Balloon – which is unlikely to be desired behaviour.)

The need for explicit trait bounds also means that a large fraction of generics use trait bounds. To see why this is, turn the observation around and consider what can be done with a struct Thing<T> where there no trait bounds on T. Without a trait bound, the Thing can only perform operations that apply to any type T; this allows for containers, collections and smart pointers, but not much else. Anything that uses the type T is going to need a trait bound.

pub fn dump_sorted<T>(mut collection: T)
where
    T: Sort + IntoIterator,
    T::Item: Debug,
{
    // Next line requires `T: Sort` trait bound.
    collection.sort();
    // Next line requires `T: IntoIterator` trait bound.
    for item in collection {
        // Next line requires `T::Item : Debug` trait bound
        println!("{:?}", item);
    }
}

So the advice here is to use trait bounds to express requirements on the types used in generics, but it's easy advice to follow – the compiler will force you to comply with it regardless.

A trait object is the other way of making use of the encapsulation defined by a trait, but here different possible implementations of the trait are chosen at run-time rather than compile-time. This dynamic dispatch is analogous to the use of virtual functions in C++, and under the covers Rust has 'vtable' objects that are roughly analogous to those in C++.

This dynamic aspect of trait objects also means that they always have to be handled indirectly, via reference (&dyn Trait) or a pointer (Box<dyn Trait>). This is because the size of the object implementing the trait isn't known at compile time – it could be a giant struct or a tiny enum – so there's no way to allocate the right amount of space for a bare trait object.

A similar concern means that traits used as trait objects cannot have methods that return the Self type, because the compiled-in-advance code that uses the trait object would have no idea how big that Self might be.

A trait that has a generic method fn method<T>(t:T) allows for the possibility of an infinite number of implemented methods, for all the different types T that might exist. This is fine for a trait used as a trait bound, because the infinite set of possibly invoked generic methods becomes a finite set of actually invoked generic methods at compile time. The same is not true for a trait object: the code available at compile time has to cope with all possible Ts that might arrive at run-time.

These two restrictions – no returning Self and no generic methods – are combined into the concept of object safety. Only object safe traits can be used as trait objects.


1: At least, not in stable Rust at the time of writing. The unboxed_closures and fn_traits experimental features may change this in future.

2: For example, Effective Java Item 64: Refer to objects by their interfaces

3: Spoiler: trait bounds.

4: The addition of concepts in C++20 allows explicit specification of constraints on template types, but the checks are still only performed when the template is instantiated, not when it is declared.