Item 15: Understand the borrow checker

"The power to destroy a thing is the absolute control over it." – Frank Herbert

Values in Rust have an owner, but that owner can lend the values out to other places in the code. This borrowing mechanism involves the creation and use of references, subject to rules policed by the borrow checker.

Under the covers this uses the same kind of pointer values (Item 9) that are so prevalent in C or C++ code, but girded with rules and restrictions to make sure that the sins of C/C++ are avoided. As a quick comparison:

  • Like a C/C++ pointer, a Rust reference is created with an ampersand: &value.
  • Like a C++ reference, a Rust reference can never be nullptr.
  • Like a C/C++ pointer or reference, a Rust reference can be modified after creation to refer to something different.
  • Unlike C++, producing a reference from a value always involves an explicit (&) conversion – if you see code like f(value), you know that f is receiving ownership of the value1.
  • Unlike C/C++, the mutability of a newly-created reference is always explicit (& mut); if you see code like f(&value), you know that value won't modified (i.e. is const in C/C++ terminology). Only expressions2 like f(&mut value) have the possibility of changing the contents of value.

The most important difference between a C/C++ pointer and a Rust reference is indicated by the term borrow: you can take a reference (pointer) to an item, but you have to give it back. In particular, you have to give it back before the lifetime of the underlying item expires, as tracked by the compiler and explored in Item 14.

These restrictions on the use of references are at the heart of Rust's memory safety guarantees, but they do mean you have to accept the cognitive costs of the borrow rules – accept that it will change how you design your software, particularly its data structures.

Access Control

There are three different ways that a Rust item can be accessed: via the item's owner (item), via a reference (&item), or via a mutable reference (&mut item).

Each of these different ways of accessing the item comes with different powers over the item. Putting things in CRUDe terms:

  • The owner of an item gets to create it, read from it, update it, and drop it (CRUD).
  • A mutable reference can be used to read from the underlying item, and to update it (_RU_).
  • A (normal) reference can only be used to read from the underlying item (_R__).

There's an important Rust-specific extension to these data access rules: only the item's owner can move the item. This makes sense if you think of a move as being some combination of creating (in the new location) and dropping the item's memory (at the old location).

This can lead to some oddities for code that has a mutable reference to an item. For example, it's OK to overwrite an Option:

fn overwrite(item: &mut Option<Item>, val: Item) {
    *item = Some(val);
}

but a modification to return the previous value falls foul of the move restriction:

    pub fn replace(item: &mut Option<Item>, val: Item) -> Option<Item> {
        let previous = *item; // move out
        *item = Some(val); // replace
        previous
    }
error[E0507]: cannot move out of `*item` which is behind a mutable reference
  --> borrows/src/main.rs:27:24
   |
27 |         let previous = *item; // move out
   |                        ^^^^^ move occurs because `*item` has type `Option<Item>`, which does not implement the `Copy` trait
   |
help: consider borrowing the `Option`'s content
   |
27 |         let previous = *item.as_ref(); // move out
   |                             +++++++++
help: consider borrowing here
   |
27 |         let previous = &*item; // move out
   |                        ~~~~~~

It's valid to read from a mutable reference, and it's valid to write to a mutable reference, and so the ability to do both at once is provided by the std::mem::replace function in the standard library. This uses unsafe code under the covers (as per Item 16) to perform the swap in one go:

    pub fn replace(item: &mut Option<Item>, val: Item) -> Option<Item> {
        std::mem::replace(item, Some(val)) // returns previous value
    }

For Option types in particular, this is a sufficiently common pattern that there is also a replace method on Option itself:

    pub fn replace(item: &mut Option<Item>, val: Item) -> Option<Item> {
        item.replace(val)
    }

Borrow Rules

The first rule for borrowing references in Rust is that the scope of any reference must be smaller than the lifetime of the item that it refers to. However, the compiler is smarter than just assuming that a reference lasts until it is dropped – the non-lexical lifetimes feature allows reference lifetimes to be shrunk so they end at the point of last use, rather than the enclosing block (as described in Item 14).

The second rule for borrowing references is that, in addition to the owner of an item, there can be

  • any number of immutable references to the item, or
  • a single mutable reference to the item

but not both.

So a method that takes multiple immutable references can be fed references to the same item:

    fn both_zero(left: &Item, right: &Item) -> bool {
        left.contents == 0 && right.contents == 0
    }

    let item = Item { contents: 0 };
    assert!(both_zero(&item, &item));

but one that takes mutable references cannot:

    fn zero_both(left: &mut Item, right: &mut Item) {
        left.contents = 0;
        right.contents = 0;
    }

    let mut item = Item { contents: 42 };
    zero_both(&mut item, &mut item);
error[E0499]: cannot borrow `item` as mutable more than once at a time
   --> borrows/src/main.rs:115:26
    |
115 |     zero_both(&mut item, &mut item);
    |     --------- ---------  ^^^^^^^^^ second mutable borrow occurs here
    |     |         |
    |     |         first mutable borrow occurs here
    |     first borrow later used by call

and similarly for a mixture of the two:

    fn copy_contents(left: &mut Item, right: &Item) {
        left.contents = right.contents;
    }

    let mut item = Item { contents: 42 };
    copy_contents(&mut item, &item);
error[E0502]: cannot borrow `item` as immutable because it is also borrowed as mutable
   --> borrows/src/main.rs:140:30
    |
140 |     copy_contents(&mut item, &item);
    |     ------------- ---------  ^^^^^ immutable borrow occurs here
    |     |             |
    |     |             mutable borrow occurs here
    |     mutable borrow later used by call

The borrowing rules allow the compiler to make better decisions around aliasing: tracking when two different pointers may or may not refer to the same underlying item in memory. If the compiler can be sure (as in Rust) that the memory location pointed to by a collection of immutable references cannot be altered via an aliased mutable reference, then it can generate code that is:

  • better optimized: values can be (e.g.) cached in registers, secure in the knowledge that the underlying memory contents will not change in the meanwhile
  • safer: data races arising from unsynchronized access to memory between threads (Item 17) are not possible.

Owner Operations

One important consequence of the rules around the existence of references is that they also affect what operations can be performed by the owner of the item. To help understand this, consider operations involving the owner as though they make use of references along the way.

For example, an attempt to update the item via its owner while a reference exists fails, because of this transient second mutable reference:

    let mut item = Item { contents: 42 };
    let r = &item;
    item.contents = 0;
    // ^^^ Changing the item is roughly equivalent to:
    //   (&mut item).contents = 0;
    println!("reference to item is {:?}", r);
error[E0506]: cannot assign to `item.contents` because it is borrowed
   --> borrows/src/main.rs:164:5
    |
163 |     let r = &item;
    |             ----- borrow of `item.contents` occurs here
164 |     item.contents = 0;
    |     ^^^^^^^^^^^^^^^^^ assignment to borrowed `item.contents` occurs here
...
167 |     println!("reference to item is {:?}", r);
    |                                           - borrow later used here

On the other hand, because multiple immutable references are allowed, it's OK for the owner to read from the item while there are immutable references in existence:

    let item = Item { contents: 42 };
    let r = &item;
    let contents = item.contents;
    // ^^^ Reading from the item is roughly equivalent to:
    //   let contents = (&item).contents;
    println!("reference to item is {:?}", r);

but not if there is a mutable reference:

    let mut item = Item { contents: 42 };
    let r = &mut item;
    let contents = item.contents; // i64 is `Copy`
    r.contents = 0;
error[E0503]: cannot use `item.contents` because it was mutably borrowed
   --> borrows/src/main.rs:194:20
    |
193 |     let r = &mut item;
    |             --------- borrow of `item` occurs here
194 |     let contents = item.contents; // i64 is `Copy`
    |                    ^^^^^^^^^^^^^ use of borrowed `item`
195 |     r.contents = 0;
    |     -------------- borrow later used here

Finally, the existence of any sort of reference prevents the owner of the item from moving or dropping the item, exactly because this mean that the reference now refers to an invalid item.

    let item = Item { contents: 42 };
    let r = &item;
    let new_item = item; // move
    println!("reference to item is {:?}", r);
error[E0505]: cannot move out of `item` because it is borrowed
   --> borrows/src/main.rs:151:20
    |
150 |     let r = &item;
    |             ----- borrow of `item` occurs here
151 |     let new_item = item; // move
    |                    ^^^^ move out of `item` occurs here
152 |     println!("reference to item is {:?}", r);
    |                                           - borrow later used here

Winning Fights against the Borrow Checker

Newcomers to Rust (and even more experienced folk!) can often feel that they are spending time fighting against the borrow checker. What kinds of things can help you win these battles?

Local Code Refactoring

The first tactic is to pay attention to the compiler's error messages, because the Rust developers have put a lot of effort into making them as helpful as possible.

/// If `needle` is present in `haystack`, return a slice containing it.
pub fn find<'a, 'b>(a: &'a str, b: &'b str) -> Option<&'a str> {
    a.find(b).map(|i| &a[i..i + b.len()])
}

// ...

    let found = find(&format!("{} to search", "Text"), "ex");
    if let Some(text) = found {
        println!("Found '{}'!", text);
    }
error[E0716]: temporary value dropped while borrowed
   --> borrows/src/main.rs:312:23
    |
312 |     let found = find(&format!("{} to search", "Text"), "ex");
    |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^       - temporary value is freed at the end of this statement
    |                       |
    |                       creates a temporary which is freed while still in use
313 |     if let Some(text) = found {
    |                         ----- borrow later used here
    |
    = note: consider using a `let` binding to create a longer lived value
    = note: this error originates in the macro `format` (in Nightly builds, run with -Z macro-backtrace for more info)

The first part of the error message is the important part, because it describes what borrowing rule the compiler thinks you have broken, and why. As you encounter enough of these errors – which you will – you can build up an intuition about the borrow checker that matches the more theoretical version encapsulated in the rules above.

The second part of the error message includes the compiler's suggestions for how to fix the problem, which in this case is simple:

    let haystack = format!("{} to search", "Text");
    let found = find(&haystack, "ex");
    if let Some(text) = found {
        println!("Found '{}'!", text);
    }
    // `found` now references `haystack`, which out-lives it

This is an instance of one of the two simple code tweaks that can help mollify the borrow checker:

  • Lifetime extension: convert a temporary (whose lifetime only extends to the end of the expression) to be a new named local variable (whose lifetime extends to the end of the block) with a let binding.
  • Lifetime reduction: add an additional block { ... } around the use of a reference so that its lifetime ends at the end of the new block.

The latter is less common, because of the existence of non-lexical lifetimes: the compiler can often figure out that a reference is no longer used, ahead of its official drop point at the end of the block. However, if you do find yourself repeatedly introducing an artificial block around similar small chunks of code, consider whether that code should be encapsulated into a method of its own.

The compiler's suggested fixes are helpful for simpler problems, but as you write more sophisticated code you're likely to find that the suggestions are no longer useful, and that the explanation of the broken borrowing rule is harder to follow.

    let x = Some(Rc::new(RefCell::new(Item { contents: 42 })));

    // Call function with signature: `check_item(item: Option<&Item>)`
    check_item(x.as_ref().map(|r| r.borrow().deref()));
error[E0515]: cannot return reference to temporary value
   --> borrows/src/main.rs:257:35
    |
257 |     check_item(x.as_ref().map(|r| r.borrow().deref()));
    |                                   ----------^^^^^^^^
    |                                   |
    |                                   returns a reference to data owned by the current function
    |                                   temporary value created here

In this situation it can be helpful to temporarily introduce a sequence of local variables, one for each step of a complicated transformation, and each with an explicit type annotation.

    let x: Option<Rc<RefCell<Item>>> =
        Some(Rc::new(RefCell::new(Item { contents: 42 })));

    let x1: Option<&Rc<RefCell<Item>>> = x.as_ref();
    let x2: Option<std::cell::Ref<Item>> = x1.map(|r| r.borrow());
    let x3: Option<&Item> = x2.map(|r| r.deref());
    check_item(x3);
error[E0515]: cannot return reference to function parameter `r`
   --> borrows/src/main.rs:269:40
    |
269 |     let x3: Option<&Item> = x2.map(|r| r.deref());
    |                                        ^^^^^^^^^ returns a reference to data owned by the current function

This narrows down the precise conversion that the compiler is complaining about, which in turn allows the code to be restructured:

    let x: Option<Rc<RefCell<Item>>> =
        Some(Rc::new(RefCell::new(Item { contents: 42 })));

    let x1: Option<&Rc<RefCell<Item>>> = x.as_ref();
    let x2: Option<std::cell::Ref<Item>> = x1.map(|r| r.borrow());
    match x2 {
        None => check_item(None),
        Some(r) => {
            let x3: &Item = r.deref();
            check_item(Some(x3));
        }
    }

Once the underlying problem is clear and has been fixed, you're then free to re-coalesce the local variables back together, so that you can pretend that you got it right all along:

    let x = Some(Rc::new(RefCell::new(Item { contents: 42 })));

    match x.as_ref().map(|r| r.borrow()) {
        None => check_item(None),
        Some(r) => check_item(Some(r.deref())),
    };

Data Structure Design

The next tactic that helps for battles against the borrow checker is to design your data structures with the borrow checker in mind. The panacea is if your data structures can own all of the data that they use, avoiding any use of references and the consequent propagation of lifetime annotations described in Item 14.

However, that's not always possible for real-world data structures; any time the internal connections of the data structure form a graph that's more inter-connected than a tree pattern (a Root that owns multiple Branches, each of which owns multiple Leafs etc.), then simple single-ownership isn't possible.

To take a simple example, imagine a simple register of guest details recorded in the order in which they arrive.

#[derive(Clone)]
struct Guest {
    name: String,
    phone: PhoneNumber,
    address: String,
    // ... many other fields
}

#[derive(Default, Debug)]
struct GuestRegister(Vec<Guest>);

impl GuestRegister {
    fn register(&mut self, guest: Guest) {
        self.0.push(guest)
    }
    fn nth(&self, idx: usize) -> Option<&Guest> {
        if idx < self.0.len() {
            Some(&self.0[idx])
        } else {
            None
        }
    }
}

If this code also needs to be able to efficiently look up guests by arrival and alphabetically by name, then there are fundamentally two distinct data structures involved, and only one of them can own the data.

If the data involved is both small and immutable, then just taking a copy can give a quick solution.

#[derive(Default, Debug)]
struct ClonedGuestRegister {
    by_arrival: Vec<Guest>,
    by_name: BTreeMap<String, Guest>,
}

impl ClonedGuestRegister {
    fn register(&mut self, guest: Guest) {
        self.by_arrival.push(guest.clone()); // requires `Guest` to be `Clone`
        self.by_name.insert(guest.name.clone(), guest);
    }
    fn named(&self, name: &str) -> Option<&Guest> {
        self.by_name.get(name)
    }
    fn nth(&self, idx: usize) -> Option<&Guest> {
        // snip
    }
}

This approach of taking copies copes poorly if the data can be modified – if the telephone number for a Guest needs to be updated, you have to find both versions and ensure they stay in sync.

Another possible approach is to add another layer of indirection, treating the Vec<Guest> as the owner and using an index into that vector for the name lookups.

This approach copes fine with a changing phone number – the (single) Guest is owned by the Vec, and will always be reached that way under the covers:

    let new_number = PhoneNumber::new(123456);
    ledger.named_mut("Bob").unwrap().phone = new_number;
    assert_eq!(ledger.named("Bob").unwrap().phone, new_number);

However, it copes less well with a different kind of modification: what happens if guests can de-register:

    // Deregister the `Guest` at position `idx`, moving up all subsequent guests.
    fn deregister(&mut self, idx: usize) -> Result<(), Error> {
        if idx >= self.by_arrival.len() {
            return Err(Error::new("out of bounds"));
        }
        self.by_arrival.remove(idx);
        // Oops, forgot to update `by_name`.
        Ok(())
    }

Now that the Vec can be shuffled, the by_name indexes into it are effectively acting like pointers, and we've re-introduced a world where those "pointers" can point to nothing (beyond the Vec bounds) or can point to incorrect data.

    ledger.register(alice);
    ledger.register(bob);
    ledger.register(charlie);
    println!("Register starts as: {:?}", ledger);

    ledger.deregister(0).unwrap();
    println!("Register after deregister(0): {:?}", ledger);

    let also_alice = ledger.named("Alice");
    // Alice still has index 0, which is now Bob
    println!("Alice is {:?}", also_alice);

    let also_bob = ledger.named("Bob");
    // Bob still has index 1, which is now Charlie
    println!("Bob is {:?}", also_bob);

    let also_charlie = ledger.named("Charlie");
    // Charlie still has index 2, which is now beyond the Vec
    println!("Charlie is {:?}", also_charlie);
Register starts as: {
  by_arrival: [{n: 'Alice', ...}, {n: 'Bob', ...}, {n: 'Charlie', ...}]
  by_name: {"Alice": 0, "Bob": 1, "Charlie": 2}
}
Register after deregister(0): {
  by_arrival: [{n: 'Bob', ...}, {n: 'Charlie', ...}]
  by_name: {"Alice": 0, "Bob": 1, "Charlie": 2}
}
Alice is Some({n: 'Bob', ...})
Bob is Some({n: 'Charlie', ...})
Charlie is None

Regardless of approach, the code needs to be fixed to ensure the data structures stay in sync. However, a better approach to the underlying data structure would be to use Rust's smart pointers instead (Item 9). Shifting to a combination of Rc and RefCell avoids the invalidation problems of using indices as pseudo-pointers:

#[derive(Default)]
struct RcGuestRegister {
    by_arrival: Vec<Rc<RefCell<Guest>>>,
    by_name: BTreeMap<String, Rc<RefCell<Guest>>>,
}

impl RcGuestRegister {
    fn register(&mut self, guest: Guest) {
        let name = guest.name.clone();
        let guest = Rc::new(RefCell::new(guest));
        self.by_arrival.push(guest.clone());
        self.by_name.insert(name, guest);
    }
    fn deregister(&mut self, idx: usize) -> Result<(), Error> {
        if idx >= self.by_arrival.len() {
            return Err(Error::new("out of bounds"));
        }
        self.by_arrival.remove(idx);
        // Oops, still forgot to update `by_name`.
        Ok(())
    }
    // snip
}
Register starts as: {
  by_arrival: [{n: 'Alice', ...}, {n: 'Bob', ...}, {n: 'Charlie', ...}]
  by_name: [("Alice", {n: 'Alice', ...}), ("Bob", {n: 'Bob', ...}), ("Charlie", {n: 'Charlie', ...})]
}
Register after deregister(0): {
  by_arrival: [{n: 'Bob', ...}, {n: 'Charlie', ...}]
  by_name: [("Alice", {n: 'Alice', ...}), ("Bob", {n: 'Bob', ...}), ("Charlie", {n: 'Charlie', ...})]
}
Alice is Some(RefCell { value: {n: 'Alice', ...} })
Bob is Some(RefCell { value: {n: 'Bob', ...} })
Charlie is Some(RefCell { value: {n: 'Charlie', ...} })

The output is now valid, but there's a lingering entry for Alice that remains until we ensure that the two collections stay in sync:

    fn deregister_fixed(&mut self, idx: usize) -> Result<(), Error> {
        if idx >= self.by_arrival.len() {
            return Err(Error::new("out of bounds"));
        }
        let guest: Rc<RefCell<Guest>> = self.by_arrival.remove(idx);
        self.by_name.remove(&guest.borrow().name);
        Ok(())
    }
Register after deregister(0): {
  by_arrival: [{n: 'Bob', ...}, {n: 'Charlie', ...}]
  by_name: [("Bob", {n: 'Bob', ...}), ("Charlie", {n: 'Charlie', ...})]
}
Alice is None
Bob is Some(RefCell { value: {n: 'Bob', ...} })
Charlie is Some(RefCell { value: {n: 'Charlie', ...} })

Smart Pointers

The final variation of the previous section is an example of a more general approach: use Rust's smart pointers for interconnected data structures.

Item 9 described the most common smart pointer types provided by Rust's standard library.

  • Rc allows shared ownership, with multiple things referring to the same item. Often combined with…
  • RefCell allows interior mutability, so that internal state can be modified without needing a mutable reference. This comes at the cost of moving borrow checks from compile-time to run-time.
  • Arc is the multi-threading equivalent to Rc.
  • Mutex (and RwLock) allows interior mutability in a multi-threading environment, roughly equivalent to RefCell.
  • Cell allows interior mutability for Copy types.

For programmers and designs that are adapting from C++ to Rust, the most common tool to reach for is Rc<T> (and its thread-safe cousin Arc<T>), often combined with RefCell (or the thread-safe alternative Mutex). A naïve translation of shared pointers (or even std::shared_ptrs) to Rc<RefCell<T>> instances will generally give something that works in Rust without too much complaint from the borrow checker. However, this approach means that you miss out on some of the protections that Rust gives you; in particular, situations where the same item is mutably borrowed (via borrow_mut()) while another reference exists result in a run-time panic! rather than a compile-time error.

For example, one pattern that breaks the one-way flow of ownership in tree-like data structures is when there's an "owner" pointer back from an item to the thing that owns it:

// C++ code (with lackadaisical pointer use)
struct Tree {
  std::string id() const;

  std::string tree_id_;
  std::vector<Branch*> branches_; // `Tree` owns `Branch` objects
};

struct Branch {
  std::string id() const;  // hierarchical identifier for `Branch`

  std::string branch_id_;
  std::vector<Leaf*> leaves_; // `Branch` owns `Leaf` objects
  Tree* owner_; // back-reference to owning `Tree`
};

struct Leaf {
  std::string id() const;  // hierarchical identifier for `Leaf`

  std::string leaf_id_;
  Branch* owner_; // back-reference to owning `Branch`
};

std::string Branch::id() const {
  if (owner_ == nullptr) {
    return "<unowned>." + branch_id_;
  } else {
    return owner_->id()+ "." + branch_id_;
  }
}

Implementing the equivalent pattern in Rust can make use of Rc<T>'s more tentative partner, Weak<T>:

struct Tree {
    tree_id: String,
    branches: Vec<Rc<RefCell<Branch>>>,
}

struct Branch {
    branch_id: String,
    leaves: Vec<Rc<RefCell<Leaf>>>,
    owner: Option<Weak<RefCell<Tree>>>,
}

struct Leaf {
    leaf_id: String,
    owner: Option<Weak<RefCell<Branch>>>,
}

The Weak reference doesn't increment the refcount, and so has to explicitly check whether the underlying item has gone away:

impl Branch {
    fn add_leaf(branch: Rc<RefCell<Branch>>, mut leaf: Leaf) {
        leaf.owner = Some(Rc::downgrade(&branch));
        branch.borrow_mut().leaves.push(Rc::new(RefCell::new(leaf)));
    }
    fn id(&self) -> String {
        match &self.owner {
            None => format!("<unowned>.{}", self.branch_id),
            Some(t) => {
                let tree = t.upgrade().expect("internal error: owner gone!");
                format!("{}.{}", tree.borrow().id(), self.branch_id)
            }
        }
    }
}

If Rust's smart pointer don't seem to cover what's needed for your data structures, there's always the final fallback of writing unsafe code that uses raw (and decidedly un-smart) pointers. However, as per Item 16 this should very much be a last resort – someone else might already have implemented the semantics you want, inside a safe interface, and if you search the standard library and crates.io you might find it.

For example, imagine that you have a function that sometimes returns a reference to one of its inputs, but sometimes needs to return some freshly allocated data. In line with Item 1, an enum that encodes these two possibilities is the natural way to express this in the type system, and you could then implement various of the pointer traits described in Item 9. But you don't have to: the standard library already includes the std::borrow::Cow type3 that covers exactly this scenario once you know it exists.

Self-Referential Data Structures

One particular style of data structure always stymies programmers arriving at Rust from other languages: attempting to create self-referential data structures, which contain a mixture of owned data together with references to within that owned data.

struct SelfRef {
    text: String,
    // The slice of `text` that holds the title text.
    title: Option<&str>,
}

At a syntactic level, this code won't compile because it doesn't comply with the lifetime rules described in Item 9: the reference needs a lifetime annotation, but we wouldn't want that lifetime annotation to be propagated to the containing data structure, because the intent is not to refer to anything external.

It's worth thinking about the reason for this restriction at a more semantic level. Data structures in Rust can move: from the stack to the heap, from the heap to the stack, and from one place to another. If that happens, the "interior" title pointer would no longer be valid, and there's no way to keep it in sync.

A simple alternative for this case is to use the indexing approach explored earlier; a range of offsets into the text is not invalidated by a move, and is invisible to the borrow checker because it doesn't involve references:

struct SelfRefIdx {
    text: String,
    // Indices into `text` where the title text is.
    title: Option<Range<usize>>,
}

However, this indexing approach only works for simple examples. A more general version of the self-reference problem turns up when the compiler deals with async code4. Roughly speaking, the compiler bundles up a pending chunk of async code into a lambda, and the data for that lambda can include both values and references to those values.

That's inherently a self-referential data structure, and so async support was a prime motivation for the Pin type in the standard library. This pointer type "pins" its value in place, forcing the value to remain at the same location in memory, thus ensuring that internal self-references remain valid.

So Pin is available as a possibility for self-referential types, but it's tricky to use correctly (as its official docs make clear):

  • The internal reference fields need to use raw pointers, or near relatives (e.g. NonNull) thereof.
  • The type being pinned needs to not implement the Unpin marker trait. This trait is automatically implemented for almost every type, so this typically involves adding a (zero-sized) field of type PhantomPinned to the struct definition5.
  • The item is only pinned once it's on the heap and held via Pin; in other words, only the contents of something like Pin<Box<MyType>> is pinned. This means that the internal reference fields can only be safely filled in after this point, but as they are raw pointers the compiler will give you no warning if you incorrectly set them before calling Box::pin.

Where possible, avoid self-referential data structures or try to find library crates that encapsulate the difficulties for you (e.g. ouroborous).


1: However, it may be ownership of a copy of the item, if the value's type is Copy; see Item 5.

2: Note that all bets are off with expressions like m!(value) that involve a macro (Item 28), because it can expand to arbitrary code.

3: Cow stands for copy-on-write; a copy of the underlying data is only made if a change (write) needs to be made to it.

4: Dealing with async code is beyond the scope of this book; to understand more about its need for self-referential data structures, see chapter 8 of Rust for Rustaceans by Jon Gjengset.

5: In future it may be possible to just declare impl !Unpin for MyType {}