Item 15: Understand the borrow checker

Values in Rust have an owner, but that owner can lend the values out to other places in the code. This borrowing mechanism involves the creation and use of references, subject to rules policed by the borrow checker—the subject of this Item.

Under the covers, Rust's references use the same kind of pointer values (Item 8) that are so prevalent in C or C++ code but are girded with rules and restrictions to make sure that the sins of C/C++ are avoided. As a quick comparison:

  • Like a C/C++ pointer, a Rust reference is created with an ampersand: &value.
  • Like a C++ reference, a Rust reference can never be nullptr.
  • Like a C/C++ pointer or reference, a Rust reference can be modified after creation to refer to something different.
  • Unlike C++, producing a reference from a value always involves an explicit (&) conversion—if you see code like f(value), you know that f is receiving ownership of the value. (However, it may be ownership of a copy of the item, if the value's type implements Copy—see Item 10.)
  • Unlike C/C++, the mutability of a newly created reference is always explicit (&mut). If you see code like f(&value), you know that value won't be modified (i.e., is const in C/C++ terminology). Only expressions like f(&mut value) have the potential to change the contents of value.1

The most important difference between a C/C++ pointer and a Rust reference is indicated by the term borrow: you can take a reference (pointer) to an item, but you can't keep that reference forever. In particular, you can't keep it longer than the lifetime of the underlying item, as tracked by the compiler and explored in Item 14.

These restrictions on the use of references enable Rust to make its memory safety guarantees, but they also mean that you have to accept the cognitive costs of the borrow rules, and accept that it will change how you design your software—particularly its data structures.

This Item starts by describing what Rust references can do, and the borrow checker's rules for using them. The rest of the Item focuses on dealing with the consequences of those rules: how to refactor, rework, and redesign your code so that you can win fights against the borrow checker.

Access Control

There are three ways to access the contents of a Rust item: via the item's owner (item), a reference (&item), or a mutable reference (&mut item). Each of these ways of accessing the item comes with different powers over the item. Putting things roughly in terms of the CRUD (create/read/update/delete) model for storage (using Rust's drop terminology in place of delete):

  • The owner of an item gets to create it, read from it, update it, and drop it.
  • A mutable reference can be used to read from the underlying item and update it.
  • A (normal) reference can be used only to read from the underlying item.

There's an important Rust-specific aspect to these data access rules: only the item's owner can move the item. This makes sense if you think of a move as being some combination of creating (in the new location) and dropping the item's memory (at the old location).

This can lead to some oddities for code that has a mutable reference to an item. For example, it's OK to overwrite an Option:

#![allow(unused)]
fn main() {
/// Some data structure used by the code.
#[derive(Debug)]
pub struct Item {
    pub contents: i64,
}

/// Replace the content of `item` with `val`.
pub fn replace(item: &mut Option<Item>, val: Item) {
    *item = Some(val);
}
}

but a modification to also return the previous value falls foul of the move restriction:2

/// Replace the content of `item` with `val`, returning the previous
/// contents.
pub fn replace(item: &mut Option<Item>, val: Item) -> Option<Item> {
    let previous = *item; // move out
    *item = Some(val); // replace
    previous
}
error[E0507]: cannot move out of `*item` which is behind a mutable reference
  --> src/main.rs:34:24
   |
34 |         let previous = *item; // move out
   |                        ^^^^^ move occurs because `*item` has type
   |                              `Option<inner::Item>`, which does not
   |                              implement the `Copy` trait
   |
help: consider removing the dereference here
   |
34 -         let previous = *item; // move out
34 +         let previous = item; // move out
   |

Although it's valid to read from a mutable reference, this code is attempting to move the value out, just prior to replacing the moved value with a new value—in an attempt to avoid making a copy of the original value. The borrow checker has to be conservative and notices that there's a moment between the two lines when the mutable reference isn't referring to a valid value.

As humans, we can see that this combined operation—extracting the old value and replacing it with a new value—is both safe and useful, so the standard library provides the std::mem::replace function to perform it. Under the covers, replace uses unsafe (as per Item 16) to perform the swap in one go:

/// Replace the content of `item` with `val`, returning the previous
/// contents.
pub fn replace(item: &mut Option<Item>, val: Item) -> Option<Item> {
    std::mem::replace(item, Some(val)) // returns previous value
}

For Option types in particular, this is a sufficiently common pattern that there is also a replace method on Option itself:

/// Replace the content of `item` with `val`, returning the previous
/// contents.
pub fn replace(item: &mut Option<Item>, val: Item) -> Option<Item> {
    item.replace(val) // returns previous value
}

Borrow Rules

There are two key rules to remember when borrowing references in Rust.

The first rule is that the scope of any reference must be smaller than the lifetime of the item that it refers to. Lifetimes are explored in detail in Item 14, but it's worth noting that the compiler has special behavior for reference lifetimes; the non-lexical lifetimes feature allows reference lifetimes to be shrunk so they end at the point of last use, rather than the enclosing block.

The second rule for borrowing references is that, in addition to the owner of an item, there can be either of the following:

  • Any number of immutable references to the item
  • A single mutable reference to the item

However, there can't be both (at the same point in the code).

So a function that takes multiple immutable references can be fed references to the same item:

/// Indicate whether both arguments are zero.
fn both_zero(left: &Item, right: &Item) -> bool {
    left.contents == 0 && right.contents == 0
}

let item = Item { contents: 0 };
assert!(both_zero(&item, &item));

but one that takes mutable references cannot:

/// Zero out the contents of both arguments.
fn zero_both(left: &mut Item, right: &mut Item) {
    left.contents = 0;
    right.contents = 0;
}

let mut item = Item { contents: 42 };
zero_both(&mut item, &mut item);
error[E0499]: cannot borrow `item` as mutable more than once at a time
   --> src/main.rs:131:26
    |
131 |     zero_both(&mut item, &mut item);
    |     --------- ---------  ^^^^^^^^^ second mutable borrow occurs here
    |     |         |
    |     |         first mutable borrow occurs here
    |     first borrow later used by call

The same restriction is true for a function that uses a mixture of mutable and immutable references:

/// Set the contents of `left` to the contents of `right`.
fn copy_contents(left: &mut Item, right: &Item) {
    left.contents = right.contents;
}

let mut item = Item { contents: 42 };
copy_contents(&mut item, &item);
error[E0502]: cannot borrow `item` as immutable because it is also borrowed
              as mutable
   --> src/main.rs:159:30
    |
159 |     copy_contents(&mut item, &item);
    |     ------------- ---------  ^^^^^ immutable borrow occurs here
    |     |             |
    |     |             mutable borrow occurs here
    |     mutable borrow later used by call

The borrowing rules allow the compiler to make better decisions around aliasing: tracking when two different pointers may or may not refer to the same underlying item in memory. If the compiler can be sure (as in Rust) that the memory location pointed to by a collection of immutable references cannot be altered via an aliased mutable reference, then it can generate code that has the following advantages:

  • It's better optimized: Values can be, for example, cached in registers, secure in the knowledge that the underlying memory contents will not change in the meantime.
  • It's safer: Data races arising from unsynchronized access to memory between threads (Item 17) are not possible.

Owner Operations

One important consequence of the rules around the existence of references is that they also affect what operations can be performed by the owner of the item. One way to help understand this is to imagine that operations involving the owner are performed by creating and using references under the covers.

For example, an attempt to update the item via its owner is equivalent to making an ephemeral mutable reference and then updating the item via that reference. If another reference already exists, this notional second mutable reference can't be created:

let mut item = Item { contents: 42 };
let r = &item;
item.contents = 0;
// ^^^ Changing the item is roughly equivalent to:
//   (&mut item).contents = 0;
println!("reference to item is {:?}", r);
error[E0506]: cannot assign to `item.contents` because it is borrowed
   --> src/main.rs:200:5
    |
199 |     let r = &item;
    |             ----- `item.contents` is borrowed here
200 |     item.contents = 0;
    |     ^^^^^^^^^^^^^^^^^ `item.contents` is assigned to here but it was
    |                       already borrowed
...
203 |     println!("reference to item is {:?}", r);
    |                                           - borrow later used here

On the other hand, because multiple immutable references are allowed, it's OK for the owner to read from the item while there are immutable references in existence:

let item = Item { contents: 42 };
let r = &item;
let contents = item.contents;
// ^^^ Reading from the item is roughly equivalent to:
//   let contents = (&item).contents;
println!("reference to item is {:?}", r);

but not if there is a mutable reference:

let mut item = Item { contents: 42 };
let r = &mut item;
let contents = item.contents; // i64 implements `Copy`
r.contents = 0;
error[E0503]: cannot use `item.contents` because it was mutably borrowed
   --> src/main.rs:231:20
    |
230 |     let r = &mut item;
    |             --------- `item` is borrowed here
231 |     let contents = item.contents; // i64 implements `Copy`
    |                    ^^^^^^^^^^^^^ use of borrowed `item`
232 |     r.contents = 0;
    |     -------------- borrow later used here

Finally, the existence of any sort of active reference prevents the owner of the item from moving or dropping the item, exactly because this would mean that the reference now refers to an invalid item:

let item = Item { contents: 42 };
let r = &item;
let new_item = item; // move
println!("reference to item is {:?}", r);
error[E0505]: cannot move out of `item` because it is borrowed
   --> src/main.rs:170:20
    |
168 |     let item = Item { contents: 42 };
    |         ---- binding `item` declared here
169 |     let r = &item;
    |             ----- borrow of `item` occurs here
170 |     let new_item = item; // move
    |                    ^^^^ move out of `item` occurs here
171 |     println!("reference to item is {:?}", r);
    |                                           - borrow later used here

This is a scenario where the non-lexical lifetime feature described in Item 14 is particularly helpful, because (roughly speaking) it terminates the lifetime of a reference at the point where the reference is last used, rather than at the end of the enclosing scope. Moving the final use of the reference up before the move happens means that the compilation error evaporates:

let item = Item { contents: 42 };
let r = &item;
println!("reference to item is {:?}", r);

// Reference `r` is still in scope but has no further use, so it's
// as if the reference has already been dropped.
let new_item = item; // move works OK

Winning Fights Against the Borrow Checker

Newcomers to Rust (and even more experienced folk!) can often feel that they are spending time fighting against the borrow checker. What kinds of things can help you win these battles?

Local code refactoring

The first tactic is to pay attention to the compiler's error messages, because the Rust developers have put a lot of effort into making them as helpful as possible:

/// If `needle` is present in `haystack`, return a slice containing it.
pub fn find<'a, 'b>(haystack: &'a str, needle: &'b str) -> Option<&'a str> {
    haystack
        .find(needle)
        .map(|i| &haystack[i..i + needle.len()])
}

// ...

let found = find(&format!("{} to search", "Text"), "ex");
if let Some(text) = found {
    println!("Found '{text}'!");
}
error[E0716]: temporary value dropped while borrowed
   --> src/main.rs:353:23
    |
353 | let found = find(&format!("{} to search", "Text"), "ex");
    |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^       - temporary value
    |                   |                 is freed at the end of this statement
    |                   |
    |                   creates a temporary value which is freed while still in
    |                   use
354 | if let Some(text) = found {
    |                     ----- borrow later used here
    |
    = note: consider using a `let` binding to create a longer lived value

The first part of the error message is the important part, because it describes what borrowing rule the compiler thinks you have broken and why. As you encounter enough of these errors—which you will—you can build up an intuition about the borrow checker that matches the more theoretical version encapsulated in the previously stated rules.

The second part of the error message includes the compiler's suggestions for how to fix the problem, which in this case is simple:

let haystack = format!("{} to search", "Text");
let found = find(&haystack, "ex");
if let Some(text) = found {
    println!("Found '{text}'!");
}
// `found` now references `haystack`, which outlives it

This is an instance of one of the two simple code tweaks that can help mollify the borrow checker:

  • Lifetime extension: Convert a temporary (whose lifetime extends only to the end of the expression) into a new named local variable (whose lifetime extends to the end of the block) with a let binding.
  • Lifetime reduction: Add an additional block { ... } around the use of a reference so that its lifetime ends at the end of the new block.

The latter is less common, because of the existence of non-lexical lifetimes: the compiler can often figure out that a reference is no longer used, ahead of its official drop point at the end of the block. However, if you do find yourself repeatedly introducing an artificial block around similar small chunks of code, consider whether that code should be encapsulated into a method of its own.

The compiler's suggested fixes are helpful for simpler problems, but as you write more sophisticated code, you're likely to find that the suggestions are no longer useful and that the explanation of the broken borrowing rule is harder to follow:

let x = Some(Rc::new(RefCell::new(Item { contents: 42 })));

// Call function with signature: `check_item(item: Option<&Item>)`
check_item(x.as_ref().map(|r| r.borrow().deref()));
error[E0515]: cannot return reference to temporary value
   --> src/main.rs:293:35
    |
293 |     check_item(x.as_ref().map(|r| r.borrow().deref()));
    |                                   ----------^^^^^^^^
    |                                   |
    |                                   returns a reference to data owned by the
    |                                       current function
    |                                   temporary value created here

In this situation, it can be helpful to temporarily introduce a sequence of local variables, one for each step of a complicated transformation, and each with an explicit type annotation:

let x: Option<Rc<RefCell<Item>>> =
    Some(Rc::new(RefCell::new(Item { contents: 42 })));

let x1: Option<&Rc<RefCell<Item>>> = x.as_ref();
let x2: Option<std::cell::Ref<Item>> = x1.map(|r| r.borrow());
let x3: Option<&Item> = x2.map(|r| r.deref());
check_item(x3);
error[E0515]: cannot return reference to function parameter `r`
   --> src/main.rs:305:40
    |
305 |     let x3: Option<&Item> = x2.map(|r| r.deref());
    |                                        ^^^^^^^^^ returns a reference to
    |                                      data owned by the current function

This narrows down the precise conversion that the compiler is complaining about, which in turn allows the code to be restructured:

let x: Option<Rc<RefCell<Item>>> =
    Some(Rc::new(RefCell::new(Item { contents: 42 })));

let x1: Option<&Rc<RefCell<Item>>> = x.as_ref();
let x2: Option<std::cell::Ref<Item>> = x1.map(|r| r.borrow());
match x2 {
    None => check_item(None),
    Some(r) => {
        let x3: &Item = r.deref();
        check_item(Some(x3));
    }
}

Once the underlying problem is clear and has been fixed, you're then free to recoalesce the local variables back together so that you can pretend you got it right all along:

let x = Some(Rc::new(RefCell::new(Item { contents: 42 })));

match x.as_ref().map(|r| r.borrow()) {
    None => check_item(None),
    Some(r) => check_item(Some(r.deref())),
};

Data structure design

The next tactic that helps for battles against the borrow checker is to design your data structures with the borrow checker in mind. The panacea is your data structures owning all of the data that they use, avoiding any use of references and the consequent propagation of lifetime annotations described in Item 14.

However, that's not always possible for real-world data structures; any time the internal connections of the data structure form a graph that's more interconnected than a tree pattern (a Root that owns multiple Branches, each of which owns multiple Leafs, etc.), then simple single-ownership isn't possible.

To take a simple example, imagine a simple register of guest details recorded in the order in which they arrive:

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub struct Guest {
    name: String,
    address: String,
    // ... many other fields
}

/// Local error type, used later.
#[derive(Clone, Debug)]
pub struct Error(String);

/// Register of guests recorded in order of arrival.
#[derive(Default, Debug)]
pub struct GuestRegister(Vec<Guest>);

impl GuestRegister {
    pub fn register(&mut self, guest: Guest) {
        self.0.push(guest)
    }
    pub fn nth(&self, idx: usize) -> Option<&Guest> {
        self.0.get(idx)
    }
}
}

If this code also needs to be able to efficiently look up guests by arrival and alphabetically by name, then there are fundamentally two distinct data structures involved, and only one of them can own the data.

If the data involved is both small and immutable, then just cloning the data can be a quick solution:

mod cloned {
    use super::Guest;

    #[derive(Default, Debug)]
    pub struct GuestRegister {
        by_arrival: Vec<Guest>,
        by_name: std::collections::BTreeMap<String, Guest>,
    }

    impl GuestRegister {
        pub fn register(&mut self, guest: Guest) {
            // Requires `Guest` to be `Clone`
            self.by_arrival.push(guest.clone());
            // Not checking for duplicate names to keep this
            // example shorter.
            self.by_name.insert(guest.name.clone(), guest);
        }
        pub fn named(&self, name: &str) -> Option<&Guest> {
            self.by_name.get(name)
        }
        pub fn nth(&self, idx: usize) -> Option<&Guest> {
            self.by_arrival.get(idx)
        }
    }
}

However, this approach of cloning copes poorly if the data can be modified. For example, if the address for a Guest needs to be updated, you have to find both versions and ensure they stay in sync.

Another possible approach is to add another layer of indirection, treating the Vec<Guest> as the owner and using an index into that vector for the name lookups:

mod indexed {
    use super::Guest;

    #[derive(Default)]
    pub struct GuestRegister {
        by_arrival: Vec<Guest>,
        // Map from guest name to index into `by_arrival`.
        by_name: std::collections::BTreeMap<String, usize>,
    }

    impl GuestRegister {
        pub fn register(&mut self, guest: Guest) {
            // Not checking for duplicate names to keep this
            // example shorter.
            self.by_name
                .insert(guest.name.clone(), self.by_arrival.len());
            self.by_arrival.push(guest);
        }
        pub fn named(&self, name: &str) -> Option<&Guest> {
            let idx = *self.by_name.get(name)?;
            self.nth(idx)
        }
        pub fn named_mut(&mut self, name: &str) -> Option<&mut Guest> {
            let idx = *self.by_name.get(name)?;
            self.nth_mut(idx)
        }
        pub fn nth(&self, idx: usize) -> Option<&Guest> {
            self.by_arrival.get(idx)
        }
        pub fn nth_mut(&mut self, idx: usize) -> Option<&mut Guest> {
            self.by_arrival.get_mut(idx)
        }
    }
}

In this approach, each guest is represented by a single Guest item, which allows the named_mut() method to return a mutable reference to that item. That in turn means that changing a guest's address works fine—the (single) Guest is owned by the Vec and will always be reached that way under the covers:

let new_address = "123 Bigger House St";
// Real code wouldn't assume that "Bob" exists...
ledger.named_mut("Bob").unwrap().address = new_address.to_string();

assert_eq!(ledger.named("Bob").unwrap().address, new_address);

However, if guests can deregister, it's easy to inadvertently introduce a bug:


// Deregister the `Guest` at position `idx`, moving up all
// subsequent guests.
pub fn deregister(&mut self, idx: usize) -> Result<(), super::Error> {
    if idx >= self.by_arrival.len() {
        return Err(super::Error::new("out of bounds"));
    }
    self.by_arrival.remove(idx);

    // Oops, forgot to update `by_name`.

    Ok(())
}

Now that the Vec can be shuffled, the by_name indexes into it are effectively acting like pointers, and we've reintroduced a world where a bug can lead those "pointers" to point to nothing (beyond the Vec bounds) or to point to incorrect data:

ledger.register(alice);
ledger.register(bob);
ledger.register(charlie);
println!("Register starts as: {ledger:?}");

ledger.deregister(0).unwrap();
println!("Register after deregister(0): {ledger:?}");

let also_alice = ledger.named("Alice");
// Alice still has index 0, which is now Bob
println!("Alice is {also_alice:?}");

let also_bob = ledger.named("Bob");
// Bob still has index 1, which is now Charlie
println!("Bob is {also_bob:?}");

let also_charlie = ledger.named("Charlie");
// Charlie still has index 2, which is now beyond the Vec
println!("Charlie is {also_charlie:?}");

The code here uses a custom Debug implementation (not shown), in order to reduce the size of the output; this truncated output is as follows:

Register starts as: {
  by_arrival: [{n: 'Alice', ...}, {n: 'Bob', ...}, {n: 'Charlie', ...}]
  by_name: {"Alice": 0, "Bob": 1, "Charlie": 2}
}
Register after deregister(0): {
  by_arrival: [{n: 'Bob', ...}, {n: 'Charlie', ...}]
  by_name: {"Alice": 0, "Bob": 1, "Charlie": 2}
}
Alice is Some(Guest { name: "Bob", address: "234 Bobton" })
Bob is Some(Guest { name: "Charlie", address: "345 Charlieland" })
Charlie is None

The preceding example showed a bug in the deregister code, but even after that bug is fixed, there's nothing to prevent a caller from hanging onto an index value and using it with nth()—getting unexpected or invalid results.

The core problem is that the two data structures need to be kept in sync. A better approach for handling this is to use Rust's smart pointers instead (Item 8). Shifting to a combination of Rc and RefCell avoids the invalidation problems of using indices as pseudo-pointers. Updating the example—but keeping the bug in it—gives the following:

mod rc {
    use super::{Error, Guest};
    use std::{cell::RefCell, rc::Rc};

    #[derive(Default)]
    pub struct GuestRegister {
        by_arrival: Vec<Rc<RefCell<Guest>>>,
        by_name: std::collections::BTreeMap<String, Rc<RefCell<Guest>>>,
    }

    impl GuestRegister {
        pub fn register(&mut self, guest: Guest) {
            let name = guest.name.clone();
            let guest = Rc::new(RefCell::new(guest));
            self.by_arrival.push(guest.clone());
            self.by_name.insert(name, guest);
        }
        pub fn deregister(&mut self, idx: usize) -> Result<(), Error> {
            if idx >= self.by_arrival.len() {
                return Err(Error::new("out of bounds"));
            }
            self.by_arrival.remove(idx);

            // Oops, still forgot to update `by_name`.

            Ok(())
        }
        // ...
    }
}
Register starts as: {
  by_arrival: [{n: 'Alice', ...}, {n: 'Bob', ...}, {n: 'Charlie', ...}]
  by_name: [("Alice", {n: 'Alice', ...}), ("Bob", {n: 'Bob', ...}),
            ("Charlie", {n: 'Charlie', ...})]
}
Register after deregister(0): {
  by_arrival: [{n: 'Bob', ...}, {n: 'Charlie', ...}]
  by_name: [("Alice", {n: 'Alice', ...}), ("Bob", {n: 'Bob', ...}),
            ("Charlie", {n: 'Charlie', ...})]
}
Alice is Some(RefCell { value: Guest { name: "Alice",
                                       address: "123 Aliceville" } })
Bob is Some(RefCell { value: Guest { name: "Bob",
                                     address: "234 Bobton" } })
Charlie is Some(RefCell { value: Guest { name: "Charlie",
                                         address: "345 Charlieland" } })

The output no longer has mismatched names, but a lingering entry for Alice remains until we fix the bug by ensuring that the two collections stay in sync:

pub fn deregister(&mut self, idx: usize) -> Result<(), Error> {
    if idx >= self.by_arrival.len() {
        return Err(Error::new("out of bounds"));
    }
    let guest: Rc<RefCell<Guest>> = self.by_arrival.remove(idx);
    self.by_name.remove(&guest.borrow().name);
    Ok(())
}
Register after deregister(0): {
  by_arrival: [{n: 'Bob', ...}, {n: 'Charlie', ...}]
  by_name: [("Bob", {n: 'Bob', ...}), ("Charlie", {n: 'Charlie', ...})]
}
Alice is None
Bob is Some(RefCell { value: Guest { name: "Bob",
                                     address: "234 Bobton" } })
Charlie is Some(RefCell { value: Guest { name: "Charlie",
                                         address: "345 Charlieland" } })

Smart pointers

The final variation of the previous section is an example of a more general approach: use Rust's smart pointers for interconnected data structures.

Item 8 described the most common smart pointer types provided by Rust's standard library:

  • Rc allows shared ownership, with multiple things referring to the same item. Rc is often combined with RefCell.
  • RefCell allows interior mutability so that internal state can be modified without needing a mutable reference. This comes at the cost of moving borrow checks from compile time to runtime.
  • Arc is the multithreading equivalent to Rc.
  • Mutex (and RwLock) allows interior mutability in a multithreading environment, roughly equivalent to RefCell.
  • Cell allows interior mutability for Copy types.

For programmers who are adapting from C++ to Rust, the most common tool to reach for is Rc<T> (and its thread-safe cousin Arc<T>), often combined with RefCell (or the thread-safe alternative Mutex). A naive translation of shared pointers (or even std::shared_ptrs) to Rc<RefCell<T>> instances will generally give something that works in Rust without too much complaint from the borrow checker.

However, this approach means that you miss out on some of the protections that Rust gives you. In particular, situations where the same item is mutably borrowed (via borrow_mut()) while another reference exists result in a runtime panic! rather than a compile-time error.

For example, one pattern that breaks the one-way flow of ownership in tree-like data structures is when there's an "owner" pointer back from an item to the thing that owns it, as shown in Figure 3-3. These owner links are useful for moving around the data structure; for example, adding a new sibling to a Leaf needs to involve the owning Branch.

Representation of a tree data structure. At the top left is a rectangle representing a Tree struct. It
has boxes for the id and branches fields, an arrow leads from the branches field to a stack of rectangles
representing the Branch struct. The Branch struct has 3 boxes for fields: id, leaves and owner.  The owner field has an
arrow that points back to the Tree struct in the top left.  The leaves field has an arrow that leads to a stack of
rectangles representing the Leaf struct in the bottom right.  The Leaf struct has id and owner fields, and the owner
field has an arrow pointing back to the Branch struct.

Figure 3-3. Tree data structure layout

Implementing this pattern in Rust can make use of Rc<T>'s more tentative partner, Weak<T>:

#![allow(unused)]
fn main() {
use std::{
    cell::RefCell,
    rc::{Rc, Weak},
};

// Use a newtype for each identifier type.
struct TreeId(String);
struct BranchId(String);
struct LeafId(String);

struct Tree {
    id: TreeId,
    branches: Vec<Rc<RefCell<Branch>>>,
}

struct Branch {
    id: BranchId,
    leaves: Vec<Rc<RefCell<Leaf>>>,
    owner: Option<Weak<RefCell<Tree>>>,
}

struct Leaf {
    id: LeafId,
    owner: Option<Weak<RefCell<Branch>>>,
}
}

The Weak reference doesn't increment the main refcount and so has to explicitly check whether the underlying item has gone away:

impl Branch {
    fn add_leaf(branch: Rc<RefCell<Branch>>, mut leaf: Leaf) {
        leaf.owner = Some(Rc::downgrade(&branch));
        branch.borrow_mut().leaves.push(Rc::new(RefCell::new(leaf)));
    }

    fn location(&self) -> String {
        match &self.owner {
            None => format!("<unowned>.{}", self.id.0),
            Some(owner) => {
                // Upgrade weak owner pointer.
                let tree = owner.upgrade().expect("owner gone!");
                format!("{}.{}", tree.borrow().id.0, self.id.0)
            }
        }
    }
}

If Rust's smart pointers don't seem to cover what's needed for your data structures, there's always the final fallback of writing unsafe code that uses raw (and decidedly un-smart) pointers. However, as per Item 16, this should very much be a last resort—someone else might have already implemented the semantics you want, inside a safe interface, and if you search the standard library and crates.io, you might find just the tool for the job.

For example, imagine that you have a function that sometimes returns a reference to one of its inputs but sometimes needs to return some freshly allocated data. In line with Item 1, an enum that encodes these two possibilities is the natural way to express this in the type system, and you could then implement various pointer traits described in Item 8. But you don't have to: the standard library already includes the std::borrow::Cow type that covers exactly this scenario once you know it exists.3

Self-referential data structures

One particular battle with the borrow checker always stymies programmers arriving at Rust from other languages: attempting to create self-referential data structures, which contain a mixture of owned data together with references to within that owned data:

struct SelfRef {
    text: String,
    // The slice of `text` that holds the title text.
    title: Option<&str>,
}

At a syntactic level, this code won't compile because it doesn't comply with the lifetime rules described in Item 14: the reference needs a lifetime annotation, and that means the containing data structure would also need a lifetime parameter. But a lifetime would be for something external to this SelfRef struct, which is not the intent: the data being referenced is internal to the struct.

It's worth thinking about the reason for this restriction at a more semantic level. Data structures in Rust can move: from the stack to the heap, from the heap to the stack, and from one place to another. If that happens, the "interior" title pointer would no longer be valid, and there's no way to keep it in sync.

A simple alternative for this case is to use the indexing approach explored earlier: a range of offsets into the text is not invalidated by a move and is invisible to the borrow checker because it doesn't involve references:

#![allow(unused)]
fn main() {
struct SelfRefIdx {
    text: String,
    // Indices into `text` where the title text is.
    title: Option<std::ops::Range<usize>>,
}
}

However, this indexing approach works only for simple examples and has the same drawbacks as noted previously: the index itself becomes a pseudo-pointer that can become out of sync or even refer to ranges of the text that no longer exist.

A more general version of the self-reference problem turns up when the compiler deals with async code.4 Roughly speaking, the compiler bundles up a pending chunk of async code into a closure, which holds both the code and any captured parts of the environment that the code works with (as described in Item 2). This captured environment can include both values and references to those values. That's inherently a self-referential data structure, and so async support was a prime motivation for the Pin type in the standard library. This pointer type "pins" its value in place, forcing the value to remain at the same location in memory, thus ensuring that internal self-references remain valid.

So Pin is available as a possibility for self-referential types, but it's tricky to use correctly—be sure to read the official docs.

Where possible, avoid self-referential data structures, or try to find library crates that encapsulate the difficulties for you (e.g., ouroborous).

Things to Remember

  • Rust's references are borrowed, indicating that they cannot be held forever.
  • The borrow checker allows multiple immutable references or a single mutable reference to an item but not both. The lifetime of a reference stops at the point of last use, rather than at the end of the enclosing scope, due to non-lexical lifetimes.
  • Errors from the borrow checker can be dealt with in various ways:
    • Adding an additional { ... } scope can reduce the extent of a value's lifetime.
    • Adding a named local variable for a value extends the value's lifetime to the end of the scope.
    • Temporarily adding multiple local variables can help narrow down what the borrow checker is complaining about.
  • Rust's smart pointer types provide ways around the borrow checker's rules and so are useful for interconnected data structures.
  • However, self-referential data structures remain awkward to deal with in Rust.

1

Note that all bets are off with expressions like m!(value) that involve a macro (Item 28), because that can expand to arbitrary code.

2

The compiler's suggestion doesn't help here, because item is needed on the subsequent line.

3

Cow stands for clone-on-write; a copy of the underlying data is made only if a change (write) needs to be made to it.

4

Dealing with async code is beyond the scope of this book; to understand more about its need for self-referential data structures, see Chapter 8 of Rust for Rustaceans by Jon Gjengset (No Starch Press).