Item 20: Avoid the temptation to over-optimize

"Just because Rust allows you to write super cool non-allocating zero-copy algorithms safely, doesn’t mean every algorithm you write should be super cool, zero-copy and non-allocating." – trentj

Most of the Items in this book are designed to help existing programmers become familiar with Rust and its idioms. This Item, however, is all about a problem that can arise when programmers stray too far in the other direction and become obsessed with exploiting Rust's potential for efficiency—at the expense of usability and maintainability.

Data Structures and Allocation

Like pointers in other languages, Rust's references allow you to reuse data without making copies. Unlike other languages, Rust's rules around reference lifetimes and borrows allow you to reuse data safely. However, complying with the borrow checking rules (Item 15) that make this possible can lead to code that's harder to use.

This is particularly relevant for data structures, where you can choose between allocating a fresh copy of something that's stored in the data structure or including a reference to an existing copy of it.

As an example, consider some code that parses a data stream of bytes, extracting data encoded as type-length-value (TLV) structures where data is transferred in the following format:

  • One byte describing the type of the value (stored in the type_code field here)1
  • One byte describing the length of the value in bytes (used here to create a slice of the specified length)
  • Followed by the specified number of bytes for the value (stored in the value field):
#![allow(unused)]
fn main() {
/// A type-length-value (TLV) from a data stream.
#[derive(Clone, Debug)]
pub struct Tlv<'a> {
    pub type_code: u8,
    pub value: &'a [u8],
}

pub type Error = &'static str; // Some local error type.

/// Extract the next TLV from the `input`, also returning the remaining
/// unprocessed data.
pub fn get_next_tlv(input: &[u8]) -> Result<(Tlv, &[u8]), Error> {
    if input.len() < 2 {
        return Err("too short for a TLV");
    }
    // The TL parts of the TLV are one byte each.
    let type_code = input[0];
    let len = input[1] as usize;
    if 2 + len > input.len() {
        return Err("TLV longer than remaining data");
    }
    let tlv = Tlv {
        type_code,
        // Reference the relevant chunk of input data
        value: &input[2..2 + len],
    };
    Ok((tlv, &input[2 + len..]))
}
}

This Tlv data structure is efficient because it holds a reference to the relevant chunk of the input data, without copying any of the data, and Rust's memory safety ensures that the reference is always valid. That's perfect for some scenarios, but things become more awkward if something needs to hang onto an instance of the data structure (as discussed in Item 15).

For example, consider a network server that is receiving messages in the form of TLVs. The received data can be parsed into Tlv instances, but the lifetime of those instances will match that of the incoming message—which might be a transient Vec<u8> on the heap or might be a buffer somewhere that gets reused for multiple messages.

That induces a problem if the server code ever wants to store an incoming message so that it can be consulted later:

pub struct NetworkServer<'a> {
    // ...
    /// Most recent max-size message.
    max_size: Option<Tlv<'a>>,
}

/// Message type code for a set-maximum-size message.
const SET_MAX_SIZE: u8 = 0x01;

impl<'a> NetworkServer<'a> {
    pub fn process(&mut self, mut data: &'a [u8]) -> Result<(), Error> {
        while !data.is_empty() {
            let (tlv, rest) = get_next_tlv(data)?;
            match tlv.type_code {
                SET_MAX_SIZE => {
                    // Save off the most recent `SET_MAX_SIZE` message.
                    self.max_size = Some(tlv);
                }
                // (Deal with other message types)
                // ...
                _ => return Err("unknown message type"),
            }
            data = rest; // Process remaining data on next iteration.
        }
        Ok(())
    }
}

This code compiles as is but is effectively impossible to use: the lifetime of the NetworkServer has to be smaller than the lifetime of any data that gets fed into its process() method. That means that a straightforward processing loop:

let mut server = NetworkServer::default();
while !server.done() {
    // Read data into a fresh vector.
    let data: Vec<u8> = read_data_from_socket();
    if let Err(e) = server.process(&data) {
        log::error!("Failed to process data: {:?}", e);
    }
}

fails to compile because the lifetime of the ephemeral data gets attached to the longer-lived server:

error[E0597]: `data` does not live long enough
   --> src/main.rs:375:40
    |
372 |     while !server.done() {
    |            ------------- borrow later used here
373 |         // Read data into a fresh vector.
374 |         let data: Vec<u8> = read_data_from_socket();
    |             ---- binding `data` declared here
375 |         if let Err(e) = server.process(&data) {
    |                                        ^^^^^ borrowed value does not live
    |                                              long enough
...
378 |     }
    |     - `data` dropped here while still borrowed

Switching the code so it reuses a longer-lived buffer doesn't help either:

let mut perma_buffer = [0u8; 256];
let mut server = NetworkServer::default(); // lifetime within `perma_buffer`

while !server.done() {
    // Reuse the same buffer for the next load of data.
    read_data_into_buffer(&mut perma_buffer);
    if let Err(e) = server.process(&perma_buffer) {
        log::error!("Failed to process data: {:?}", e);
    }
}

This time, the compiler complains that the code is trying to hang on to a reference while also handing out a mutable reference to the same buffer:

error[E0502]: cannot borrow `perma_buffer` as mutable because it is also
              borrowed as immutable
   --> src/main.rs:353:31
    |
353 |         read_data_into_buffer(&mut perma_buffer);
    |                               ^^^^^^^^^^^^^^^^^ mutable borrow occurs here
354 |         if let Err(e) = server.process(&perma_buffer) {
    |                         -----------------------------
    |                         |              |
    |                         |              immutable borrow occurs here
    |                         immutable borrow later used here

The core problem is that the Tlv structure references transient data—which is fine for transient processing but is fundamentally incompatible with storing state for later. However, if the Tlv data structure is converted to own its contents:

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub struct Tlv {
    pub type_code: u8,
    pub value: Vec<u8>, // owned heap data
}
}

and the get_next_tlv() code is correspondingly tweaked to include an additional call to .to_vec():

// ...
let tlv = Tlv {
    type_code,
    // Copy the relevant chunk of data to the heap.
    // The length field in the TLV is a single `u8`,
    // so this copies at most 256 bytes.
    value: input[2..2 + len].to_vec(),
};

then the server code has a much easier job. The data-owning Tlv structure has no lifetime parameter, so the server data structure doesn't need one either, and both variants of the processing loop work fine.

Who's Afraid of the Big Bad Copy?

One reason why programmers can become overly obsessed with reducing copies is that Rust generally makes copies and allocations explicit. A visible call to a method like .to_vec() or .clone(), or to a function like Box::new(), makes it clear that copying and allocation are occurring. This is in contrast to C++, where it's easy to inadvertently write code that blithely performs allocation under the covers, particularly in a copy-constructor or assignment operator.

Making an allocation or copy operation visible rather than hidden isn't a good reason to optimize it away, especially if that happens at the expense of usability. In many situations, it makes more sense to focus on usability first, and fine-tune for optimal efficiency only if performance is genuinely a concern—and if benchmarking (see Item 30) indicates that reducing copies will have a significant impact.

Also, the efficiency of your code is usually important only if it needs to scale up for extensive use. If it turns out that the trade-offs in the code are wrong, and it doesn't cope well when millions of users start to use it—well, that's a nice problem to have.

However, there are a couple of specific points to remember. The first was hidden behind the weasel word generally when pointing out that copies are generally visible. The big exception to this is Copy types, where the compiler silently makes copies willy-nilly, shifting from move semantics to copy semantics. As such, the advice in Item 10 bears repeating here: don't implement Copy unless a bitwise copy is valid and fast. But the converse is true too: do consider implementing Copy if a bitwise copy is valid and fast. For example, enum types that don't carry additional data are usually easier to use if they derive Copy.

The second point that might be relevant is the potential trade-off with no_std use. Item 33 suggests that it's often possible to write code that's no_std-compatible with only minor modifications, and code that avoids allocation altogether makes this more straightforward. However, targeting a no_std environment that supports heap allocation (via the alloc library, also described in Item 33) may give the best balance of usability and no_std support.

References and Smart Pointers

"So very recently, I've consciously tried the experiment of not worrying about the hypothetical perfect code. Instead, I call .clone() when I need to, and use Arc to get local objects into threads and futures more smoothly.

And it feels glorious." – Josh Triplett

Designing a data structure so that it owns its contents can certainly make for better ergonomics, but there are still potential problems if multiple data structures need to make use of the same information. If the data is immutable, then each place having its own copy works fine, but if the information might change (which is very commonly the case), then multiple copies means multiple places that need to be updated, in sync with each other.

Using Rust's smart pointer types helps solve this problem, by allowing the design to shift from a single-owner model to a shared-owner model. The Rc (for single-threaded code) and Arc (for multithreaded code) smart pointers provide reference counting that supports this shared-ownership model. Continuing with the assumption that mutability is needed, they are typically paired with an inner type that allows interior mutability, independently of Rust's borrow checking rules:

  • RefCell: For interior mutability in single-threaded code, giving the common Rc<RefCell<T>> combination
  • Mutex: For interior mutability in multithreaded code (as per Item 17), giving the common Arc<Mutex<T>> combination

This transition is covered in more detail in the GuestRegister example in Item 15, but the point here is that you don't have to treat Rust's smart pointers as a last resort. It's not an admission of defeat if your design uses smart pointers instead of a complex web of interconnected reference lifetimes—smart pointers can lead to a simpler, more maintainable, and more usable design.


1

The field can't be named type because that's a reserved keyword in Rust. It's possible to work around this restriction by using the raw identifier prefix r# (giving a field r#type: u8), but it's normally easier just to rename the field.