Item 20: Avoid the temptation to over-optimize
"Just because Rust allows you to write super cool non-allocating zero-copy algorithms safely, doesn’t mean every algorithm you write should be super cool, zero-copy and non-allocating." – trentj
Most of the Items in this book are designed to help existing programmers become familiar with Rust and its idioms. This Item, however, is all about a problem that can arise when programmers stray too far in the other direction and become obsessed with exploiting Rust's potential for efficiency—at the expense of usability and maintainability.
Data Structures and Allocation
Like pointers in other languages, Rust's references allow you to reuse data without making copies. Unlike other languages, Rust's rules around reference lifetimes and borrows allow you to reuse data safely. However, complying with the borrow checking rules (Item 15) that make this possible can lead to code that's harder to use.
This is particularly relevant for data structures, where you can choose between allocating a fresh copy of something that's stored in the data structure or including a reference to an existing copy of it.
As an example, consider some code that parses a data stream of bytes, extracting data encoded as type-length-value (TLV) structures where data is transferred in the following format:
- One byte describing the type of the value (stored in the
type_code
field here)1 - One byte describing the length of the value in bytes (used here to create a slice of the specified length)
- Followed by the specified number of bytes for the value (stored in the
value
field):
#![allow(unused)] fn main() { /// A type-length-value (TLV) from a data stream. #[derive(Clone, Debug)] pub struct Tlv<'a> { pub type_code: u8, pub value: &'a [u8], } pub type Error = &'static str; // Some local error type. /// Extract the next TLV from the `input`, also returning the remaining /// unprocessed data. pub fn get_next_tlv(input: &[u8]) -> Result<(Tlv, &[u8]), Error> { if input.len() < 2 { return Err("too short for a TLV"); } // The TL parts of the TLV are one byte each. let type_code = input[0]; let len = input[1] as usize; if 2 + len > input.len() { return Err("TLV longer than remaining data"); } let tlv = Tlv { type_code, // Reference the relevant chunk of input data value: &input[2..2 + len], }; Ok((tlv, &input[2 + len..])) } }
This Tlv
data structure is efficient because it holds a reference to the relevant chunk of the input data, without
copying any of the data, and Rust's memory safety ensures that the reference is always valid. That's perfect for some
scenarios, but things become more awkward if something needs to hang onto an instance of the data structure (as
discussed in Item 15).
For example, consider a network server that is receiving messages in the form of TLVs. The received data can be parsed
into Tlv
instances, but the lifetime of those instances will match that of the incoming message—which might be
a transient Vec<u8>
on the heap or might be a buffer somewhere that gets reused for multiple messages.
That induces a problem if the server code ever wants to store an incoming message so that it can be consulted later:
pub struct NetworkServer<'a> {
// ...
/// Most recent max-size message.
max_size: Option<Tlv<'a>>,
}
/// Message type code for a set-maximum-size message.
const SET_MAX_SIZE: u8 = 0x01;
impl<'a> NetworkServer<'a> {
pub fn process(&mut self, mut data: &'a [u8]) -> Result<(), Error> {
while !data.is_empty() {
let (tlv, rest) = get_next_tlv(data)?;
match tlv.type_code {
SET_MAX_SIZE => {
// Save off the most recent `SET_MAX_SIZE` message.
self.max_size = Some(tlv);
}
// (Deal with other message types)
// ...
_ => return Err("unknown message type"),
}
data = rest; // Process remaining data on next iteration.
}
Ok(())
}
}
This code compiles as is but is effectively impossible to use: the lifetime of the NetworkServer
has to be smaller
than the lifetime of any data that gets fed into its process()
method. That means that a straightforward processing
loop:
let mut server = NetworkServer::default();
while !server.done() {
// Read data into a fresh vector.
let data: Vec<u8> = read_data_from_socket();
if let Err(e) = server.process(&data) {
log::error!("Failed to process data: {:?}", e);
}
}
fails to compile because the lifetime of the ephemeral data gets attached to the longer-lived server:
error[E0597]: `data` does not live long enough
--> src/main.rs:375:40
|
372 | while !server.done() {
| ------------- borrow later used here
373 | // Read data into a fresh vector.
374 | let data: Vec<u8> = read_data_from_socket();
| ---- binding `data` declared here
375 | if let Err(e) = server.process(&data) {
| ^^^^^ borrowed value does not live
| long enough
...
378 | }
| - `data` dropped here while still borrowed
Switching the code so it reuses a longer-lived buffer doesn't help either:
let mut perma_buffer = [0u8; 256];
let mut server = NetworkServer::default(); // lifetime within `perma_buffer`
while !server.done() {
// Reuse the same buffer for the next load of data.
read_data_into_buffer(&mut perma_buffer);
if let Err(e) = server.process(&perma_buffer) {
log::error!("Failed to process data: {:?}", e);
}
}
This time, the compiler complains that the code is trying to hang on to a reference while also handing out a mutable reference to the same buffer:
error[E0502]: cannot borrow `perma_buffer` as mutable because it is also
borrowed as immutable
--> src/main.rs:353:31
|
353 | read_data_into_buffer(&mut perma_buffer);
| ^^^^^^^^^^^^^^^^^ mutable borrow occurs here
354 | if let Err(e) = server.process(&perma_buffer) {
| -----------------------------
| | |
| | immutable borrow occurs here
| immutable borrow later used here
The core problem is that the Tlv
structure references transient data—which is fine for transient processing
but is fundamentally incompatible with storing state for later. However, if the Tlv
data structure is converted to
own its contents:
#![allow(unused)] fn main() { #[derive(Clone, Debug)] pub struct Tlv { pub type_code: u8, pub value: Vec<u8>, // owned heap data } }
and the get_next_tlv()
code is correspondingly tweaked to include an additional call to .to_vec()
:
// ...
let tlv = Tlv {
type_code,
// Copy the relevant chunk of data to the heap.
// The length field in the TLV is a single `u8`,
// so this copies at most 256 bytes.
value: input[2..2 + len].to_vec(),
};
then the server code has a much easier job. The data-owning Tlv
structure has no lifetime parameter, so the server
data structure doesn't need one either, and both variants of the processing loop work fine.
Who's Afraid of the Big Bad Copy?
One reason why programmers can become overly obsessed with reducing copies is that Rust generally makes copies and
allocations explicit. A visible call to a method like .to_vec()
or .clone()
, or to a function like Box::new()
,
makes it clear that copying and allocation are occurring. This is in contrast to C++, where it's easy to
inadvertently write code that blithely performs allocation under the covers, particularly in a copy-constructor or
assignment operator.
Making an allocation or copy operation visible rather than hidden isn't a good reason to optimize it away, especially if that happens at the expense of usability. In many situations, it makes more sense to focus on usability first, and fine-tune for optimal efficiency only if performance is genuinely a concern—and if benchmarking (see Item 30) indicates that reducing copies will have a significant impact.
Also, the efficiency of your code is usually important only if it needs to scale up for extensive use. If it turns out that the trade-offs in the code are wrong, and it doesn't cope well when millions of users start to use it—well, that's a nice problem to have.
However, there are a couple of specific points to remember. The first was hidden behind the weasel word generally
when pointing out that copies are generally visible. The big exception to this is Copy
types, where the compiler
silently makes copies willy-nilly, shifting from move semantics to copy semantics. As such, the advice in
Item 10 bears repeating here: don't implement Copy
unless a bitwise copy is valid and fast. But the converse is true
too: do consider implementing Copy
if a bitwise copy is valid and fast. For example, enum
types that
don't carry additional data are usually easier to use if they derive Copy
.
The second point that might be relevant is the potential trade-off with no_std
use. Item 33 suggests that it's
often possible to write code that's no_std
-compatible with only minor modifications, and code that avoids allocation
altogether makes this more straightforward. However, targeting a no_std
environment that supports heap allocation
(via the alloc
library, also described in Item 33) may give the best balance of usability and no_std
support.
References and Smart Pointers
"So very recently, I've consciously tried the experiment of not worrying about the hypothetical perfect code. Instead, I call .clone() when I need to, and use Arc to get local objects into threads and futures more smoothly.
And it feels glorious." – Josh Triplett
Designing a data structure so that it owns its contents can certainly make for better ergonomics, but there are still potential problems if multiple data structures need to make use of the same information. If the data is immutable, then each place having its own copy works fine, but if the information might change (which is very commonly the case), then multiple copies means multiple places that need to be updated, in sync with each other.
Using Rust's smart pointer types helps solve this problem, by allowing the design to shift from a single-owner model to
a shared-owner model. The Rc
(for single-threaded code) and Arc
(for multithreaded code) smart
pointers provide reference counting that supports this shared-ownership model. Continuing with the assumption that
mutability is needed, they are typically paired with an inner type that allows interior mutability, independently
of Rust's borrow checking rules:
RefCell
: For interior mutability in single-threaded code, giving the commonRc<RefCell<T>>
combinationMutex
: For interior mutability in multithreaded code (as per Item 17), giving the commonArc<Mutex<T>>
combination
This transition is covered in more detail in the GuestRegister
example in Item 15, but the point here is that you
don't have to treat Rust's smart pointers as a last resort. It's not an admission of defeat if your design uses smart
pointers instead of a complex web of interconnected reference lifetimes—smart pointers can lead to a
simpler, more maintainable, and more usable design.
The field can't be named type
because that's a reserved keyword in Rust. It's possible to work around this restriction by using the raw
identifier prefix r#
(giving a field
r#type: u8
), but it's normally easier just to rename the field.