Item 14: Understand lifetimes
This Item describes Rust's lifetimes, which are a more precise formulation of a concept that existed in previous compiled languages like C and C++ – in practice if not in theory. Lifetimes are a required input for the borrow checker described in Item 15; taken together these features form the heart of Rust's memory safety guarantees.
Introduction to the Stack
Lifetimes are fundamentally related to the stack, so a quick introduction/reminder is in order.
While a program is running, the memory that it uses is divided up into different chunks, sometimes called segments. Some of these chunks are a fixed size, such as the ones that hold the program code or the program's global data, but two of the chunks – the heap and the stack – change size as the program runs. To allow for this, they are typically arranged at opposite ends of the program's virtual memory space, so one can grow downwards and the other can grow upwards (at least until your program runs out of memory and crashes).
Of these two dynamically sized chunks, the stack is used to hold state related to the currently executing function,
specifically its parameters, local variables and temporary values, held in a stack frame. When a function f()
is called, a new stack frame is added to the stack, beyond where the stack frame for the calling function ends, and the
CPU normally updates a register – the stack pointer – to point to the new stack frame.
When the inner function f()
returns, the stack pointer is reset to where it was before the call, which will be the
caller's stack frame, intact and unmodified.
When the caller invokes a different function g()
, the process happens again, which means that the stack frame for
g()
will re-use the same area of memory that f()
previously used.
#![allow(unused)] fn main() { fn caller() -> u64 { let x = 42u64; let y = 19u64; f(x) + g(y) } fn f(f_param: u64) -> u64 { let two = 2; f_param + two } fn g(g_param: u64) -> u64 { let arr = [2, 3]; g_param + arr[1] } }
Of course, this is a dramatically simplified version of what really goes on; putting things on and off the stack takes time and so there are many optimizations for real processors. However, the simplified conceptual picture is enough for understanding the subject of this Item.
Evolution of Lifetimes
The previous section explained how parameters and local variables are stored on the stack, but only ephemerally. Historically, this allowed for some dangerous footguns: what happens if you hold onto a pointer to one of these ephemeral stack values?
Starting back with C, it was perfectly OK to return a pointer to a local variable (although modern compilers will emit a warning for it):
/* C code. */
struct File* open_bugged() {
struct File f = { open("README.md", O_RDONLY) };
return &f; // return address of stack object
}
You might get away with this, if you're unlucky and the calling code uses the returned value immediately:
struct File* f = open_bugged();
printf("in caller: file at %p has fd=%d\n", f, f->fd);
in caller: file at 0x7ff7b5ca9408 has fd=3
This is unlucky because it only appears to work. As soon as any other function calls happen, the stack area will be re-used and the memory that used to hold the object will be overwritten:
investigate_file(f);
/* C code. */
void investigate_file(struct File* f) {
long array[4] = {1, 2, 3, 4}; // put things on the stack
printf("in function: file at %p has fd=%d\n", f, f->fd);
}
in function: file at 0x7ff7b5ca9408 has fd=1842872565
Trashing the contents of the object has an additional bad effect for this example: the file descriptor corresponding to the open file is lost, and so the program leaks the resource that was held in the data structure.
Moving forwards in time to C++, this problem of losing access to resources was solved by the inclusion of destructors, enabling RAII (cf. Item 11). Now, the things on the stack have the ability to tidy themselves up: if the object holds some kind of resource, the destructor can tidy it up and the C++ compiler guarantees that the destructor of an object on the stack gets called as part of tidying up the stack frame.
// C++ code.
~File() {
std::cout << "~File(): close fd " << fd << "\n";
close(fd);
fd = -1;
}
The caller now gets an (invalid) pointer to an object that's been destroyed and its resources reclaimed:
File* f = open_bugged();
printf("in caller: file at %p has fd=%d\n", f, f->fd);
~File(): close fd 3
in caller: file at 0x7ff7b57ef408 has fd=-1
However, C++ did nothing to help with the problem of dangling pointers: it's still possible to hold on to a pointer to an object that's gone (and whose destructor has been called).
// C++ code.
void investigate_file(File* f) {
long array[4] = {1, 2, 3, 4}; // put things on the stack
std::cout << "in function: file at " << f << " has fd=" << f->fd << "\n";
}
in function: file at 0x7ff7b57ef408 has fd=1711145032
As a C/C++ programmer, it's up to you to notice this, and make sure that you don't dereference a pointer that points to something that's gone. Alternatively, if you're an attacker and you find one of these dangling pointers, you're more likely to cackle maniacally and gleefully dereference the pointer on your way to an exploit.
Enter Rust. One of Rust's core attractions is that it fundamentally solves the problem of dangling pointers, immediately solving a large fraction1 of security problems.
Doing so requires that the concept of lifetimes move from the background (where C/C++ programmers have to just know to
watch out for them, without any language support) to the foreground: every type that includes an ampersand &
has
an associated lifetime ('a
), even if the compiler lets you omit mention of it much of the time.
Scope of a Lifetime
The lifetime of an item on the stack is the period where that item is guaranteed to stay in the same place; in other words, this is exactly the period where a reference (pointer) to item is guaranteed not to become invalid.
This starts at the point where the item is created, and extends to where it is either:
(The ubiquity of the latter is sometimes surprising for programmers coming from C/C++: Rust moves items from one place on the stack to another, or from the stack to the heap, or from the heap to the stack, in lots of situations.)
Precisely where an item gets automatically dropped depends on whether an item has a name or not.
Local variables and function parameters have names, and the corresponding item's lifetime starts when the item is created and the name is populated:
- For a local variable: at the
let var = ...
declaration. - For a function parameter: as part of setting up the execution frame for the function invocation.
The lifetime for a named item ends when the item is either moved somewhere else, or when the name goes out of scope:
{
let item1 = Item { contents: 1 }; // `item1` created here
let item2 = Item { contents: 2 }; // `item2` created here
println!("item1 = {:?}, item2 = {:?}", item1, item2);
consuming_fn(item2); // `item2` moved here
} // `item1` dropped here
It's also possible to build an item "on the fly", as part of an expression that's then fed into something else. These unnamed temporary items are then dropped when they're no longer needed. One over-simplified but helpful way to think about this is to imagine that each branch of the expression's syntax tree gets expanded to its own block, with temporary variables being inserted by the compiler. For example, an expression like:
let x = f((a + b) * 2);
would be roughly equivalent to:
let x = {
let temp1 = a + b;
{
let temp2 = temp1 * 2;
f(temp2)
} // `temp2` dropped here
}; // `temp1` dropped here
By the time execution reaches the semicolon at the end of the line, the temporaries have all been dropped.
One way to see what the compiler calculates as an item's lifetime is to insert a deliberate error for the borrow checker (Item 15) to detect. For example, hold onto a reference beyond the lifetime's scope:
let r: &Item;
{
let item = Item { contents: 42 };
r = &item;
}
println!("r.contents = {}", r.contents);
The error message indicates the exact endpoint of item
's lifetime:
error[E0597]: `item` does not live long enough
--> lifetimes/src/main.rs:206:17
|
206 | r = &item;
| ^^^^^ borrowed value does not live long enough
207 | }
| - `item` dropped here while still borrowed
208 | println!("r.contents = {}", r.contents);
| ---------- borrow later used here
Similarly, for an unnamed temporary:
let r: &Item = fn_returning_ref(&mut Item { contents: 42 });
println!("r.contents = {}", r.contents);
the error message shows the endpoint at the end of the expression:
error[E0716]: temporary value dropped while borrowed
--> lifetimes/src/main.rs:236:46
|
236 | let r: &Item = fn_returning_ref(&mut Item { contents: 42 });
| ^^^^^^^^^^^^^^^^^^^^^ - temporary value is freed at the end of this statement
| |
| creates a temporary which is freed while still in use
237 | println!("r.contents = {}", r.contents);
| ---------- borrow later used here
|
= note: consider using a `let` binding to create a longer lived value
One final point about the lifetimes of references: if the compiler can prove to itself that there is no use of a reference beyond a certain point in the code, then it treats the endpoint of the reference's lifetime as the last place it's used, rather than the end of the enclosing scope. This feature (known as non-lexical lifetimes) allows the borrow checker to be a little bit more generous:
#![allow(unused)] fn main() { { let mut s: String = "Hello, world".to_string(); // `s` owns the `String` let greeting = &mut s[..5]; // mutable reference to `String` greeting.make_ascii_uppercase(); // .. no use of `greeting` after this point let r: &str = &s; // immutable reference to `String` println!("s = '{}'", r); // s = 'HELLO, world' } // where the mutable reference `greeting` would naively be dropped }
Algebra of Lifetimes
Although lifetimes are ubiquitous when dealing with references in Rust, you don't get to specify them in any detail
– there's no way to say "I'm dealing with a lifetime that extends from line 17 to line 32 of ref.rs
". (There's
one partial exception to this, covered below: 'static
.)
Instead, your code refers to lifetimes with arbitrary labels, conventionally 'a
, 'b
, 'c
, …, and the
compiler has its own internal, inaccessible representation of what that equates to in the source code.
You don't get to do much with these lifetime labels; the main thing that's possible is to compare one label with another, repeating a label to indicate that two lifetimes are the "same". (Later, we'll see that it's also possible to specify that one lifetime must be bigger than another, when expressed as the lifetime bounds for a generic.)
This algebra of lifetimes is easiest to illustrate with function signatures: if the inputs and outputs of a function deal with references, what's the relationship between their lifetimes?
The most common case is a function that receives a single reference as input, and emits a reference as output. The
output reference must have a lifetime, but what can it be? There's only one possibility to choose from: the lifetime of
the input, which means that they both share the same label, say 'a
:
pub fn first<'a>(data: &'a [Item]) -> Option<&'a Item> {
Because this variant is so common, and because there's (almost) no choice about what the output lifetime can be, Rust has lifetime elision rules that mean you don't have to explicitly write the lifetimes for this case. A more idiomatic version of the same function signature would be:
pub fn first(data: &[Item]) -> Option<&Item> {
What if there's more than one choice of input lifetimes to map to an output lifetime? In this case, the compiler can't figure out what to do:
pub fn find(haystack: &[u8], needle: &[u8]) -> Option<&[u8]> {
error[E0106]: missing lifetime specifier
--> lifetimes/src/main.rs:399:59
|
399 | pub fn find(haystack: &[u8], needle: &[u8]) -> Option<&[u8]> {
| ----- ----- ^ expected named lifetime parameter
|
= help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `haystack` or `needle`
help: consider introducing a named lifetime parameter
|
399 | pub fn find<'a>(haystack: &'a [u8], needle: &'a [u8]) -> Option<&'a [u8]> {
| ++++ ++ ++ ++
A shrewd guess based on the function and parameter names is that the intended lifetime for the output here is expected
to match the haystack
input:
pub fn find<'a, 'b>(
haystack: &'a [u8],
needle: &'b [u8],
) -> Option<&'a [u8]> {
Interestingly, the compiler suggested a different alternative: having both inputs to the function use the same
lifetime 'a'
. For example, a function where this combination of lifetimes might make sense is:
pub fn smaller<'a>(left: &'a Item, right: &'a Item) -> &'a Item {
This appears to imply that the two input lifetimes are the "same", but the scare quotes (here and above) are included to signify that that's not quite what's going on.
The raison d'être of lifetimes is to ensure that references to items don't out-live the items themselves; with this in
mind, an output lifetime 'a
that's the "same" as an input lifetime 'a
just means that the input has to live longer
than the output.
When there are two input lifetimes 'a
that are the "same", that just means that the output lifetime has to be
contained within the lifetimes of both of the inputs:
{
let outer = Item { contents: 7 };
{
let inner = Item { contents: 8 };
{
let min = smaller(&inner, &outer);
println!("smaller of {:?} and {:?} is {:?}", inner, outer, min);
} // `min` dropped
} // `inner` dropped
} // `outer` dropped
To put it another way, the output lifetime has to be subsumed within the smaller of the lifetimes of the two inputs.
In contrast, if the output lifetime is unrelated to the lifetime of one of the inputs, then there's no requirement for those lifetimes to nest:
{
let haystack = b"123456789"; // start of lifetime 'a
let found = {
let needle = b"234"; // start of lifetime 'b
find(haystack, needle)
}; // end of lifetime 'b
println!("found = {:?}", found); // `found` use within 'a, outside of 'b
} // end of lifetime 'a
Lifetime Elision Rules
In addition to the "one in, one out" elision rule described above, there are two other elision rules that mean that lifetimes can be omitted.
The first occurs when there are no references in the outputs from a function; in this case, each of the input references automatically gets its own lifetime, different from any of the other input parameters.
The second occurs for methods that use a reference to self
(either &self
or &mut self
); in this case, the compiler
assumes that any output references take the lifetime of self
, as this turns out to be (by far) the most common situation.
Summarizing the elision rules for functions:
- One input, one or more outputs: assume outputs have the "same" lifetime as the input.
fn f(x: &Item) -> (&Item, &Item) // ... is equivalent to ... fn f<'a>(x: &'a Item) -> (&'a Item, &'a Item)
- Multiple inputs, no output: assume all the inputs have different lifetimes.
fn f(x: &Item, y: &Item, z: &Item) -> i32 // ... is equivalent to ... fn f<'a, 'b, 'c>(x: &'a Item, y: &'b Item, z: &'c Item) -> i32
- Multiple inputs including
&self
, one or more outputs: assume output lifetime(s) are the "same" as&self
's lifetime.fn f(&self, y: &Item, z: &Item) -> &Thing // ... is equivalent to ... fn f(&'a self, y: &'b Item, z: &'c Item) -> &'a Thing
The 'static
Lifetime
The previous section described various different possible mappings between the input and output reference lifetimes for a function, but it neglected to cover one special case. What happens if there are no input lifetimes, but the output return value includes a reference anyway?
pub fn the_answer() -> &Item {
error[E0106]: missing lifetime specifier
--> lifetimes/src/main.rs:411:28
|
411 | pub fn the_answer() -> &Item {
| ^ expected named lifetime parameter
|
= help: this function's return type contains a borrowed value, but there is no value for it to be borrowed from
help: consider using the `'static` lifetime
|
411 | pub fn the_answer() -> &'static Item {
| ~~~~~~~~
The only allowed possibility is for the returned reference to have a lifetime that's guaranteed to never go out of
scope. This is indicated by the special lifetime 'static
, which is also the only lifetime that has a
specific name rather than a placeholder label.
pub fn the_answer() -> &'static Item {
The simplest way to get something with the 'static
lifetime is to take a reference to a global variable that's
been marked as static
:
static ANSWER: Item = Item { contents: 42 };
pub fn the_answer() -> &'static Item {
&ANSWER
}
The Rust compiler guarantees that a static
item always has the same address for the entire duration of the program,
and never moves. This means that a reference to a static
item has a 'static
lifetime, logically enough.
Note that a const
global variable does not have the same guarantee: only the value is guaranteed to be the same
everywhere, and the compiler is allowed to make as many copies as it likes, wherever the variable is used. These
potential copies may be ephemeral, and so won't satisfy the 'static
requirements:
const ANSWER: Item = Item { contents: 42 };
pub fn the_answer() -> &'static Item {
&ANSWER
}
error[E0515]: cannot return reference to temporary value
--> lifetimes/src/main.rs:424:9
|
424 | &ANSWER
| ^------
| ||
| |temporary value created here
| returns a reference to data owned by the current function
There's one more possible way to get something with a 'static
lifetime. The key promise of 'static
is that the
lifetime should outlive any other lifetime in the program; a value that's allocated on the heap but never freed
also satisfies this constraint.
A normal heap-allocated Box<T>
doesn't work for this, because there's no guarantee (as described in the next section)
that the item won't get dropped along the way:
{
let boxed = Box::new(Item { contents: 12 });
let r: &'static Item = &boxed;
println!("'static item is {:?}", r);
}
error[E0597]: `boxed` does not live long enough
--> lifetimes/src/main.rs:318:32
|
318 | let r: &'static Item = &boxed;
| ------------- ^^^^^^ borrowed value does not live long enough
| |
| type annotation requires that `boxed` is borrowed for `'static`
319 | println!("'static item is {:?}", r);
320 | }
| - `boxed` dropped here while still borrowed
However, the Box::leak
function
converts an owned Box<T>
to a mutable reference to T
. There's no longer an owner for the value, so it can never be
dropped – which satisfies the requirements for the 'static
lifetime:
{
let boxed = Box::new(Item { contents: 12 });
// `leak()` consumes the `Box<T>` and returns `&mut T`.
let r: &'static Item = Box::leak(boxed);
println!("'static item is {:?}", r);
} // `boxed` not dropped here, because it was already moved
The inability to drop the item also means that the memory that holds the item can never be reclaimed using safe Rust,
possibly leading to a permanent memory leak. Recovering the memory requires unsafe
code, which makes this a technique
to reserve for special circumstances.
Lifetimes and the Heap
The discussion so far has concentrated on the lifetimes of items on the stack, whether function parameters, local variables or temporaries. But what about items on the heap?
The key thing to realize about heap values is that every item has an owner (excepting special cases like the deliberate
leaks described in the previous section). For example, a simple Box<T>
puts the T
value on the heap, with the owner
being the variable holding the Box<T>
:
{
let b: Box<Item> = Box::new(Item { contents: 42 });
} // `b` dropped here, so `Item` dropped too.
The owning Box<Item>
drops its contents when it goes out of scope, so the lifetime of the Item
on the heap is the
same as the lifetime of the Box<Item>
variable on the stack.
The owner of a value on the heap may itself be on the heap rather than the stack, but then who owns the owner?
{
let b: Box<Item> = Box::new(Item { contents: 42 });
let bb: Box<Box<Item>> = Box::new(b); // `b` moved onto heap here
} // `b` dropped here, so `Box<Item>` dropped too, so `Item` dropped too
The chain of ownership has to end somewhere, and there are only two possibilities:
- The chain ends at a local variable or function parameter – in which case the lifetime of everything in the chain
is just the lifetime
'a
of that stack variable. When the stack variable goes out of scope, everything in the chain is dropped too. - The chain ends at a global variable marked as
static
– in which case the lifetime of everything in the chain is'static
. Thestatic
variable never goes out of scope, so nothing in the chain ever gets automatically dropped.
As a result, the lifetimes of items on the heap are fundamentally tied to stack lifetimes.
Lifetimes in Data Structures
The earlier section on the algebra of lifetimes concentrated on inputs and outputs for functions, but there are similar concerns when references are stored in data structures.
If we try to sneak a reference into a data structure without mentioning an associated lifetime, the compiler brings us up sharply:
pub struct ReferenceHolder {
pub index: usize,
pub item: &Item,
}
error[E0106]: missing lifetime specifier
--> lifetimes/src/main.rs:452:19
|
452 | pub item: &Item,
| ^ expected named lifetime parameter
|
help: consider introducing a named lifetime parameter
|
450 ~ pub struct ReferenceHolder<'a> {
451 | pub index: usize,
452 ~ pub item: &'a Item,
|
As usual, the compiler error message tells what to do. The first part is simple enough: give the reference type an
explicit lifetime 'a
, because there are no lifetime elision rules when using references in data structures.
The second part is less obvious and has deeper consequences: the data structure itself has to have a
lifetime annotation <'a>
that matches the lifetime of the reference contained within it:
pub struct ReferenceHolder<'a> {
pub index: usize,
pub item: &'a Item,
}
The lifetime annotation for the data structure is infectious: any containing data structure that uses the type also has to acquire a lifetime annotation:
// Annotation includes lifetimes of all fields
pub struct MultiRefHolder<'a, 'b> {
pub left: ReferenceHolder<'a>,
pub right: ReferenceHolder<'b>, // Could choose 'a instead here
}
This makes sense: anything that contains a reference, no matter how deeply nested, is only valid for the lifetime of the item referred to. If that item is moved or dropped, then the whole chain of data structures is no longer valid.
However, this does also mean that data structures involving references are harder to use – the owner of the data
structure has to ensure that the lifetimes all line up. As a result, prefer data structures that own their contents
where possible, particularly if the code doesn't need to be highly optimized (Item 20). Where that's not possible, the
various smart pointer types (e.g. Rc
) described in
Item 9 can help untangle the lifetime constraints.
Lifetime Bounds
Generic code (Item 12) involves some unknown type T
, often constrained by a trait bound T: SomeTrait
. But what happens if the code that's built around T
is a reference type?
There's various different ways that references can make their way into the generics mix:
- A generic might take a
<T>
, but include code that deals with references-to-T
, for example&T
. - The type
T
might itself be a reference type, for example&Thing
, or some bigger data structureMultiRefHolder<'a, 'b>
that includes references.
Regardless of the route by which references arise, any generic data structure needs to propagate their associated lifetimes, as in the previous section.
The main way to allow for this is to specify lifetime
bounds, which indicate that either a type
(T: 'b
) or specific lifetime 'a: 'b
has to outlive some other lifetime 'b
.
For example, consider a type that wraps a reference (somewhat akin to the Ref
type returned by
RefCell::borrow()
):
pub struct Ref<'a, T: 'a>(&'a T);
This generic data structure holds an explicit reference &'a T
, as per the first bullet above. But the type T
might
itself contain references with some lifetime 'b
, as per the second bullet above. If T
's inherent lifetime 'b
were
smaller than the exterior lifetime 'a
we'd have a potential disaster: the Ref
would be holding a reference to a data
structure whose own references have gone bad.
To prevent this, we need 'b
to be larger than 'a
; for named lifetimes this would be written as 'b: 'a
, but we need
to say that slightly differently, as T: 'a
. Roughly translated into words, that says "any references in the type T
must have a lifetime that outlives 'a
", and that makes Ref
safe: if its own lifetime ('a
) is still valid, then so
are any references hidden inside T
.
Translating lifetime bounds into words also helps with a 'static
lifetime bound like T: 'static
. This says that "any
references in the type T
must have a 'static
lifetime", which means that T
can't have any dynamic references. Any
non-reference type T
that owns its contents – String
, Vec
etc. – satisfies this bound, but any type
that has an <'a>
creeping in does not.
One common place this shows up is when you try to move values between threads with
std::thread::spawn
. The moved values need to be of types that
implement Send
(see Item 17), indicating that they're safe to move between threads, but they also need to not contain
any dynamic references (the 'static
lifetime bound). This makes sense when you realize that a reference to something
on the stack now raises the question: which stack? Each thread's stack is independent, and so lifetimes can't be
tracked between them.
1: For example, the Chromium project estimates that 70% of security bugs are due to memory safety.