Item 9: Consider using iterator transforms instead of explicit loops

The humble loop has had a long journey of increasing convenience and increasing abstraction. The B language (the precursor to C) had only while (condition) { ... }, but with the arrival of C, the common scenario of iterating through indexes of an array became more convenient with the addition of the for loop:

// C code
int i;
for (i = 0; i < len; i++) {
  Item item = collection[i];
  // body
}

The early versions of C++ further improved convenience and scoping by allowing the loop variable declaration to be embedded in the for statement (this was also adopted by C in C99):

// C++98 code
for (int i = 0; i < len; i++) {
  Item item = collection[i];
  // ...
}

Most modern languages abstract the idea of the loop further: the core function of a loop is often to move to the next item of some container. Tracking the logistics that are required to reach that item (index++ or ++it) is mostly an irrelevant detail. This realization produced two core concepts:

  • Iterators: A type whose purpose is to repeatedly emit the next item of a container, until exhausted1
  • For-each loops: A compact loop expression for iterating over all of the items in a container, binding a loop variable to the item rather than to the details of reaching that item

These concepts allow for loop code that's shorter and (more importantly) clearer about what's intended:

// C++11 code
for (Item& item : collection) {
  // ...
}

Once these concepts were available, they were so obviously powerful that they were quickly retrofitted to those languages that didn't already have them (e.g., for-each loops were added to Java 1.5 and C++11).

Rust includes iterators and for-each–style loops, but it also includes the next step in abstraction: allowing the whole loop to be expressed as an iterator transform (sometimes also referred to as an iterator adaptor). As with Item 3's discussion of Option and Result, this Item will attempt to show how these iterator transforms can be used instead of explicit loops, and will give guidance as to when it's a good idea. In particular, iterator transforms can be more efficient than an explicit loop, because the compiler can skip the bounds checks it might otherwise need to perform.

By the end of this Item, a C-like explicit loop to sum the squares of the first five even items of a vector:

#![allow(unused)]
fn main() {
let values: Vec<u64> = vec![1, 1, 2, 3, 5 /* ... */];

let mut even_sum_squares = 0;
let mut even_count = 0;
for i in 0..values.len() {
    if values[i] % 2 != 0 {
        continue;
    }
    even_sum_squares += values[i] * values[i];
    even_count += 1;
    if even_count == 5 {
        break;
    }
}
}

should start to feel more natural expressed as a functional-style expression:

let even_sum_squares: u64 = values
    .iter()
    .filter(|x| *x % 2 == 0)
    .take(5)
    .map(|x| x * x)
    .sum();

Iterator transformation expressions like this can roughly be broken down into three parts:

  • An initial source iterator, from an instance of a type that implements one of Rust's iterator traits
  • A sequence of iterator transforms
  • A final consumer method to combine the results of the iteration into a final value

The first two of these parts effectively move functionality out of the loop body and into the for expression; the last removes the need for the for statement altogether.

Iterator Traits

The core Iterator trait has a very simple interface: a single method next that yields Some items until it doesn't (None). The type of the emitted items is given by the trait's associated Item type.

Collections that allow iteration over their contents—what would be called iterables in other languages—implement the IntoIterator trait; the into_iter method of this trait consumes Self and emits an Iterator in its stead. The compiler will automatically use this trait for expressions of the form:

for item in collection {
    // body
}

effectively converting them to code roughly like:

let mut iter = collection.into_iter();
loop {
    let item: Thing = match iter.next() {
        Some(item) => item,
        None => break,
    };
    // body
}

or more succinctly and more idiomatically:

let mut iter = collection.into_iter();
while let Some(item) = iter.next() {
    // body
}

To keep things running smoothly, there's also an implementation of IntoIterator for any Iterator, which just returns self; after all, it's easy to convert an Iterator into an Iterator!

This initial form is a consuming iterator, using up the collection as it's created:

let collection = vec![Thing(0), Thing(1), Thing(2), Thing(3)];
for item in collection {
    println!("Consumed item {item:?}");
}

Any attempt to use the collection after it's been iterated over fails:

println!("Collection = {collection:?}");
error[E0382]: borrow of moved value: `collection`
   --> src/main.rs:171:28
    |
163 |   let collection = vec![Thing(0), Thing(1), Thing(2), Thing(3)];
    |       ---------- move occurs because `collection` has type `Vec<Thing>`,
    |                  which does not implement the `Copy` trait
164 |   for item in collection {
    |               ---------- `collection` moved due to this implicit call to
    |                           `.into_iter()`
...
171 |   println!("Collection = {collection:?}");
    |                          ^^^^^^^^^^^^^^ value borrowed here after move
    |
note: `into_iter` takes ownership of the receiver `self`, which moves
      `collection`

While simple to understand, this all-consuming behavior is often undesired; some kind of borrow of the iterated items is needed.

To ensure that behavior is clear, the examples here use a Thing type that does not implement Copy (Item 10), as that would hide questions of ownership (Item 15)—the compiler would silently make copies everywhere:

#![allow(unused)]
fn main() {
// Deliberately not `Copy`
#[derive(Clone, Debug, Eq, PartialEq)]
struct Thing(u64);

let collection = vec![Thing(0), Thing(1), Thing(2), Thing(3)];
}

If the collection being iterated over is prefixed with &:

for item in &collection {
    println!("{}", item.0);
}
println!("collection still around {collection:?}");

then the Rust compiler will look for an implementation of IntoIterator for the type &Collection. Properly designed collection types will provide such an implementation; this implementation will still consume Self, but now Self is &Collection rather than Collection, and the associated Item type will be a reference &Thing.

This leaves the collection intact after iteration, and the equivalent expanded code is as follows:

let mut iter = (&collection).into_iter();
while let Some(item) = iter.next() {
    println!("{}", item.0);
}

If it makes sense to provide iteration over mutable references,2 then a similar pattern applies for for item in &mut collection: the compiler looks for and uses an implementation of IntoIterator for &mut Collection, with each Item being of type &mut Thing.

By convention, standard containers also provide an iter() method that returns an iterator over references to the underlying item, and an equivalent iter_mut() method, if appropriate, with the same behavior as just described. These methods can be used in for loops but have a more obvious benefit when used as the start of an iterator transformation:

let result: u64 = (&collection).into_iter().map(|thing| thing.0).sum();

becomes:

let result: u64 = collection.iter().map(|thing| thing.0).sum();

Iterator Transforms

The Iterator trait has a single required method (next) but also provides default implementations (Item 13) of a large number of other methods that perform transformations on an iterator.

Some of these transformations affect the overall iteration process:

  • take(n): Restricts an iterator to emitting at most n items.
  • skip(n): Skips over the first n elements of the iterator.
  • step_by(n): Converts an iterator so it emits only every nth item.
  • chain(other): Glues together two iterators, to build a combined iterator that moves through one then the other.
  • cycle(): Converts an iterator that terminates into one that repeats forever, starting at the beginning again whenever it reaches the end. (The iterator must support Clone to allow this.)
  • rev(): Reverses the direction of an iterator. (The iterator must implement the DoubleEndedIterator trait, which has an additional next_back required method.)

Other transformations affect the nature of the Item that's the subject of the Iterator:

  • map(|item| {...}): Repeatedly applies a closure to transform each item in turn. This is the most general version; several of the following entries in this list are convenience variants that could be equivalently implemented as a map.
  • cloned(): Produces a clone of all of the items in the original iterator; this is particularly useful with iterators over &Item references. (This obviously requires the underlying Item type to implement Clone.)
  • copied(): Produces a copy of all of the items in the original iterator; this is particularly useful with iterators over &Item references. (This obviously requires the underlying Item type to implement Copy, but it is likely to be faster than cloned(), if that's the case.)
  • enumerate(): Converts an iterator over items to be an iterator over (usize, Item) pairs, providing an index to the items in the iterator.
  • zip(it): Joins an iterator with a second iterator, to produce a combined iterator that emits pairs of items, one from each of the original iterators, until the shorter of the two iterators is finished.

Yet other transformations perform filtering on the Items being emitted by the Iterator:

  • filter(|item| {...}): Applies a bool-returning closure to each item reference to determine whether it should be passed through.
  • take_while(): Emits an initial subrange of the iterator, based on a predicate. Mirror image of skip_while.
  • skip_while(): Emits a final subrange of the iterator, based on a predicate. Mirror image of take_while.

The flatten() method deals with an iterator whose items are themselves iterators, flattening the result. On its own, this doesn't seem that helpful, but it becomes much more useful when combined with the observation that both Option and Result act as iterators: they produce either zero (for None, Err(e)) or one (for Some(v), Ok(v)) items. This means that flattening a stream of Option/Result values is a simple way to extract just the valid values, ignoring the rest.

Taken as a whole, these methods allow iterators to be transformed so that they produce exactly the sequence of elements that are needed for most situations.

Iterator Consumers

The previous two sections described how to obtain an iterator and how to transform it into exactly the right shape for precise iteration. This precisely targeted iteration could happen as an explicit for-each loop:

let mut even_sum_squares = 0;
for value in values.iter().filter(|x| *x % 2 == 0).take(5) {
    even_sum_squares += value * value;
}

However, the large collection of Iterator methods includes many that allow an iteration to be consumed in a single method call, removing the need for an explicit for loop.

The most general of these methods is for_each(|item| {...}), which runs a closure for each item produced by the Iterator. This can do most of the things that an explicit for loop can do (the exceptions are described in a later section), but its generality also makes it a little awkward to use—the closure needs to use mutable references to external state in order to emit anything:

let mut even_sum_squares = 0;
values
    .iter()
    .filter(|x| *x % 2 == 0)
    .take(5)
    .for_each(|value| {
        // closure needs a mutable reference to state elsewhere
        even_sum_squares += value * value;
    });

However, if the body of the for loop matches one of a number of common patterns, there are more specific iterator-consuming methods that are clearer, shorter, and more idiomatic.

These patterns include shortcuts for building a single value out of the collection:

  • sum(): Sums a collection of numeric values (integers or floats).
  • product(): Multiplies a collection of numeric values.
  • min(): Finds the minimum value of a collection, relative to the Item's Ord implementation (see Item 10).
  • max(): Finds the maximum value of a collection, relative to the Item's Ord implementation (see Item 10).
  • min_by(f): Finds the minimum value of a collection, relative to a user-specified comparison function f.
  • max_by(f): Finds the maximum value of a collection, relative to a user-specified comparison function f.
  • reduce(f): Builds an accumulated value of the Item type by running a closure at each step that takes the value accumulated so far and the current item. This is a more general operation that encompasses the previous methods.
  • fold(f): Builds an accumulated value of an arbitrary type (not just the Iterator::Item type) by running a closure at each step that takes the value accumulated so far and the current item. This is a generalization of reduce.
  • scan(init, f): Builds an accumulated value of an arbitrary type by running a closure at each step that takes a mutable reference to some internal state and the current item. This is a slightly different generalization of reduce.

There are also methods for selecting a single value out of the collection:

  • find(p): Finds the first item that satisfies a predicate.
  • position(p): Also finds the first item satisfying a predicate, but this time it returns the index of the item.
  • nth(n): Returns the nth element of the iterator, if available.

There are methods for testing against every item in the collection:

  • any(p): Indicates whether a predicate is true for any item in the collection.
  • all(p): Indicates whether a predicate is true for all items in the collection.

In either case, iteration will terminate early if the relevant counterexample is found.

There are methods that allow for the possibility of failure in the closures used with each item. In each case, if a closure returns a failure for an item, the iteration is terminated and the operation as a whole returns the first failure:

Finally, there are methods that accumulate all of the iterated items into a new collection. The most important of these is collect(), which can be used to build a new instance of any collection type that implements the FromIterator trait.

The FromIterator trait is implemented for all of the standard library collection types (Vec, HashMap, BTreeSet, etc.), but this ubiquity also means that you often have to use explicit types, because otherwise the compiler can't figure out whether you're trying to assemble (say) a Vec<i32> or HashSet<i32>:

#![allow(unused)]
fn main() {
use std::collections::HashSet;

// Build collections of even numbers.  Type must be specified, because
// the expression is the same for either type.
let myvec: Vec<i32> = (0..10).into_iter().filter(|x| x % 2 == 0).collect();
let h: HashSet<i32> = (0..10).into_iter().filter(|x| x % 2 == 0).collect();
}

This example also illustrates the use of range expressions to generate the initial data to be iterated over.

Other (more obscure) collection-producing methods include the following:

  • unzip(): Divides an iterator of pairs into two collections
  • partition(p): Splits an iterator into two collections based on a predicate that is applied to each item

This Item has touched on a wide selection of Iterator methods, but this is only a subset of the methods available; for more information, consult the iterator documentation or read Chapter 15 of Programming Rust, 2nd edition (O'Reilly), which has extensive coverage of the possibilities.

This rich collection of iterator transformations is there to be used. It produces code that is more idiomatic, more compact, and has clearer intent.

Expressing loops as iterator transformations can also produce code that is more efficient. In the interests of safety, Rust performs bounds checking on access to contiguous containers such as vectors and slices; an attempt to access a value beyond the bounds of the collection triggers a panic rather than an access to invalid data. An old-style loop that accesses container values (e.g., values[i]) might be subject to these runtime checks, whereas an iterator that produces one value after another is already known to be within range.

However, it's also the case that an old-style loop might not be subject to additional bounds checks compared to the equivalent iterator transformation. The Rust compiler and optimizer is very good at analyzing the code surrounding a slice access to determine whether it's safe to skip the bounds checks; Sergey "Shnatsel" Davidoff's 2023 article explores the subtleties involved.

Building Collections from Result Values

The previous section described the use of collect() to build collections from iterators, but collect() also has a particularly helpful feature when dealing with Result values.

Consider an attempt to convert a vector of i64 values into bytes (u8), with the optimistic expectation that they will all fit:

#![allow(unused)]
fn main() {
// In the 2021 edition of Rust, `TryFrom` is in the prelude, so this
// `use` statement is no longer needed.
use std::convert::TryFrom;

let inputs: Vec<i64> = vec![0, 1, 2, 3, 4];
let result: Vec<u8> = inputs
    .into_iter()
    .map(|v| <u8>::try_from(v).unwrap())
    .collect();
}

This works until some unexpected input comes along:

#![allow(unused)]
fn main() {
let inputs: Vec<i64> = vec![0, 1, 2, 3, 4, 512];
}

and causes a runtime failure:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value:
TryFromIntError(())', iterators/src/main.rs:266:36
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Following the advice given in Item 3, we want to keep the Result type in play and use the ? operator to make any failure the problem of the calling code. The obvious modification to emit the Result doesn't really help:

let result: Vec<Result<u8, _>> =
    inputs.into_iter().map(|v| <u8>::try_from(v)).collect();
// Now what?  Still need to iterate to extract results and detect errors.

However, there's an alternative version of collect(), which can assemble a Result holding a Vec, instead of a Vec holding Results.

Forcing use of this version requires the turbofish (::<Result<Vec<_>, _>>):

let result: Vec<u8> = inputs
    .into_iter()
    .map(|v| <u8>::try_from(v))
    .collect::<Result<Vec<_>, _>>()?;

Combining this with the question mark operator gives useful behavior:

  • If the iteration encounters an error value, that error value is emitted to the caller and iteration stops.
  • If no errors are encountered, the remainder of the code can deal with a sensible collection of values of the right type.

Loop Transformation

The aim of this Item is to convince you that many explicit loops can be regarded as something to be converted to iterator transformations. This can feel somewhat unnatural for programmers who aren't used to it, so let's walk through a transformation step by step.

Starting with a very C-like explicit loop to sum the squares of the first five even items of a vector:

let mut even_sum_squares = 0;
let mut even_count = 0;
for i in 0..values.len() {
    if values[i] % 2 != 0 {
        continue;
    }
    even_sum_squares += values[i] * values[i];
    even_count += 1;
    if even_count == 5 {
        break;
    }
}

The first step is to replace vector indexing with direct use of an iterator in a for-each loop:

let mut even_sum_squares = 0;
let mut even_count = 0;
for value in values.iter() {
    if value % 2 != 0 {
        continue;
    }
    even_sum_squares += value * value;
    even_count += 1;
    if even_count == 5 {
        break;
    }
}

An initial arm of the loop that uses continue to skip over some items is naturally expressed as a filter():

let mut even_sum_squares = 0;
let mut even_count = 0;
for value in values.iter().filter(|x| *x % 2 == 0) {
    even_sum_squares += value * value;
    even_count += 1;
    if even_count == 5 {
        break;
    }
}

Next, the early exit from the loop once five even items have been spotted maps to a take(5):

let mut even_sum_squares = 0;
for value in values.iter().filter(|x| *x % 2 == 0).take(5) {
    even_sum_squares += value * value;
}

Every iteration of the loop uses only the item squared, in the value * value combination, which makes it an ideal target for a map():

let mut even_sum_squares = 0;
for val_sqr in values.iter().filter(|x| *x % 2 == 0).take(5).map(|x| x * x)
{
    even_sum_squares += val_sqr;
}

These refactorings of the original loop result in a loop body that's the perfect nail to fit under the hammer of the sum() method:

let even_sum_squares: u64 = values
    .iter()
    .filter(|x| *x % 2 == 0)
    .take(5)
    .map(|x| x * x)
    .sum();

When Explicit Is Better

This Item has highlighted the advantages of iterator transformations, particularly with respect to concision and clarity. So when are iterator transformations not appropriate or idiomatic?

  • If the loop body is large and/or multifunctional, it makes sense to keep it as an explicit body rather than squeezing it into a closure.
  • If the loop body involves error conditions that result in early termination of the surrounding function, these are often best kept explicit—the try_..() methods help only a little. However, collect()'s ability to convert a collection of Result values into a Result holding a collection of values often allows error conditions to still be handled with the ? operator.
  • If performance is vital, an iterator transform that involves a closure should get optimized so that it is just as fast as the equivalent explicit code. But if performance of a core loop is that important, measure different variants and tune appropriately:
    • Be careful to ensure that your measurements reflect real-world performance—the compiler's optimizer can give overoptimistic results on test data (as described in Item 30).
    • The Godbolt compiler explorer is an amazing tool for exploring what the compiler spits out.

Most importantly, don't convert a loop into an iteration transformation if the conversion is forced or awkward. This is a matter of taste to be sure—but be aware that your taste is likely to change as you become more familiar with the functional style.


1

In fact, the iterator can be more general—the idea of emitting next items until completion need not be associated with a container.

2

This method can't be provided if a mutation to the item might invalidate the container's internal guarantees. For example, changing the item's contents in a way that alters its Hash value would invalidate the internal data structures of a HashMap.