Item 10: Consider using iterator transforms instead of explicit loops

The humble loop has had a long journey of increasing convenience and increasing abstraction. The B language (the precursor to C) only had while (condition) { ... }, but with the arrival of C the common scenario of iterating through indexes of an array was made more convenient with the addition of the for loop:

  // C code
  int i;
  for (i = 0; i < len; i++) {
    Item item = collection[i];
    // body
  }

The early versions of C++ improved convenience and scoping further by allowing the loop variable declaration to be embedded in the for statement (and this was also adopted by C in C99):

  // C++98 code
  for (int i = 0; i < len; i++) {
    Item item = collection[i];
    // ...
  }

Most modern languages abstract the idea of the loop further: the core function of a loop is often to move to the next item of some container, and tracking the logistics that are required to reach that item (index++ or ++it) is mostly an irrelevant detail. This realization produced two core concepts:

  • Iterators: a type whose purpose is to repeatedly emit the next item of a container1, until exhausted.
  • For-Each Loops: a compact loop expression for iterating over all of the items in a container, binding a loop variable to the item rather than to the details of reaching that item.

These concepts allow for loop code that's shorter, and (more importantly) clearer about what's intended:

  // C++11 code
  for (Item& item : collection) {
    // ...
  }

Once these concepts were available, they were so obviously powerful that they were quickly retrofitted to those languages that didn't already have them (e.g. for-each loops were added to Java 1.5 and C++11).

Rust includes iterators and for-each style loops, but it also includes the next step in abstraction: allowing the whole loop to be expressed as an iterator transform. As with Item 3's discussion of Option and Result, this Item will attempt to show how these iterator transforms can be used instead of explicit loops, and to give guidance as to when it's a good idea.

By the end of this Item, a C-like explicit loop to sum the squares of the first five even items of a vector:

    let mut even_sum_squares = 0;
    let mut even_count = 0;
    for i in 0..values.len() {
        if values[i] % 2 != 0 {
            continue;
        }
        even_sum_squares += values[i] * values[i];
        even_count += 1;
        if even_count == 5 {
            break;
        }
    }

should start to feel more natural expressed as a functional style expression:

    let even_sum_squares: u64 = values
        .iter()
        .filter(|x| *x % 2 == 0)
        .take(5)
        .map(|x| x * x)
        .sum();

Iterator transformation expressions like this can roughly be broken down into three parts:

  • An initial source iterator, from one of Rust's iterator traits.
  • A sequence of iterator transforms.
  • A final consumer method to combine the results of the iteration into a final value.

The first two of these effectively move functionality out of the loop body and into the for expression; the last removes the need for the for statement altogether.

Iterator Traits

The core Iterator trait has a very simple interface: a single method next that yields Some items until it doesn't (None).

Collections that allow iteration over their contents – called iterables – implement the IntoIterator trait; the into_iter method of this trait consumes Self and emits an Iterator in its stead. The compiler will automatically use this trait for expressions of the form

    for item in collection {
        // body
    }

effectively converting them to code roughly like:

    let mut iter = collection.into_iter();
    loop {
        let item: Thing = match iter.next() {
            Some(item) => item,
            None => break,
        };
        // body
    }

or more succinctly and more idiomatically:

    let mut iter = collection.into_iter();
    while let Some(item) = iter.next() {
        // body
    }

(To keep things running smoothly, there's also an implementation of IntoIterator for any Iterator, which just returns self; after all, it's easy to convert an Iterator into an Iterator!)

This initial form is a consuming iterator, using up the collection as it's created:

    let collection = vec![Thing(0), Thing(1), Thing(2), Thing(3)];
    for item in collection {
        println!("Consumed item {:?}", item);
    }

Any attempt to use the collection after it's been iterated over fails:

    println!("Collection = {:?}", collection);
error[E0382]: borrow of moved value: `collection`
   --> iterators/src/main.rs:156:35
    |
148 |     let collection = vec![Thing(0), Thing(1), Thing(2), Thing(3)];
    |         ---------- move occurs because `collection` has type `Vec<Thing>`, which does not implement the `Copy` trait
149 |     for item in collection {
    |                 ----------
    |                 |
    |                 `collection` moved due to this implicit call to `.into_iter()`
    |                 help: consider borrowing to avoid moving into the for loop: `&collection`
...
156 |     println!("Collection = {:?}", collection);
    |                                   ^^^^^^^^^^ value borrowed here after move
    |
note: this function takes ownership of the receiver `self`, which moves `collection`

While simple to understand, this all-consuming behaviour is often undesired; some kind of borrow of the iterated items is needed.

To ensure that behaviour is clear, the examples here use an Item type that is not Copy (Item 5), as that would hide questions of ownership (Item 15) – the compiler would silently make copies everywhere.

#![allow(unused)]
fn main() {
    // Deliberately not `Copy`
    #[derive(Clone, Debug, Eq, PartialEq)]
    struct Thing(u64);

    let collection = vec![Thing(0), Thing(1), Thing(2), Thing(3)];
}

If the collection being iterated over is prefixed with &:

    for item in &collection {
        println!("{}", item.0);
    }
    println!("collection still around {:?}", collection);

then the Rust compiler will look for an implementation of IntoIterator for the type &Collection. Properly designed collection types will provide such an implementation; this implementation will still consume Self, but now Self is &Collection rather than Collection, and the associated Item type will be a reference &Thing.

This leaves the collection intact after iteration, and the equivalent expanded code is:

    let mut iter = (&collection).into_iter();
    while let Some(item) = iter.next() {
        println!("{}", item.0);
    }

If it makes sense to provide iteration over mutable references2, then a similar pattern applies for for item in &mut collection: the compiler looks for and uses an implementation of IntoIterator for &mut Collection, with each Item being of type &mut Thing.

By convention, standard containers also provide an iter() method that returns an iterator over references to the underlying item, and equivalent an iter_mut() method if appropriate, with the same behaviour as just described. These methods can be used in for loops, but have a more obvious benefit when used as the start of an iterator transformation:

    let result: u64 = (&collection).into_iter().map(|thing| thing.0).sum();

becomes:

    let result: u64 = collection.iter().map(|thing| thing.0).sum();

Iterator Transforms

The Iterator trait has a single required method (next), but also provides default implementations (Item 13) of a large number of other methods that perform transformations on an iterator.

Some of these tranformations affect the overall iteration process:

  • take(n) restricts an iterator to emitting at most n items.
  • skip(n) skips over the first n elements of the iterator.
  • step_by(n) converts an iterator so it only emits every n-th item.
  • chain(other) glues together two iterators, to build a combined iterator that moves through one then the other.
  • cycle() converts an iterator that terminates into one that repeats forever, starting at the beginning again whenever it reaches the end. (The iterator must support Clone to allow this.)
  • rev() reverses the direction of an iterator. (The iterator must implement the DoubleEndedIterator trait, which has an additional next_back required method.)

Other transformations affect the nature of the Item that's the subject of the Iterator:

  • map(|item| {...}) is the most general version, repeatedly applying a closure to transform each item in turn. Several of the following entries in this list are convenience variants that could be equivalently implemented as a map.
  • cloned() produces a clone of all of the items in the original iterator; this is particularly useful with iterators over &Item references. (This obviously requires the underlying Item type to implement Clone).
  • copied() produces a copy of all of the items in the original iterator; this is particularly useful with iterators over &Item references. (This obviously requires the underlying Item type to implement Copy).
  • enumerate() converts an iterator over items to be an iterator over (usize, Item) pairs, providing an index to the items in the iterator.
  • zip(it) joins an iterator with a second iterator, to produce a combined iterator that emits pairs of items, one from each of the original iterators, until the shorter of the two iterators is finished.

Yet other transformations perform filtering on the Items being emitted by the Iterator:

  • filter(|item| {...}) is the most general version, applying a bool-returning closure to each item reference to determine whether it should be passed through.
  • take_while() and skip_while() are mirror images of each other, emitting either an initial subrange or a final subrange of the iterator, based on a predicate.

The flatten() method deals with an iterator whose items are themselves iterators, flattening the result. On its own, this doesn't seem that helpful, but it becomes much more useful when combined with the observation that both Option and Result act as iterators: they produce either zero (for None, Err(e)) or one (for Some(v), Ok(v)) items. This means that flattening a stream of Option / Result values is a simple way to extract just the valid values, ignoring the rest.

Taken as a whole, these methods allow iterators to be transformed so that they produce exactly the sequence of elements that are needed for most situations.

Iterator Consumers

The previous two sections described how to obtain an iterator, and how to transmute it into exactly the right form for precise iteration. This precisely-targetted iteration could happen as an explicit for-each loop:

    let mut even_sum_squares = 0;
    for value in values.iter().filter(|x| *x % 2 == 0).take(5) {
        even_sum_squares += value * value;
    }

However, the large collection of Iterator methods includes many that allow an iteration to be consumed in a single method call, removing the need for an explicit for loop.

The most general of these methods is for_each(|item| {...}), which runs a closure for each item produced by the Iterator. This can do most of the things that an explicit for loop could do (the exceptions are described below), but its generality also makes it a little awkward to use – the closure needs to use mutable references to external state in order to emit anything:

    let mut even_sum_squares = 0;
    values
        .iter()
        .filter(|x| *x % 2 == 0)
        .take(5)
        .for_each(|value| {
            // closure needs a mutable reference to state elsewhere
            even_sum_squares += value * value;
        });

However, if the body of the for loop matches one of a number of common patterns, there are more specific iterator-consuming methods that are clearer, shorter and more idiomatic.

These patterns include shortcuts for building a single value out of the collection:

  • sum(), for summing a collection of numeric values (integers or floats).
  • product(), for multiplying together a collection of numeric values.
  • min() and max(), for finding the extreme values of a collection, relative to the Item's Ord implementation (see Item 5).
  • min_by(f) and max_by(f), for finding the extreme values of a collection, relative to a user-specified comparison function f.
  • reduce(f) is a more general operation that encompasses the previous methods, building an accumulated value of the Item type by running a closure at each step that takes the value accumulated so far and the current item.
  • fold(f) is a generalization of reduce, allowing the "accumulated value" to be of an arbitrary type (not just the Iterator::Item type).
  • scan(f) generalizes in a slightly different way, giving the closure a mutable reference to some internal state at each step.

There are also methods for selecting a single value out of the collection:

  • find(p) finds the first item that satisfies a predicate.
  • position(p) also finds the first item satisfying a predicate, but this time it returns the index of the item.
  • nth(n) returns the n-th element of the iterator, if available.

There are methods for testing against every item in the collection:

  • any(p) indicates whether a predicate is true for any item in the collection.
  • all(p) indicates whether a predicate is true for all items in the collection.

(In either case, iteration will terminate early if the relevant counter-example is found.)

There are methods that allow for the possibility of failure in the closures used with each item; in each case, if a closure returns a failure for an item, the iteration is terminated and the operation as a whole returns the first failure.

Finally, there are methods that accumulate all of the iterated items into a new collection. The most important of these is collect(), which can be used to build a new instance of any collection type that implements the FromIterator trait.

The FromIterator trait is implemented for all of the standard library collection types (Vec, HashMap, BTreeSet etc.) but this ubiquity also means that you often have to use explicit types, because otherwise the compiler can't figure out whether you're trying to assemble (say) a Vec<i32> or HashSet<i32>.

#![allow(unused)]
fn main() {
    use std::collections::HashSet;

    // Build collections of even numbers.  Type must be specified, because
    // the expression is the same for either type.
    let myvec: Vec<i32> = (0..10).into_iter().filter(|x| x % 2 == 0).collect();
    let h: HashSet<i32> = (0..10).into_iter().filter(|x| x % 2 == 0).collect();
}

(As an aside, this example also illustrates the use of range expressions to generate the initial data to be iterated over.)

Other (more obscure) collection-producing methods include:

  • unzip(), which divides an iterator of pairs into two collections.
  • partition(p), which splits an iterator into two collections based on a predicate that is applied to each item.

This Item has touched on a wide selection of Iterator methods, but this is only a subset of the methods available; for more information, consult the iterator documentation or read Chapter 15 of Programming Rust (2nd edition), which has extensive coverage of the possibilities.

This rich collection of iterator transformations is meant to be used – to produce code that is more idiomatic, more compact, and with clearer intent.

Building Collections From Result Values

The previous section described the use of collect() to build collections from iterators, but collect() also has a particularly helpful feature when dealing with Result values.

Consider an attempt to convert a vector of i64 values into bytes (u8), with the optimistic expectation that they will all fit:

    use std::convert::TryFrom;
    let inputs: Vec<i64> = vec![0, 1, 2, 3, 4];
    let result: Vec<u8> = inputs
        .into_iter()
        .map(|v| <u8>::try_from(v).unwrap())
        .collect();

This works until some unexpected input comes along:

    let inputs: Vec<i64> = vec![0, 1, 2, 3, 4, 512];

and causes a run-time failure:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: TryFromIntError(())', iterators/src/main.rs:249:36
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Following the advice of Item 3, we want to keep the Result type in play, and use the ? operator to make any failure the problem of the calling code. The obvious modification to emit the Result doesn't really help:

    let result: Vec<Result<u8, _>> =
        inputs.into_iter().map(|v| <u8>::try_from(v)).collect();
    // Now what?  Still need to iterate to extract results and detect errors.

However, there's an alternative version of collect(), which can assemble a Result holding a Vec, instead of a Vec holding Results.

Forcing use of this version requires the turbofish (::<Result<Vec<_>, _>>):

    let result: Vec<u8> = inputs
        .into_iter()
        .map(|v| <u8>::try_from(v))
        .collect::<Result<Vec<_>, _>>()?;

Combining this with the question mark operator gives useful behaviour:

  • If the iteration encounters an error value, that error value is emitted to the caller and iteration stops.
  • If no errors are encountered, the remainder of the code can deal with a sensible collection of values of the right type.

Loop Transformation

The aim of this Item is to convince you that many explicit loops can be regarded as something to be converted to iterator transformations. This can feel somewhat unnatural for programmers who aren't used to it, so let's walk through a transformation step by step.

Starting with a very C-like explicit loop to sum the squares of the first five even items of a vector:

    let mut even_sum_squares = 0;
    let mut even_count = 0;
    for i in 0..values.len() {
        if values[i] % 2 != 0 {
            continue;
        }
        even_sum_squares += values[i] * values[i];
        even_count += 1;
        if even_count == 5 {
            break;
        }
    }

The first step is to replace vector indexing with direct use of an iterator in a for-each loop:

    let mut even_sum_squares = 0;
    let mut even_count = 0;
    for value in values.iter() {
        if value % 2 != 0 {
            continue;
        }
        even_sum_squares += value * value;
        even_count += 1;
        if even_count == 5 {
            break;
        }
    }

An initial arm of the loop that uses continue to skip over some items is naturally expressed as a filter():

    let mut even_sum_squares = 0;
    let mut even_count = 0;
    for value in values.iter().filter(|x| *x % 2 == 0) {
        even_sum_squares += value * value;
        even_count += 1;
        if even_count == 5 {
            break;
        }
    }

Next, the early exit from the loop once 5 even items have been spotted maps to a take(5):

    let mut even_sum_squares = 0;
    for value in values.iter().filter(|x| *x % 2 == 0).take(5) {
        even_sum_squares += value * value;
    }

The value of the item is never used directly, only in the value * value combination, which makes it an ideal target for a map():

    let mut even_sum_squares = 0;
    for val_sqr in values.iter().filter(|x| *x % 2 == 0).take(5).map(|x| x * x)
    {
        even_sum_squares += val_sqr;
    }

These refactorings of the original loop have resulting in a loop body that's the perfect nail to fit under the hammer of the sum() method:

    let even_sum_squares: u64 = values
        .iter()
        .filter(|x| *x % 2 == 0)
        .take(5)
        .map(|x| x * x)
        .sum();

When Explicit is Better

This Item has highlighted the advantages of iterator transformations, particularly with respect to concision and clarity. So when are iterator transformations not appropriate or idiomatic?

  • If the loop body is large and/or multi-functional, it makes sense to keep it as an explicit body rather than squeezing it into a closure.
  • If the loop body involves error conditions that result in early termination of the surrounding function, these are often best kept explicit – the try_..() methods only help a little. However, collect()'s ability to convert a collection of Result values into a Result holding a collection of values often allows error conditions to still be handled with the ? operator.
  • If performance is vital, an iterator transform that involves a closure should get optimized so that it is just as fast as the equivalent explicit code. But if performance of a core loop is that important, measure different variants and tune appropriately.
    • Be careful to ensure that your measurements reflect real-world performance – the compiler's optimizer can give over-optimistic results on test data (as described in Item 30).
    • The Godbolt compiler explorer is an amazing tool for exploring what the compiler spits out.

Most importantly, don't convert a loop into an iteration transformation if the conversion is forced or awkward. This is a matter of taste to be sure – but be aware that your taste is likely to change as you become more familiar with the functional style.


1: In fact, the iterator can be more general – the idea of emitting next items until done need not be associated with a container.

2: This method can't be provided if a mutation to the item might invalidate the container's internal guarantees. For example, changing the item's contents in a way that alters its Hash value would invalidate the internal data structures of a HashMap.