Item 1: Use the type system to express your data structures

"who called them programers and not type writers" – @thingskatedid

This Item provides a quick tour of Rust's type system, starting with the fundamental types that the compiler makes available, then moving on to the various ways that values can be combined into data structures.

Rust's enum type then takes a starring role. Although the basic version is equivalent to what other languages provide, the ability to combine enum variants with data fields allows for enhanced flexibility and expressivity.

Fundamental Types

The basics of Rust's type system are pretty familiar to anyone coming from another statically typed programming language (such as C++, Go, or Java). There's a collection of integer types with specific sizes, both signed (i8, i16, i32, i64, i128) and unsigned (u8, u16, u32, u64, u128).

There are also signed (isize) and unsigned (usize) integers whose sizes match the pointer size on the target system. However, you won't be doing much in the way of converting between pointers and integers with Rust, so that size equivalence isn't really relevant. However, standard collections return their size as a usize (from .len()), so collection indexing means that usize values are quite common—which is obviously fine from a capacity perspective, as there can't be more items in an in-memory collection than there are memory addresses on the system.

The integral types do give us the first hint that Rust is a stricter world than C++. In Rust, attempting to put a larger integer type (i32) into a smaller integer type (i16) generates a compile-time error:

let x: i32 = 42;
let y: i16 = x;
error[E0308]: mismatched types
  --> src/main.rs:18:18
   |
18 |     let y: i16 = x;
   |            ---   ^ expected `i16`, found `i32`
   |            |
   |            expected due to this
   |
help: you can convert an `i32` to an `i16` and panic if the converted value
      doesn't fit
   |
18 |     let y: i16 = x.try_into().unwrap();
   |                   ++++++++++++++++++++

This is reassuring: Rust is not going to sit there quietly while the programmer does things that are risky. Although we can see that the values involved in this particular conversion would be just fine, the compiler has to allow for the possibility of values where the conversion is not fine:

let x: i32 = 66_000;
let y: i16 = x; // What would this value be?

The error output also gives an early indication that while Rust has stronger rules, it also has helpful compiler messages that point the way to how to comply with the rules. The suggested solution raises the question of how to handle situations where the conversion would have to alter the value to fit, and we'll have more to say on both error handling (Item 4) and using panic! (Item 18) later.

Rust also doesn't allow some things that might appear "safe", such as putting a value from a smaller integer type into a larger integer type:

let x = 42i32; // Integer literal with type suffix
let y: i64 = x;
error[E0308]: mismatched types
  --> src/main.rs:36:18
   |
36 |     let y: i64 = x;
   |            ---   ^ expected `i64`, found `i32`
   |            |
   |            expected due to this
   |
help: you can convert an `i32` to an `i64`
   |
36 |     let y: i64 = x.into();
   |                   +++++++

Here, the suggested solution doesn't raise the specter of error handling, but the conversion does still need to be explicit. We'll discuss type conversions in more detail later (Item 5).

Continuing with the unsurprising primitive types, Rust has a bool type, floating point types (f32, f64), and a unit type () (like C's void).

More interesting is the char character type, which holds a Unicode value (similar to Go's rune type). Although this is stored as four bytes internally, there are again no silent conversions to or from a 32-bit integer.

This precision in the type system forces you to be explicit about what you're trying to express—a u32 value is different from a char, which in turn is different from a sequence of UTF-8 bytes, which in turn is different from a sequence of arbitrary bytes, and it's up to you to specify exactly which you mean.1 Joel Spolsky's famous blog post can help you understand which you need.

Of course, there are helper methods that allow you to convert between these different types, but their signatures force you to handle (or explicitly ignore) the possibility of failure. For example, a Unicode code point can always be represented in 32 bits,2 so 'a' as u32 is allowed, but the other direction is trickier (as there are some u32 values that are not valid Unicode code points):

  • char::from_u32: Returns an Option<char>, forcing the caller to handle the failure case.
  • char::from_u32_unchecked: Makes the assumption of validity but has the potential to result in undefined behavior if that assumption turns out not to be true. The function is marked unsafe as a result, forcing the caller to use unsafe too (Item 16).

Aggregate Types

Moving on to aggregate types, Rust has a variety of ways to combine related values. Most of these are familiar equivalents to the aggregation mechanisms available in other languages:

  • Arrays: Hold multiple instances of a single type, where the number of instances is known at compile time. For example, [u32; 4] is four 4-byte integers in a row.
  • Tuples: Hold instances of multiple heterogeneous types, where the number of elements and their types are known at compile time, for example, (WidgetOffset, WidgetSize, WidgetColor). If the types in the tuple aren't distinctive—for example, (i32, i32, &'static str, bool)—it's better to give each element a name and use a struct.
  • Structs: Also hold instances of heterogeneous types known at compile time but allow both the overall type and the individual fields to be referred to by name.

Rust also includes the tuple struct, which is a crossbreed of a struct and a tuple: there's a name for the overall type but no names for the individual fields—they are referred to by number instead: s.0, s.1, and so on:

#![allow(unused)]
fn main() {
/// Struct with two unnamed fields.
struct TextMatch(usize, String);

// Construct by providing the contents in order.
let m = TextMatch(12, "needle".to_owned());

// Access by field number.
assert_eq!(m.0, 12);
}

enums

This brings us to the jewel in the crown of Rust's type system, the enum. With the basic form of an enum, it's hard to see what there is to get excited about. As with other languages, the enum allows you to specify a set of mutually exclusive values, possibly with a numeric value attached:

#![allow(unused)]
fn main() {
enum HttpResultCode {
    Ok = 200,
    NotFound = 404,
    Teapot = 418,
}

let code = HttpResultCode::NotFound;
assert_eq!(code as i32, 404);
}

Because each enum definition creates a distinct type, this can be used to improve readability and maintainability of functions that take bool arguments. Instead of:

print_page(/* both_sides= */ true, /* color= */ false);

a version that uses a pair of enums:

#![allow(unused)]
fn main() {
pub enum Sides {
    Both,
    Single,
}

pub enum Output {
    BlackAndWhite,
    Color,
}

pub fn print_page(sides: Sides, color: Output) {
    // ...
}
}

is more type-safe and easier to read at the point of invocation:

print_page(Sides::Both, Output::BlackAndWhite);

Unlike the bool version, if a library user were to accidentally flip the order of the arguments, the compiler would immediately complain:

error[E0308]: arguments to this function are incorrect
   --> src/main.rs:104:9
    |
104 | print_page(Output::BlackAndWhite, Sides::Single);
    | ^^^^^^^^^^ ---------------------  ------------- expected `enums::Output`,
    |            |                                    found `enums::Sides`
    |            |
    |            expected `enums::Sides`, found `enums::Output`
    |
note: function defined here
   --> src/main.rs:145:12
    |
145 |     pub fn print_page(sides: Sides, color: Output) {
    |            ^^^^^^^^^^ ------------  -------------
help: swap these arguments
    |
104 | print_page(Sides::Single, Output::BlackAndWhite);
    |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Using the newtype pattern—see Item 6—to wrap a bool also achieves type safety and maintainability; it's generally best to use the newtype pattern if the semantics will always be Boolean, and to use an enum if there's a chance that a new alternative—e.g., Sides::BothAlternateOrientation—could arise in the future.

The type safety of Rust's enums continues with the match expression:

let msg = match code {
    HttpResultCode::Ok => "Ok",
    HttpResultCode::NotFound => "Not found",
    // forgot to deal with the all-important "I'm a teapot" code
};
error[E0004]: non-exhaustive patterns: `HttpResultCode::Teapot` not covered
  --> src/main.rs:44:21
   |
44 |     let msg = match code {
   |                     ^^^^ pattern `HttpResultCode::Teapot` not covered
   |
note: `HttpResultCode` defined here
  --> src/main.rs:10:5
   |
7  | enum HttpResultCode {
   |      --------------
...
10 |     Teapot = 418,
   |     ^^^^^^ not covered
   = note: the matched value is of type `HttpResultCode`
help: ensure that all possible cases are being handled by adding a match arm
      with a wildcard pattern or an explicit pattern as shown
   |
46 ~         HttpResultCode::NotFound => "Not found",
47 ~         HttpResultCode::Teapot => todo!(),
   |

The compiler forces the programmer to consider all of the possibilities that are represented by the enum,3 even if the result is just to add a default arm _ => {}. (Note that modern C++ compilers can and do warn about missing switch arms for enums as well.)

enums with Fields

The true power of Rust's enum feature comes from the fact that each variant can have data that comes along with it, making it an aggregate type that acts as an algebraic data type (ADT). This is less familiar to programmers of mainstream languages; in C/C++ terms, it's like a combination of an enum with a union—only type-safe.

This means that the invariants of the program's data structures can be encoded into Rust's type system; states that don't comply with those invariants won't even compile. A well-designed enum makes the creator's intent clear to humans as well as to the compiler:

use std::collections::{HashMap, HashSet};

pub enum SchedulerState {
    Inert,
    Pending(HashSet<Job>),
    Running(HashMap<CpuId, Vec<Job>>),
}

Just from the type definition, it's reasonable to guess that Jobs get queued up in the Pending state until the scheduler is fully active, at which point they're assigned to some per-CPU pool.

This highlights the central theme of this Item, which is to use Rust's type system to express the concepts that are associated with the design of your software.

A dead giveaway for when this is not happening is a comment that explains when some field or parameter is valid:

pub struct DisplayProps {
    pub x: u32,
    pub y: u32,
    pub monochrome: bool,
    // `fg_color` must be (0, 0, 0) if `monochrome` is true.
    pub fg_color: RgbColor,
}

This is a prime candidate for replacement with an enum holding data:

pub enum Color {
    Monochrome,
    Foreground(RgbColor),
}

pub struct DisplayProps {
    pub x: u32,
    pub y: u32,
    pub color: Color,
}

This small example illustrates a key piece of advice: make invalid states inexpressible in your types. Types that support only valid combinations of values mean that whole classes of errors are rejected by the compiler, leading to smaller and safer code.

Ubiquitous enum Types

Returning to the power of the enum, there are two concepts that are so common that Rust's standard library includes built-in enum types to express them; these types are ubiquitous in Rust code.

Option<T>

The first concept is that of an Option: either there's a value of a particular type (Some(T)) or there isn't (None). Always use Option for values that can be absent; never fall back to using sentinel values (-1, nullptr, …) to try to express the same concept in-band.

There is one subtle point to consider, though. If you're dealing with a collection of things, you need to decide whether having zero things in the collection is the same as not having a collection. For most situations, the distinction doesn't arise and you can go ahead and use (say) Vec<Thing>: a count of zero things implies an absence of things.

However, there are definitely other rare scenarios where the two cases need to be distinguished with Option<Vec<Thing>>—for example, a cryptographic system might need to distinguish between "payload transported separately" and "empty payload provided". (This is related to the debates around the NULL marker for columns in SQL.)

Similarly, what's the best choice for a String that might be absent? Does "" or None make more sense to indicate the absence of a value? Either way works, but Option<String> clearly communicates the possibility that this value may be absent.

Result<T, E>

The second common concept arises from error processing: if a function fails, how should that failure be reported? Historically, special sentinel values (e.g., -errno return values from Linux system calls) or global variables (errno for POSIX systems) were used. More recently, languages that support multiple or tuple return values (such as Go) from functions may have a convention of returning a (result, error) pair, assuming the existence of some suitable "zero" value for the result when the error is non-"zero".

In Rust, there's an enum for just this purpose: always encode the result of an operation that might fail as a Result<T, E>. The T type holds the successful result (in the Ok variant), and the E type holds error details (in the Err variant) on failure.

Using the standard type makes the intent of the design clear. It also allows the use of standard transformations (Item 3) and error processing (Item 4), which in turn makes it possible to streamline error processing with the ? operator as well.


1

The situation gets muddier still if the filesystem is involved, since filenames on popular platforms are somewhere in between arbitrary bytes and UTF-8 sequences: see the std::ffi::OsString documentation.

2

Technically, a Unicode scalar value rather than a code point.

3

The need to consider all possibilities also means that adding a new variant to an existing enum in a library is a breaking change (Item 21): library clients will need to change their code to cope with the new variant. If an enum is really just a C-like list of related numerical values, this behavior can be avoided by marking it as a non_exhaustive enum; see Item 21.