Propagating Errors - Federico's Blog

Lately, I have been converting the code in librsvg that handles XML from C to Rust. For many technical reasons, the library still uses libxml2, GNOME's historic XML parsing library, but some of the callbacks to handle XML events like start_element, end_element, characters, are now implemented in Rust. This has meant that I'm running into all the cases where the original C code in librsvg failed to handle errors properly; Rust really makes it obvious when that happens.

In this post I want to talk a bit about propagating errors. You call a function, it returns an error, and then what?

What can fail?

It turns out that this question is highly context-dependent. Let's say a program is starting up and tries to read a configuration file. What could go wrong?

The file doesn't exist. Maybe it is the very first time the program is run, and so there isn't a configuration file at all? Can the program provide a default configuration in this case? Or does it absolutely need a pre-written configuration file to be somewhere?
The file can't be parsed. Should the program warn the user and exit, or should it revert to a default configuration (should it overwrite the file with valid, default values)? Can the program warn the user, or is it a user-less program that at best can just shout into the void of a server-side log file?
The file can be parsed, but the values are invalid. Same questions as the case above.
Etcetera.

At each stage, the code will probably see very low-level errors ("file not found", "I/O error", "parsing failed", "value is out of range"). What the code decides to do, or what it is able to do at any particular stage, depends both on the semantics you want from the program, and from the code structure itself.

Structuring the problem

This is an easy, but very coarse way of handling things:

gboolean
read_configuration (const char *config_file_name)
{
    /* open the file */

    /* parse it */

    /* set global variables to the configuration values */

    /* return true if success, or false if failure */
}

What is bad about this? Let's see:

The calling code just gets a success/failure condition. In the case of failure, it doesn't get to know why things failed.
If the function sets global variables with configuration values as they get read... and something goes wrong and the function returns an error... the caller ends up possibly in an inconsistent state, with a set of configuration variables that are only halfway-set.
If the function finds parse errors, well, do you really want to call UI code from inside it? The caller might be a better place to make that decision.

A slightly better structure

Let's add an enumeration to indicate the possible errors, and a structure of configuration values.

enum ConfigError {
    ConfigFileDoesntExist,
    ParseError, // config file has bad syntax or something
    ValueError, // config file has an invalid value
}

struct ConfigValues {
    // a bunch of fields here with the program's configuration
}

fn read_configuration(filename: &Path) -> Result<ConfigValues, ConfigError> {
    // open the file, or return Err(ConfigError::ConfigFileDoesntExist)

    // parse the file; or return Err(ConfigError::ParseError)

    // validate the values, or return Err(ConfigError::ValueError)

    // if everything succeeds, return Ok(ConfigValues)
}

This is better, in that the caller decides what to do with the validated ConfigValues: maybe it can just copy them to the program's global variables for configuration.

However, this scheme doesn't give the caller all the information it would like to present a really good error message. For example, the caller will get to know if there is a parse error, but it doesn't know specifically what failed during parsing. Similarly, it will just get to know if there was an invalid value, but not which one.

Ah, so the problem is fractal

We could have new structs to represent the little errors, and then make them part of the original error enum:

struct ParseError {
    line: usize,
    column: usize,
    error_reason: String,
}

struct ValueError {
    config_key: String,
    error_reason: String,
}

enum ConfigError {
    ConfigFileDoesntExist,
    ParseError(ParseError), // we put those structs in here
    ValueError(ValueError),
}

Is that enough? It depends.

The ParseError and ValueError structs have individual error_reason fields, which are strings. Presumably, one could have a ParseError with error_reason = "unexpected token", or a ValueError with error_reason = "cannot be a negative number".

One problem with this is that if the low-level errors come with error messages in English, then the caller has to know how to localize them to the user's language. Also, if they don't have a machine-readable error code, then the calling code may not have enough information to decide what do do with the error.

Let's say we had a ParseErrorKind enum with variants like UnexpectedToken, EndOfFile, etc. This is fine; it lets the calling code know the reason for the error. Also, there can be a gimme_localized_error_message() method for that particular type of error.

enum ParseErrorKind {
    UnexpectedToken,
    EndOfFile,
    MissingComma,
    // ... etc.
}

struct ParseError {
    line: usize,
    column: usize,
    kind: ParseErrorKind,
}

How can we expand this? Maybe the ParseErrorKind::UnexpectedToken variant wants to contain data that indicates which token it got that was wrong, so it would be UnexpectedToken(String) or something similar.

But is that useful to the calling code? For our example program, which is reading a configuration file... it probably only needs to know if it could parse the file, but maybe it doesn't really need any additional details on the reason for the parse error, other than having something useful to present to the user. Whether it is appropriate to burden the user with the actual details... does the app expect to make it the user's job to fix broken configuration files? Yes for a web server, where the user is a sysadmin; probably not for a random end-user graphical app, where people shouldn't need to write configuration files by hand in the first place (should those have a "Details" section in the error message window? I don't know!).

Maybe the low-level parsing/validation code can emit those detailed errors. But how can we propagate them to something more useful to the upper layers of the code?

Translation and propagation

Maybe our original read_configuration() function can translate the low-level errors into high-level ones:

fn read_configuration(filename: &Path) -> Result<ConfigValues, ConfigError> {
    // open file

    if cannot_open_file {
        return Err(ConfigError::ConfigFileDoesntExist);
    }

    let contents = read_the_file().map_err(|e| ... oops, maybe we need an IoError case, too)?;

    // parse file

    let parsed = parse(contents).map_err(|e| ... translate to a higher-level error)?

    // validate

    let validated = validate(parsed).map_err(|e| ... translate to a higher-level error)?;

    // yay!
    Ok(ConfigValues::from(validated))
}

Etcetera. It is up to each part of the code to decide what do do with lower-level errors. Can it recover from them? Should it fail the whole operation and return a higher-level error? Should it warn the user right there?

Language facilities

C makes it really easy to ignore errors, and pretty hard to present detailed errors like the above. One could mimic what Rust is actually doing with a collection of union and struct and enum, but this gets very awkward very fast.

Rust provides these facilities at the language level, and the idioms around Result and error handling are very nice to use. There are even crates like failure that go a long way towards automating error translation, propagation, and conversion to strings for presenting to users.

Infinite details

I've been recommending The Error Model to anyone who comes into a discussion of error handling in programming languages. It's a long, detailed, but very enlightening read on recoverable vs. unrecoverable errors, simple error codes vs. exceptions vs. monadic results, the performance/reliability/ease of use of each model... Definitely worth a read.