Writing a command-line program in Rust

- Tags: gnome, librsvg, rust

As a library writer, it feels a bit strange, but refreshing, to write a program that actually has a main() function.

My experience with Rust so far has been threefold:

  • Porting chunks of C to Rust for librsvg - this is all work on librsvg's internals and no users are exposed to it directly.

  • Working on gnome-class, the procedural macro ("a little compiler") to generate GObject boilerplate from Rust. This feels like working on the edge of the exotic; it is something that runs in the Rust compiler and spits code on behalf of the programmer.

  • A few patches to the gtk-rs ecosystem. Again, work on the internals, or something that feels library-like.

But other than toy programs to test things, I haven't written a stand-alone tool until rsvg-bench. It's quite a thrill to be able to just run the thing instead of waiting for other people to write code to use it!

Parsing command-line arguments

There are quite a few Rust crates ("libraries") to parse command-line arguments. I read about structopt via Robert O'Callahan's blog; structopt lets you define a struct to hold the values of your command-line options, and then you annotate the fields in that struct to indicate how they should be parsed from the command line. It works via Rust's procedural macros. Internally it generates stuff for the clap crate, a well-established mechanism for dealing with command-line options.

And it is quite pleasant! This is basically all I needed to do:

#[derive(StructOpt, Debug)]
#[structopt(name = "rsvg-bench", about = "Benchmarking utility for librsvg.")]
struct Opt {
    #[structopt(short = "s",
                long  = "sleep",
                help  = "Number of seconds to sleep before starting to process SVGs",
                default_value = "0")]
    sleep_secs: usize,

    #[structopt(short = "p",
                long  = "num-parse",
                help  = "Number of times to parse each file",
                default_value = "100")]
    num_parse: usize,

    #[structopt(short = "r",
                long  = "num-render",
                help  = "Number of times to render each file",
                default_value = "100")]
    num_render: usize,

    #[structopt(long = "pixbuf",
                help = "Render to a GdkPixbuf instead of a Cairo image surface")]
    render_to_pixbuf: bool,

    #[structopt(help = "Input files or directories",
                parse(from_os_str))]
    inputs: Vec<PathBuf>
}

fn main() {
    let opt = Opt::from_args();

    if opt.inputs.len() == 0 {
        eprintln!("No input files or directories specified\n");
        process.exit(1);
    }

    ...
}

Each field in the Opt struct above corresponds to one command-line argument; each field has annotations for structopt to generate the appropriate code to parse each option. For example, the render_to_pixbuf field has a long option name called "pixbuf"; that field will be set to true if the --pixbuf option gets passed to rsvg-bench.

Handling errors

Command-line programs generally have the luxury of being able to just exit as soon as they encounter an error.

In C this is a bit cumbersome since you need to deal with every place that may return an error, find out what to print, and call exit(1) by hand or something. If you miss a single place where an error is returned, your program will keep running with an inconsistent state.

In languages with exception handling, it's a bit easier - a small script can just let exceptions be thrown wherever, and if it catches them at the toplevel, it can just print the exception and abort gracefully. However, these nonlocal jumps make me uncomfortable; I think exceptions are hard to reason about.

Rust makes this easy: it forces you to handle every call that may return an error, but it lets you bubble errors up easily, or handle them in-place, or translate them to a higher-level error.

In the Rust world the [failure] crate is getting a lot of traction as a convenient, modern way to handle errors.

In rsvg-bench, errors can come from several places:

  • I/O errors when reading files and directories.

  • Errors from librsvg's parsing stage; you get a GError.

  • Errors from the rendering stage. This can be a Cairo error (a cairo_status_t), or a simple "something bad happened; can't render" from librsvg's old convenience api in C. Don't you hate it when C code just gives up and returns NULL or a boolean false, without any further details on what went wrong?

For rsvg-bench, I just needed to be able to represent Cairo errors and generic rendering errors. Everything else, like an io::Error, is automatically wrapped by the failure crate's mechanism. I just needed to do this:

extern crate failure;
#[macro_use]
extern crate failure_derive;

#[derive(Debug, Fail)]
enum ProcessingError {
    #[fail(display = "Cairo error: {:?}", status)]
    CairoError {
        status: cairo::Status
    },

    #[fail(display = "Rendering error")]
    RenderingError
}

Whenever the code gets a Cairo error, I can translate it to a ProcessingError::CairoError and bubble it up:

fn render_to_cairo(handle: &rsvg::Handle) -> Result<(), Error> {
    let dim = handle.get_dimensions();
    let surface = cairo::ImageSurface::create(cairo::Format::ARgb32,
                                              dim.width,
                                              dim.height)
        .map_err(|e| ProcessingError::CairoError { status: e })?;

    ...
}

And when librsvg returns a "couldn't render" error, I translate that to a ProcessingError::RenderingError:

fn render_to_cairo(handle: &rsvg::Handle) -> Result<(), Error> {
    ...

    let cr = cairo::Context::new(&surface);

    if handle.render_cairo(&cr) {
        Ok(())
    } else {
        Err(Error::from(ProcessingError::RenderingError))
    }
}

Here, the Ok() case of the Result does not contain any value — it's just (), as the generated images are not stored anywhere: they are just rendered to get some timings, not to be saved or anything.

Up to where do errors bubble?

This is the "do everything" function:

fn run(opt: &Opt) -> Result<(), Error> {
    ...

    for path in &opt.inputs {
        process_path(opt, &path)?;
    }

    Ok(())
}

For each path passed in the command line, process it. The program sees if the path corresponds to a directory, and it will scan it recursively. Or if the path is an SVG file, the program will load the file and render it.

Finally, main() just has this:

fn main() {
    let opt = Opt::from_args();

    ...

    match run(&opt) {
        Ok(_) => (),
        Err(e) => {
            eprintln!("{}", e);
            process::exit(1);
        }
    }
}

I.e. process command line arguments, run the whole thing, and print an error if there was one.

I really appreciate that most places that can return an error an just put a ? for the error to bubble up. This is much more legible than in C, where every call must have an if (something_bad_happened) { deal_with_it; } after it... and Rust won't let me get away with ignoring an error, but it makes it easy to actually deal with it properly.

Reading an SVG file quickly

Why, just mmap() it and feed it to librsvg, to avoid buffer copies. This is easy in Rust:

fn process_file<P: AsRef<Path>>(opt: &Opt, path: P) -> Result<(), Error> {
    let file = File::open(path)?;
    let mmap = unsafe { MmapOptions::new().map(&file)? };

    let bytes = &mmap;

    let handle = rsvg::Handle::new_from_data(bytes)?;
    ...
}

Many things can go wrong here:

  • File::open() can return an io::Error.
  • MmapOptions::map() can return an io::Error from the mmap(2) system call, or from the fstat(2) to read the file's size to map it.
  • rsvg::Handle::new_from_data() can return a GError from parsing the file.

The little ? characters after each call that can return an error mean, just give me back the result, or convert the error to a failure::Error that can be examined later. This is beautifully legible to me.

Summary

Writing command-line programs in Rust is fun! It's nice to have neurotically-safe scripts that one can trust in the future.

Rsvg-bench is available here.