Surviving a rust-cssparser API break

- Tags: gnome, librsvg, rust

Yesterday I looked into updating librsvg's Rust dependencies. There have been some API breaks (!!!) in the unstable libraries that it uses since the last time I locked them. This post is about an interesting case of API breakage.

rust-cssparser is the crate that Servo uses for parsing CSS. Well, more like tokenizing CSS: you give it a string, it gives you back tokens, and you are supposed to compose CSS selector information or other CSS values from the tokens.

Librsvg uses rust-cssparser now for most of the micro-languages in SVG's attribute values, instead of its old, fragile C parsers. I hope to be able to use it in conjunction with Servo's rust-selectors crate to fully parse CSS data and replace libcroco.

A few months ago, rust-cssparser's API looked more or less like the following. This is the old representation of a Token:

pub enum Token<'a> {
    // an identifier
    Ident(Cow<'a, str>),

    // a plain number
    Number(NumericValue),

    // a percentage value normalized to [0.0, 1.0]
    Percentage(PercentageValue),

    WhiteSpace(&'a str),
    Comma,

    ...
}

That is, a Token can be an Identifier with a string name, or a Number, a Percentage, whitespace, a comma, and many others.

On top of that is the old API for a Parser, which you construct with a string and then it gives you back tokens:

impl<'i> Parser<'i> {
    pub fn new(input: &'i str) -> Parser<'i, 'i> {

    pub fn next(&mut self) -> Result<Token<'i>, ()> { ... }

    ...
}

This means the following. You create the parser out of a string slice with new(). You can then extract a Result with a Token sucessfully, or with an empty error value. The parser uses a lifetime 'i on the string from which it is constructed: the Tokens that return identifiers, for example, could return sub-string slices that come from the original string, and the parser has to be marked with a lifetime so that it does not outlive its underlying string.

A few commits later, rust-cssparser got changed to return detailed error values, so that instead of () you get a a BasicParseError with sub-cases like UnexpectedToken or EndOfInput.

After the changes to the error values for results, I didn't pay much attention to rust-cssparser for while. Yesterday I wanted to update librsvg to use the newest rust-cssparser, and had some interesting problems.

First, Parser::new() was changed from taking just a &str slice to taking a ParserInput struct. This is an implementation detail which lets the parser cache the last token it saw. Not a big deal:

// instead of constructing a parser like
let mut parser = Parser::new (my_string);

// you now construct it like
let mut input = ParserInput::new (my_string);
let mut parser = Parser::new (&mut input);

I am not completely sure why this is exposed to the public API, since Rust won't allow you to have two mutable references to a ParserInput, and the only consumer of a (mutable) ParserInput is the Parser, anyway.

However, the parser.next() function changed:

// old version
pub fn next(&mut self) -> Result<Token<'i>, ()> { ... }

// new version
pub fn next(&mut self) -> Result<&Token<'i>, BasicParseError<'i>> {... }
// note this bad boy here -------^

The successful Result from next() is now a reference to a Token, not a plain Token value which you now own. The parser is giving you a borrowed reference to its internally-cached token.

My parsing functions for the old API looked similar to the following. This is a function that parses a string into an angle; it can look like "45deg" or "1.5rad", for example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
pub fn parse_angle_degrees (s: &str) -> Result <f64, ParseError> {
    let mut parser = Parser::new (s);

    let token = parser.next ()
        .map_err (|_| ParseError::new ("expected angle"))?;

    match token {
        Token::Number (NumericValue { value, .. }) => Ok (value as f64),

        Token::Dimension (NumericValue { value, .. }, unit) => {
            let value = value as f64;

            match unit.as_ref () {
                "deg"  => Ok (value),
                "grad" => Ok (value * 360.0 / 400.0),
                "rad"  => Ok (value * 180.0 / PI),
                _      => Err (ParseError::new ("expected angle"))
            }
        },

        _ => Err (ParseError::new ("expected angle"))
    }.and_then (|r|
                parser.expect_exhausted ()
                .map (|_| r)
                .map_err (|_| ParseError::new ("expected angle")))
}

This is a bit ugly, but it was the first version that passed the tests. Lines 4 and 5 mean, "get the first token or return an error". Line 17 means, "anything except deg, grad, or rad for the units causes the match expression to generate an error". Finally, I was feeling very proud of using and_then() in line 22, with parser.expect_exhausted(), to ensure that the parser would not find any more tokens after the angle/units.

However, in the new version of rust-cssparser, Parser.next() gives back a Result with a &Token success value — a reference to a token —, while the old version returned a plain Token. No problem, I thought, I'm just going to de-reference the value in the match and be done with it:

    let token = parser.next ()
        .map_err (|_| ParseError::new ("expected angle"))?;

    match *token {
    //    ^ dereference here...
        Token::Number { value, .. } => value as f64,

        Token::Dimension { value, ref unit, .. } => {
    //                            ^ avoid moving the unit value

The compiler complained elsewhere. The whole function now looked like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
pub fn parse_angle_degrees (s: &str) -> Result <f64, ParseError> {
    let mut parser = Parser::new (s);

    let token = parser.next ()
        .map_err (|_| ParseError::new ("expected angle"))?;

    match token {
        // ...
    }.and_then (|r|
                parser.expect_exhausted ()
                .map (|_| r)
                .map_err (|_| ParseError::new ("expected angle")))
}

But in line 4, token is now a reference to something that lives inside parser, and parser is therefore borrowed mutably. The compiler didn't like that line 10 (the call to parser.expect_exhausted()) was trying to borrow parser mutably again.

I played a bit with creating a temporary scope around the assignment to token so that it would only borrow parser mutably inside that scope. Things ended up like this, without the call to and_then() after the match:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
pub fn angle_degrees (s: &str) -> Result <f64, ParseError> {
    let mut input = ParserInput::new (s);
    let mut parser = Parser::new (&mut input);

    let angle = {
        let token = parser.next ()
            .map_err (|_| ParseError::new ("expected angle"))?;

        match *token {
            Token::Number { value, .. } => value as f64,

            Token::Dimension { value, ref unit, .. } => {
                let value = value as f64;

                match unit.as_ref () {
                    "deg"  => value,
                    "grad" => value * 360.0 / 400.0,
                    "rad"  => value * 180.0 / PI,
                    _      => return Err (ParseError::new ("expected 'deg' | 'grad' | 'rad'"))
                }
            },

            _ => return Err (ParseError::new ("expected angle"))
        }
    };

    parser.expect_exhausted ().map_err (|_| ParseError::new ("expected angle"))?;

    Ok (angle)
}

Lines 5 through 25 are basically

    let angle = {
        // parse out the angle; return if error
    };

And after that is done, I test for parser.expect_exhausted(). There is no chaining of results with helper functions; instead it's just going through each token linearly.

The API break was annoying to deal with, but fortunately the calling code ended up cleaner, and I didn't have to change anything in the tests. I hope rust-cssparser can stabilize its API for consumers that are not Servo.