1. Surviving a rust-cssparser API break

    - Tags: gnome, librsvg, rust

    Yesterday I looked into updating librsvg's Rust dependencies. There have been some API breaks (!!!) in the unstable libraries that it uses since the last time I locked them. This post is about an interesting case of API breakage.

    rust-cssparser is the crate that Servo uses for parsing CSS. Well, more like tokenizing CSS: you give it a string, it gives you back tokens, and you are supposed to compose CSS selector information or other CSS values from the tokens.

    Librsvg uses rust-cssparser now for most of the micro-languages in SVG's attribute values, instead of its old, fragile C parsers. I hope to be able to use it in conjunction with Servo's rust-selectors crate to fully parse CSS data and replace libcroco.

    A few months ago, rust-cssparser's API looked more or less like the following. This is the old representation of a Token:

    pub enum Token<'a> {
        // an identifier
        Ident(Cow<'a, str>),
    
        // a plain number
        Number(NumericValue),
    
        // a percentage value normalized to [0.0, 1.0]
        Percentage(PercentageValue),
    
        WhiteSpace(&'a str),
        Comma,
    
        ...
    }
    

    That is, a Token can be an Identifier with a string name, or a Number, a Percentage, whitespace, a comma, and many others.

    On top of that is the old API for a Parser, which you construct with a string and then it gives you back tokens:

    impl<'i> Parser<'i> {
        pub fn new(input: &'i str) -> Parser<'i, 'i> {
    
        pub fn next(&mut self) -> Result<Token<'i>, ()> { ... }
    
        ...
    }
    

    This means the following. You create the parser out of a string slice with new(). You can then extract a Result with a Token sucessfully, or with an empty error value. The parser uses a lifetime 'i on the string from which it is constructed: the Tokens that return identifiers, for example, could return sub-string slices that come from the original string, and the parser has to be marked with a lifetime so that it does not outlive its underlying string.

    A few commits later, rust-cssparser got changed to return detailed error values, so that instead of () you get a a BasicParseError with sub-cases like UnexpectedToken or EndOfInput.

    After the changes to the error values for results, I didn't pay much attention to rust-cssparser for while. Yesterday I wanted to update librsvg to use the newest rust-cssparser, and had some interesting problems.

    First, Parser::new() was changed from taking just a &str slice to taking a ParserInput struct. This is an implementation detail which lets the parser cache the last token it saw. Not a big deal:

    // instead of constructing a parser like
    let mut parser = Parser::new (my_string);
    
    // you now construct it like
    let mut input = ParserInput::new (my_string);
    let mut parser = Parser::new (&mut input);
    

    I am not completely sure why this is exposed to the public API, since Rust won't allow you to have two mutable references to a ParserInput, and the only consumer of a (mutable) ParserInput is the Parser, anyway.

    However, the parser.next() function changed:

    // old version
    pub fn next(&mut self) -> Result<Token<'i>, ()> { ... }
    
    // new version
    pub fn next(&mut self) -> Result<&Token<'i>, BasicParseError<'i>> {... }
    // note this bad boy here -------^
    

    The successful Result from next() is now a reference to a Token, not a plain Token value which you now own. The parser is giving you a borrowed reference to its internally-cached token.

    My parsing functions for the old API looked similar to the following. This is a function that parses a string into an angle; it can look like "45deg" or "1.5rad", for example.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    pub fn parse_angle_degrees (s: &str) -> Result <f64, ParseError> {
        let mut parser = Parser::new (s);
    
        let token = parser.next ()
            .map_err (|_| ParseError::new ("expected angle"))?;
    
        match token {
            Token::Number (NumericValue { value, .. }) => Ok (value as f64),
    
            Token::Dimension (NumericValue { value, .. }, unit) => {
                let value = value as f64;
    
                match unit.as_ref () {
                    "deg"  => Ok (value),
                    "grad" => Ok (value * 360.0 / 400.0),
                    "rad"  => Ok (value * 180.0 / PI),
                    _      => Err (ParseError::new ("expected angle"))
                }
            },
    
            _ => Err (ParseError::new ("expected angle"))
        }.and_then (|r|
                    parser.expect_exhausted ()
                    .map (|_| r)
                    .map_err (|_| ParseError::new ("expected angle")))
    }
    

    This is a bit ugly, but it was the first version that passed the tests. Lines 4 and 5 mean, "get the first token or return an error". Line 17 means, "anything except deg, grad, or rad for the units causes the match expression to generate an error". Finally, I was feeling very proud of using and_then() in line 22, with parser.expect_exhausted(), to ensure that the parser would not find any more tokens after the angle/units.

    However, in the new version of rust-cssparser, Parser.next() gives back a Result with a &Token success value — a reference to a token —, while the old version returned a plain Token. No problem, I thought, I'm just going to de-reference the value in the match and be done with it:

        let token = parser.next ()
            .map_err (|_| ParseError::new ("expected angle"))?;
    
        match *token {
        //    ^ dereference here...
            Token::Number { value, .. } => value as f64,
    
            Token::Dimension { value, ref unit, .. } => {
        //                            ^ avoid moving the unit value
    

    The compiler complained elsewhere. The whole function now looked like this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    pub fn parse_angle_degrees (s: &str) -> Result <f64, ParseError> {
        let mut parser = Parser::new (s);
    
        let token = parser.next ()
            .map_err (|_| ParseError::new ("expected angle"))?;
    
        match token {
            // ...
        }.and_then (|r|
                    parser.expect_exhausted ()
                    .map (|_| r)
                    .map_err (|_| ParseError::new ("expected angle")))
    }
    

    But in line 4, token is now a reference to something that lives inside parser, and parser is therefore borrowed mutably. The compiler didn't like that line 10 (the call to parser.expect_exhausted()) was trying to borrow parser mutably again.

    I played a bit with creating a temporary scope around the assignment to token so that it would only borrow parser mutably inside that scope. Things ended up like this, without the call to and_then() after the match:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    pub fn angle_degrees (s: &str) -> Result <f64, ParseError> {
        let mut input = ParserInput::new (s);
        let mut parser = Parser::new (&mut input);
    
        let angle = {
            let token = parser.next ()
                .map_err (|_| ParseError::new ("expected angle"))?;
    
            match *token {
                Token::Number { value, .. } => value as f64,
    
                Token::Dimension { value, ref unit, .. } => {
                    let value = value as f64;
    
                    match unit.as_ref () {
                        "deg"  => value,
                        "grad" => value * 360.0 / 400.0,
                        "rad"  => value * 180.0 / PI,
                        _      => return Err (ParseError::new ("expected 'deg' | 'grad' | 'rad'"))
                    }
                },
    
                _ => return Err (ParseError::new ("expected angle"))
            }
        };
    
        parser.expect_exhausted ().map_err (|_| ParseError::new ("expected angle"))?;
    
        Ok (angle)
    }
    

    Lines 5 through 25 are basically

        let angle = {
            // parse out the angle; return if error
        };
    

    And after that is done, I test for parser.expect_exhausted(). There is no chaining of results with helper functions; instead it's just going through each token linearly.

    The API break was annoying to deal with, but fortunately the calling code ended up cleaner, and I didn't have to change anything in the tests. I hope rust-cssparser can stabilize its API for consumers that are not Servo.

Page 99 of 105 (previous) (next)