Yesterday I looked into updating librsvg's Rust dependencies. There have been some API breaks (!!!) in the unstable libraries that it uses since the last time I locked them. This post is about an interesting case of API breakage.
rust-cssparser is the crate that Servo uses for parsing CSS. Well, more like tokenizing CSS: you give it a string, it gives you back tokens, and you are supposed to compose CSS selector information or other CSS values from the tokens.
Librsvg uses rust-cssparser now for most of the micro-languages in SVG's attribute values, instead of its old, fragile C parsers. I hope to be able to use it in conjunction with Servo's rust-selectors crate to fully parse CSS data and replace libcroco.
A few months ago, rust-cssparser's API looked more or less like the
following. This is the old representation of a Token
:
pub enum Token<'a> {
// an identifier
Ident(Cow<'a, str>),
// a plain number
Number(NumericValue),
// a percentage value normalized to [0.0, 1.0]
Percentage(PercentageValue),
WhiteSpace(&'a str),
Comma,
...
}
That is, a Token
can be an Ident
ifier with a string name, or a
Number
, a Percentage
, whitespace, a comma, and many others.
On top of that is the old API for a Parser
, which you construct with
a string and then it gives you back tokens:
impl<'i> Parser<'i> {
pub fn new(input: &'i str) -> Parser<'i, 'i> {
pub fn next(&mut self) -> Result<Token<'i>, ()> { ... }
...
}
This means the following. You create the parser out of a string slice
with new()
. You can then extract a Result
with a Token
sucessfully, or with an empty error value. The parser uses a lifetime
'i
on the string from which it is constructed: the Token
s that
return identifiers, for example, could return sub-string slices that
come from the original string, and the parser has to be marked with a
lifetime so that it does not outlive its underlying string.
A few commits later, rust-cssparser got changed to return detailed
error values, so that instead of ()
you get a a BasicParseError
with sub-cases like UnexpectedToken
or EndOfInput
.
After the changes to the error values for results, I didn't pay much attention to rust-cssparser for while. Yesterday I wanted to update librsvg to use the newest rust-cssparser, and had some interesting problems.
First, Parser::new()
was changed from taking just a &str
slice to
taking a ParserInput
struct. This is an implementation detail which
lets the parser cache the last token it saw. Not a big deal:
// instead of constructing a parser like
let mut parser = Parser::new (my_string);
// you now construct it like
let mut input = ParserInput::new (my_string);
let mut parser = Parser::new (&mut input);
I am not completely sure why this is exposed to the public API, since
Rust won't allow you to have two mutable references to a
ParserInput
, and the only consumer of a (mutable) ParserInput
is
the Parser
, anyway.
However, the parser.next()
function changed:
// old version
pub fn next(&mut self) -> Result<Token<'i>, ()> { ... }
// new version
pub fn next(&mut self) -> Result<&Token<'i>, BasicParseError<'i>> {... }
// note this bad boy here -------^
The successful Result
from next()
is now a reference to a
Token
, not a plain Token
value which you now own. The parser is
giving you a borrowed reference to its internally-cached token.
My parsing functions for the old API looked similar to the
following. This is a function that parses a string into an angle; it
can look like "45deg"
or "1.5rad"
, for example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
This is a bit ugly, but it was the first version that passed the
tests. Lines 4 and 5 mean, "get the first token or return an error".
Line 17 means, "anything except deg
, grad
, or rad
for the units
causes the match
expression to generate an error". Finally, I was
feeling very proud of using and_then()
in line 22, with
parser.expect_exhausted()
, to ensure that the parser would not find
any more tokens after the angle/units.
However, in the new version of rust-cssparser, Parser.next() gives
back a Result
with a &Token
success value — a reference to a
token —, while the old version returned a plain Token
. No problem,
I thought, I'm just going to de-reference the value in the match
and
be done with it:
let token = parser.next ()
.map_err (|_| ParseError::new ("expected angle"))?;
match *token {
// ^ dereference here...
Token::Number { value, .. } => value as f64,
Token::Dimension { value, ref unit, .. } => {
// ^ avoid moving the unit value
The compiler complained elsewhere. The whole function now looked like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
But in line 4, token
is now a reference to something that lives
inside parser
, and parser
is therefore borrowed mutably. The
compiler didn't like that line 10 (the call to
parser.expect_exhausted()
) was trying to borrow parser
mutably
again.
I played a bit with creating a temporary scope around the assignment
to token
so that it would only borrow parser
mutably inside that
scope. Things ended up like this, without the call to and_then()
after the match
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
Lines 5 through 25 are basically
let angle = {
// parse out the angle; return if error
};
And after that is done, I test for parser.expect_exhausted()
.
There is no chaining of results with helper functions; instead it's
just going through each token linearly.
The API break was annoying to deal with, but fortunately the calling code ended up cleaner, and I didn't have to change anything in the tests. I hope rust-cssparser can stabilize its API for consumers that are not Servo.