Refactoring the Length type

CSS length values have a number and a unit, e.g. 5cm or 6px. Sometimes the unit is a percentage, like 50%, and SVG says that lengths with percentage units should be resolved with respect to a certain rectangle. For example, consider this circle element:

<circle cx="50%" cy="75%" r="4px" fill="black"/>

This means, draw a solid black circle whose center is at 50% of the width and 75% of the height of the current viewport. The circle should have a 4-pixel radius.

The process of converting that kind of units into absolute pixels for the final drawing is called normalization. In SVG, percentage units sometimes need to be normalized with respect to the current viewport (a local coordinate system), or with respect to the size of another object (e.g. when a clipping path is used to cut the current shape in half).

One detail about normalization is that it can be with respect to the horizontal dimension of the current viewport, the vertical dimension, or both. Keep this in mind: at normalization time, we need to be able to distinguish between those three modes.

The original C version

I have talked about the original C code for lengths before; the following is a small summary.

The original C code had this struct to represent lengths:

typedef struct {
    double length;
    char factor;
} RsvgLength;

The parsing code would set the factor field to a character depending on the length's unit: 'p' for percentages, 'i' for inches, etc., and '\0' for the default unit, which is pixels.

Along with that, the normalization code needed to know the direction (horizontal, vertical, both) to which the length in question refers. It did this by taking another character as an argument to the normalization function:

double
_rsvg_css_normalize_length (const RsvgLength * in, RsvgDrawingCtx * ctx, char dir)
{
    if (in->factor == '\0')            /* pixels, no need to normalize */
        return in->length;
    else if (in->factor == 'p') {      /* percentages; need to consider direction */
        if (dir == 'h')                                     /* horizontal */
            return in->length * ctx->vb.rect.width;
        if (dir == 'v')                                     /* vertical */
            return in->length * ctx->vb.rect.height;
        if (dir == 'o')                                     /* both */
            return in->length * rsvg_viewport_percentage (ctx->vb.rect.width,
                                                          ctx->vb.rect.height);
    } else { ... }
}

The original post talks about how I found a couple of bugs with how the directions are identified at normalization time. The function above expects one of 'h'/'v'/'o' for horizontal/vertical/both, and one or two places in the code passed the wrong character.

Making the C version cleaner

Before converting that code to Rust, I removed the pesky characters and made the code use proper enums to identify a length's units.

+typedef enum {
+    LENGTH_UNIT_DEFAULT,
+    LENGTH_UNIT_PERCENT,
+    LENGTH_UNIT_FONT_EM,
+    LENGTH_UNIT_FONT_EX,
+    LENGTH_UNIT_INCH,
+    LENGTH_UNIT_RELATIVE_LARGER,
+    LENGTH_UNIT_RELATIVE_SMALLER
+} LengthUnit;
+
 typedef struct {
     double length;
-    char factor;
+    LengthUnit unit;
 } RsvgLength;

Then, do the same for the normalization function, so it will get the direction in which to normalize as an enum instead of a char.

+typedef enum {
+    LENGTH_DIR_HORIZONTAL,
+    LENGTH_DIR_VERTICAL,
+    LENGTH_DIR_BOTH
+} LengthDir;

 double
-_rsvg_css_normalize_length (const RsvgLength * in, RsvgDrawingCtx * ctx, char dir)
+_rsvg_css_normalize_length (const RsvgLength * in, RsvgDrawingCtx * ctx, LengthDir dir)

Making the C version easier to get right

While doing the last change above, I found a place in the code that used the wrong direction by mistake, probably due to a cut&paste error. Part of the problem here is that the code was specifying the direction at normalization time.

I decided to change it so that each direction value carried its own direction since initialization, so that subsequent code wouldn't have to worry about that. Hopefully, initializing a width field should make it obvious that it needed LENGTH_DIR_HORIZONTAL.

 typedef struct {
     double length;
     LengthUnit unit;
+    LengthDir dir;
 } RsvgLength;

That is, so that instead of

  /* at initialization time */
  foo.width = _rsvg_css_parse_length (str);

  ...

  /* at rendering time */
  double final_width = _rsvg_css_normalize_length (&foo.width, ctx, LENGTH_DIR_HORIZONTAL);

we would instead do this:

  /* at initialization time */
  foo.width = _rsvg_css_parse_length (str, LENGTH_DIR_HORIZONTAL);

  ...

  /* at rendering time */
  double final_width = _rsvg_css_normalize_length (&foo.width, ctx);

This made the drawing code, which deals with a lot of coordinates at the same time, a lot less noisy.

Initial port to Rust

To recap, this was the state of the structs after the initial refactoring in C:

typedef enum {
    LENGTH_UNIT_DEFAULT,
    LENGTH_UNIT_PERCENT,
    LENGTH_UNIT_FONT_EM,
    LENGTH_UNIT_FONT_EX,
    LENGTH_UNIT_INCH,
    LENGTH_UNIT_RELATIVE_LARGER,
    LENGTH_UNIT_RELATIVE_SMALLER
} LengthUnit;

typedef enum {
    LENGTH_DIR_HORIZONTAL,
    LENGTH_DIR_VERTICAL,
    LENGTH_DIR_BOTH
} LengthDir;

typedef struct {
    double length;
    LengthUnit unit;
    LengthDir dir;
} RsvgLength;

This ported to Rust in a straightforward fashion:

pub enum LengthUnit {
    Default,
    Percent,
    FontEm,
    FontEx,
    Inch,
    RelativeLarger,
    RelativeSmaller
}

pub enum LengthDir {
    Horizontal,
    Vertical,
    Both
}

pub struct RsvgLength {
    length: f64,
    unit: LengthUnit,
    dir: LengthDir
}

It got a similar constructor that took the direction and produced an RsvgLength:

impl RsvgLength {
    pub fn parse (string: &str, dir: LengthDir) -> RsvgLength { ... }
}

(This was before using Result; remember that the original C code did very little error checking!)

The initial Parse trait

It was at that point that it seemed convenient to introduce a Parse trait, which all CSS value types would implement to parse themselves from string.

However, parsing an RsvgLength also needed an extra piece of data, the LengthDir. My initial version of the Parse trait had an associated called Data, through which one could pass an extra piece of data during parsing/initialization:

pub trait Parse: Sized {
    type Data;
    type Err;

    fn parse (s: &str, data: Self::Data) -> Result<Self, Self::Err>;
}

This was explicitly to be able to pass a LengthDir to the parser for RsvgLength:

impl Parse for RsvgLength {
    type Data = LengthDir;
    type Err = AttributeError;

    fn parse (string: &str, dir: LengthDir) -> Result <RsvgLength, AttributeError> { ... }
}

This was okay for lengths, but very noisy for everything else that didn't require an extra bit of data. In the rest of the code, the helper type was Data = () and there was a pair of extra parentheses () in every place that parse() was called.

Removing the helper Data type

Introducing one type per direction

Over a year later, that () bit of data everywhere was driving me nuts. I started refactoring the Length module to remove it.

First, I introduced three newtypes to wrap Length, and indicate their direction at the same time:

pub struct LengthHorizontal(Length);
pub struct LengthVertical(Length);
pub struct LengthBoth(Length);

This was done with a macro because now each wrapper type needed to know the relevant LengthDir.

Now, for example, the declaration for the <circle> element looked like this:

pub struct NodeCircle {
    cx: Cell<LengthHorizontal>,
    cy: Cell<LengthVertical>,
    r: Cell<LengthBoth>,
}

(Ignore the Cell everywhere; we got rid of that later.)

Removing the `dir` field

Since now the information about the length's direction is embodied in the LengthHorizontal/LengthVertical/LengthBoth types, this made it possible to remove the dir field from the inner Length struct.

 pub struct RsvgLength {
     length: f64,
     unit: LengthUnit,
-    dir: LengthDir
 }

+pub struct LengthHorizontal(Length);
+pub struct LengthVertical(Length);
+pub struct LengthBoth(Length);
+
+define_length_type!(LengthHorizontal, LengthDir::Horizontal);
+define_length_type!(LengthVertical, LengthDir::Vertical);
+define_length_type!(LengthBoth, LengthDir::Both);

Note the use of a define_length_type! macro to generate code for those three newtypes.

Removing the `Data` associated type

And finally, this made it possible to remove the Data associated type from the Parse trait.

 pub trait Parse: Sized {
-    type Data;
     type Err;

-    fn parse(parser: &mut Parser<'_, '_>, data: Self::Data) -> Result<Self, Self::Err>;
+    fn parse(parser: &mut Parser<'_, '_>) -> Result<Self, Self::Err>;
 }

The resulting mega-commit removed a bunch of stray parentheses () from all calls to parse(), and the code ended up a lot easier to read.

Removing the newtypes

This was fine for a while. Recently, however, I figured out that it would be possible to embody the information for a length's direction in a different way.

But to get there, I first needed a temporary refactor.

Replacing the macro with a trait with a default implementation

Deep in the guts of length.rs, the key function that does something different based on LengthDir is its scaling_factor method:

enum LengthDir {
    Horizontal,
    Vertical,
    Both,
}

impl LengthDir {
    fn scaling_factor(self, x: f64, y: f64) -> f64 {
        match self {
            LengthDir::Horizontal => x,
            LengthDir::Vertical => y,
            LengthDir::Both => viewport_percentage(x, y),
        }
    }
}

That method gets passed, for example, the width/height of the current viewport for the x/y arguments. The method decides whether to use the width, height, or a combination of both.

And of course, the interesting part of the define_length_type! macro was to generate code for calling LengthDir::Horizontal::scaling_factor()/etc. as appropriate depending on the LengthDir in question.

First I made a trait called Orientation with a scaling_factor method, and three zero-sized types that implement that trait. Note how each of these three implementations corresponds to one of the match arms above:

pub trait Orientation {
    fn scaling_factor(x: f64, y: f64) -> f64;
}

pub struct Horizontal;
pub struct Vertical;
pub struct Both;

impl Orientation for Horizontal {
    fn scaling_factor(x: f64, _y: f64) -> f64 {
        x
    }
}

impl Orientation for Vertical {
    fn scaling_factor(_x: f64, y: f64) -> f64 {
        y
    }
}

impl Orientation for Both {
    fn scaling_factor(x: f64, y: f64) -> f64 {
        viewport_percentage(x, y)
    }
}

Now most of the contents of the define_length_type! macro can go in the default implementation of a new trait LengthTrait. Crucially, this trait has an Orientation associated type, which it uses to call into the Orientation trait:

pub trait LengthTrait: Sized {
    type Orientation: Orientation;

    ...

    fn normalize(&self, values: &ComputedValues, params: &ViewParams) -> f64 {
        match self.unit() {
            LengthUnit::Px => self.length(),

            LengthUnit::Percent => {
                self.length() *
                <Self::Orientation>::scaling_factor(params.view_box_width, params.view_box_height)
            }

            ...
    }
}

Note that the incantation is <Self::Orientation>::scaling_factor(...) to call that method on the associated type.

Now the define_length_type! macro is shrunk a lot, with the interesting part being just this:

macro_rules! define_length_type {
    {$name:ident, $orient:ty} => {
        pub struct $name(Length);

        impl LengthTrait for $name {
            type Orientation = $orient;
        }
    }
}

define_length_type! { LengthHorizontal, Horizontal }
define_length_type! { LengthVertical, Vertical }
define_length_type! { LengthBoth, Both }

We moved from having three newtypes of length-with-LengthDir to three newtypes with dir-as-associated-type.

Removing the newtypes and the macro

After that temporary refactoring, we had the Orientation trait and the three zero-sized types Horizontal, Vertical, Both.

I figured out that one can use PhantomData as a way to carry around the type that Length needs to normalize itself, instead of using an associated type in an extra LengthTrait. Behold!

pub struct Length<O: Orientation> {
    pub length: f64,
    pub unit: LengthUnit,
    orientation: PhantomData<O>,
}

impl<O: Orientation> Length<O> {
    pub fn normalize(&self, values: &ComputedValues, params: &ViewParams) -> f64 {
        match self.unit {
            LengthUnit::Px => self.length,

            LengthUnit::Percent => {
                self.length 
                    * <O as Orientation>::scaling_factor(params.view_box_width, params.view_box_height)
            }

            ...
        }
    }
}

Now the incantation is <O as Orientation>::scaling_factor() to call the method on the generic type; it is no longer an associated type in a trait.

With that, users of lengths look like this; here, our <circle> element from before:

pub struct Circle {
    cx: Length<Horizontal>,
    cy: Length<Vertical>,
    r: Length<Both>,
}

I'm very happy with the readability of all the code now. I used to think of PhantomData as a way to deal with wrapping pointers from C, but it turns out that it is also useful to keep a generic type around should one need it.

The final Length struct is this:

pub struct Length<O: Orientation> {
    pub length: f64,
    pub unit: LengthUnit,
    orientation: PhantomData<O>,
}

And it only takes up as much space as its length and unit fields; PhantomData is zero-sized after all.

(Later, we renamed Orientation to Normalize, but the code's structure remained the same.)

Summary

Over a couple of years, librsvg's type that represents CSS lengths went from a C representation along the lines of "all data in the world is an int", to a Rust representation that uses some interesting type trickery:

C struct with char for units.
C struct with a LengthUnits enum.
C struct without an embodied direction; each place that needs to normalize needs to get the orientation right.
C struct with a built-in direction as an extra field, done at initialization time.
Same struct but in Rust.
An ugly but workable Parse trait so that the direction can be set at parse/initialization time.
Three newtypes LengthHorizontal, LengthVertical, LengthBoth with a common core. A cleaned-up Parse trait. A macro to generate those newtypes.
Replace the LengthDir enum with an Orientation trait, and three zero-sized types Horizontal/Vertical/Both that implement the trait.
Replace most of the macro with a helper trait LengthTrait that has an Orientation associated type.
Replace the helper trait with a single Length<T: Orientation> type, which puts the orientation as a generic parameter. The macro disappears and there is a single implementation for everything.

Refactoring never ends!