Boring news about Federico - November 2016

Go forward in time to January 2017.

Wed 2016/Nov/16

Debugging Rust code inside a C library

An application that uses librsvg is in fact using a C library that has some Rust code inside it. We can debug C code with gdb as usual, but what about the Rust code?

Fortunately, Rust generates object code! From the application's viewpoint, there is no difference between the C parts and the Rust parts: they are just part of the librsvg.so to which it linked, and that's it.

Let's try this. I'll use the rsvg-view-3 program that ships inside librsvg — this is a very simple program that opens a window and displays an SVG image in it. If you build the rustification branch of librsvg (clone instructions at the bottom of that page), you can then run this in the toplevel directory of the librsvg source tree:
```
tlacoyo:~/src/librsvg-latest (rustification)$ libtool --mode=execute gdb ./rsvg-view-3
```
Since rsvg-view-3 is an executable built with libtool, we can't plainly run gdb on it. We need to invoke libtool with the incantation for "do your magic for shared library paths and run gdb on this binary".

Gdb starts up, but no shared libraries are loaded yet. I like to set up a breakpoint in main() and run the program with its command-line arguments, so its shared libs will load, and then I can start setting breakpoints:
```
(gdb) break main
Breakpoint 1 at 0x40476c: file rsvg-view.c, line 583.

(gdb) run tests/fixtures/reftests/bugs/340047.svg
Starting program: /home/federico/src/librsvg-latest/.libs/rsvg-view-3 tests/fixtures/reftests/bugs/340047.svg

...

Breakpoint 1, main (argc=2, argv=0x7fffffffdd48) at rsvg-view.c:583
583         int retval = 1;
(gdb)
```
Okay! Now the rsvg-view-3 binary is fully loaded, with all its initial shared libraries. We can set breakpoints.

But what does Rust call the functions we defined? The functions we exported to C code with the #[no_mangle] attribute of course get the name we expect, but what about internal, Rust-only functions? Let's ask gdb!

Finding mangled names

I have a length.rs file which defines an RsvgLength structure with a "parse" constructor: it takes a string which is a CSS length specifier, and returns an RsvgLength structure. I'd like to debug that RsvgLength::parse(), but what is it called in the object code?

The gdb command to list all the functions it knows about is "info functions". You can pass a regexp to it to narrow down your search. I want a regexp that will match something-something-length-something-parse, so I'll use "ength.*parse". I skip the L in "Length" because I don't know how Rust mangles CamelCase struct names.
```
(gdb) info functions ength.*parse
All functions matching regular expression "ength.*parse":

File src/length.rs:
struct RsvgLength rsvg_internals::length::rsvg_length_parse(i8 *, enum class LengthDir);
static struct RsvgLength rsvg_internals::length::{{impl}}::parse(struct &str, enum class LengthDir);
```
All right! The first one, rsvg_length_parse(), is a function I exported from Rust so that C code can call it. The second one is the mangled name for the RsvgLength::parse() that I am looking for.

Printing values

Let's cut and paste the mangled name, set a breakpoint in it, and continue the execution:
```
(gdb) break rsvg_internals::length::{{impl}}::parse
Breakpoint 2 at 0x7ffff7ac6297: file src/length.rs, line 89.

(gdb) cont
Continuing.
[New Thread 0x7fffe992c700 (LWP 26360)]
[New Thread 0x7fffe912b700 (LWP 26361)]

Thread 1 "rsvg-view-3" hit Breakpoint 2, rsvg_internals::length::{{impl}}::parse (string=..., dir=Both) at src/length.rs:89
89              let (mut value, rest) = strtod (string);
(gdb)
```
Can we print values? Sure we can. I'm interested in the case where the incoming string argument contains "100%" — this will be parse()d into an RsvgLength value with length.length=1.0 and length.unit=Percent. Let's print the string argument:
```
89              let (mut value, rest) = strtod (string);
(gdb) print string
$2 = {data_ptr = 0x8bd8e0 "12.0\377\177", length = 4}
```
Rust strings are different from null-terminated C strings; they have a pointer to the char data, and a length value. Here, gdb is showing us a string that contains the four characters "12.0". I'll make this a conditional breakpoint so I can continue the execution until string comes in with a value of "100%", but I'll cheat: I'll use the C function strncmp() to test those four characters in string.data_ptr; I can't use strcmp() as the data_ptr is not null-terminated.
```
(gdb) cond 2 strncmp (string.data_ptr, "100%", 4) == 0
(gdb) cont
Continuing.

Thread 1 "rsvg-view-3" hit Breakpoint 2, rsvg_internals::length::{{impl}}::parse (string=..., dir=Vertical) at src/length.rs:89
89              let (mut value, rest) = strtod (string);
(gdb) p string
$8 = {data_ptr = 0x8bd8e0 "100%", length = 4}
```
All right! We got to the case we wanted. Let's execute this next line that has "let (mut value, rest) = strtod (string); in it, and print out the results:
```
(gdb) next
91              match rest.as_ref () {
(gdb) print value
$9 = 100
(gdb) print rest
$10 = {data_ptr = 0x8bd8e3 "%", length = 1}
```
What type did "value" get assigned?
```
(gdb) ptype value
type = f64 
```
A floating point value, as expected.

You can see that the value of rest indicates that it is a string with "%" in it. The rest of the parse() function will decide that in fact it is a CSS length specified as a percentage, and will translate our value of 100 into a normalized value of 1.0 and a length.unit of LengthUnit.Percent.

Summary

Rust generates object code with debugging information, which gets linked into your C code as usual. You can therefore use gdb on it.

Rust creates mangled names for methods. Inside gdb, you can find the mangled names with "info functions"; pass it a regexp that is close enough to the method name you are looking for, unless you want tons of function names from the whole binary and all its libraries.

You can print Rust values in gdb. Strings are special because they are not null-terminated C strings.

You can set breakpoints, conditional breakpoints, and do pretty much do all the gdb magic that you expect.

I didn't have to do anything for gdb to work with Rust. The version that comes in openSUSE Tumbleweed works fine. Maybe it's because Rust generates standard object code with debugging information, which gdb readily accepts. In any case, it works out of the box and that's just as it should be.

Mon 2016/Nov/14

Exposing Rust objects to C code

When librsvg parses an SVG file, it will encounter elements that generate path-like objects: lines, rectangles, polylines, circles, and actual path definitions. Internally, librsvg translates all of these into path definitions. For example, librsvg will read an element from the SVG that defines a rectangle like
```
<rect x="20" y="30" width="40" height="50" style="..."></rect> 
```
and translate it into a path definition with the following commands:
```
move_to (20, 30)
line_to (60, 30)
line_to (60, 80)
line_to (20, 80)
line_to (20, 30)
close_path ()
```
But where do those commands live? How are they fed into Cairo to actually draw a rectangle?

Get your Cairo right here

One of librsvg's public API entry points is rsvg_handle_render_cairo():
```
gboolean rsvg_handle_render_cairo (RsvgHandle * handle, cairo_t * cr);
```
Your program creates an appropriate Cairo surface (a window, an off-screen image, a PDF surface, whatever), obtains a cairo_t drawing context for the surface, and passes the cairo_t to librsvg using that rsvg_handle_render_cairo() function. It means, "take this parsed SVG (the handle), and render it to this cairo_t drawing context".

SVG files may look like an XML-ization of a tree of graphical objects: here is a group which contains a blue rectangle and a green circle, and here is a closed Bézier curve with a black outline and a red fill. However, SVG is more complicated than that; it allows you to define objects once and recall them later many times, it allows you to use CSS cascading rules for applying styles to objects ("all the objects in this group are green unless they define another color on their own"), to reference other SVG files, etc. The magic of librsvg is that it resolves all of that into drawing commands for Cairo.

Feeding a path into Cairo

This is easy enough: Cairo provides an API for its drawing context with functions like
```
void cairo_move_to (cairo_t *cr, double x, double y);

void cairo_line_to (cairo_t *cr, double x, double y);

void cairo_close_path (cairo_t *cr);

/* Other commands ommitted */
```
Librsvg doesn't feed paths to Cairo as soon as it parses them from the XML; that is done until rendering time. In the meantime, librsvg has to keep an intermediate representation of path data.

Librsvg uses an RsvgPathBuilder object to hold on to this path data for as long as needed. The API is simple enough:
```
pub struct RsvgPathBuilder {
   ...
}

impl RsvgPathBuilder {
    pub fn new () -> RsvgPathBuilder { ... }

    pub fn move_to (&mut self, x: f64, y: f64) { ... }

    pub fn line_to (&mut self, x: f64, y: f64) { ... }

    pub fn curve_to (&mut self, x2: f64, y2: f64, x3: f64, y3: f64, x4: f64, y4: f64) { ... }

    pub fn close_path (&mut self) { ... }
}
```
This mimics the sub-API of cairo_t to build paths, except that instead of feeding them immediately into the Cairo drawing context, RsvgPathBuilder builds an array of path commands that it will later replay to a given cairo_t. Let's look at the methods of RsvgPathBuilder.

"pub fn new () -> RsvgPathBuilder" - this doesn't take a self parameter; you could call it a static method in languages that support classes. It is just a constructor.

"pub fn move_to (&mut self, x: f64, y: f64)" - This one is a normal method, as it takes a self parameter. It also takes (x, y) double-precision floating point values for the move_to command. Note the "&mut self": this means that you must pass a mutable reference to an RsvgPathBuilder, since the method will change the builder's contents by adding a move_to command. It is a method that changes the state of the object, so it must take a mutable object.

The other methods for path commands are similar to move_to. None of them have return values; if they did, they would have a "-> ReturnType" after the argument list.

But that RsvgPathBuilder is a Rust object! And it still needs to be called from the C code in librsvg that hasn't been ported over to Rust yet. How do we do that?

Exporting an API from Rust to C

C doesn't know about objects with methods, even though you can fake them pretty well with structs and pointers to functions. Rust doesn't try to export structs with methods in a fancy way; you have to do that by hand. This is no harder than writing a GObject implementation in C, fortunately.

Let's look at the C header file for the RsvgPathBuilder object, which is entirely implemented in Rust. The C header file is rsvg-path-builder.h. Here is part of that file:
```
typedef struct _RsvgPathBuilder RsvgPathBuilder;

G_GNUC_INTERNAL
void rsvg_path_builder_move_to (RsvgPathBuilder *builder,
                                double x,
                                double y);
G_GNUC_INTERNAL
void rsvg_path_builder_line_to (RsvgPathBuilder *builder,
                                double x,
                                double y);
```
Nothing special here. RsvgPathBuilder is an opaque struct; we declare it like that just so we can take a pointer to it as in the rsvg_path_builder_move_to() and rsvg_path_builder_line_to() functions.

How about the Rust side of things? This is where it gets more interesting. This is part of path-builder.rs:
```
extern crate cairo;                                                         // 1

pub struct RsvgPathBuilder {                                                // 2
    path_segments: Vec<cairo::PathSegment>,
}

impl RsvgPathBuilder {                                                      // 3
    pub fn move_to (&mut self, x: f64, y: f64) {                            // 4
        self.path_segments.push (cairo::PathSegment::MoveTo ((x, y)));      // 5
    }
}

#[no_mangle]                                                                    // 6
pub extern fn rsvg_path_builder_move_to (raw_builder: *mut RsvgPathBuilder,     // 7
                                         x: f64,
                                         y: f64) {
    assert! (!raw_builder.is_null ());                                          // 8

    let builder: &mut RsvgPathBuilder = unsafe { &mut (*raw_builder) };         // 9

    builder.move_to (x, y);                                                     // 10
}
```
Let's look at the numbered lines:

1. We use the cairo crate from the excellent gtk-rs, the Rust binding for GTK+ and Cairo.

2. This is our Rust structure. Its fields are not important for this discussion; they are just what the struct uses to store Cairo path commands.

3. Now we begin implementing methods for that structure. These are Rust-side methods, not visible from C. In 4 and 5 we see the implementation of ::move_to(); it just creates a new cairo::PathSegment and pushes it to the vector of segments.

6. The "#[no_mangle]" line instructs the Rust compiler to put the following function name in the .a library just as it is, without any name mangling. The function name without name mangling looks just like rsvg_path_builder_move_to to the linker, as we expect. A name-mangled Rust function looks like _ZN14rsvg_internals12path_builder15RsvgPathBuilder8curve_to17h1b8f49042ff19daaE — you can explore these with "objdump -x rust/target/debug/librsvg_internals.a"

7. "pub extern fn rsvg_path_builder_move_to (raw_builder: *mut RsvgPathBuilder". This is a public function with an exported symbol in the .a file, not an internal one, as it will be called from the C code. And the "raw_builder: *mut RsvgPathBuilder" is Rust-ese for "a pointer to an RsvgPathBuilder with mutable contents". If this were only an accessor function, we would use a "*const RsvgPathBuilder" argument type.

8. "assert! (!raw_builder.is_null ());". You can read this as "g_assert (raw_builder != NULL);" if you come from GObject land.

9. "let builder: &mut RsvgPathBuilder = unsafe { &mut (*raw_builder) }". This declares a builder variable, of type &mut RsvgPathBuilder, which is a reference to a mutable path builder. The variable gets intialized with the result of "&mut (*raw_builder)": first we de-reference the raw_builder pointer with the asterisk, and convert that to a mutable reference with the &mut. De-referencing pointers that come from who-knows-where is an unsafe operation in Rust, as the compiler cannot guarantee their validity, and so we must wrap that operation with an unsafe{} block. This is like telling the compiler, "I acknowledge that this is potentially unsafe". Already this is better than life in C, where *every* de-reference is potentially dangerous; in Rust, only those that "bring in" pointers from the outside are potentially dangerous.

10. Now we have a Rust-side reference to an RsvgPathBuilder object, and we can call the builder.move_to() method as in regular Rust code.

Those are methods. And the constructor/destructor?

Excellent question! We defined an absolutely conventional method, but we haven't created a Rust object and sent it over to the C world yet. And we haven't taken a Rust object from the C world and destroyed it when we are done with it.

Construction

Here is the C prototype for the constructor, exactly as you would expect from a GObject library:
```
G_GNUC_INTERNAL
RsvgPathBuilder *rsvg_path_builder_new (void);
```
And here is the corresponding implementation in Rust:
```
#[no_mangle]
pub unsafe extern fn rsvg_path_builder_new () -> *mut RsvgPathBuilder {    // 1
    let builder = RsvgPathBuilder::new ();                                 // 2

    let boxed_builder = Box::new (builder);                                // 3

    Box::into_raw (boxed_builder)                                          // 4
}
```
1. Again, this is a public function with an exported symbol. However, this whole function is marked as unsafe since it returns a pointer, a *mut RsvgPathBuilder. To Rust this declaration means, "this pointer will be out of your control", hence the unsafe. With that we acknowledge our responsibility in handling the memory to which the pointer refers.

2. We instantiate an RsvgPathBuilder with normal Rust code...

3. ... and ensure that that object is put in the heap by Boxing it. This is a common operation in garbage-collected languages. Boxing is Rust's primitive for putting data in the program's heap; it allows the object in question to outlive the scope where it got created, i.e. the duration of the rsvg_path_builder_new()function.

4. Finally, we call Box::into_raw() to ask Rust to give us a pointer to the contents of the box, i.e. the actual RsvgPathBuilder struct that lives there. This statement doesn't end in a semicolon, so it is the return value for the function.

You could read this as "builder = g_new (...); initialize (builder); return builder;". Allocate something in the heap and initialize it, and return a pointer to it. This is exactly what the Rust code is doing.

Destruction

This is the C prototype for the destructor. This not a reference-counted GObject; it is just an internal thing in librsvg, which does not need reference counting.
```
G_GNUC_INTERNAL
void rsvg_path_builder_destroy (RsvgPathBuilder *builder);
```
And this is the implementation in Rust:
```
#[no_mangle]
pub unsafe extern fn rsvg_path_builder_destroy (raw_builder: *mut RsvgPathBuilder) {    // 1
    assert! (!raw_builder.is_null ());                                                  // 2

    let _ = Box::from_raw (raw_builder);                                                // 3
}
```
1. Same as before; we declare the whole function as public, exported, and unsafe since it takes a pointer from who-knows-where.

2. Same as in the implementation for move_to(), we assert that we got passed a non-null pointer.

3. Let's take this bit by bit. "Box::from_raw (raw_builder)" is the counterpart to Box::into_raw() from above; it takes a pointer and wraps it with a Box, which Rust knows how to de-reference into the actual object it contains. "let _ =" is to have a variable binding in the current scope (the function we are implementing). We don't care about the variable's name, so we use _ as a default name. The variable is now bound to a reference to an RsvgPathBuilder. The function terminates, and since the _ variable goes out of scope, Rust frees the memory for the RsvgPathBuilder. You can read this idiom as "g_free (builder)".

Recapitulating

Make your object. Box it. Take a pointer to it with Box::into_raw(), and send it off into the wild west. Bring back a pointer to your object. Unbox it with Box::from_raw(). Let it go out of scope if you want the object to be freed. Acknowledge your responsibilities with unsafe and that's all!

Making the functions visible to C

The code we just saw lives in path-builder.rs. By convention, the place where one actually exports the visible API from a Rust library is a file called lib.rs, and here is part of that file's contents in librsvg:
```
pub use path_builder::{
    rsvg_path_builder_new,
    rsvg_path_builder_destroy,
    rsvg_path_builder_move_to,
    rsvg_path_builder_line_to,
    rsvg_path_builder_curve_to,
    rsvg_path_builder_close_path,
    rsvg_path_builder_arc,
    rsvg_path_builder_add_to_cairo_context
};

mod path_builder; 
```
The mod path_builder indicates that lib.rs will use the path_builder sub-module. The pub use block exports the functions listed in it to the outside world. They will be visible as symbols in the .a file.

The Cargo.toml (akin to a toplevel Makefile.am) for my librsvg's little sub-library has this bit:
```
[lib]
name = "rsvg_internals"
crate-type = ["staticlib"]
```
This means that the sub-library will be called librsvg_internals.a, and it is a static library. I will link that into my master librsvg.so. If this were a stand-alone shared library entirely implemented in Rust, I would use the "cdylib" crate type instead.

Linking into the main .so

In librsvg/Makefile.am I have a very simplistic scheme for building the librsvg_internals.a library with Rust's tools, and linking the result into the main librsvg.so:
```
RUST_LIB = rust/target/debug/librsvg_internals.a

.PHONY: rust/target/debug/librsvg_internals.a
rust/target/debug/librsvg_internals.a:
	cd rust && \
	cargo build --verbose

librsvg_@RSVG_API_MAJOR_VERSION@_la_CPPFLAGS = ...

librsvg_@RSVG_API_MAJOR_VERSION@_la_CFLAGS = ...

librsvg_@RSVG_API_MAJOR_VERSION@_la_LDFLAGS = ...

librsvg_@RSVG_API_MAJOR_VERSION@_la_LIBADD = \
	$(LIBRSVG_LIBS) 	\
	$(LIBM)			\
	$(RUST_LIB)
```
This uses a .PHONY target for librsvg_internals.a, so "cargo build" will always be called on it. Cargo already takes care of dependency tracking; there is no need for make/automake to do that.

I put the filename of my library in a RUST_LIB variable, which I then reference from LIBADD. This gets librsvg_internals.a linked into the final librsvg.so.

When you run "cargo build" just like that, it creates a debug build in a target/debug subdirectory. I haven't looked for a way to make it play together with Automake when one calls "cargo build --release": that one puts things in a different directory, called target/release. Rust's tooling is more integrated that way, while in the Autotools world I'm expected to pass any CFLAGS for compilation by hand, depending on whether I'm doing a debug build or a release build. Any ideas for how to do this cleanly are appreciated.

I don't have any code in configure.ac to actually detect if Rust is present. I'm just assuming that it is for now; fixes are appreciated :)

Using the Rust functions from C

There is no difference from what we had before! This comes from rsvg-shapes.c:
```
static RsvgPathBuilder *
_rsvg_node_poly_create_builder (const char *value,
                                gboolean close_path)
{
    RsvgPathBuilder *builder;

    ...

    builder = rsvg_path_builder_new ();

    rsvg_path_builder_move_to (builder, pointlist[0], pointlist[1]);

    ...

    return builder;
}
```
Note that we are calling rsvg_path_builder_new() and rsvg_path_builder_move_to(), and returning a pointer to an RsvgPathBuilder structure as usual. However, all of those are implemented in the Rust code. The C code has no idea!

This is the magic of Rust: it allows you to move your C code bit by bit into a safe language. You don't have to do a whole rewrite in a single step. I don't know any other languages that let you do that.

Thu 2016/Nov/03

Refactoring C to make Rustification easier

In SVG, the sizes and positions of objects are not just numeric values or pixel coordinates. You can actually specify physical units ("this rectangle is 5 cm wide"), or units relative to the page ("this circle's X position is at 50% of the page's width, i.e. centered"). Librsvg's machinery for dealing with this is in two parts: parsing a length string from an SVG file into an RsvgLength structure, and normalizing those lengths to final units for rendering.

How RsvgLength is represented

The RsvgLength structure used to look like this:
```
typedef struct {
    double length;
    char factor;
} RsvgLength;
```
The parsing code would then do things like
```
RsvgLength
_rsvg_css_parse_length (const char *str)
{
    RsvgLength out;

    out.length = ...; /* parse a number with strtod() and friends */

    if (next_token_is ("pt")) { /* points */
        out.length /= 72;
	out.factor = 'i';
    } else if (next_token_is ("in")) { /* inches */
        out.factor = 'i';
    } else if (next_token_is ("em")) { /* current font's Em size */
        out.factor = 'm';
    } else if (next_token_is ("%")) { /* percent */
        out.factor = 'p';
    } else {
        out.factor = '\0';
    }
}
```
That is, it uses a char for the length.factor field, and then uses actual characters to indicate each different type. This is pretty horrible, so I changed it to use an enum:
```
typedef enum {
    LENGTH_UNIT_DEFAULT,
    LENGTH_UNIT_PERCENT,
    LENGTH_UNIT_FONT_EM,
    LENGTH_UNIT_FONT_EX,
    LENGTH_UNIT_INCH,
    LENGTH_UNIT_RELATIVE_LARGER,
    LENGTH_UNIT_RELATIVE_SMALLER
} LengthUnit;

typedef struct {
    double length;
    LengthUnit unit;
} RsvgLength;
```
We have a nice enum instead of chars, but also, the factor field is now renamed to unit. This ensures that code like
```
if (length.factor == 'p')
    ...
```
will no longer compile, and I can catch all the uses of "factor" easily. I replace them with unit as appropriate, and ensure that simply changing the chars for enums as appropriate is the right thing.

When would it not be the right thing? I'm just replacing 'p' for LENGTH_UNIT_PERCENT, right? Well, it turns out that in a couple of hacky places in the rsvg-filters code, that code put an 'n' by hand in foo.factor to really mean, "this foo length value was not specified in the SVG data".

That pattern seemed highly specific to the filters code, so instead of adding an extra LENGTH_UNIT_UNSPECIFIED, I added an extra field to the FilterPrimitive structures: when they used 'n' for primitive.foo.factor, instead they now have a primitive.foo_specified boolean flag, and the code checks for that instead of essentially monkey-patching the RsvgLength structure.

Normalizing lengths for rendering

At rendering time, these RsvgLength with their SVG-specific units need to be normalized to units that are relative to the current transformation matrix. There is a function used all over the code, called _rsvg_css_normalize_length(). This function gets called in an interesting way: one has to specify whether the length in question refers to a horizontal measure, or vertical, or both. For example, an RsvgNodeRect represents a rectangle shape, and it has x/y/w/h fields that are of type RsvgLength. When librsvg is rendering such an RsvgNodeRect, it does this:
```
static void
_rsvg_node_rect_draw (RsvgNodeRect *self, RsvgDrawingCtx *ctx)
{
    double x, y, w, h;

    x = _rsvg_css_normalize_length (&rect->x, ctx, 'h');
    y = _rsvg_css_normalize_length (&rect->y, ctx, 'v');

    w = fabs (_rsvg_css_normalize_length (&rect->w, ctx, 'h'));
    h = fabs (_rsvg_css_normalize_length (&rect->h, ctx, 'v'));

    ...
}
```
Again with the fucking chars. Those 'h' and 'v' parameters are because lengths in SVG need to be resolved relative to the width or the height (or both) of something. Sometimes that "something" is the size of the current object's parent group; sometimes it is the size of the whole page; sometimes it is the current font size. The _rsvg_css_normalize_length() function sees if it is dealing with a LENGTH_UNIT_PERCENT, for example, and will pick up page_size->width if the requested value is 'h'orizontal, or page_size->height if it is 'v'ertical. Of course I replaced all of those with an enum.

This time I didn't find hacky code like the one that would stick an 'n' in the length.factor field. Instead, I found an actual bug; a horizontal unit was using 'w' for "width", instead of 'h' for "horizontal". If these had been enums since the beginning, this bug would probably not be there.

While I appreciate the terseness of 'h' instead of LINE_DIR_HORIZONTAL, maybe we can later refactor groups of coordinates into commonly-used patterns. For example, instead of
```
patternx = _rsvg_css_normalize_length (&rsvg_pattern->x, ctx, LENGTH_DIR_HORIZONTAL);
patterny = _rsvg_css_normalize_length (&rsvg_pattern->y, ctx, LENGTH_DIR_VERTICAL);
patternw = _rsvg_css_normalize_length (&rsvg_pattern->width, ctx, LENGTH_DIR_HORIZONTAL);
patternh = _rsvg_css_normalize_length (&rsvg_pattern->height, ctx, LENGTH_DIR_VERTICAL);
```
perhaps we can have
```
normalize_lengths_for_x_y_w_h (ctx,
                               &rsvg_pattern->x,
                               &rsvg_pattern->y,
                               &rsvg_pattern->width,
                               &rsvg_pattern->height);
```
since those x/y/width/height groups get used all over the place.

And in Rust?

This is all so that when that code gets ported to Rust, it will be easier. Librsvg is old code, and it has a bunch of C-isms that either don't translate well to Rust, or are kind of horrible by themselves and could be turned into more robust C — to make the corresponding rustification obvious.

Tue 2016/Nov/01

Bézier curves, markers, and SVG's concept of directionality

In the first post in this series I introduced SVG markers, which let you put symbols along the nodes of a path. You can use them to draw arrows (arrowhead as an end marker on a line), points in a chart, and other visual effects.

In that post and in the second one, I started porting some of the code in librsvg that renders SVG markers from C to Rust. So far I've focused on the code and how it looks in Rust vs. C, and on some initial refactorings to make it feel more Rusty. I have casually mentioned Bézier segments and their tangents, and you may have an idea that SVG paths are composed of Bézier curves and straight lines, but I haven't explained what this code is really about. Why not simply walk over all the nodes in the path, and slap a marker at each one?

(Sorry. Couldn't resist.)

SVG paths

If you open an illustration program like Inkscape, you can draw paths based on Bézier curves.

Each segment is a cubic Bézier curve and can be considered independently. Let's focus on the middle segment there.

At each endpoint, the tangent direction of the curve is determined by the corresponding control point. For example, at endpoint 1 the curve goes out in the direction of control point 2, and at endpoint 4 the curve comes in from the direction of control point 3. The further away the control points are from the endpoints, the larger "pull" they will have on the curve.

Tangents at the endpoints

Let's consider the tangent direction of the curve at the endpoints. What cases do we have, especially when some of the control points are in the same place as the endpoints?

When the endpoints and the control points are all in different places (upper-left case), the tangents are easy to compute. We just subtract the vectors P2-P1 and P4-P3, respectively.

When just one of the control points coincides with one of the endpoints (second and third cases, upper row), the "missing" tangent just goes to the other control point.

In the middle row, we have the cases where both endpoints are coincident. If the control points are both in different places, we just have a curve that loops back. If just one of the control points coincides with the endpoints, the "curve" turns into a line that loops back, and its direction is towards the stray control point.

Finally, if both endpoints and both control points are in the same place, the curve is just a degenerate point, and it has no tangent directions.

Here we only care about the direction of the curve at the endpoints; we don't care about the magnitude of the tangent vectors. As a side note, Bézier curves have the nice property that they fit completely inside the convex hull of their control points: if you draw a non-crossing quadrilateral using the control points, then the curve fits completely inside that quadrilateral.

How SVG represents paths

SVG uses a representation for paths that is similar to that of PDF and its precursor, the PostScript language for printers. There is a pen with a current point. The pen can move in a line or in a curve to another point while drawing, or it can lift up and move to another point without drawing.

To create a path, you specify commands. These are the four basic commands:
- move_to (x, y) - Change the pen's current point without drawing, and begin a new subpath.
- line_to (x, y) - Draw a straight line from the current point to another point.
- curve_to (x2, y2, x3, y3, x4, y4) - Draw a Bézier curve from the current point to (x4, y4), with the control points (x2 y2) and (x3, y3).
- close_path - Draw a line from the current point back to the beginning of the current subpath (i.e. the position of the last move_to command).
For example, this sequence of commands draws a closed square path:

move_to (0, 0) line_to (10, 0) line_to (10, 10) line_to (0, 10) close_path

If we had omitted the close_path, we would have an open C shape.

SVG paths provide secondary commands that are built upon those basic ones: commands to draw horizontal or vertical lines without specifying both coordinates, commands to draw quadratic curves instead of cubic ones, and commands to draw elliptical or circular arcs. All of these can be built from, or approximated from, straight lines or cubic Bézier curves.

Let's say you have a path with two disconnected sections: move_to (0, 0), line_to (10, 0), line_to (10, 10), move_to (20, 20), line_to (30, 20).

These two sections are called subpaths. A subpath begins with a move_to command. If there were a close_path command somewhere, it would draw a line from the current point back to where the current subpath started, i.e. to the location of the last move_to command.

Markers at nodes

Repeating ourselves a bit: for each path, SVG lets you define markers. A marker is a symbol that can be automatically placed at each node along a path. For example, here is a path composed of line_to segments, and which has an arrow-shaped marker at each node:

Here, the arrow-shaped marker is defined to be orientable. Its anchor point is at the V shaped concavity of the arrow. SVG specifies the angle at which orientable markers should be placed: given a node, the angle of its marker is the average of the incoming and outgoing angles of the path segments that meet at that node. For example, at node 5 above, the incoming line comes in at 0° (Eastwards) and the outgoing line goes out at 90° (Southwards) — so the arrow marker at 5 is rotated so it points at 45° (South-East).

In the following picture we see the angle of each marker as the bisection of the incoming and outgoing angles of the respective nodes:

The nodes at the beginning and end of subpaths only have one segment that meets that node. So, the marker uses that segment's angle. For example, at node 6 the only incoming segment goes Southward, so the marker points South.

Converting paths into Segments

The path above is simple to define. The path definition is

move_to (1) line_to (2) line_to (3) line_to (4) line_to (5) line_to (6)

(Imagine that instead of those numbers, which are just for illustration purposes, we include actual x/y coordinates.)

When librsvg turns that path into Segments, they more or less look like
```
line from 1, outgoing angle East,       to 2, incoming angle East
line from 2, outgoing angle South-East, to 3, incoming angle South-East
line from 3, outgoing angle North-East, to 4, incoming angle North-East
line from 4, outgoing angle East,       to 5, incoming angle East
line from 5, outgoing angle South,      to 6, incoming angle South
```
Obviously, straight line segments (i.e. from a line_to) have the same angles at the start and the end of each segment. In contrast, curve_to segments can have different tangent angles at each end. For example, if we had a single curved segment like this:

move_to (1) curve_to (2, 3, 4)

Then the corresponding single Segment would look like this:
```
curve from 1, outgoing angle North, to 4, incoming angle South-East
```
Now you know what librsvg's function path_to_segments() does! It turns a sequence of move_to / line_to / curve_to commands into a sequence of segments, each one with angles at the start/end nodes of the segment.

Paths with zero-length segments

Let's go back to our path made up of line segments, the one that looks like this:

However, imagine that for some reason the path contains duplicated, contiguous nodes. If we specified the path as

move_to (1) line_to (2) line_to (3) line_to (3) line_to (3) line_to (3) line_to (4) line_to (5) line_to (6)

Then our rendered path would look the same, with duplicated nodes at 3:

But now when librsvg turns that into Segments, they would look like
```
  line from 1, outgoing angle East,       to 2, incoming angle East
  line from 2, outgoing angle South-East, to 3, incoming angle South-East
  line from 3, to 3, no angles since this is a zero-length segment
* line from 3, to 3, no angles since this is a zero-length segment
  line from 3, outgoing angle North-East, to 4, incoming angle North-East
  line from 4, outgoing angle East,       to 5, incoming angle East
  line from 5, outgoing angle South,      to 6, incoming angle South
```
When librsvg has to draw the markers for this path, it has to compute the marker's angle at each node. However, in the starting node for the segment marked with a (*) above, there is no angle! In this case, the SVG spec says that you have to walk the path backwards until you find a segment which has an angle, and then forwards until you find another segment with an angle, and then take their average angles and use them for the (*) node. Visually this makes sense: you don't see where there are contiguous duplicated nodes, but you certainly see lines coming out of that vertex. The algorithm finds those lines and takes their average angles for the marker.

Now you know where our exotic names find_incoming_directionality_backwards() and find_outgoing_directionality_forwards() come from!

Next up: refactoring C to make Rustification easier.