Containing mutability in GObjects

- Tags: gnome, librsvg, refactoring, rust

Traditionally, GObject implementations in C are mutable: you instantiate a GObject and then change its state via method calls. Sometimes this is expected and desired; a GtkCheckButton widget certainly can change its internal state from pressed to not pressed, for example.

Other times, objects are mutable while they are being "assembled" or "configured", and only yield a final immutable result until later. This is the case for RsvgHandle from librsvg.

Please bear with me while I write about the history of the RsvgHandle API and why it ended up with different ways of doing the same thing.

The traditional RsvgHandle API

The final purpose of an RsvgHandle is to represent an SVG document loaded in memory. Once it is loaded, the SVG document does not change, as librsvg does not support animation or creating/removing SVG elements; it is a static renderer.

However, before an RsvgHandle achieves its immutable state, it has to be loaded first. Loading can be done in two ways:

  • The historical/deprecated way, using the rsvg_handle_write() and rsvg_handle_close() APIs. Plenty of code in GNOME used this write/close idiom before GLib got a good abstraction for streams; you can see another example in GdkPixbufLoader. The idea is that applications do this:
file = open a file...;
handle = rsvg_handle_new ();

while (file has more data) {
   rsvg_handle_write(handle, a bit of data);
}

rsvg_handle_close (handle);

// now the handle is fully loaded and immutable

rsvg_handle_render (handle, ...);
file = g_file_new_for_path ("/foo/bar.svg");
stream = g_file_read (file, ...);
handle = rsvg_handle_new ();

rsvg_handle_read_stream_sync (handle, stream, ...);

// now the handle is fully loaded and immutable

rsvg_handle_render (handle, ...);

A bit of history

Let's consider a few of RsvgHandle's functions.

Constructors:

  • rsvg_handle_new()
  • rsvg_handle_new_with_flags()

Configure the handle for loading:

  • rsvg_handle_set_base_uri()
  • rsvg_handle_set_base_gfile()

Deprecated loading API:

  • rsvg_handle_write()
  • rsvg_handle_close()

Streaming API:

  • rsvg_handle_read_stream_sync()

When librsvg first acquired the concept of an RsvgHandle, it just had rsvg_handle_new() with no arguments. About 9 years later, it got rsvg_handle_new_with_flags() to allow more options, but it took another 2 years to actually add some usable flags — the first one was to configure the parsing limits in the underlying calls to libxml2.

About 3 years after RsvgHandle appeared, it got rsvg_handle_set_base_uri() to configure the "base URI" against which relative references in the SVG document get resolved. For example, if you are reading /foo/bar.svg and it contains an element like <image xlink:ref="smiley.png"/>, then librsvg needs to be able to produce the path /foo/smiley.png and that is done relative to the base URI. (The base URI is implicit when reading from a specific SVG file, but it needs to be provided when reading from an arbitrary stream that may not even come from a file.)

Initially RsvgHandle had the write/close APIs, and 8 years later it got the streaming functions once GIO appeared. Eventually the streaming API would be the preferred one, instead of just being a convenience for those brave new apps that started using GIO.

A summary of librsvg's API may be something like:

  • librsvg gets written initially; it doesn't even have an RsvgHandle, and just provides a single function which takes a FILE * and renders it to a GdkPixbuf.

  • That gets replaced with RsvgHandle, its single rsvg_handle_new() constructor, and the write/close API to feed it data progressively.

  • GIO appears, we get the first widespread streaming APIs in GNOME, and RsvgHandle gets the ability to read from streams.

  • RsvgHandle gets rsvg_handle_new_with_flags() because now apps may want to configure extra stuff for libxml2.

  • When Cairo appears and librsvg is ported to it, RsvgHandle gets an extra flag so that SVGs rendered to PDF can embed image data efficiently.

It's a convoluted history, but git log -- rsvg.h makes it accessible.

Where is the mutability?

An RsvgHandle gets created, with flags or without. It's empty, and doesn't know if it will be given data with the write/close API or with the streaming API. Also, someone may call set_base_uri() on it. So, the handle must remain mutable while it is being populated with data. After that, it can say, "no more changes, I'm done".

In C, this doesn't even have a name. Everything is mutable by default all the time. This monster was the private data of RsvgHandle before it got ported to Rust:

struct RsvgHandlePrivate {
    // set during construction
    RsvgHandleFlags flags;

    // GObject-ism
    gboolean is_disposed;

    // Extra crap for a deprecated API
    RsvgSizeFunc size_func;
    gpointer user_data;
    GDestroyNotify user_data_destroy;

    // Data only used while parsing an SVG
    RsvgHandleState state;
    RsvgDefs *defs;
    guint nest_level;
    RsvgNode *currentnode;
    RsvgNode *treebase;
    GHashTable *css_props;
    RsvgSaxHandler *handler;
    int handler_nest;
    GHashTable *entities;
    xmlParserCtxtPtr ctxt;
    GError **error;
    GCancellable *cancellable;
    GInputStream *compressed_input_stream;

    // Data only used while rendering
    double dpi_x;
    double dpi_y;

    // The famous base URI, set before loading
    gchar *base_uri;
    GFile *base_gfile;

    // Some internal stuff
    gboolean in_loop;
    gboolean is_testing;
};

"Single responsibility principle"? This is a horror show. That RsvgHandlePrivate struct has all of these:

  • Data only settable during construction (flags)
  • Data set after construction, but which may only be set before loading (base URI)
  • Highly mutable data used only during the loading stage: state machines, XML parsers, a stack of XML elements, CSS properties...
  • The DPI (dots per inch) values only used during rendering.
  • Assorted fields used at various stages of the handle's life.

It took a lot of refactoring to get the code to a point where it was clear that an RsvgHandle in fact has distinct stages during its lifetime, and that some of that data should only live during a particular stage. Before, everything seemed a jumble of fields, used at various unclear points in the code (for the struct listing above, I've grouped related fields together — they were somewhat shuffled in the original code!).

What would a better separation look like?

In the master branch, now librsvg has this:

/// Contains all the interior mutability for a RsvgHandle to be called
/// from the C API.
pub struct CHandle {
    dpi: Cell<Dpi>,
    load_flags: Cell<LoadFlags>,

    base_url: RefCell<Option<Url>>,
    // needed because the C api returns *const char
    base_url_cstring: RefCell<Option<CString>>,

    size_callback: RefCell<SizeCallback>,
    is_testing: Cell<bool>,
    load_state: RefCell<LoadState>,
}

Internally, that CHandle struct is now the private data of the public RsvgHandle object. Note that all of CHandle's fields are a Cell<> or RefCell<>: in Rust terms, this means that those fields allow for "interior mutability" in the CHandle struct: they can be modified after intialization.

The last field's cell, load_state, contains this type:

enum LoadState {
    Start,

    // Being loaded using the legacy write()/close() API
    Loading { buffer: Vec<u8> },

    // Fully loaded, with a Handle to an SVG document
    ClosedOk { handle: Handle },

    ClosedError,
}

A CHandle starts in the Start state, where it doesn't know if it will be loaded with a stream, or with the legacy write/close API.

If the caller uses the write/close API, the handle moves to the Loading state, which has a buffer where it accumulates the data being fed to it.

But if the caller uses the stream API, the handle tries to parse an SVG document from the stream, and it moves either to the ClosedOk state, or to the ClosedError state if there is a parse error.

Correspondingly, when using the write/close API, when the caller finally calls rsvg_handle_close(), the handle creates a stream for the buffer, parses it, and also gets either into the ClosedOk or ClosedError state.

If you look at the variant ClosedOk { handle: Handle }, it contains a fully loaded Handle inside, which right now is just a wrapper around a reference-counted Svg object:

pub struct Handle {
    svg: Rc<Svg>,
}

The reason why LoadState::ClosedOk does not contain an Rc<Svg> directly, and instead wraps it with a Handle, is that this is just the first pass at refactoring. Also, Handle contains some API-level logic which I'm not completely sure makes sense as a lower-level Svg object. We'll see.

Couldn't you move more of CHandle's fields into LoadState?

Sort of, kind of, but the public API still lets one do things like call rsvg_handle_get_base_uri() after the handle is fully loaded, even though its result will be of little value. So, the fields that hold the base_uri information are kept in the longer-lived CHandle, not in the individual variants of LoadState.

How does this look from the Rust API?

CHandle implements the public C API of librsvg. Internally, Handle implements the basic "load from stream", "get the geometry of an SVG element", and "render to a Cairo context" functionality.

This basic functionality gets exported in a cleaner way through the Rust API, discussed previously. There is no interior mutability in there at all; that API uses a builder pattern to gradually configure an SVG loader, which returns a fully loaded SvgHandle, out of which you can create a CairoRenderer.

In fact, it may be possible to refactor all of this a bit and implement CHandle directly in terms of the new Rust API: in effect, use CHandle as the "holding space" while the SVG loader gets configured, and later turned into a fully loaded SvgHandle internally.

Conclusion

The C version of RsvgHandle's private structure used to have a bunch of fields. Without knowing the code, it was hard to know that they belonged in groups, and each group corresponded roughtly to a stage in the handle's lifetime.

It took plenty of refactoring to get the fields split up cleanly in librsvg's internals. The process of refactoring RsvgHandle's fields, and ensuring that the various states of a handle are consistent, in fact exposed a few bugs where the state was not being checked appropriately. The public C API remains the same as always, but has better internal checks now.

GObject APIs tend to allow for a lot of mutability via methods that change the internal state of objects. For RsvgHandle, it was possible to change this into a single CHandle that maintains the mutable data in a contained fashion, and later translates it internally into an immutable Handle that represents a fully-loaded SVG document. This scheme ties in well with the new Rust API for librsvg, which keeps everything immutable after creation.