Traditionally, GObject implementations in C are mutable: you
instantiate a GObject and then change its state via method calls.
Sometimes this is expected and desired; a GtkCheckButton widget
certainly can change its internal state from pressed to not pressed,
for example.
Other times, objects are mutable while they are being "assembled" or
"configured", and only yield a final immutable result until later.
This is the case for RsvgHandle from librsvg.
Please bear with me while I write about the history of the
RsvgHandle API and why it ended up with different ways of doing the
same thing.
The traditional RsvgHandle API
The final purpose of an RsvgHandle is to represent an SVG document
loaded in memory. Once it is loaded, the SVG document does not
change, as librsvg does not support animation or creating/removing SVG
elements; it is a static renderer.
However, before an RsvgHandle achieves its immutable state, it has
to be loaded first. Loading can be done in two ways:
- The historical/deprecated way, using the
rsvg_handle_write()andrsvg_handle_close()APIs. Plenty of code in GNOME used thiswrite/closeidiom before GLib got a good abstraction for streams; you can see another example inGdkPixbufLoader. The idea is that applications do this:
file = open a file...;
handle = rsvg_handle_new ();
while (file has more data) {
rsvg_handle_write(handle, a bit of data);
}
rsvg_handle_close (handle);
// now the handle is fully loaded and immutable
rsvg_handle_render (handle, ...);
- The streaming way, with
rsvg_handle_read_stream_sync(), which takes aGInputStream, or one of the convenience functions which take aGFileand produce a stream from it.
file = g_file_new_for_path ("/foo/bar.svg");
stream = g_file_read (file, ...);
handle = rsvg_handle_new ();
rsvg_handle_read_stream_sync (handle, stream, ...);
// now the handle is fully loaded and immutable
rsvg_handle_render (handle, ...);
A bit of history
Let's consider a few of RsvgHandle's functions.
Constructors:
rsvg_handle_new()rsvg_handle_new_with_flags()
Configure the handle for loading:
rsvg_handle_set_base_uri()rsvg_handle_set_base_gfile()
Deprecated loading API:
rsvg_handle_write()rsvg_handle_close()
Streaming API:
rsvg_handle_read_stream_sync()
When librsvg first acquired the concept of an RsvgHandle, it just
had rsvg_handle_new() with no arguments. About 9 years later, it
got rsvg_handle_new_with_flags() to allow more options, but it took
another 2 years to actually add some usable flags — the first one was
to configure the parsing limits in the underlying calls to libxml2.
About 3 years after RsvgHandle appeared, it got
rsvg_handle_set_base_uri() to configure the "base URI" against which
relative references in the SVG document get resolved. For example, if
you are reading /foo/bar.svg and it contains an element like <image
xlink:ref="smiley.png"/>, then librsvg needs to be able to produce
the path /foo/smiley.png and that is done relative to the base URI.
(The base URI is implicit when reading from a specific SVG file, but
it needs to be provided when reading from an arbitrary stream that may
not even come from a file.)
Initially RsvgHandle had the write/close APIs, and 8 years later
it got the streaming functions once GIO appeared. Eventually the
streaming API would be the preferred one, instead of just being a
convenience for those brave new apps that started using GIO.
A summary of librsvg's API may be something like:
-
librsvg gets written initially; it doesn't even have an
RsvgHandle, and just provides a single function which takes aFILE *and renders it to aGdkPixbuf. -
That gets replaced with
RsvgHandle, its singlersvg_handle_new()constructor, and thewrite/closeAPI to feed it data progressively. -
GIO appears, we get the first widespread streaming APIs in GNOME, and
RsvgHandlegets the ability to read from streams. -
RsvgHandlegetsrsvg_handle_new_with_flags()because now apps may want to configure extra stuff for libxml2. -
When Cairo appears and librsvg is ported to it,
RsvgHandlegets an extra flag so that SVGs rendered to PDF can embed image data efficiently.
It's a convoluted history, but git log -- rsvg.h makes it accessible.
Where is the mutability?
An RsvgHandle gets created, with flags or without. It's empty, and
doesn't know if it will be given data with the write/close API or
with the streaming API. Also, someone may call set_base_uri() on
it. So, the handle must remain mutable while it is being populated
with data. After that, it can say, "no more changes, I'm done".
In C, this doesn't even have a name. Everything is mutable by default
all the time. This monster was the private data of RsvgHandle
before it got ported to Rust:
struct RsvgHandlePrivate {
// set during construction
RsvgHandleFlags flags;
// GObject-ism
gboolean is_disposed;
// Extra crap for a deprecated API
RsvgSizeFunc size_func;
gpointer user_data;
GDestroyNotify user_data_destroy;
// Data only used while parsing an SVG
RsvgHandleState state;
RsvgDefs *defs;
guint nest_level;
RsvgNode *currentnode;
RsvgNode *treebase;
GHashTable *css_props;
RsvgSaxHandler *handler;
int handler_nest;
GHashTable *entities;
xmlParserCtxtPtr ctxt;
GError **error;
GCancellable *cancellable;
GInputStream *compressed_input_stream;
// Data only used while rendering
double dpi_x;
double dpi_y;
// The famous base URI, set before loading
gchar *base_uri;
GFile *base_gfile;
// Some internal stuff
gboolean in_loop;
gboolean is_testing;
};
"Single responsibility principle"? This is a horror show. That
RsvgHandlePrivate struct has all of these:
- Data only settable during construction (flags)
- Data set after construction, but which may only be set before loading (base URI)
- Highly mutable data used only during the loading stage: state machines, XML parsers, a stack of XML elements, CSS properties...
- The DPI (dots per inch) values only used during rendering.
- Assorted fields used at various stages of the handle's life.
It took a lot of refactoring to get the code to a point where it was
clear that an RsvgHandle in fact has distinct stages during its
lifetime, and that some of that data should only live during a
particular stage. Before, everything seemed a jumble of fields, used
at various unclear points in the code (for the struct listing above,
I've grouped related fields together — they were somewhat shuffled in
the original code!).
What would a better separation look like?
In the master branch, now librsvg has this:
/// Contains all the interior mutability for a RsvgHandle to be called
/// from the C API.
pub struct CHandle {
dpi: Cell<Dpi>,
load_flags: Cell<LoadFlags>,
base_url: RefCell<Option<Url>>,
// needed because the C api returns *const char
base_url_cstring: RefCell<Option<CString>>,
size_callback: RefCell<SizeCallback>,
is_testing: Cell<bool>,
load_state: RefCell<LoadState>,
}
Internally, that CHandle struct is now the private data of the
public RsvgHandle object. Note that all of CHandle's fields are a
Cell<> or RefCell<>: in Rust terms, this means that those fields
allow for "interior mutability" in the CHandle struct: they can be
modified after intialization.
The last field's cell, load_state, contains this type:
enum LoadState {
Start,
// Being loaded using the legacy write()/close() API
Loading { buffer: Vec<u8> },
// Fully loaded, with a Handle to an SVG document
ClosedOk { handle: Handle },
ClosedError,
}
A CHandle starts in the Start state, where it doesn't know if it
will be loaded with a stream, or with the legacy write/close API.
If the caller uses the write/close API, the handle moves to the
Loading state, which has a buffer where it accumulates the data
being fed to it.
But if the caller uses the stream API, the handle tries to parse an
SVG document from the stream, and it moves either to the ClosedOk
state, or to the ClosedError state if there is a parse error.
Correspondingly, when using the write/close API, when the caller
finally calls rsvg_handle_close(), the handle creates a stream for
the buffer, parses it, and also gets either into the ClosedOk or
ClosedError state.
If you look at the variant ClosedOk { handle: Handle }, it contains
a fully loaded Handle inside, which right now is just a wrapper
around a reference-counted Svg object:
pub struct Handle {
svg: Rc<Svg>,
}
The reason why LoadState::ClosedOk does not contain an Rc<Svg>
directly, and instead wraps it with a Handle, is that this is just
the first pass at refactoring. Also, Handle contains some
API-level logic which I'm not completely sure makes sense as a
lower-level Svg object. We'll see.
Couldn't you move more of CHandle's fields into LoadState?
Sort of, kind of, but the public API still lets one do things like
call rsvg_handle_get_base_uri() after the handle is fully loaded,
even though its result will be of little value. So, the fields that
hold the base_uri information are kept in the longer-lived
CHandle, not in the individual variants of LoadState.
How does this look from the Rust API?
CHandle implements the public C API of librsvg. Internally,
Handle implements the basic "load from stream", "get the geometry of
an SVG element", and "render to a Cairo context" functionality.
This basic functionality gets exported in a cleaner way through the
Rust API, discussed previously. There is no
interior mutability in there at all; that API uses a builder pattern
to gradually configure an SVG loader, which returns a fully loaded
SvgHandle, out of which you can create a CairoRenderer.
In fact, it may be possible to refactor all of this a bit and
implement CHandle directly in terms of the new Rust API: in effect,
use CHandle as the "holding space" while the SVG loader gets
configured, and later turned into a fully loaded SvgHandle
internally.
Conclusion
The C version of RsvgHandle's private structure used to have a bunch
of fields. Without knowing the code, it was hard to know that they
belonged in groups, and each group corresponded roughtly to a stage in
the handle's lifetime.
It took plenty of refactoring to get the fields split up cleanly in
librsvg's internals. The process of refactoring RsvgHandle's fields,
and ensuring that the various states of a handle are consistent, in
fact exposed a few bugs where the state was not being checked
appropriately. The public C API remains the same as always, but has
better internal checks now.
GObject APIs tend to allow for a lot of mutability via methods that
change the internal state of objects. For RsvgHandle, it was possible
to change this into a single CHandle that maintains the mutable data
in a contained fashion, and later translates it internally into an
immutable Handle that represents a fully-loaded SVG document. This
scheme ties in well with the new Rust API for librsvg, which keeps
everything immutable after creation.