Traditionally, GObject implementations in C are mutable: you
instantiate a GObject and then change its state via method calls.
Sometimes this is expected and desired; a GtkCheckButton
widget
certainly can change its internal state from pressed to not pressed,
for example.
Other times, objects are mutable while they are being "assembled" or
"configured", and only yield a final immutable result until later.
This is the case for RsvgHandle
from librsvg.
Please bear with me while I write about the history of the
RsvgHandle
API and why it ended up with different ways of doing the
same thing.
The traditional RsvgHandle API
The final purpose of an RsvgHandle
is to represent an SVG document
loaded in memory. Once it is loaded, the SVG document does not
change, as librsvg does not support animation or creating/removing SVG
elements; it is a static renderer.
However, before an RsvgHandle
achieves its immutable state, it has
to be loaded first. Loading can be done in two ways:
- The historical/deprecated way, using the
rsvg_handle_write()
andrsvg_handle_close()
APIs. Plenty of code in GNOME used thiswrite/close
idiom before GLib got a good abstraction for streams; you can see another example inGdkPixbufLoader
. The idea is that applications do this:
file = open a file...;
handle = rsvg_handle_new ();
while (file has more data) {
rsvg_handle_write(handle, a bit of data);
}
rsvg_handle_close (handle);
// now the handle is fully loaded and immutable
rsvg_handle_render (handle, ...);
- The streaming way, with
rsvg_handle_read_stream_sync()
, which takes aGInputStream
, or one of the convenience functions which take aGFile
and produce a stream from it.
file = g_file_new_for_path ("/foo/bar.svg");
stream = g_file_read (file, ...);
handle = rsvg_handle_new ();
rsvg_handle_read_stream_sync (handle, stream, ...);
// now the handle is fully loaded and immutable
rsvg_handle_render (handle, ...);
A bit of history
Let's consider a few of RsvgHandle
's functions.
Constructors:
rsvg_handle_new()
rsvg_handle_new_with_flags()
Configure the handle for loading:
rsvg_handle_set_base_uri()
rsvg_handle_set_base_gfile()
Deprecated loading API:
rsvg_handle_write()
rsvg_handle_close()
Streaming API:
rsvg_handle_read_stream_sync()
When librsvg first acquired the concept of an RsvgHandle
, it just
had rsvg_handle_new()
with no arguments. About 9 years later, it
got rsvg_handle_new_with_flags()
to allow more options, but it took
another 2 years to actually add some usable flags — the first one was
to configure the parsing limits in the underlying calls to libxml2.
About 3 years after RsvgHandle
appeared, it got
rsvg_handle_set_base_uri()
to configure the "base URI" against which
relative references in the SVG document get resolved. For example, if
you are reading /foo/bar.svg
and it contains an element like <image
xlink:ref="smiley.png"/>
, then librsvg needs to be able to produce
the path /foo/smiley.png
and that is done relative to the base URI.
(The base URI is implicit when reading from a specific SVG file, but
it needs to be provided when reading from an arbitrary stream that may
not even come from a file.)
Initially RsvgHandle
had the write/close
APIs, and 8 years later
it got the streaming functions once GIO appeared. Eventually the
streaming API would be the preferred one, instead of just being a
convenience for those brave new apps that started using GIO.
A summary of librsvg's API may be something like:
-
librsvg gets written initially; it doesn't even have an
RsvgHandle
, and just provides a single function which takes aFILE *
and renders it to aGdkPixbuf
. -
That gets replaced with
RsvgHandle
, its singlersvg_handle_new()
constructor, and thewrite/close
API to feed it data progressively. -
GIO appears, we get the first widespread streaming APIs in GNOME, and
RsvgHandle
gets the ability to read from streams. -
RsvgHandle
getsrsvg_handle_new_with_flags()
because now apps may want to configure extra stuff for libxml2. -
When Cairo appears and librsvg is ported to it,
RsvgHandle
gets an extra flag so that SVGs rendered to PDF can embed image data efficiently.
It's a convoluted history, but git log -- rsvg.h
makes it accessible.
Where is the mutability?
An RsvgHandle
gets created, with flags or without. It's empty, and
doesn't know if it will be given data with the write/close
API or
with the streaming API. Also, someone may call set_base_uri()
on
it. So, the handle must remain mutable while it is being populated
with data. After that, it can say, "no more changes, I'm done".
In C, this doesn't even have a name. Everything is mutable by default
all the time. This monster was the private data of RsvgHandle
before it got ported to Rust:
struct RsvgHandlePrivate {
// set during construction
RsvgHandleFlags flags;
// GObject-ism
gboolean is_disposed;
// Extra crap for a deprecated API
RsvgSizeFunc size_func;
gpointer user_data;
GDestroyNotify user_data_destroy;
// Data only used while parsing an SVG
RsvgHandleState state;
RsvgDefs *defs;
guint nest_level;
RsvgNode *currentnode;
RsvgNode *treebase;
GHashTable *css_props;
RsvgSaxHandler *handler;
int handler_nest;
GHashTable *entities;
xmlParserCtxtPtr ctxt;
GError **error;
GCancellable *cancellable;
GInputStream *compressed_input_stream;
// Data only used while rendering
double dpi_x;
double dpi_y;
// The famous base URI, set before loading
gchar *base_uri;
GFile *base_gfile;
// Some internal stuff
gboolean in_loop;
gboolean is_testing;
};
"Single responsibility principle"? This is a horror show. That
RsvgHandlePrivate
struct has all of these:
- Data only settable during construction (flags)
- Data set after construction, but which may only be set before loading (base URI)
- Highly mutable data used only during the loading stage: state machines, XML parsers, a stack of XML elements, CSS properties...
- The DPI (dots per inch) values only used during rendering.
- Assorted fields used at various stages of the handle's life.
It took a lot of refactoring to get the code to a point where it was
clear that an RsvgHandle
in fact has distinct stages during its
lifetime, and that some of that data should only live during a
particular stage. Before, everything seemed a jumble of fields, used
at various unclear points in the code (for the struct listing above,
I've grouped related fields together — they were somewhat shuffled in
the original code!).
What would a better separation look like?
In the master branch, now librsvg has this:
/// Contains all the interior mutability for a RsvgHandle to be called
/// from the C API.
pub struct CHandle {
dpi: Cell<Dpi>,
load_flags: Cell<LoadFlags>,
base_url: RefCell<Option<Url>>,
// needed because the C api returns *const char
base_url_cstring: RefCell<Option<CString>>,
size_callback: RefCell<SizeCallback>,
is_testing: Cell<bool>,
load_state: RefCell<LoadState>,
}
Internally, that CHandle
struct is now the private data of the
public RsvgHandle
object. Note that all of CHandle
's fields are a
Cell<>
or RefCell<>
: in Rust terms, this means that those fields
allow for "interior mutability" in the CHandle
struct: they can be
modified after intialization.
The last field's cell, load_state
, contains this type:
enum LoadState {
Start,
// Being loaded using the legacy write()/close() API
Loading { buffer: Vec<u8> },
// Fully loaded, with a Handle to an SVG document
ClosedOk { handle: Handle },
ClosedError,
}
A CHandle
starts in the Start
state, where it doesn't know if it
will be loaded with a stream, or with the legacy write/close API.
If the caller uses the write/close API, the handle moves to the
Loading
state, which has a buffer
where it accumulates the data
being fed to it.
But if the caller uses the stream API, the handle tries to parse an
SVG document from the stream, and it moves either to the ClosedOk
state, or to the ClosedError
state if there is a parse error.
Correspondingly, when using the write/close API, when the caller
finally calls rsvg_handle_close()
, the handle creates a stream for
the buffer
, parses it, and also gets either into the ClosedOk
or
ClosedError
state.
If you look at the variant ClosedOk { handle: Handle }
, it contains
a fully loaded Handle
inside, which right now is just a wrapper
around a reference-counted Svg
object:
pub struct Handle {
svg: Rc<Svg>,
}
The reason why LoadState::ClosedOk
does not contain an Rc<Svg>
directly, and instead wraps it with a Handle
, is that this is just
the first pass at refactoring. Also, Handle
contains some
API-level logic which I'm not completely sure makes sense as a
lower-level Svg
object. We'll see.
Couldn't you move more of CHandle
's fields into LoadState
?
Sort of, kind of, but the public API still lets one do things like
call rsvg_handle_get_base_uri()
after the handle is fully loaded,
even though its result will be of little value. So, the fields that
hold the base_uri
information are kept in the longer-lived
CHandle
, not in the individual variants of LoadState
.
How does this look from the Rust API?
CHandle
implements the public C API of librsvg. Internally,
Handle
implements the basic "load from stream", "get the geometry of
an SVG element", and "render to a Cairo context" functionality.
This basic functionality gets exported in a cleaner way through the
Rust API, discussed previously. There is no
interior mutability in there at all; that API uses a builder pattern
to gradually configure an SVG loader, which returns a fully loaded
SvgHandle
, out of which you can create a CairoRenderer
.
In fact, it may be possible to refactor all of this a bit and
implement CHandle
directly in terms of the new Rust API: in effect,
use CHandle
as the "holding space" while the SVG loader gets
configured, and later turned into a fully loaded SvgHandle
internally.
Conclusion
The C version of RsvgHandle
's private structure used to have a bunch
of fields. Without knowing the code, it was hard to know that they
belonged in groups, and each group corresponded roughtly to a stage in
the handle's lifetime.
It took plenty of refactoring to get the fields split up cleanly in
librsvg's internals. The process of refactoring RsvgHandle
's fields,
and ensuring that the various states of a handle are consistent, in
fact exposed a few bugs where the state was not being checked
appropriately. The public C API remains the same as always, but has
better internal checks now.
GObject APIs tend to allow for a lot of mutability via methods that
change the internal state of objects. For RsvgHandle
, it was possible
to change this into a single CHandle
that maintains the mutable data
in a contained fashion, and later translates it internally into an
immutable Handle
that represents a fully-loaded SVG document. This
scheme ties in well with the new Rust API for librsvg, which keeps
everything immutable after creation.