Exposing C and Rust APIs: some thoughts from librsvg

Librsvg exports two public APIs: the C API that is in turn available to other languages through GObject Introspection, and the Rust API.

You could call this a use of the facade pattern on top of the rsvg_internals crate. That crate is the actual implementation of librsvg, and exports an interface with many knobs that are not exposed from the public APIs. The knobs are to allow for the variations in each of those APIs.

This post is about some interesting things that have come up during the creation/separation of those public APIs, and the implications of having an internals library that implements both.

Initial code organization

When librsvg was being ported to Rust, it just had an rsvg_internals crate that compiled as a staticlib to a .a library, which was later linked into the final librsvg.so.

Eventually the code got to the point where it was feasible to port the toplevel C API to Rust. This was relatively easy to do, since everything else underneath was already in Rust. At that point I became interested in also having a Rust API for librsvg — first to port the test suite to Rust and be able to run tests in parallel, and then to actually have a public API in Rust with more modern idioms than the historical, GObject-based API in C.

Version 2.45.5, from February 2019, is the last release that only had a C API.

Most of the C API of librsvg is in the RsvgHandle class. An RsvgHandle gets loaded with SVG data from a file or a stream, and then gets rendered to a Cairo context. The naming of Rust source files more or less matched the C source files, so where there was rsvg-handle.c initially, later we had handle.rs with the Rustified part of that code.

So, handle.rs had the Rust internals of the RsvgHandle class, and a bunch of extern "C" functions callable from C. For example, for this function in the public C API:

void rsvg_handle_set_base_gfile (RsvgHandle *handle,
                                 GFile      *base_file);

The corresponding Rust implementation was this:

#[no_mangle]
pub unsafe extern "C" fn rsvg_handle_rust_set_base_gfile(
    raw_handle: *mut RsvgHandle,
    raw_gfile: *mut gio_sys::GFile,
) {
    let rhandle = get_rust_handle(raw_handle);        // 1

    assert!(!raw_gfile.is_null());                    // 2
    let file: gio::File = from_glib_none(raw_gfile);  // 3

    rhandle.set_base_gfile(&file);                    // 4
}

Get the Rust struct corresponding to the C GObject.
Check the arguments.
Convert from C GObject reference to Rust reference.
Call the actual implementation of set_base_gfile in the Rust struct.

You can see that this function takes in arguments with C types, and converts them to Rust types. It's basically just glue between the C code and the actual implementation.

Then, the actual implementation of set_base_gfile looked like this:

impl Handle {
    fn set_base_gfile(&self, file: &gio::File) {
        if let Some(uri) = file.get_uri() {
            self.set_base_url(&uri);
        } else {
            rsvg_g_warning("file has no URI; will not set the base URI");
        }
    }
}

This is an actual method for a Rust Handle struct, and takes Rust types as arguments — no conversions are necessary here. However, there is a pesky call to rsvg_g_warning, about which I'll talk later.

I found it cleanest, although not the shortest code, to structure things like this:

C code: bunch of stub functions where rsvg_blah just calls a corresponding rsvg_rust_blah.
Toplevel Rust code: bunch of #[no_mangle] unsafe extern "C" fn rust_blah() that convert from C argument types to Rust types, and call safe Rust functions — for librsvg, these happened to be methods for a struct. Before returning, the toplevel functions convert Rust return values to C return values, and do things like converting the Err(E) of a Result<> into a GError or a boolean or whatever the traditional C API required.

In the very first versions of the code where the public API was implemented in Rust, the extern "C" functions actually contained their implementation. However, after some refactoring, it turned out to be cleaner to leave those functions just with the task of converting C to Rust types and vice-versa, and put the actual implementation in very Rust-y code. This made it easier to keep the unsafe conversion code (unsafe because it deals with raw pointers coming from C) only in the toplevel functions.

Growing out a Rust API

This commit is where the new, public Rust API started. That commit just created a Cargo workspace with two crates; the rsvg_internals crate that we already had, and a librsvg_crate with the public Rust API.

The commits over the subsequent couple of months are of intense refactoring:

This commit moves the unsafe extern "C" functions to a separate c_api.rs source file. This leaves handle.rs with only the safe Rust implementation of the toplevel API, and c_api.rs with the unsafe entry points that mostly just convert argument types, return values, and errors.
The API primitives get expanded to allow for a public Rust API that is "hard to misuse" unlike the C API, which needs to be called in a certain order.

Needing to call a C macro

However, there was a little problem. The Rust code cannot call g_warning, a C macro in glib that prints a message to stderr or uses structured logging. Librsvg used that to signal conditions where something went (recoverably) wrong, but there was no way to return a proper error code to the caller — it's mainly used as a debugging aid.

This is what the rsvg_internals used to be able to call that C macro:

First, the C code exports a function that just calls the macro:

/* This function exists just so that we can effectively call g_warning() from Rust,
 * since glib-rs doesn't bind the g_log functions yet.
 */
void
rsvg_g_warning_from_c(const char *msg)
{
    g_warning ("%s", msg);
}

Second, the Rust code binds that function to be callable from Rust:

pub fn rsvg_g_warning(msg: &str) {
    extern "C" {
        fn rsvg_g_warning_from_c(msg: *const libc::c_char);
    }

    unsafe {
        rsvg_g_warning_from_c(msg.to_glib_none().0);
    }
}

However! Since the standalone librsvg_crate does not link to the C code from the public librsvg.so, the helper rsvg_g_warning_from_c is not available!

A configuration feature for the internals library

And yet! Those warnings are only meaningful for the C API, which is not able to return error codes from all situations. However, the Rust API is able to do that, and so doesn't need the warnings printed to stderr. My first solution was to add a build-time option for whether the rsvg_internals library is being build for the C library, or for the Rust one.

In case we are building for the C library, the code calls rsvg_g_warning_from_c as usual.

But in case we are building for the Rust library, that code is a no-op.

This is the bit in rsvg_internals/Cargo.toml to declare the feature:

[features]
# Enables calling g_warning() when built as part of librsvg.so
c-library = []

And this is the corresponding code:

#[cfg(feature = "c-library")]
pub fn rsvg_g_warning(msg: &str) {
    unsafe {
        extern "C" {
            fn rsvg_g_warning_from_c(msg: *const libc::c_char);
        }

        rsvg_g_warning_from_c(msg.to_glib_none().0);
    }
}

#[cfg(not(feature = "c-library"))]
pub fn rsvg_g_warning(_msg: &str) {
    // The only callers of this are in handle.rs. When those functions
    // are called from the Rust API, they are able to return a
    // meaningful error code, but the C API isn't - so they issues a
    // g_warning() instead.
}

The first function is the one that is compiled when the c-library feature is enabled; this happens when building rsvg_internals to link into librsvg.so.

The second function does nothing; it is what is compiled when rsvg_internals is being used just from the librsvg_crate crate with the Rust API.

While this worked well, it meant that the internals library was built twice on each compilation run of the whole librsvg module: once for librsvg.so, and once for librsvg_crate.

Making programming errors a `g_critical`

While g_warning() means "something went wrong, but the program will continue", g_critical() means "there is a programming error". For historical reasons Glib does not abort when g_critical() is called, except by setting G_DEBUG=fatal-criticals, or by running a development version of Glib.

This commit turned warnings into critical errors when the C API was called out of order, by using a similar rsvg_g_critical_from_c() wrapper for a C macro.

Separating the C-callable code into yet another crate

To recapitulate, at that point we had this:

librsvg/
|  Cargo.toml - declares the Cargo workspace
|
+- rsvg_internals/
|  |  Cargo.toml
|  +- src/
|       c_api.rs - convert types and return values, call into implementation
|       handle.rs - actual implementation
|       *.rs - all the other internals
|
+- librsvg/
|    *.c - stub functions that call into Rust
|    rsvg-base.c - contains rsvg_g_warning_from_c() among others
|
+- librsvg_crate/
   |  Cargo.toml
   +- src/
   |    lib.rs - public Rust API
   +- tests/ - tests for the public Rust API
        *.rs

At this point c_api.rs with all the unsafe functions looked out of place. That code is only relevant to librsvg.so — the public C API —, not to the Rust API in librsvg_crate.

I started moving the C API glue to a separate librsvg_c_api crate that lives along with the C stubs:

+- librsvg/
|    *.c - stub functions that call into Rust
|    rsvg-base.c - contains rsvg_g_warning_from_c() among others
|    Cargo.toml
|    c_api.rs - what we had before

This made the dependencies look like the following:

      rsvg_internals
       ^           ^
       |             \
       |               \
librsvg_crate     librsvg_c_api
  (Rust API)             ^
                         |
                    librsvg.so
                      (C API)

And also, this made it possible to remove the configuration feature for rsvg_internals, since the code that calls rsvg_g_warning_from_c now lives in librsvg_c_api.

With that, rsvg_internals is compiled only once, as it should be.

This also helped clean up some code in the internals library. Deprecated functions that render SVGs directly to GdkPixbuf are now in librsvg_c_api and don't clutter the rsvg_internals library. All the GObject boilerplate is there as well now; rsvg_internals is mostly safe code except for the glue to libxml2.

Summary

It was useful to move all the code that dealt with incoming C types, our outgoing C return values and errors, into the same place, and separate it from the "pure Rust" code.

This took gradual refactoring and was not done in a single step, but it left the resulting Rust code rather nice and clean.

When we added a new public Rust API, we had to shuffle some code around that could only be linked in the context of a C library.

Compile-time configuration features are useful (like #ifdef in the C world), but they do cause double compilation if you need a C-internals and a Rust-internals library from the same code.

Having proper error reporting throughout the Rust code is a lot of work, but pretty much invaluable. The glue code to C can then convert and expose those errors as needed.

If you need both C and Rust APIs into the same code base, you may end up naturally using a facade pattern for each. It helps to gradually refactor the internals to be as "pure idiomatic Rust" as possible, while letting API idiosyncrasies bubble up to each individual facade.