Debugging an Rc<T> reference leak in Rust

- Tags: gnome, librsvg, rust

The bug that caused two brown-paper-bag released in librsvg — because it was leaking all the SVG nodes — has been interesting.

Memory leaks in Rust? Isn't it supposed to prevent that?

Well, yeah, but the leaks were caused by the C side of things, and by unsafe code in Rust, which does not prevent leaks.

The first part of the bug was easy: C code started calling a function implemented in Rust, which returns a newly-acquired reference to an SVG node. The old code simply got a pointer to the node, without acquiring a reference. The new code was forgetting to rsvg_node_unref(). No biggie.

The second part of the bug was trickier to find. The C code was apparently calling all the functions to unref nodes as appropriate, and even calling the rsvg_tree_free() function in the end; this is the "free the whole SVG tree" function.

There are these types:

// We take a pointer to this and expose it as an opaque pointer to C
pub enum RsvgTree {}

// This is the real structure we care about
pub struct Tree {
    // This is the Rc that was getting leaked
    pub root: Rc<Node>,
    ...
}

Tree is the real struct that holds the root of the SVG tree and some other data. Each node is an Rc<Node>; the root node was getting leaked (... and all the children, recursively) because its reference count never went down from 1.

RsvgTree is just an empty type. The code does an unsafe cast of *const Tree as *const RsvgTree in order to expose a raw pointer to the C code.

The rsvg_tree_free() function, callable from C, looked like this:

#[no_mangle]
pub extern "C" fn rsvg_tree_free(tree: *mut RsvgTree) {
    if !tree.is_null() {
        let _ = unsafe { Box::from_raw(tree) };
                         // ^ this returns a Box<RsvgTree> which is an empty type!
    }
}

When we call Box::from_raw() on a *mut RsvgTree, it gives us back a Box<RsvgTree>... which is a box of a zero-sized type. So, the program frees zero memory when the box gets dropped.

The code was missing this cast:

    let tree = unsafe { &mut *(tree as *mut Tree) };
                                 // ^ this cast to the actual type inside the Box
    let _ = unsafe { Box::from_raw(tree) };

So, tree as *mut Tree gives us a value which will cause Box::from_raw() to return a Box<Tree>, which is what we intended. Dropping the box will drop the Tree, reduce the last reference count on the root node, and free all the nodes recursively.

Monitoring an Rc<T>'s reference count in gdb

So, how does one set a gdb watchpoint on the reference count?

First I set a breakpoint on a function which I knew would get passed the Rc<Node> I care about:

(gdb) b <rsvg_internals::structure::NodeSvg as rsvg_internals::node::NodeTrait>::set_atts
Breakpoint 3 at 0x7ffff71f3aaa: file rsvg_internals/src/structure.rs, line 131.

(gdb) c
Continuing.

Thread 1 "rsvg-convert" hit Breakpoint 3, <rsvg_internals::structure::NodeSvg as rsvg_internals::node::NodeTrait>::set_atts (self=0x646c60, node=0x64c890, pbag=0x64c820) at rsvg_internals/src/structure.rs:131

(gdb) p node
$5 = (alloc::rc::Rc<rsvg_internals::node::Node> *) 0x64c890

Okay, node is a reference to an Rc<Node>. What's inside?

(gdb) p *node
$6 = {ptr = {pointer = {__0 = 0x625800}}, phantom = {<No data fields>}}

Why, a pointer to the actual contents of the Rc. Look inside again:

(gdb) p *node.ptr.pointer.__0
$9 = {strong = {value = {value = 3}}, weak = {value = {value = 1}},  ... and lots of extra crap ...

Aha! There are the strong and weak reference counts. So, set a watchpoint on the strong reference count:

(gdb) set $ptr = &node.ptr.pointer.__0.strong.value.value
(gdb) watch *$ptr
Hardware watchpoint 4: *$ptr

Continue running the program until the reference count changes:

(gdb) continue
Thread 1 "rsvg-convert" hit Hardware watchpoint 4: *$ptr

Old value = 3
New value = 2

At this point I can print a stack trace and see if it makes sense, check that the refs/unrefs are matched, etc.

TL;DR: dig into the Rc<T> until you find the reference count, and watch it. It's wrapped in several layers of Rust-y types; NonNull pointers, an RcBox for the actual container of the refcount plus the object it's wrapping, and Cells for the refcount values. Just dig until you reach the refcount values and they are there.

So, how did I find the missing cast?

Using that gdb recipe, I watched the reference count of the toplevel SVG node change until the program exited. When the program terminated, the reference count was 1 — it should have dropped to 0 if there was no memory leak.

The last place where the toplevel node loses a reference is in rsvg_tree_free(). I ran the program again and checked if that function was being called; it was being called correctly. So I knew that the problem must lie in that function. After a little head-scratching, I found the missing cast. Other functions of the form rsvg_tree_whatever() had that cast, but rsvg_tree_free() was missing it.

I think Rust now has better facilities to tag structs that are exposed as raw pointers to extern code, to avoid this kind of perilous casting. We'll see.

In the meantime, apologies for the buggy releases!