The bug that caused two brown-paper-bag released in librsvg — because it was leaking all the SVG nodes — has been interesting.
Memory leaks in Rust? Isn't it supposed to prevent that?
Well, yeah, but the leaks were caused by the C side of things, and by
unsafe
code in Rust, which does not prevent leaks.
The first part of the bug was easy: C code started calling a
function implemented in Rust, which returns a newly-acquired reference
to an SVG node. The old code simply got a pointer to the node,
without acquiring a reference. The new code was forgetting to
rsvg_node_unref()
. No biggie.
The second part of the bug was trickier to find. The C code
was apparently calling all the functions to unref nodes as
appropriate, and even calling the rsvg_tree_free()
function in the
end; this is the "free the whole SVG tree" function.
There are these types:
// We take a pointer to this and expose it as an opaque pointer to C
pub enum RsvgTree {}
// This is the real structure we care about
pub struct Tree {
// This is the Rc that was getting leaked
pub root: Rc<Node>,
...
}
Tree
is the real struct that holds the root of the SVG tree and some
other data. Each node is an Rc<Node>
; the root node was getting
leaked (... and all the children, recursively) because its reference
count never went down from 1.
RsvgTree
is just an empty type. The code does an unsafe cast of
*const Tree
as *const RsvgTree
in order to expose a raw pointer to
the C code.
The rsvg_tree_free()
function, callable from C, looked like this:
#[no_mangle]
pub extern "C" fn rsvg_tree_free(tree: *mut RsvgTree) {
if !tree.is_null() {
let _ = unsafe { Box::from_raw(tree) };
// ^ this returns a Box<RsvgTree> which is an empty type!
}
}
When we call Box::from_raw()
on a *mut RsvgTree
, it gives us back
a Box<RsvgTree>
... which is a box of a zero-sized type. So, the program
frees zero memory when the box gets dropped.
The code was missing this cast:
let tree = unsafe { &mut *(tree as *mut Tree) };
// ^ this cast to the actual type inside the Box
let _ = unsafe { Box::from_raw(tree) };
So, tree as *mut Tree
gives us a value which will cause
Box::from_raw()
to return a Box<Tree>
, which is what we intended.
Dropping the box will drop the Tree
, reduce the last reference count
on the root node, and free all the nodes recursively.
Monitoring an Rc<T>
's reference count in gdb
So, how does one set a gdb watchpoint on the reference count?
First I set a breakpoint on a function which I knew would get passed
the Rc<Node>
I care about:
(gdb) b <rsvg_internals::structure::NodeSvg as rsvg_internals::node::NodeTrait>::set_atts
Breakpoint 3 at 0x7ffff71f3aaa: file rsvg_internals/src/structure.rs, line 131.
(gdb) c
Continuing.
Thread 1 "rsvg-convert" hit Breakpoint 3, <rsvg_internals::structure::NodeSvg as rsvg_internals::node::NodeTrait>::set_atts (self=0x646c60, node=0x64c890, pbag=0x64c820) at rsvg_internals/src/structure.rs:131
(gdb) p node
$5 = (alloc::rc::Rc<rsvg_internals::node::Node> *) 0x64c890
Okay, node
is a reference to an Rc<Node>
. What's inside?
(gdb) p *node
$6 = {ptr = {pointer = {__0 = 0x625800}}, phantom = {<No data fields>}}
Why, a pointer
to the actual contents of the Rc
. Look inside
again:
(gdb) p *node.ptr.pointer.__0
$9 = {strong = {value = {value = 3}}, weak = {value = {value = 1}}, ... and lots of extra crap ...
Aha! There are the strong
and weak
reference counts. So, set a
watchpoint on the strong reference count:
(gdb) set $ptr = &node.ptr.pointer.__0.strong.value.value
(gdb) watch *$ptr
Hardware watchpoint 4: *$ptr
Continue running the program until the reference count changes:
(gdb) continue
Thread 1 "rsvg-convert" hit Hardware watchpoint 4: *$ptr
Old value = 3
New value = 2
At this point I can print a stack trace and see if it makes sense, check that the refs/unrefs are matched, etc.
TL;DR: dig into the Rc<T>
until you find the reference count, and
watch it. It's wrapped in several layers of Rust-y types; NonNull
pointers, an RcBox
for the actual container of the refcount plus the
object it's wrapping, and Cell
s for the refcount values. Just dig
until you reach the refcount values and they are there.
So, how did I find the missing cast?
Using that gdb recipe, I watched the reference count of the toplevel SVG node change until the program exited. When the program terminated, the reference count was 1 — it should have dropped to 0 if there was no memory leak.
The last place where the toplevel node loses a reference is in
rsvg_tree_free()
. I ran the program again and checked if that
function was being called; it was being called correctly. So I knew
that the problem must lie in that function. After a little
head-scratching, I found the missing cast. Other functions of the
form rsvg_tree_whatever()
had that cast, but rsvg_tree_free()
was
missing it.
I think Rust now has better facilities to tag structs that are exposed
as raw pointers to extern
code, to avoid this kind of perilous
casting. We'll see.
In the meantime, apologies for the buggy releases!