To continue with last time's topic, let's see how to make librsvg's DOM nodes smaller in memory. Since that time, there have been some changes to the code; that is why in this post some of the type names are different from last time's.
Every SVG element is represented with this struct:
pub struct Element {
element_type: ElementType,
element_name: QualName,
id: Option<String>,
class: Option<String>,
specified_values: SpecifiedValues,
important_styles: HashSet<QualName>,
result: ElementResult,
transform: Transform,
values: ComputedValues,
cond: bool,
style_attr: String,
element_impl: Box<dyn ElementTrait>,
}
The two biggest fields are the ones with types SpecifiedValues
and
ComputedValues
. These are the sizes of the whole Element
struct
and those two types:
sizeof Element: 1808
sizeof SpecifiedValues: 824
sizeof ComputedValues: 704
In this post, we'll reduce the size of SpecifiedValues
.
What is SpecifiedValues?
If we have an element like this:
<circle cx="10" cy="10" r="10" stroke-width="4" stroke="blue"/>
The values of the style properties stroke-width
and stroke
get
stored in a SpecifiedValues
struct. This struct has a bunch of
fields, one for each possible style property:
pub struct SpecifiedValues {
baseline_shift: SpecifiedValue<BaselineShift>,
clip_path: SpecifiedValue<ClipPath>,
clip_rule: SpecifiedValue<ClipRule>,
/// ...
stroke: SpecifiedValue<Stroke>,
stroke_width: SpecifiedValue<StrokeWidth>,
/// ...
}
Each field is a SpecifiedValue<T>
for the following reason. In
CSS/SVG, a style property can be unspecified, or it can have an
inherit
value to force the property to be copied from the element's
parent, or it can actually have a specified value. Librsvg represents
these as follows:
pub enum SpecifiedValue<T>
where
T: // some trait bounds here
{
Unspecified,
Inherit,
Specified(T),
}
Now, SpecifiedValues
has a bunch of fields, 47 of them to be exact —
one for each of the style properties that librsvg supports. That is
why SpecifiedValues
has a size of 824 bytes; it is the largest
sub-structure within Element
, and it would be good to reduce its
size.
Not all properties are specified
Let's go back to the chunk of SVG from above:
<circle cx="10" cy="10" r="10" stroke-width="4" stroke="blue"/>
Here we only have two specified properties, so the stroke_width
and
stroke
fields of SpecifiedValues
will be set as
SpecifiedValue::Specified(something)
and all the other fields will
be left as SpecifiedValue::Unspecified
.
It would be good to store only complete values for the properties that are specified, and just a small flag for unset properties.
Another way to represent the set of properties
Since there is a maximum of 47 properties per element (or more if librsvg adds support for extra ones), we can have a small array of 47 bytes. Each byte contains the index within another array that contains only the values of specified properties, or a sentinel value for properties that are unset.
First, I made an enum that fits in a u8
for all the properties, plus
the sentinel value, which also gives us the total number of
properties. The #[repr(u8)]
guarantees that this enum fits in a
byte.
#[repr(u8)]
enum PropertyId {
BaselineShift,
ClipPath,
ClipRule,
Color,
// ...
WritingMode,
XmlLang,
XmlSpace,
UnsetProperty, // the number of properties and also the sentinel value
}
Also, since before these changes there was the following monster to represent "which property is this" plus the property's value:
pub enum ParsedProperty {
BaselineShift(SpecifiedValue<BaselineShift>),
ClipPath(SpecifiedValue<ClipPath>),
ClipRule(SpecifiedValue<ClipRule>),
Color(SpecifiedValue<Color>),
// ...
}
I changed the definition of SpecifiedValues
to have two arrays, one
to store which properties are specified, and another only with the
values for the properties that are actually specified:
pub struct SpecifiedValues {
indices: [u8; PropertyId::UnsetProperty as usize],
props: Vec<ParsedProperty>,
}
There is a thing that is awkward in Rust, or which I haven't found how
to solve in a nicer way: given a ParsedProperty
, find the
corresponding PropertyId
for its discriminant. I did the obvious
thing:
impl ParsedProperty {
fn get_property_id(&self) -> PropertyId {
use ParsedProperty::*;
match *self {
BaselineShift(_) => PropertyId::BaselineShift,
ClipPath(_) => PropertyId::ClipPath,
ClipRule(_) => PropertyId::ClipRule,
Color(_) => PropertyId::Color,
// ...
}
}
}
Initialization
First, we want to initialize an empty SpecifiedValues
, where every
element of the the indices
array is set to the sentinel value that
means that the corresponding property is not set:
impl Default for SpecifiedValues {
fn default() -> Self {
SpecifiedValues {
indices: [PropertyId::UnsetProperty.as_u8(); PropertyId::UnsetProperty as usize],
props: Vec::new(),
}
}
}
That sets the indices
field to an array full of the same
PropertyId::UnsetProperty
sentinel value. Also, the props
array
is empty; it hasn't even had a block of memory allocated for it yet.
That way, SVG elements without style properties don't use any extra
memory.
Which properties are specified and what are their indices?
Second, we want a function that will give us the index in props
for
some property, or that will tell us if the property has not been set
yet:
impl SpecifiedValues {
fn property_index(&self, id: PropertyId) -> Option<usize> {
let v = self.indices[id.as_usize()];
if v == PropertyId::UnsetProperty.as_u8() {
None
} else {
Some(v as usize)
}
}
}
(If someone passes id = PropertyId::UnsetProperty
, the array access
to indices
will panic, which is what we want, since that is not a
valid property id.)
Change a property's value
Third, we want to set the value of a property that has not been set, or change the value of one that was already specified:
impl SpecifiedValues {
fn replace_property(&mut self, prop: &ParsedProperty) {
let id = prop.get_property_id();
if let Some(index) = self.property_index(id) {
self.props[index] = prop.clone();
} else {
self.props.push(prop.clone());
let pos = self.props.len() - 1;
self.indices[id.as_usize()] = pos as u8;
}
}
}
In the first case in the if
, the property was already set and we
just replace its value. In the second case, the property was not set;
we add it to the props
array and store its resulting index in
indices
.
Results
Before:
sizeof Element: 1808
sizeof SpecifiedValues: 824
After:
sizeof Element: 1056
sizeof SpecifiedValues: 72
The pathological file from the last time used 463,412,720 bytes in memory before these changes. After the changes, it uses 314,526,136 bytes.
I also measured memory consumption for a normal file, in this case one with a bunch of GNOME's symbolic icons. The old version uses 17 MB; the new version only 13 MB.
How to keep fine-tuning this
For now, I am satisfied with SpecifiedValues
, although it could
still be made smaller:
-
The crate tagged-box converts an enum like
ParsedProperty
into an enum-of-boxes, and codifies the enum's discriminant into the box's pointer. This way each variant occupies the minimum possible memory, although in a separately-allocated block, and the container itself uses only a pointer. I am not sure if this is worth it; eachParsedProperty
is 64 bytes, but the flat arrayprops: Vec<ParsedProperty>
is very appealing in a single block of memory. I have not checked the sizes of each individual property to see if they vary a lot among them. -
Look for a crate that lets us have the properties in a single memory block, a kind of arena with variable types. This can be implemented with a bit of
unsafe
, but one has to be careful with the alignment of different types. -
The crate enum_set2 represents an array of field-less enums as a compact bit array. If we changed the representation of
SpecifiedValue
, this would reduce theindices
array to a minimum.
If someone wants to dedicate some time to implement and measure this, I would be very grateful.
Next steps
According to Massif, the next thing is to keep making Element
smaller. The next thing to shrink is ComputedValues
. The obvious
route is to do exactly the same as I did for SpecifiedValues
. I am
not sure if it would be better to try to share the style
structs between elements.