While in the middle of converting librsvg's code that processes XML from C to Rust, I went into a digression that has to do with the way librsvg decides which files are allowed to be referenced from within an SVG.
Resource references in SVG
SVG files can reference other files, i.e. they are not
self-contained. For example, there can be an element like <image
xlink:href="foo.png">
, or one can request that a sub-element of
another SVG be included with <use xlink:href="secondary.svg#foo">
.
Finally, there is the xi:include
mechanism to include chunks of text
or XML into another XML file.
Since librsvg is sometimes used to render untrusted files that come from
the internet, it needs to be careful not to allow those files to
reference any random resource on the filesystem. We don't want
something like
<text><xi:include href="/etc/passwd" parse="text"/></text>
or something equally nefarious that would exfiltrate a random file
into the rendered output.
Also, want to catch malicious SVGs that want to "phone home" by
referencing a network resource like
<image xlink:href="http://evil.com/pingback.jpg">
.
So, librsvg is careful to have a single place where it can load secondary resources, and first it validates the resource's URL to see if it is allowed.
The actual validation rules are not very important for this
discussion; they are something like "no absolute URLs allowed" (so you
can't request /etc/passwd
, "only siblings or (grand)children of
siblings allowed" (so foo.svg
can request bar.svg
and
subdir/bar.svg
, but not ../../bar.svg
).
The code
There was a central function rsvg_io_acquire_stream()
which took a
URL as a string. The code assumed that that URL had been first
validated with a function called allow_load(url)
. While the code's
structure guaranteed that all the places that may acquire a stream
would actually go through allow_load()
first, the structure of the
code in Rust made it possible to actually make it impossible to
acquire a disallowed URL.
Before:
pub fn allow_load(url: &str) -> bool;
pub fn acquire_stream(url: &str, ...) -> Result<gio::InputStream, glib::Error>;
pub fn rsvg_acquire_stream(url: &str, ...) -> Result<gio::InputStream, LoadingError> {
if allow_load(url) {
acquire_stream(url, ...)?
} else {
Err(LoadingError::NotAllowed)
}
}
The refactored code now has an AllowedUrl
type that encapsulates a
URL, plus the promise that it has gone through these steps:
- The URL has been run through a URL well-formedness parser.
- The resource is allowed to be loaded following librsvg's rules.
pub struct AllowedUrl(Url); // from the Url parsing crate
impl AllowedUrl {
pub fn from_href(href: &str) -> Result<AllowedUrl, ...> {
let parsed = Url::parse(href)?; // may return LoadingError::InvalidUrl
if allow_load(parsed) {
Ok(AllowedUrl(parsed))
} else {
Err(LoadingError::NotAllowed)
}
}
}
// new prototype
pub fn acquire_stream(url: &AllowedUrl, ...) -> Result<gio::InputStream, glib::Error>;
This forces callers to validate the URLs as soon as possible, right after they get them from the SVG file. Now it is not possible to request a stream unless the URL has been validated first.
Plain URIs vs. fragment identifiers
Some of the elements in SVG that reference other data require full files:
<image xlink:href="foo.png" ...> <!-- no fragments allowed -->
And some others, that reference particular elements in secondary SVGs, require a fragment ID:
<use xlink:href="icons.svg#app_name" ...> <!-- fragment id required -->
And finally, the feImage
element, used to paste an image as part of
a filter effects pipeline, allows either:
<!-- will use that image -->
<feImage xlink:href="foo.png" ...>
<!-- will render just this element from an SVG and use it as an image -->
<feImage xlink:href="foo.svg#element">
So, I introduced a general Href
parser :
pub enum Href {
PlainUri(String),
WithFragment(Fragment),
}
/// Optional URI, mandatory fragment id
pub struct Fragment(Option<String>, String);
The parts of the code that absolutely require a fragment id now take a
Fragment
. Parts which require a PlainUri
can unwrap that case.
The next step is making those structs contain an AllowedUrl
directly, instead of just strings, so that for callers, obtaining a
fully validated name is a one-step operation.
In general, the code is moving towards a scheme where all file I/O is
done at loading time. Right now, some of those external references
get resolved at rendering time, which is somewhat awkward (for
example, at rendering time the caller has no chance to use a
GCancellable
to cancel loading). This refactoring to do early
validation is leaving the code in a very nice state.