Refactoring allowed URLs in librsvg

- Tags: gnome, librsvg, rust

While in the middle of converting librsvg's code that processes XML from C to Rust, I went into a digression that has to do with the way librsvg decides which files are allowed to be referenced from within an SVG.

Resource references in SVG

SVG files can reference other files, i.e. they are not self-contained. For example, there can be an element like <image xlink:href="foo.png">, or one can request that a sub-element of another SVG be included with <use xlink:href="secondary.svg#foo">. Finally, there is the xi:include mechanism to include chunks of text or XML into another XML file.

Since librsvg is sometimes used to render untrusted files that come from the internet, it needs to be careful not to allow those files to reference any random resource on the filesystem. We don't want something like <text><xi:include href="/etc/passwd" parse="text"/></text> or something equally nefarious that would exfiltrate a random file into the rendered output.

Also, want to catch malicious SVGs that want to "phone home" by referencing a network resource like <image xlink:href="http://evil.com/pingback.jpg">.

So, librsvg is careful to have a single place where it can load secondary resources, and first it validates the resource's URL to see if it is allowed.

The actual validation rules are not very important for this discussion; they are something like "no absolute URLs allowed" (so you can't request /etc/passwd, "only siblings or (grand)children of siblings allowed" (so foo.svg can request bar.svg and subdir/bar.svg, but not ../../bar.svg).

The code

There was a central function rsvg_io_acquire_stream() which took a URL as a string. The code assumed that that URL had been first validated with a function called allow_load(url). While the code's structure guaranteed that all the places that may acquire a stream would actually go through allow_load() first, the structure of the code in Rust made it possible to actually make it impossible to acquire a disallowed URL.

Before:

pub fn allow_load(url: &str) -> bool;

pub fn acquire_stream(url: &str, ...) -> Result<gio::InputStream, glib::Error>;

pub fn rsvg_acquire_stream(url: &str, ...) -> Result<gio::InputStream, LoadingError> {
    if allow_load(url) {
        acquire_stream(url, ...)?
    } else {
        Err(LoadingError::NotAllowed)
    }
}

The refactored code now has an AllowedUrl type that encapsulates a URL, plus the promise that it has gone through these steps:

  • The URL has been run through a URL well-formedness parser.
  • The resource is allowed to be loaded following librsvg's rules.
pub struct AllowedUrl(Url);  // from the Url parsing crate

impl AllowedUrl {
    pub fn from_href(href: &str) -> Result<AllowedUrl, ...> {
        let parsed = Url::parse(href)?; // may return LoadingError::InvalidUrl

        if allow_load(parsed) {
            Ok(AllowedUrl(parsed))
        } else {
            Err(LoadingError::NotAllowed)
        }
    }
}

// new prototype
pub fn acquire_stream(url: &AllowedUrl, ...) -> Result<gio::InputStream, glib::Error>;

This forces callers to validate the URLs as soon as possible, right after they get them from the SVG file. Now it is not possible to request a stream unless the URL has been validated first.

Plain URIs vs. fragment identifiers

Some of the elements in SVG that reference other data require full files:

<image xlink:href="foo.png" ...>      <!-- no fragments allowed -->

And some others, that reference particular elements in secondary SVGs, require a fragment ID:

<use xlink:href="icons.svg#app_name" ...>   <!-- fragment id required -->

And finally, the feImage element, used to paste an image as part of a filter effects pipeline, allows either:

<!-- will use that image -->
<feImage xlink:href="foo.png" ...>

<!-- will render just this element from an SVG and use it as an image -->
<feImage xlink:href="foo.svg#element">

So, I introduced a general Href parser :

pub enum Href {
    PlainUri(String),
    WithFragment(Fragment),
}

/// Optional URI, mandatory fragment id
pub struct Fragment(Option<String>, String);

The parts of the code that absolutely require a fragment id now take a Fragment. Parts which require a PlainUri can unwrap that case.

The next step is making those structs contain an AllowedUrl directly, instead of just strings, so that for callers, obtaining a fully validated name is a one-step operation.

In general, the code is moving towards a scheme where all file I/O is done at loading time. Right now, some of those external references get resolved at rendering time, which is somewhat awkward (for example, at rendering time the caller has no chance to use a GCancellable to cancel loading). This refactoring to do early validation is leaving the code in a very nice state.