Our graphics stack that deals with images has evolved a lot over the years.

In ye olden days

In the context of GIMP/GNOME, the only thing that knew how to draw RGB images to X11 windows (doing palette mapping for 256-color graphics cards and dithering if necessary) was the GIMP. Later, when GTK+ was written, it exported a GtkPreview widget, which could take an RGB image buffer supplied by the application and render it to an X window — this was what GIMP plug-ins could use in their user interface to show, well, previews of what they were about to do with the user's images. Later we got some obscure magic in a GdkColorContext object, which helped allocate X11 colors for the X drawing primitives. In turn, GdkColorContext came from the port that Miguel and I did of XmHTML's color context object (and for those that remember, XmHTML became the first version of GtkHtml; later it was rewritten as a port of KDE's HTML widget). Thankfully all that stuff is gone now; we can now assume that video cards are 24-bit RGB or better everywhere, and there is no need to worry about limited color palettes and color allocation.

Later, we started using the Imlib library, from the Enlightenment project, as an easy API to load images — the APIs from libungif, libjpeg, libpng, etc. were not something one really wanted to use directly — and also to keep images in memory with a uniform representation. Unfortunately, Imlib's memory management was peculiar, as it was tied to Enlightenment's model for caching and rendering loaded/scaled images.

A bunch of people worked to write GdkPixbuf: it kept Imlib's concepts of a unified representation for image data, and an easy API to load various image formats. It added support for an alpha channel (we only had 1-bit masks before), and it put memory management in the hands of the calling application, in the form of reference counting. GdkPixbuf obtained some high-quality scaling functions, mainly for use by Eye Of Gnome (our image viewer) and by applications that just needed scaling instead of arbitrary transformations.

Later, we got libart, the first library in GNOME to do antialiased vector rendering and affine transformations. Libart was more or less compatible with GdkPixbuf: they both had the same internal representation for pixel data, but one had to pass the pixels/width/height/rowstride around by hand.

Mea culpa

Back then I didn't understand premultiplied alpha, which is now ubiquitous. The GIMP made the decision to use non-premultiplied alpha when it introduced layers with transparency, probably to "avoid losing data" from transparent pixels. GdkPixbuf follows the same scheme.

(Now that the GIMP uses GEGL for its internal representation of images... I have no idea what it does with respect to alpha.)

Cairo and afterwards

Some time after the libart days, we got Cairo and pixman. Cairo had a different representation of images than GdkPixbuf's, and it supported more pixel formats and color models.

GTK2 got patched to use Cairo in the toplevel API. We still had a dichotomy between Cairo's image surfaces, which are ARGB premultiplied data in memory, and GdkPixbufs, which are RGBA non-premultiplied. There are utilities in GTK+ to do these translations, but they are inconvenient: every time a program loads an image with GdkPixbuf's easy API, a translation has to happen from non-premul RGBA to premul ARGB.

Having two formats means that we inevitably do translations back and forth of practically the same data. For example, when one embeds a JPEG inside an SVG, librsvg will read that JPEG using GdkPixbuf, translate it to Cairo's representation, composite it with Cairo onto the final result, and finally translate the whole thing back to a GdkPixbuf... if someone uses librsvg's legacy APIs to output pixbufs instead of rendering directly to a Cairo surface.

Who uses that legacy API? GTK+, of course! GTK+ loads scalable SVG icons with GdkPixbuf's loader API, which dynamically links librsvg at runtime: in effect, GTK+ doesn't use librsvg directly. And the SVG pixbuf loader uses the "gimme a pixbuf" API in librsvg.

GPUs

Then, we got GPUs everywhere. Each GPU has its own preferred pixel format. Image data has to be copied to the GPU at some point. Cairo's ARGB needs to be translated to the GPU's preferred format and alignment.

Summary so far

Libraries that load images from standard formats have different output formats. Generally they can be coaxed into spitting ARGB or RGBA, but we don't expect them to support any random representation that a GPU may want.
GdkPixbuf uses non-premultiplied RGBA data, always in that order.
Cairo uses premultiplied ARGB in platform-endian 32-bit chunks: if each pixel is 0xaarrggbb, then the bytes are shuffled around depending on whether the platform is little-endian or big-endian.
Cairo internally uses a subset of the formats supported by pixman.
GPUs use whatever they damn well please.
Hilarity ensues.

What would we like to do?

We would like to reduce the number of translations between image formats along the loading-processing-display pipeline. Here is a plan:

Make sure Cairo/pixman support the image formats that GPUs generally prefer. Have them do the necessary conversions if the rest of the program passes an unsupported format. Ensure that a Cairo image surface can be created with the GPU's preferred format.
Make GdkPixbuf just be a wrapper around a Cairo image surface. GdkPixbuf is already an opaque structure, and it already knows how to copy pixel data in case the calling code requests it, or wants to turn a pixbuf from immutable to mutable.
Provide GdkPixbuf APIs that deal with Cairo image surfaces. For example, deprecate gdk_pixbuf_new() and gdk_pixbuf_new_from_data(), in favor of a new gdk_pixbuf_new_from_cairo_image_surface(). Instead of gdk_pixbuf_get_pixels() and related functions, have gdk_pixbuf_get_cairo_image_surface(). Mark the "give me the pixel data" functions as highly discouraged, and only for use really by applications that want to use GdkPixbuf as an image loader and little else.
Remove calls in GTK+ that cause image conversions; make them use Cairo image surfaces directly, from GdkTexture up.
Audit applications to remove calls that cause image conversions. Generally, look for where they use GdkPixbuf's deprecated APIs and update them.

Is this really a performance problem?

This is in the "excess work" category of performance issues. All those conversions are not really slow (they don't make up for the biggest part of profiles), but they are nevertheless things that we could avoid doing. We may get some speedups, but it's probably more interesting to look at things like power consumption.

Right now I'm seeing this as a cool, minor optimization, but more as a way to gradually modernize our image API.

We seem to change imaging models every N years (X11 -> libart -> Cairo -> render trees in GPUs -> ???). It is very hard to change applications to use different APIs. In the meantime, we can provide a more linear path for image data, instead of doing unnecessary conversions everywhere.

Code

I have a use-cairo-surface-internally branch in gdk-pixbuf, which I'll be working on this week. Meanwhile, you may be interested in the ongoing Performance Hackfest in Cambridge!