I want to write a braindump on the stuff that I remember from gdk-pixbuf's history. There is some talk about replacing it with something newer; hopefully this history will show some things that worked, some that didn't, and why.
The beginnings
Gdk-pixbuf started as a replacement for Imlib, the image loading and rendering library that GNOME used in its earliest versions. Imlib came from the Enlightenment project; it provided an easy API around the idiosyncratic libungif, libjpeg, libpng, etc., and it maintained decoded images in memory with a uniform representation. Imlib also worked as an image cache for the Enlightenment window manager, which made memory management very inconvenient for GNOME.
Imlib worked well as a "just load me an image" library. It showed that a small, uniform API to load various image formats into a common representation was desirable. And in those days, hiding all the complexities of displaying images in X was very important indeed.
The initial API
Gdk-pixbuf replaced Imlib, and added two important features: reference counting for image data, and support for an alpha channel.
Gdk-pixbuf appeared with support for RGB(A) images. And although in
theory it was possible to grow the API to support other
representations, GdkColorspace
never acquired anything other than
GDK_COLORSPACE_RGB
, and the bits_per_sample
argument to some
functions only ever supported being 8
. The presence or absence of an alpha
channel was done with a gboolean
argument in conjunction with that
single GDK_COLORSPACE_RGB
value; we didn't have something like
cairo_format_t
which actually specifies the pixel format in single
enum values.
While all the code in gdk-pixbuf carefully checks that those conditions are met — RGBA at 8 bits per channel —, some applications inadvertently assume that that is the only possible case, and would get into trouble really fast if gdk-pixbuf ever started returning pixbufs with different color spaces or depths.
One can still see the battle between bilevel-alpha vs. continuous-alpha in this enum:
typedef enum
{
GDK_PIXBUF_ALPHA_BILEVEL,
GDK_PIXBUF_ALPHA_FULL
} GdkPixbufAlphaMode;
Fortunately, only the "render this pixbuf with alpha to an Xlib drawable" functions take values of this type: before the Xrender days, it was a Big Deal to draw an image with alpha to an X window, and applications often opted to use a bitmask instead, even if they had jagged edges as a result.
Pixel formats
The only pixel format that ever got implemented was unpremultiplied RGBA on all platforms. Back then I didn't understand premultiplied alpha! Also, the GIMP followed that scheme, and copying it seemed like the easiest thing.
After gdk-pixbuf, libart also copied that pixel format, I think.
But later we got Cairo, Pixman, and all the Xrender stack. These prefer premultiplied ARGB. Moreover, Cairo prefers it if each pixel is actually a 32-bit value, with the ARGB values inside it in platform-endian order. So if you look at a memory dump, a Cairo pixel looks like BGRA on a little-endian box, while it looks like ARGB on a big-endian box.
Every time we paint a GdkPixbuf
to a cairo_t
, there is a
conversion from unpremultiplied RGBA to premultiplied, platform-endian
ARGB. I talked a bit about this in Reducing the number of image
copies in GNOME.
The loading API
The public loading API in gdk-pixbuf, and its relationship to loader plug-ins, evolved in interesting ways.
At first the public API and loaders only implemented load_from_file
:
you gave the library a FILE *
and it gave you back a GdkPixbuf
.
Back then we didn't have a robust MIME sniffing framework in the form
of a library, so gdk-pixbuf got its own. This lives in the
mostly-obsolete GdkPixbufFormat
machinery; it
even has its own little language for sniffing file headers!
Nowadays we do most MIME sniffing with GIO.
After the intial load_from_file
API... I think we got progressive
loading first, and animation support aftewards.
Progressive loading
This where the calling program feeds chunks of bytes to the library,
and at the end a fully-formed GdkPixbuf
comes out, instead of having
a single "read a whole file" operation.
We conflated this with a way to get updates on how the image area gets modified as the data gets parsed. I think we wanted to support the case of a web browser, which downloads images slowly over the network, and gradually displays them as they are downloaded. In 1998, images downloading slowly over the network was a real concern!
It took a lot of very careful work to convert the image loaders, which parsed a whole file at a time, into loaders that could maintain some state between each time that they got handed an extra bit of buffer.
It also sounded easy to implement the progressive updating API by simply emitting a signal that said, "this rectangular area got updated from the last read". It could handle the case of reading whole scanlines, or a few pixels, or even area-based updates for progressive JPEGs and PNGs.
The internal API for the image format loaders still keeps a
distinction between the "load a whole file" API and the "load an image
in chunks". Not all loaders got redone to simply just use the second
one: io-jpeg.c
still implements loading whole files by calling the
corresponding libjpeg functions. I think it could remove that code
and use the progressive loading functions instead.
Animations
Animations: we followed the GIF model for animations, in which each frame overlays the previous one, and there's a delay set between each frame. This is not a video file; it's a hacky flipbook.
However, animations presented the problem that the whole gdk-pixbuf API was meant for static images, and now we needed to support multi-frame images as well.
We defined the "correct" way to use the gdk-pixbuf library as to
actually try to load an animation, and then see if it is a
single-frame image, in which case you can just get a GdkPixbuf
for
the only frame and use it.
Or, if you got an animation, that would be a GdkPixbufAnimation
object, from which you could ask for an iterator to get each frame as
a separate GdkPixbuf
.
However, the progressive updating API never got extended to really
support animations. So, we have awkward functions like
gdk_pixbuf_animation_iter_on_currently_loading_frame()
instead.
Necessary accretion
Gdk-pixbuf got support for saving just a few formats: JPEG, PNG, TIFF, ICO, and some of the formats that are implemented with the Windows-native loaders.
Over time gdk-pixbuf got support for preserving some metadata-ish chunks from formats that provide it: DPI, color profiles, image comments, hotspots for cursors/icons...
While an image is being loaded with the progressive loaders, there is a clunky way to specify that one doesn't want the actual size of the image, but another size instead. The loader can handle that situation itself, hopefully if an image format actually embeds different sizes in it. Or if not, the main loading code will rescale the full loaded image into the size specified by the application.
Historical cruft
GdkPixdata
- a way to embed binary image data in executables, with a
funky encoding. Nowadays it's just easier to directly store a PNG or
JPEG or whatever in a GResource
.
contrib/gdk-pixbuf-xlib
- to deal with old-style X drawables.
Hopefully mostly unused now, but there's a good number of mostly old,
third-party software that still uses gdk-pixbuf as an image loader and
renderer to X drawables.
gdk-pixbuf-transform.h
- Gdk-pixbuf had some very high-quality
scaling functions, which the original versions of EOG used for the
core of the image viewer. Nowadays Cairo is the preferred way of
doing this, since it not only does scaling, but general affine
transformations as well. Did you know that
gdk_pixbuf_composite_color
takes 17 arguments, and it can composite
an image with alpha on top of a checkerboard? Yes, that used to be
the core of EOG.
Debatable historical cruft
gdk_pixbuf_get_pixels()
. This lets the program look into the actual
pixels of a loaded pixbuf, and modify them. Gdk-pixbuf just did not
have a concept of immutability.
Back in GNOME 1.x / 2.x, when it was fashionable to put icons beside menu items, or in toolbar buttons, applications would load their icon images, and modify them in various ways before setting them onto the corresponding widgets. Some things they did: load a colorful icon, desaturate it for "insensitive" command buttons or menu items, or simulate desaturation by compositing a 1x1-pixel checkerboard on the icon image. Or lighten the icon and set it as the "prelight" one onto widgets.
The concept of "decode an image and just give me the pixels" is of course useful. Image viewers, image processing programs, and all those, of course need this functionality.
However, these days GTK would prefer to have a way to decode an image, and ship it as fast as possible ot the GPU, without intermediaries. There is all sorts of awkward machinery in the GTK widgets that can consume either an icon from an icon theme, or a user-supplied image, or one of the various schemes for providing icons that GTK has acquired over the years.
It is interesting to note that gdk_pixbuf_get_pixels()
was available
pretty much since the beginning, but it was only until much later that
we got gdk_pixbuf_get_pixels_with_length()
, the "give me the guchar
*
buffer and also its length" function, so that calling code has a
chance of actually checking for buffer overruns. (... and it is one
of the broken "give me a length" functions that returns a guint
rather than a gsize
. There is a better
gdk_pixbuf_get_byte_length()
which actually returns a gsize
,
though.)
Problems with mutable pixbufs
The main problem is that as things are right now, we have no flexibility in changing the internal representation of image data to make it better for current idioms: GPU-specific pixel formats may not be unpremultiplied RGBA data.
We have no API to say, "this pixbuf has been modified", akin to
cairo_surface_mark_dirty()
: once an application calls
gdk_pixbuf_get_pixels()
, gdk-pixbuf or GTK have to assume that the
data will be changed and they have to re-run the pipeline to send
the image to the GPU (format conversions? caching? creating a
texture?).
Also, ever since the beginnings of the gdk-pixbuf API, we had a way to
create pixbufs from arbitrary user-supplied RGBA buffers: the
gdk_pixbuf_new_from_data
functions. One problem with this scheme is
that memory management of the buffer is up to the calling application,
so the resulting pixbuf isn't free to handle those resources as it
pleases.
A relatively recent addition is gdk_pixbuf_new_from_bytes()
, which
takes a GBytes
buffer instead of a random guchar *
. When a pixbuf
is created that way, it is assumed to be immutable, since a GBytes
is basically a shared reference into a byte buffer, and it's just
easier to think of it as immutable. (Nothing in C actually enforces
immutability, but the API indicates that convention.)
Internally, GdkPixbuf
actually prefers to be created from a
GBytes
. It will downgrade itself to a guchar *
buffer if
something calls the old gdk_pixbuf_get_pixels()
; in the best case,
that will just take ownership of the internal buffer from the
GBytes
(if the GBytes
has a single reference count); in the worst
case, it will copy the buffer from the GBytes
and retain ownership
of that copy. In either case, when the pixbuf downgrades itself to
pixels, it is assumed that the calling application will modify the
pixel data.
What would immutable pixbufs look like?
I mentioned this a bit in "Reducing Copies". The loaders in gdk-pixbuf would create immutable pixbufs, with an internal representation that is friendly to GPUs. In the proposed scheme, that internal representation would be a Cairo image surface; it can be something else if GTK/GDK eventually prefer a different way of shipping image data into the toolkit.
Those pixbufs would be immutable. In true C fashion we can call it
undefined behavior to change the pixel data (say, an app could request
gimme_the_cairo_surface
and tweak it, but that would not be
supported).
I think we could also have a "just give me the pixels" API, and a
"create a pixbuf from these pixels" one, but those would be one-time
conversions at the edge of the API. Internally, the pixel data that
actually lives inside a GdkPixbuf
would remain immutable, in some
preferred representation, which is not necessarily what the
application sees.
What worked well
A small API to load multiple image formats, and paint the images easily to the screen, while handling most of the X awkwardness semi-automatically, was very useful!
A way to get and modify pixel data: applications clearly like doing this. We can formalize it as an application-side thing only, and keep the internal representation immutable and in a format that can evolve according to the needs of the internal API.
Pluggable loaders, up to a point. Gdk-pixbuf doesn't support all the image formats in the world out of the box, but it is relatively easy for third-parties to provide loaders that, once installed, are automatically usable for all applications.
What didn't work well
Having effectively two pixel formats supported, and nothing else: gdk-pixbuf does packed RGB and unpremultiplied RGBA, and that's it. This isn't completely terrible: applications which really want to know about indexed or grayscale images, or high bit-depth ones, are probably specialized enough that they can afford to have their own custom loaders with all the functionality they need.
Pluggable loaders, up to a point. While it is relatively easy to create third-party loaders, installation is awkward from a system's perspective: one has to run the script to regenerate the loader cache, there are more shared libraries running around, and the loaders are not sandboxed by default.
I'm not sure if it's worthwhile to let any application read "any" image format if gdk-pixbuf supports it. If your word processor lets you paste an image into the document... do you want it to use gdk-pixbuf's limited view of things and include a high bit-depth image with its probably inadequate conversions? Or would you rather do some processing by hand to ensure that the image looks as good as it can, in the format that your word processor actually supports? I don't know.
The API for animations is very awkward. We don't even support APNG... but honestly I don't recall actually seeing one of those in the wild.
The progressive loading API is awkward. The "feed some bytes into the loader" part is mostly okay; the "notify me about changes to the pixel data" is questionable nowadays. Web browsers don't use it; they implement their own loaders. Even EOG doesn't use it.
I think most code that actually connects to GdkPixbufLoader
's
signals only uses the size-prepared
signal — the one that gets
emitted soon after reading the image headers, when the loader gets to
know the dimensions of the image. Apps sometimes use this to say,
"this image is W*H pixels in size", but don't actually decode the
rest of the image.
The gdk-pixbuf model of static images, or GIF animations, doesn't work well for multi-page TIFFs. I'm not sure if this is actualy a problem. Again, applications with actual needs for multi-page TIFFs are probably specialized enough that they will want a full-featured TIFF loader of their own.
Awkward architectures
Thumbnailers
The thumbnailing system has slowly been moving towards a model where we actually have thumbnailers specific to each file format, instead of just assuming that we can dump any image into a gdk-pixbuf loader.
If we take this all the way, we would be able to remove some weird code in, for example, the JPEG pixbuf loader. Right now it supports loading images at a size that the calling code requests, not only at the "natural" size of the JPEG. The thumbnailer can say, "I want to load this JPEG at 128x128 pixels" or whatever, and in theory the JPEG loader will do the minimal amount of work required to do that. It's not 100% clear to me if this is actually working as intended, or if we downscale the whole image anyway.
We had a distinction between in-process and out-of-process thumbnailers, and it had to do with the way pixbuf loaders are used; I'm not sure if they are all out-of-process and sandboxed now.
Non-raster data
There is a gdk-pixbuf loader for SVG images which uses librsvg
internally, but only in a very basic way: it simply loads the SVG at
its preferred size. Librsvg jumps through some hoops to compute a
"preferred size" for SVGs, as not all of them actually indicate one.
The SVG model would rather have the renderer say that the SVG is to be
inserted into a rectangle of certain width/height, and
scaled/positioned inside the rectangle according to some other
parameters (i.e. like one would put it inside an HTML document, with a
preserveAspectRatio
attribute and all that). GNOME applications
historically operated with a different model, one of "load me an
image, I'll scale it to whatever size, and paint it".
This gdk-pixbuf loader for SVG files gets used for the SVG thumbnailer, or more accurately, the "throw random images into a gdk-pixbuf loader" thumbnailer. It may be better/cleaner to have a specific thumbnailer for SVGs instead.
Even EOG, our by-default image viewer, doesn't use the gdk-pixbuf loader for SVGs: it actually special-cases them and uses librsvg directly, to be able to load an SVG once and re-render it at different sizes if one changes the zoom factor, for example.
GTK reads its SVG icons... without using librsvg... by assuming that
librsvg installed its gdk-pixbuf loader, so it loads them as any
normal raster image. This kind of dirty, but I can't quite pinpoint
why. I'm sure it would be convenient for icon themes to ship a single
SVG with tons of icons, and some metadata on their id
s, so that GTK
could pick them out of the SVG file with rsvg_render_cairo_sub()
or
something. Right now icon theme authors are responsible for splitting
out those huge SVGs into many little ones, one for each icon, and I
don't think that's their favorite thing in the world to do :)
Exotic raster data
High bit-depth images... would you expect EOG to be able to load them? Certainly; maybe not with all the fancy conversions from a real RAW photo editor. But maybe this can be done as EOG-specific plugins, rather than as low in the platform as the gdk-pixbuf loaders?
(Same thing for thumbnailing high bit-depth images: the loading code should just provide its own thumbnailer program for those.)
Non-image metadata
The gdk_pixbuf_set_option
/ gdk_pixbuf_get_option
family of
functions is so that pixbuf loaders can set key/value pairs of strings
onto a pixbuf. Loaders use this for comment
blocks, or ICC profiles
for color calibration, or DPI information for images that have it, or
EXIF data from photos. It is up to applications to actually use this
information.
It's a bit uncomfortable that gdk-pixbuf makes no promises about the
kind of raster data it gives to the caller: right now it is raw
RGB(A) data that is not gamma-corrected nor in any particular color
space. It is up to the caller to see if the pixbuf has an ICC profile
attached to it as an option
. Effectively, this means that
applications don't know if they are getting SRGB, or linear RGB, or
what... unless they specifically care to look.
The gdk-pixbuf API could probably make promises: if you call this function you will get SRGB data; if you call this other function, you'll get the raw RGBA data and we'll tell you its colorspace/gamma/etc.
The various set_option
/ get_option
pairs are also usable by the
gdk-pixbuf saving code (up to now we have just talked about
loaders). I don't know enough about how applications use the saving
code in gdk-pixbuf... the thumbnailers use it to save PNGs or JPEGs,
but other apps? No idea.
What I would like to see
Immutable pixbufs in a useful format. I've started work on
this in a merge request; the internal code is now ready
to take in different internal representations of pixel data. My goal
is to make Cairo image surfaces the preferred, immutable, internal
representation. This would give us a
gdk_pixbuf_get_cairo_surface()
, which pretty much everything that
needs one reimplements by hand.
Find places that assume mutable pixbufs. To gradually deprecate
mutable pixbufs, I think we would need to audit applications and
libraries to find places that cause GdkPixbuf
structures to degrade
into mutable ones: basically, find callers of
gdk_pixbuf_get_pixels()
and related functions, see what they do, and
reimplement them differently. Maybe they don't need to tint icons by
hand anymore? Maybe they don't need icons anymore, given our
changing UI paradigms? Maybe they are using gdk-pixbuf as an image
loader only?
Reconsider the loading-updates API. Do we need the
GdkPixbufLoader::area-updated
signal at all? Does anything break
if we just... not emit it, or just emit it once at the end of the
loading process? (Caveat: keeping it unchanged more or less means
that "immutable pixbufs" as loaded by gdk-pixbuf actually mutate while
being loaded, and this mutation is exposed to applications.)
Sandboxed loaders. While these days gdk-pixbuf loaders prefer the progressive feed-it-bytes API, sandboxed loaders would maybe prefer a read-a-whole-file approach. I don't know enough about memfd or how sandboxes pass data around to know how either would work.
Move loaders to Rust. Yes, really. Loaders are
security-sensitive, and while we do need to sandbox them, it would
certainly be better to do them in a memory-safe language. There are
already pure Rust-based image loaders: JPEG,
PNG, TIFF, GIF, ICO.
I have no idea how featureful they are. We can certainly try them
with gdk-pixbuf's own suite of test images. We can modify them to add
hooks for things like a size-prepared
notification, if they don't
already have a way to read "just the image headers".
Rust makes it very easy to plug in micro-benchmarks, fuzz testing, and other modern amenities. These would be perfect for improving the loaders.
I started sketching a Rust backend for gdk-pixbuf
loaders some months ago, but there's nothing useful
yet. One mismatch between gdk-pixbuf's model for loaders, and the
existing Rust codecs, is that Rust codecs generally take something
that implements the Read
trait: a blocking API to read bytes from
abstract sources; it's a pull API. The gdk-pixbuf model is a push
API: the calling code creates a loader object, and then pushes bytes
into it. The gdk-pixbuf convenience functions that take a
GInputStream
basically do this:
loader = gdk_pixbuf_loader_new (...);
while (more_bytes) {
n_read = g_input_stream_read (stream, buffer, ...);
gdk_pixbuf_loader_write(loader, buffer, n_read, ...);
}
gdk_pixbuf_loader_close (loader);
However, this cannot be flipped around easily. We could probably use a second thread (easy, safe to do in Rust) to make the reader/decoder thread block while the main thread pushes bytes into it.
Also, I don't know how the Rust bindings for GIO present things like
GInputStream
and friends, with our nice async cancellables and all
that.
Deprecate animations? Move that code to EOG, just so one can look at memes in it? Do any "real apps" actually use GIF animations for their UI?
Formalize promises around returned color profiles, gamma, etc. As mentioned above: have an "easy API" that returns SRGB, and a "raw API" that returns the ARGB data from the image, plus info on its ICC profile, gamma, or any other info needed to turn this into a "good enough to be universal" representation. (I think all the Apple APIs that pass colors around do so with an ICC profile attached, which seems... pretty much necessary for correctness.)
Remove the internal MIME-sniffing machinery. And just use GIO's.
Deprecate the crufty/old APIs in gdk-pixbuf.
Scaling/transformation, compositing, GdkPixdata
,
gdk-pixbuf-csource
, all those. Pixel crunching can be done by
Cairo; the others are better done with GResource
these days.
Figure out if we want blessed codecs; fix thumbnailers. Link those loaders statically, unconditionally. Exotic formats can go in their own custom thumbnailers. Figure out if we want sandboxed loaders for everything, or just for user-side images (not ones read from the trusted system installation).
Have GTK4 communicate clearly about its drawing model. I think we are having a disconnect between the GUI chrome, which is CSS/GPU friendly, and graphical content generated by applications, which by default right now is done via Cairo. And having Cairo as a to-screen and to-printer API is certainly very convenient! You Wouldn't Print a GUI, but certainly you would print a displayed document.
It would also be useful for GTK4 to actually define what its preferred image format is if it wants to ship it to the GPU with as little work as possible. Maybe it's a Cairo image surface? Maybe something else?
Conclusion
We seem to change imaging models every ten years or so. Xlib, then Xrender with Cairo, then GPUs and CSS-based drawing for widgets. We've gone from trusted data on your local machine, to potentially malicious data that rains from the Internet. Gdk-pixbuf has spanned all of these periods so far, and it is due for a big change.