Porting of librsvg to Rust goes on. Yesterday I started porting the
C code that implements SVG's <text>
family of elements. I have also
been replacing the little parsers in librsvg with Rust code.
And these days, the lack of string slices in C is bothering me a lot.
What if...
It feels like it should be easy to just write something like
typedef struct {
const char *ptr;
size_t len;
} StringSlice;
And then a whole family of functions. The starting point, where you slice a whole string:
StringSlice
make_slice_from_string (const char *s)
{
StringSlice slice;
assert (s != NULL);
slice.ptr = s;
slice.len = strlen (s);
return slice;
}
But that wouldn't keep track of the lifetime of the original string. Okay, this is C, so you are used to keeping track of that yourself.
Onwards. Substrings?
StringSlice
make_sub_slice(StringSlice slice, size_t start, size_t len)
{
StringSlice sub;
assert (len <= slice.len);
assert (start <= slice.len - len); /* Not "start + len <= slice.len" or it can overflow. */
/* The subtraction can't underflow because of the previous assert */
sub.ptr = slice.ptr + start;
sub.len = len;
return sub;
}
Then you could write a million wrappers for g_strsplit()
and
friends, or equivalents to them, to give you slices instead of C
strings. But then:
-
You have to keep track of lifetimes yourself.
-
You have to wrap every function that returns a plain "
char *
"... -
... and every function that takes a plain "
char *
" as an argument, without a length parameter, because... -
You CANNOT take
slice.ptr
and pass it to a function that just expects a plain "char *
", because your slice does not include a nul terminator (the'\0
byte at the end of a C string). This is what kills the whole plan.
Even if you had a helper library that implements C string slices
like that, you would have a mismatch every time you needed to call a C
function that expects a conventional C string in the form of a
"char *
". You need to put a nul terminator somewhere, and if you
only have a slice, you need to allocate memory, copy the slice into
it, and slap a 0 byte at the end. Then you can pass that to a
function that expects a normal C string.
There is hacky C code that needs to pass a substring to another function, so it overwrites the byte after the substring with a 0, passes the substring, and overwrites the byte back. This is horrible, and doesn't work with strings that live in read-only memory. But that's the best that C lets you do.
I'm very happy with string slices in Rust, which work exactly like the
StringSlice
above, but &str
is actually at the language level and
everything knows how to handle it.
The glib-rs
crate has conversion traits to go from Rust strings or
slices into C, and vice-versa. We alredy saw some of those in the
blog post about conversions in Glib-rs.
Sizes of things
Rust uses usize
to specify the size of things; it's an unsigned
integer; 32 bits on 32-bit machines, and 64 bits on 64-bit machines;
it's like C's size_t
.
In the Glib/C world, we have an assortment of types to represent the sizes of things:
-
gsize
, the same assize_t
. This is an unsigned integer; it's okay. -
gssize
, a signed integer of the same size asgsize
. This is okay if used to represent a negative offset, and really funky in the Glib functions likeg_string_new_len (const char *str, gssize len)
, wherelen == -1
means "callstrlen(str)
for me because I'm too lazy to compute the length myself". -
int
- broken, as in libxml2, but we can't change the API. On 64-bit machines, anint
to specify a length means you can't pass objects bigger than 2 GB. -
long
- marginally better thanint
, since it has a better chance of actually being the same size assize_t
, but still funky. Probably okay for negative offsets; problematic for sizes which should really be unsigned. -
etc.
I'm not sure how old size_t
is in the C standard library, but it
can't have been there since the beginning of time — otherwise
people wouldn't have been using int
to specify the sizes of things.