Fixing test coverage reports in at-spi2-core

Over the past weeks I have been fixing the test coverage report for at-spi2-core. It has been a bit of an adventure where I had to do these:

Replace a code coverage tool for another one...
... which was easier to modify to produce more accessible HTML reports.
Figuring out why some of at-spi2-core's modules got 0% coverage.
Learning to mock DBus services.

What is a code coverage report?

In short — you run your program, or its test suite. You generate a coverage report, and it tells you which lines of code were executed in your program, and which lines weren't.

A coverage report is very useful! It lets one answer some large-scale questions:

Which code in my project does not get exercised by the tests?
If there is code that is conditionally-compiled depending on build-time options, am I forgetting to test a particular build configuration?

And small-scale questions:

Did the test I just added actually cause the code I am interested in to be run?
Are there tests for all the error paths?

You can also use a coverage report as an exploration tool:

Run the program by hand and do some things with it. Which code was run through your actions?

I want to be able to do all those things for the accessibility infrastructure: use the report as an exploration tool while I learn how the code works, and use it as a tool to ensure that the tests I add actually test what I want to test.

A snippet of a coverage report

This is a screenshot of the report for at-spi2-core/atspi/atspi-accessible.c:

Coverage report for the atspi_accessible_get_child_count() function

The leftmost column is the line number in the source file. The second column has the execution count and color-coding for each line: green lines were executed one or more times; red lines were not executed; white lines are not executable.

By looking at that bit of the report, we can start asking questions:

There is a return -1 for an error condition, which is not executed. Would the calling code actually handle this correctly, since we have no tests for it?
The last few lines in the function are not executed, since the check before them works as a cache. How can we test those lines, and cause them to be executed? Are they necessary, or is everything handled by the cache above them? How can we test different cache behavior?

First pass: lcov

When I initially added continuous integration infrastructure to at-spi2-core, I copied most of it from libgweather, as Emmanuele Bassi had put in some nice things in it like static code analysis, address-sanitizer, and a code coverage report via lcov.

The initial runs of lcov painted a rather grim picture: test coverage was only at about 30% of the code. However, some modules which are definitely run by the test suite showed up with 0% coverage. This is wrong; those modules definitely have code that gets executed; why isn't it showing up?

Zero coverage

At-spi2-core has some peculiar modules. It does not provide a single program or library that one can just run by hand. Instead, it provides a couple of libraries and a couple of daemons that get used through those libraries, or through raw DBus calls.

In particular, at-spi2-registryd is the registry daemon for accessibility, which multiplexes requests from assitive technologies (ATs) like screen readers into applications. It doesn't even use the session DBus; it registers itself in a separate DBus daemon specific to accessibility, to avoid too much traffic in the main session bus.

at-spi2-registryd gets started up as soon as something requires the accessibility APIs, and remains running until the user's session ends.

However, in the test runner, there is no session. The daemon runs, and gets a SIGTERM from its parent dbus-daemon when it terminates. So, while at-spi2-registryd has no persistent state that it may care about saving, it doesn't exit "cleanly".

And it turns out that gcc's coverage data gets written out only if the program exits cleanly. When you compile with the --coverage option, gcc emits code that turns on the flag in libgcc to write out coverage information when the program ends (libgcc is the compiler-specific runtime helper that gets linked into normal programs compiled with gcc).

It's as if main() had a wrapper:

void main_wrapper(void)
{
    int r = main(argc, argv);

    write_out_coverage_info();

    exit(r);
}

int main(int argc, char **argv) 
{
    /* your program goes here */
}

Of course, if your program terminates prematurely through SIGTERM, the wrapper will not finish running and it will not write out the coverage info.

So, how do we simulate a session in the test runner?

Mocking gnome-session

I recently learned of a fantastic tool, python-dbusmock, which makes it really easy to create mock implementations of DBus interfaces.

There are a couple of places in at-spi2-core that depend on watching the user session's lifetime, and fortunately they only need two things from the gnome-session interfaces:

Register themselves as a session client.
Get notified when the session ends so the daemon can exit.

I wrote a mock of these DBus interfaces so that the daemons can register against the fake session manager. Then I made the test runner ask the mock session to tell the daemons to exit when the tests are done.

With that, at-spi2-registryd gets coverage information written out properly.

Obtaining coverage for atk-adaptor

atk-adaptor is a bunch of glue code between atk, the GObject-based library that GTK3 uses to expose accessible interfaces, and libatspi, the hand-written DBus binding to the accessibility interfaces.

The tests for this are very interesting. We want to simulate an application that uses atk to make itself accessible, and to test that e.g. the screen reader can actually interface with them. Instead of creating ATK implementations by hand, there is a helper program that reads XML descriptions of accessible objects, and exposes them via ATK. Each individual test uses a different XML file, and each test spawns the helper program with the XML it needs.

Again, it turns out that the test runner just sent a SIGTERM to the helper program when each test was done. This is fine for running the tests normally, but it prevents code coverage from being written out when the helper program terminates.

So, I installed a gmain signal handler in the helper program, to make it exit cleanly when it gets that SIGTERM. Problem solved!

Missing coverage info for GTK2

The only part of at-spi2-core that doesn't have coverage information yet is the glue code for GTK2. I think this would require running a test program under xvfb so that its libgtk2 can load the module that provides the glue code. I am not sure if this should be tested by at-spi2-core itself, or if that should be the responsibility of GTK2.

Are the coverage reports accessible?

For a sighted person, it is easy to look at a coverage report like the example above and just look for red lines — those that were not executed.

For people who use screen readers, it is not so convenient. I asked around a bit, and Eitan Isaacson gave me some excellent tips on improving the accessibility of lcov and grcov's HTML output.

Lcov is an old tool, and I started using it for at-spi2-core because it is what libgweather already used for its CI. Grcov is a newer tool, mostly by Mozilla people, which they use for Firefox's coverage reports. Grcov is also the tool that librsvg already uses. Since I'd rather baby-sit one tool instead of two, I decided to switch at-spi2-core to use grcov as well and to improve the accessibility of its reports.

The extract from the screenshot above looks like a table with three columns (line number, execution count, source code), but it is not a real HTML <table> — it is done with div elements and styling. Something like this:

<div>
  <div class="columns">
    <div class="column">
      line number
    </div>
    <div class="column color-coding-executed">
      execution count
    </div>
    <div class="column color-coding-executed">
      <pre>source code</pre>
    </div>
  </div>
  <!-- repeat the above for each source line -->
</div>

Eitan showed me how to use ARIA tags to actually expose those divs as something that can be navigated as a table:

Add role="table" aria-label="Coverage report" to the main <div>. This tells web browsers to go into whatever interaction model they use for navigating tables via accessible interfaces. It also gives a label to the table, so that it is easy to find by assistive tools; for example, a screen reader may let you easily navigate to the next table in a document, and you'd like to know what the table is about.
Add role="row" to each row's div.
Add role="cell" to an individual cell's div.
Add an aria-label to cells with the execution count: while the sighted version shows nothing (non-executable lines), or just a red background (lines not executed), or a number with a green background, the screen reader version cannot depend on color coding alone. That aria-label will say "no coverage", or "0", or the actual execution count, respectively.

Time will tell whether this makes reports easier to peruse. I was mainly worried about being able to scan down the source quickly to find lines that were not executed. By using a screen reader's commands for tabular navigation, one can move down in the second column until you reach a "zero". Maybe there is a faster way? Advice is appreciated!

Grcov now includes that patch, yay!

Next steps

I am starting to sanitize the XML interfaces in at-spi2-core, at least in terms of how they are used in the build. Expect an update soon!