Testing testing migration

If you have been following Lintian's development closely, you will probably have noticed that I have not really done anything there for the past week. Instead I have turned my focus on our testing migration script, britney2. First, I have created a minimal test suite[1]. It started as 4 simple tests and by now it contains about 30 tests.

The size of each test is rather small; the largest tests are about 1600 binary packages in total[2], but most are 2-20 binary packages in total. Thus the test suite is rather fast compared to a "live data sample", which easily takes more than 10 minutes for a single run. Unfortunately, hand-crafting the test data is somewhat annoying and easy to get wrong.

The test suite has a somewhat unfair focus on "auto-hint"[3] cases, so the current britney2 fails up to 14 tests. Some of these appears to fail because the auto-hinter (for some reason) receives incomplete information about the situation. To my knowledge we not been able to debug the situation, but Adam has a refactor branch that does not seem to have this issue. Personally I am hoping it will soon be merged into the master branch, especially because it seems to simplify a lot of common operations.

Joachim Breitner (who has been working on a SAT-solver based britney) also contributed a couple of test cases[4]. Allegedly, SAT-britney does rather well on the test suite, failing only 2 tests as far as I can tell[5]. On the other hand, it does solve a some of the more interesting cases britney2 does not solve.

On a more mathematical note, the britney2 implementation behaves like a function[6] with an attractive fixed point[7]. This is interesting, because for some cases it may take britney2 a couple of iterations to reach the right solution. This fixed point is somewhat simple to find by using the following steps (pseudo-code):

// Runtime complexity O(n * br * diff), where "n" is the number of iterations until
// a fixed point is reached, "br" is the complexity of "run_britney" and "diff" is
// the runtime of the "last != current" comparison.
function find_fixed_point(initial);
    last = run_britney(initial)
    current = run_britney(last)
    while last != current ; do
        last = current
        current = run_britney(last)
    od
    return current
end

This gives us a simple way to test if britney will eventually solve the issue herself (and when she will do it). Currently britney2 is automatically run twice a day, so for every 2 iterations (beyond the first) roughly translates to a 24-hours delay. So far the test suite does not have a lot of problems that requires more than one iteration. Personally I would be pleased if it turned out to stay that way as the test suite coverage grows.

If you are interested in playing around with this, you can get sources from:

britney2
- Currently only works in stable (i.e. requires python2.5 and python-apt < 0.8 or so)
  - There some patches for this filed against release.debian.org.
- See the INSTALL file for instructions
- Adam's branch
  - use the "p-u" branch.
SAT-britney
- I haven't tested this one and I do not know the requirements here
britney-tests
- See the README file for instructions

Footnotes:

[1] http://lists.debian.org/debian-release/2011/10/msg00178.html

[2] These tests are auto-generated, so it is merely an "up-scaled pattern".

[3] Basically if two (or more) packages needs to migrate into testing at the exact same time, they need to be hinted in.

[4] Not to mention all the copy-waste errors he pointed out in mine. Apparently, SAT-britney has stricter requirements to the data than britney2. :P

[5] I assume the test called "sat-britney-death" (created by Joachim) was named that way for a reason. The second failure is caused by SAT britney not reading hints (yet?), so the "approve tpu package" test case should fail.

[6] A function that maps an "archive" into another "archive"... erh, I mean, it maps a set of packages into another set of packages... :P

[7] http://en.wikipedia.org/wiki/Fixed_point_%28mathematics%29

Assuming my claim to be true, the function will have more than one fixed point. The obtained fixed point depends on the initial state of testing.

As an example:
- y depends on x
- x in testing has RC bugs

If x is not in testing, it cannot migrate to testing (due to its RC bugs). If x is not in testing, then y cannot migrate into testing. But if x starts in testing, then y may be able to migrate. This can happen if x migrated to testing before an RC bug was filed against it.

(Dis-)Proving my claim is an exercise left for the reader.

Numbers and lintian

Lintian uses some small scripts to "collect" data from packages. In daily talk, they are usually referred to as "collection" scripts. Lintian uses files to track the status of "collection" scripts between runs. Consider the following directory listing on lintian.d.o:

$ ls -al laboratory/binary/eclipse/.*-*
-rw-r--r-- 1 user group  45 Jul 18 01:21 laboratory/binary/eclipse/.ar-info-1
-rw-r--r-- 1 user group  45 Jul 18 01:21 laboratory/binary/eclipse/.bin-pkg-control-1
-rw-r--r-- 1 user group  45 Jul 18 01:21 laboratory/binary/eclipse/.changelog-file-1
-rw-r--r-- 1 user group  45 Jul 18 01:21 laboratory/binary/eclipse/.copyright-file-1
-rw-r--r-- 1 user group  45 Jul 18 01:21 laboratory/binary/eclipse/.debian-readme-1
-rw-r--r-- 1 user group  45 Jul 18 01:21 laboratory/binary/eclipse/.doc-base-files-1
-rw-r--r-- 1 user group  45 Jul 18 01:21 laboratory/binary/eclipse/.fields-1
[...]

This example shows that a bunch of "collections" scripts have been run for the eclipse binary package. Each of these files contain the version of Lintian that created them (or last wrote them) and a timestamp.

Why the interest in these files? Let us go a bit back in time to the 15th of June 2011. That was the day where Lintian 2.5.1 was uploaded to unstable. That version of Lintian had 17 collection scripts[1] for binary packages. So every binary package would have 17 files and there are... over 35 000 binary packages in the Debian archive.

Ouch, so that makes about 595 000 files if we use this on all binary packages on the archive. The size of each of those files are about 45 bytes, so that is a total of 25.5 MB for all of these files[2]. So other than the "inode abuse", this is not too bad. A little du -h should confirm this...

$ du -h laboratory/binary/eclipse/.*-*
4.0K    laboratory/binary/eclipse/.ar-info-1
4.0K    laboratory/binary/eclipse/.bin-pkg-control-1
4.0K    laboratory/binary/eclipse/.changelog-file-1
4.0K    laboratory/binary/eclipse/.copyright-file-1
4.0K    laboratory/binary/eclipse/.debian-readme-1
4.0K    laboratory/binary/eclipse/.doc-base-files-1
4.0K    laboratory/binary/eclipse/.fields-1
[...]

Whaaa... oh - the file system uses a block-size of 4K bytes, so I guess we have to pay a full block for these files. Let's see what that gives, 595 000 times 4 kB is ... 2.27 GB...

Oops!

That was in the (not too distant) past. About a month later (12th of July), the code creating these files are refactored into:

sub _mark_coll_finished {
    my ($self, $collname, $collver) = @_;
    # In the "old days" we would also write the Lintian version and the time
    # stamp in these files, but since we never read them it seems like overkill.
    #  - for the timestamp we could use the mtime of the file anyway
    return touch_file "$self->{base_dir}/.$collname-$collver";
}

This turns out to be a very space-saving change if we ask du -h:

$ du -h laboratory/binary/lintian/.*-*
0       laboratory/binary/lintian/.ar-info-1
0       laboratory/binary/lintian/.bin-pkg-control-1
0       laboratory/binary/lintian/.changelog-file-1
0       laboratory/binary/lintian/.copyright-file-1
0       laboratory/binary/lintian/.debian-readme-1
0       laboratory/binary/lintian/.doc-base-files-1
[...]

Existing files are not emptied, so we still have some old non-empty files left on lintian.d.o. Nevertheless that was the "nice story" about "the side effect of being lazy sometimes reduces space waste"[3].

But we have another "number" problem. If you grep for "Too many links" in the lintian.log from lintian.d.o you should see:

$ grep -i "too many links" logs/lintian.log
mkdir: cannot create directory `/srv/lintian.debian.org/laboratory/binary/aces3': Too many links
mkdir: cannot create directory `/srv/lintian.debian.org/laboratory/binary/boinc-cgi-stripchart': Too many links
mkdir: cannot create directory `/srv/lintian.debian.org/laboratory/binary/libbrailleutils-java': Too many links
mkdir: cannot create directory `/srv/lintian.debian.org/laboratory/binary/libbrailleutils-java-doc': Too many links
mkdir: cannot create directory `/srv/lintian.debian.org/laboratory/binary/libclippoly0': Too many links
mkdir: cannot create directory `/srv/lintian.debian.org/laboratory/binary/libclippoly-dev': Too many links
mkdir: cannot create directory `/srv/lintian.debian.org/laboratory/binary/xul-ext-cookie-monster': Too many links
mkdir: cannot create directory `/srv/lintian.debian.org/laboratory/binary/cucumber': Too many links
[...]
$ grep -i "too many links" logs/lintian.log | wc -l
3274

In the Lintian Laboratory every package is unpacked in a directory based on its type and name (as hinted in the output above). The problem is that ext3 has a limit of the amount sub-directories a directory can have[4]. This limit is just shy of 32 000 and (as a reminder) Debian has over 35 000 binary packages. So a lot of packages are currently not checked on lintian.d.o (#641468).

The current feature branch for #641468 already solves the directory limit issue (by using a "mirror-like" pool layout). It also removes the need for "1 file per completed collection" by writing this information in the existing "per entry" status file.

I hope we can get the branch merged into master within 2-3 weeks, though there are still a few issues that needs to be worked out before then.

Footnotes:

[1] Actually it had 18 collections, but one of them are "auto-removed" so it is better not to include it in this case.

[2] Content alone - metadata like permissions and such is ignored.

[3] I admit it was more luck than intention!

[4] The limit comes from the "32 000 hardlink per inode" limitation.

DEP-5, JW, dev docs and other goodies

Are you using DEP-5 in copyright file? Are you doing it right? Thanks to the work of Jakub Wilk, Lintian 2.5.3 will spot some of the regular mistakes (including syntax errors) and check if you are using the newest revision.

This is far from the only thing Jakub Wilk has added to Lintian in this development cycle. Most of his contributions in 2.5.3 are listed next to a little "[JW]" in the changelog and I hope to see a lot more of those. :)

Together with Jeremiah Foster, we have created a small POD document that hopefully will help aspiring developers. The file currently explains how the source code is divided, some core concepts and how to run specific tests in the test suite. The plan is to make it available on lintian.d.o along with a link to the git source code (#639974).

Last time, I mentioned we have gotten rid of collection/fields by a simple improvement to our Lintian::Collect API. Turns out the same improvement could also be used on collection/source-control-file. This means we no longer create a file per field in the d/control of the source package.

To avoid (any more) confusion, Lintian will now inform you if the package carries an override for a non-overridable tag (except if you use --quiet). On a related note, Lintian will now error out if it sees an unknown field in a vendor profile. Previously it would just silently ignore the field.

There is one more change I would like to highlight and this one is probably best served with an example:

$ lintian --show-overrides ../lintian_2.5.2.dsc
N: We build-depend on cdbs for the test suite
O: lintian source: unused-build-dependency-on-cdbs
N: We build-depend on quilt for the test suite
O: lintian source: quilt-build-dep-but-no-series-file
N: We don't have a patch system for lintian itself
O: lintian source: patch-system-but-no-source-readme

Lintian will now attempt to extract and print the comment related to the override from the overrides file. The comment will be printed above each tag the override applies to (so you may see some repeated comments). The full documentation for how Lintian finds the comments (or why it does not find the comment) is available in the Lintian User Manual.

The exact format of the comments being outputted may change in the near future. Particularly, we may add a marker to assist the tools generating lintian.d.o to find the comments. Hopefully we will be able to close #512901 soon.

Beyond the changes above, I am very happy to see the bug count drop below 180 unfixed bugs[1]. We have not been this low there since the 2.5.0~rc1 upload (in February) and I hope we can keep it this time. :)

Finally if you ever wanted to run your own little "lintian.d.o", then you may find the "Lintian harness" proposal interesting. The basic idea the is to clean up the current tools and make a proper frontend out of them. Should you be interested, feel free to come with suggestions or requests.

[1] Lintain bug page

I use the number next to "Outstanding" under "Status" near the bottom. So "pending" bugs are considered "fixed" in my numbering.

Updates in the Lintian World

Documentation!

That is what the next version of Lintian will have. In 2.5.1, Lintian
had support for adding options in the lintianrc. Unfortunately, the
user had to guess the syntax, which makes the feature less useful.
With the next release, the manpage and the example config file[1]
will contain documentation and examples.

The User Manual[2] also saw some improvements. Particularly, it will
mention that lintian can process a .changes file and that it does
cross-package checks. Of course, it also documents the format of the
new Vendor Profiles. Though even with these improvements, the current
documentation still leaves much to be wanted.

The last "unpack" script (unpack-srcpkg-l1) was finally removed and
was replaced by a smarter version of index collection. This means
that Lintian can now schedule index in parallel with a few other
collections (namely unpacked, which is the current bottleneck).

The speed improvement is probably not noticeable, but it was also a
good time to upgrade our index collection to properly handle
multi-tarball source packages. Previously Lintian only extracted the
index from the "base" tarball.

The fields collection also saw some attention. Previously it
extracted fields from the control file (in the .deb, etc.) into a
directory. Each field became a file and the value of the field would
be written in the field. Today, the collection basically looks like
this:

[...]
if (-d 'fields'){
    system('rm', '-rf', 'fields') == 0
        or die 'Could not remove old fields directory';
}
[...]

It has been deprecated the minute Lintian::Collect API saw a little
upgrade to fetch these fields itself. This change should remove some
unnecessary file I/O in a normal run[3].

If you have a friend with his/her own little home-grown Lintian check
(or collection)[4], they will have to update it to A) not depend on
fields, B) use the Lintian::Collect API for accessing fields and C)
depend on index for source packages if it uses the index information.
For the latter, Lintian::Collect has grown support for source package
index.

Lintian has also seen a lot of decrufting in the 2.5.2 development
cycle. The new test suite now features skeletons ("standard base
environments") and template support for all package test suites[5].

This means it takes a lot less of writing (or copy/waste) to generate
a simple specialized test in the low-level test suites. As a part
of cleaning up the test-suites, I got to run:

$ git rm t/{source,debs}/*/copyright

without breaking a single test. I have to admit that commit 8584996

is among my favorite ones in the 2.5.2 development cycle.

Finally, if you are one of the few that have ever used the
--packages-file option, then I am pleased to inform you that it will
be deprecated in 2.5.2 and replaced by --packages-from-file. The
old option had some specialized format, whereas the new option parses
the file (or stdin) as "one package per line". The latter being far
more "find -name '*.deb' | "-friendly.

In case you are wondering why --packages-file is deprecated, here is

the packages-file parser from lintian 2.5.1:

while (my $line = <$pkgin>) {
chomp($line);
my (undef, undef, undef, $file) = split(/\s+/, $line, 4);
$pool->add_file($file);
}

[1] Installed as /etc/lintianrc

[2] Installed as /usr/share/doc/lintian/lintian.{html/,txt.gz}

[3] Namely, 2 opens + 2 closes + 1 read and one write for each field
in every package (the fields were cached, once they were read). Not
to mention a directory listing and having to remove the files again
later.

[4] Of course, you do not have such a thing... Because we all know
that no one (except this "not too close" friend of yours) messes with
Internal APIs of other projects...

[5] The exception being that the "changes" suite does not have a

standard base environment, since it would be completely unused.

Status on Lintian and Vendor Profiles

During the past week I finally got around to give the Vendor Profiles branch some attention. I suspect the most exciting change is that I got support for non-overridable tags.

Along with making --ftp-master-rejects an alias for the ftp-master profile. This means that you will get the same behavior from Lintian and from the ftp-master auto-reject mechanism. The only issue is that the overrides are silently ignored.

We also made dropped the "default" profile symlink in favor of using dpkg-vendor to find the "best" profile. This causes a slight conflict with solving #626476. I have asked the dpkg maintainers if that part of the libdpkg-perl API is stable enough for us to use.

Finally I fixed an issue where profiles would ignore all display settings (--display-info etc.) and just show all tags (except experimental tags).

Beyond the vendor profile branch, there are also a number of interesting changes applied in the master branch.

Together with some people from #debian-python (ScottK and jwilk as I recall), I prepared some python related goodies. Particularly we fixed a false-positive missing-python-build-dependency with python3 and marked dh_pycentral and dh_python as deprecated. On a related note, we have also removed the tag uses-dh-python-with-no-pycompat.

Based on feedback from Eric Lavarde, we redefined the classpath-contains-relative-path tag. It now only emits the "bad" parts of the Classpath and relative entries are allowed in they exists or point to /usr/share/java/$file.jar and there is a strong libX-java dependency. While the change breaks overrides, according to lintian.d.o we do not have any yet.

The next release will feature a check for non-empty dependency_libs fields in la files and a fix to a flaw in logic for architecture dependent overrides.

We will be reducing our dependency of dpkg-dev to a Suggests (#626476). Currently the xz-utils is in the pseudo-essential set, but it may not stay that way. So in the future you may need to explicitly install xz-utils for Lintian to support .xz source packages.

Last, but not least, if you have felt that the lintianrc file has been rather useless to you, then you are not alone. I have made a prototype patch to improve the situation and hopefully soon you get to specify default options in your lintianrc.

Lintian 2.5.0 - Overrides and other changes

In the new version of Lintian there has been some more changes to overrides as well as a couple of new changes.

First of, we fixed a false-positive with the embedded-library tag. Lintian would incorrectly use the source-field of a binary package, when figuring out if a package was the "real library" and not an offender embedding the library. However, this field is allowed to contain the version of the source package (if you remember Policy 5.6.1) and Lintian did not correctly cope with that.

Speaking of embedded libraries, we have accepted a (series of) patch(es) from Marcelo Jorge Vieira to detect a number of embedded versions of the jQuery javascript library (libjs-query-*).

Antonio Terceiro also gave a couple of patches two make Lintian more accurate with the recent changes for Ruby packages. Furthermore we also added a new experimental tag to catch duplicate files in /usr/share/doc.

Lintian 2.5.0 also modifies the syntax and semantics of the overrides file. In 2.5.0 and newer all "" after the tag name are wildcards*; previously they only acted as wildcards in the beginning or the end of text after the tag.

The "Multi-Arch"-aware reader might have noticed that this is not enough for packages marked "Multi-Arch: same". Some packages might emit different tags on different architectures and all files in the package (incl. the override file) must be byte-for-byte identical if the path is the same on all architectures.

Andreas Beckmann proposed a solution that we have accepted, namely architecture dependent overrides. With 2.5.0 and newer you can specify that a certain override is only for certain architecture. The parser is currently somewhat naive and forgiving, so it does not support architecture wildcards and it will not check that the architectures are valid. Now you get to do overrides like this:

# We like our code without pic on x86, thank you
[i386]: shlib-with-non-pic-code

If you had not guessed it, we use the same format as is used in the Build-Depends field (except for the lack of wildcard support). So you should be familiar with it. :)

Note that the syntax of the overrides should be backwards compatible, so unlike the 2.5.0~rc1 upload, your overrides should still work!

A little heads up for people going to DebConf11: We will do a Lintian BoF again this year, yay!

Update on Lintian 2.5.0~rc3

While we are waiting for Lintian 2.5.0~rc3, here is a little update on what you can look forward to.

About a week ago I had the pleasure of merging my branch into the Lintian master branch. While the IRC bot DOS'ed the #debian-qa channel[SRY], cross-package checks became part of the official Lintian.

Beyond the reduced amount of binary-without-manpage false-positives, circular-dependency checking and a partial broken symlink detector, the 2.5.0~rc3 release will also feature java and multiarch checks (kudos to Vincent Fourmond and Steve R. Langasek, respectively).

The new java checks are based on the Java Policy and currently mostly covers checking for possibly build misconfigurations (jars without class files etc.), weird classpaths and missing Main-Class/depends on jarwrapper for executable jars. Note that #539315 has not been fixed, so executable jars will still trigger a warning.

For people wanting to upgrade their libraries to multiarch, Lintian will have a check to see if you remember the Pre-Depends on multiarch-support. Personally I have not followed the progress of multiarch too closely, but last I checked multiarch-support was not in unstable. I guess we are ahead on this one. :)

Other highlights include "dont install stuff in /run", armhf support, less false positive on missing B-D on dh_python{2,3} and recognition of 3.9.2 as the latest version of the Policy Manual.

I am also planning on starting a new branch and this time I intend to work on Vendor-based customization of Lintian. If you have any ideas/suggestions, by all means follow up on the thread on the Lintian list.

[SRY] Sorry to anyone present in the channel!

Lintian 2.5.0~rc1 (and newer) breaks overrides

This is old actually old news, but I suspect some people have forgotten or maybe do not understand to what extend. We decided to break overrides in 2.5.0~rc1[1], as explained in Raphael's email to d-d-a@l.d.o.

As the email mentions there were two breakages - the tag merges and the file prefix change. Especially the latter seem to be causing confusion (or at least pings/questions/emails for me). The long story short:

All overrides listing files with a prefix of "./" or "/" will not work as of 2.5.0~rc1

If you are curious as to why, please have a look at #534940.

Fixing the override is simply a matter of removing the "./" or "/" prefix(es). If you discover a tag that still uses "./" or "/" as file name prefix, please file a bug (assuming someone did not beat you to it).

I am not entirely sure if the FTP-masters have updated the copy of Lintian that handles auto-rejects[2], so you may have keep the old/broken override if it is for anon-fatal auto-reject tag. This is especially true for the embedded-zlib tag that were merged into embedded-library.

[1] On a related note, this is why this is 2.5.0~rc1 and not 2.4.4.

[2] I also do not know when they plan to do it if they have not already done so, but Tolimar has informed me that 2.5.0~rc2 is in squeeze-backports. >.>

More fun with cross package checks

About a week ago, I wrote about doing cross package checks in Lintian. At that time I had only played with looking up manpages in direct dependencies, which I hope will greatly improve the current binary-without-manpage stats (around 677 packages and 2750ish tags incl. overrides).

Today, I decided to take on another challenge. Detection of circular dependencies between binary packages (created from the same source). We already have a little library for building dependency relationships; however I ended up not using it, since it has an important limitation. It cannot tell how the circular dependencies relate to each other.

Let me demonstrate what I mean with an example. Suppose we have a couple packages that depend on each other like this:

pkg-core => pkg-A, pkg-D # circle 1: A, B, C pkg-A => pkg-B pkg-B => pkg-C pkg-C => pkg-A

# circle 2: D, E, F
pkg-D => pkg-E
pkg-E => pkg-F
pkg-F => pkg-D

In this case, our current tool will be able to tell that all packages (possibly except root - I did not check) are in a circular relationship. However, we can find better results by pretending we are trying to find Strongly Connected Components (actually that is exactly what we are trying to do here).

So today, I implemented Tarjan's algorithm in Perl to solve this particular issue and committed it to my Lintian branch. If this branch is merged into the Lintian master branch, we can close #316283.

Cross package checks in Lintian

If you have been using Lintian you may also know that it has some limitations. One of these are that Lintian always checks every package isolated from each other unlike other tools like piuparts. It has its advantages (such as not requiring all dependencies being present) but it also has it disadvantages.

Currently Lintian emits a "binary-without-manpage"-tag if the package contains a binary without a manpage. At first glance this appears reasonable, but it really is too simple (#512645).

Basically Lintian complains about the missing manpage even if the manpage will always be present. A simple case is package foo depends on foo-data, the binary is in foo and the manpage is in foo-data. A live example of this is gedit and gedit-common[1].

The problem is that Lintian does not know if a given package is going to processed or not. Lintian does have a few cross package checks; namely binary checks can see the source package if it is available. This can be done with a bit of clever sorting (quote from Lintian source code):

# [...] (the sort is to make sure that source packages are # before the corresponding binary packages--this has the advantage that binary # can use information from the source packages if these are unpacked)

This works rather well, but its scope is very limited. Particularly it is no help, if we need to check if the manpage missing in foo might be in foo-data. This is what #513663 is all about.

In January I branched the Lintian code and started working on grouping packages from the same source together. Yesterday I finished refactoring Lintian to do just that.

However, this in itself is not very useful if the checks cannot take advantage of this. So today I sat down and extended our manpages check a bit and posted the results to the Lintian mailing list.

[1]http://packages.debian.org/sid/gedit

http://packages.debian.org/sid/gedit-common

« Page 5 / 5