Numbers and lintian
Lintian uses some small scripts to "collect" data from packages. In daily talk, they are usually referred to as "collection" scripts. Lintian uses files to track the status of "collection" scripts between runs. Consider the following directory listing on lintian.d.o:
$ ls -al laboratory/binary/eclipse/.*-* -rw-r--r-- 1 user group 45 Jul 18 01:21 laboratory/binary/eclipse/.ar-info-1 -rw-r--r-- 1 user group 45 Jul 18 01:21 laboratory/binary/eclipse/.bin-pkg-control-1 -rw-r--r-- 1 user group 45 Jul 18 01:21 laboratory/binary/eclipse/.changelog-file-1 -rw-r--r-- 1 user group 45 Jul 18 01:21 laboratory/binary/eclipse/.copyright-file-1 -rw-r--r-- 1 user group 45 Jul 18 01:21 laboratory/binary/eclipse/.debian-readme-1 -rw-r--r-- 1 user group 45 Jul 18 01:21 laboratory/binary/eclipse/.doc-base-files-1 -rw-r--r-- 1 user group 45 Jul 18 01:21 laboratory/binary/eclipse/.fields-1 [...]
This example shows that a bunch of "collections" scripts have been run for the eclipse binary package. Each of these files contain the version of Lintian that created them (or last wrote them) and a timestamp.
Why the interest in these files? Let us go a bit back in time to the 15th of June 2011. That was the day where Lintian 2.5.1 was uploaded to unstable. That version of Lintian had 17 collection scripts[1] for binary packages. So every binary package would have 17 files and there are... over 35 000 binary packages in the Debian archive.
Ouch, so that makes about 595 000 files if we use this on all binary packages on the archive. The size of each of those files are about 45 bytes, so that is a total of 25.5 MB for all of these files[2]. So other than the "inode abuse", this is not too bad. A little du -h should confirm this...
$ du -h laboratory/binary/eclipse/.*-* 4.0K laboratory/binary/eclipse/.ar-info-1 4.0K laboratory/binary/eclipse/.bin-pkg-control-1 4.0K laboratory/binary/eclipse/.changelog-file-1 4.0K laboratory/binary/eclipse/.copyright-file-1 4.0K laboratory/binary/eclipse/.debian-readme-1 4.0K laboratory/binary/eclipse/.doc-base-files-1 4.0K laboratory/binary/eclipse/.fields-1 [...]
Whaaa... oh - the file system uses a block-size of 4K bytes, so I guess we have to pay a full block for these files. Let's see what that gives, 595 000 times 4 kB is ... 2.27 GB...
Oops!
That was in the (not too distant) past. About a month later (12th of July), the code creating these files are refactored into:
sub _mark_coll_finished { my ($self, $collname, $collver) = @_; # In the "old days" we would also write the Lintian version and the time # stamp in these files, but since we never read them it seems like overkill. # - for the timestamp we could use the mtime of the file anyway return touch_file "$self->{base_dir}/.$collname-$collver"; }
This turns out to be a very space-saving change if we ask du -h:
$ du -h laboratory/binary/lintian/.*-* 0 laboratory/binary/lintian/.ar-info-1 0 laboratory/binary/lintian/.bin-pkg-control-1 0 laboratory/binary/lintian/.changelog-file-1 0 laboratory/binary/lintian/.copyright-file-1 0 laboratory/binary/lintian/.debian-readme-1 0 laboratory/binary/lintian/.doc-base-files-1 [...]
Existing files are not emptied, so we still have some old non-empty files left on lintian.d.o. Nevertheless that was the "nice story" about "the side effect of being lazy sometimes reduces space waste"[3].
But we have another "number" problem. If you grep for "Too many links" in the lintian.log from lintian.d.o you should see:
$ grep -i "too many links" logs/lintian.log mkdir: cannot create directory `/srv/lintian.debian.org/laboratory/binary/aces3': Too many links mkdir: cannot create directory `/srv/lintian.debian.org/laboratory/binary/boinc-cgi-stripchart': Too many links mkdir: cannot create directory `/srv/lintian.debian.org/laboratory/binary/libbrailleutils-java': Too many links mkdir: cannot create directory `/srv/lintian.debian.org/laboratory/binary/libbrailleutils-java-doc': Too many links mkdir: cannot create directory `/srv/lintian.debian.org/laboratory/binary/libclippoly0': Too many links mkdir: cannot create directory `/srv/lintian.debian.org/laboratory/binary/libclippoly-dev': Too many links mkdir: cannot create directory `/srv/lintian.debian.org/laboratory/binary/xul-ext-cookie-monster': Too many links mkdir: cannot create directory `/srv/lintian.debian.org/laboratory/binary/cucumber': Too many links [...] $ grep -i "too many links" logs/lintian.log | wc -l 3274
In the Lintian Laboratory every package is unpacked in a directory based on its type and name (as hinted in the output above). The problem is that ext3 has a limit of the amount sub-directories a directory can have[4]. This limit is just shy of 32 000 and (as a reminder) Debian has over 35 000 binary packages. So a lot of packages are currently not checked on lintian.d.o (#641468).
Footnotes:
[1] Actually it had 18 collections, but one of them are "auto-removed" so it is better not to include it in this case.
[2] Content alone - metadata like permissions and such is ignored.
[3] I admit it was more luck than intention!
[4] The limit comes from the "32 000 hardlink per inode" limitation.