Unfortunately I could not go on stage at the 31st Chaos Communication Congress to present reproducible builds in Debian alongside Mike Perry from the Tor Project and Seth Schoen from the Electronic Frontier Foundation. I've tried to make it up for it, though… and we have made amazing progress.

Wiki reorganization

What was a massive and frightening wiki page now looks really more welcoming:

Screenshot of ReproducibleBuilds on Debian wiki

Depending on what one is looking for, it should be much easier to find. There's now a high-level status overview given on the landing page, maintainers can learn how to make their packages reproducible, enthusiasts can more easily find what can help the project, and we have even started writing some history.

.buildinfo for all packages

New year's eve saw me hacking Perl to write dpkg-genbuildinfo. Similar to dpkg-genchanges, it's run by dpkg-buildpackage to produce .buildinfo control files. This is where the build environment, and hash of source and binary packages are recorded. This script, integrated with dpkg, replace the previous debhelper interim solution written by Niko Tyni.

We used to fix mtimes in control.tar and data.tar using a specific addition to debhelper named dh_fixmtimes. To better support the ALWAYS_EXCLUDE environment variable and for pragramtic reasons, we moved the process in dh_builddeb.

Both changes were quickly pushed to our continuous integration platform. Before, only packages using dh would create a .buildinfo and thus eventually be considered reproducible. With these modifications, many more packages had their chance… and this shows:

Growing amount of packages considered reproducible

Yes, with our experimental toolchain we are now at more than eighty percent! That's more than 17200 source packages!

srebuild

Another big item on the todo-list was crossed over by Johannes Schauer. srebuild is a wrapper around sbuild:

Given a .buildinfo file, it first finds a timestamp of Debian Sid from snapshot.debian.org which contains the requested packages in their exact versions. It then runs sbuild with the right architecture as given by the .buildinfo file and the right base system to upgrade from, as given by the version of the base-files package version in the .buildinfo file. Using two hooks it will install the right package versions and verify that the installed packages are in the right version before the build starts.

Understanding problems

Over 1700 packages have now been reviewed to understand why build results could not be reproduced on our experimental platform. The variations between the two builds are currently limited to time and file ordering, but this still has uncovered many problems. There are still toolchain fixes to be made (more than 180 packages for the PHP registry) which can make many packages reproducible at once, but others like C pre-processor macros will require many individual changes.

debbindiff, the main tool used to understand differences, has gained support for .udeb, TrueType and OpenType fonts, PNG and PDF files. It's less likely to crash on problems with encoding or external tool. But most importantly for large package, it has been made a lot faster, thanks to Reiner Herrmann and Helmut Grohne. Helmut has also been able to spot cross-compilation issues by using debbindiff!

Targeting our efforts

It gives warm fuzzy feelings to hit the 80% mark, but it would be a bit irrelevant if this would not concern packages that matter. Thankfully, Holger worked on producing statistics for more specific package sets. Mattia Rizzolo has also done great work to improve the scripts generating the various pages visible on reproducible.debian.net.

All essential and build-esential packages, except gcc and bash, are considered reproducible or have patches ready. After some lengthy builds, I also managed to come up with a patch to make linux build reproducibly.

Miscellaneous

After my initial attempt to modify r-base to remove a timestamp in R packages, Dirk Eddelbuettel discussed the issue with upstream and came up with a better patch. The latter has already been merged upstream!

Dirk's solution is to allow timestamps to be set using an external environment variable. This is also how I modified FontForge to make it possible to reproduce fonts.

Identifiers generated by xsltproc have also been an issue. After reviewing my initial patch, Andrew Awyer came up with a much nicer solution. Its potential performance implications need to be evaluated before submission, though.

Chris West has been working on packages built with Maven amongst other things.

PDF generated by GhostScript, another painful source of troubles, is being worked on by Peter De Wachter.

Holger got X.509 certificates signed by the CA cartel for jenkins.debian.net and reproducible.debian.net. No more scary security messages now. Let's hope next year we will be able to get certificates through Let's Encrypt!

Let's make a difference together

As you can imagine with all that happened in the past weeks, the #debian-reproducible IRC channel has been a cool place to hang out. It's very energizing to get together and share contributions, exchange tips and discuss hardest points. Mandatory quote:

* h01ger is very happy to see again and again how this is a nice
         learning circle...! i've learned a whole lot here too... in
         just 3 months... and its going on...!

Reproducible builds are not going to change anything for most of our users. They simply don't care how they get software on their computer. But they care to get the right software without having to worry about it. That's our responsibility, as developers. Enabling users to trust their software is important and a major contribution, we as Debian, can make to the wider free software movement. Once Jessie is released, we should make a collective effort to make reproducible builds an highlight of our next release.