About the Debian GNU/Linux port for OpenRISC or1k

In my previous post I mentioned my involvement with the OpenRISC or1k port. It was the technical activity in which I spent most time during 2014 (Debian and otherwise, day job aside).

I thought that it would be nice to talk a bit about the port for people who don't know about it, and give an update for those who do know and care. So this post explains a bit how it came to be, details about its development, and finally the current status. It is going to be written as a rather personal account, for that matter, since I did not get involved enough in the OpenRISC community at large to learn much about its internal workings and aspects that I was not directly involved with.

There is not much information about all of this elsewhere, only bits and pieces scattered here and there, but specially not much public information at all about the development of the Debian port. There is an OpenRISC entry in the Debian wiki, but it does not contain much information yet. Hopefully, this piece will help a bit to preserve history and give an insight for future porters.

First Things First

I imagine that most people reading this post will be familiar with the terminology, but just in case, to create a new Debian port means to get a Debian system (GNU/Linux variant, in this case) to run in the OpenRISC or1k computer architecture.

Setting to one side all differences between hardware and software, and as described in their site:

“The aim of the OpenRISC project is to create free and open source computing platforms”

It is therefore a good match for the purposes of Debian and Free Software world in general.

The processor has not been produced in silicon, or not available for the masses in any case. People with the necessary know-how can download the hardware description (Verilog) and synthesise it in a FPGA, or otherwise use simulators. It is not some piece of hardware that people can purchase yet, and there are no plans to mass-produce it in the near future either.

The two people (including me) involved in this Debian port did not have the hardware, so we created the port entirely through cross-compiling from other architectures, and then compiling inside Qemu. In a sense, we were creating a Debian port for hardware that "does not [phisically] exist". The software that we built was tested by people who had hardware available in FPGA, though, so it was at least usable. I understand that people working in the arm64 port had to work similarly in the initial phases, working in the dark without access to real hardware to compile or test.

The Spark

The first time that I heard about the initiative to create the port was in late February of 2014, in a post which appeared in Linux Weekly News (sent by Paul Wise) and Slashdot. The original post announcing it was actually from late January, from Christian Svensson (blueCmd):

“Some people know that I've been working on porting Glibc and doing some toolchain work. My evil master plan was to make a Debian port, and today I'm a happy hacker indeed!

Below is a link to a screencast of me installing Debian for OpenRISC, installing python2.7 via apt-get (which you shouldn't do in or1ksim, it takes ages! (but it works!)) and running a small Python script. http://asciinema.org/a/7362

So, now, what can a Debian Hacker do when reading this? (Even if one's Hackery Level is not that high, as it is my case). And well, How Hard Can It Be? I mean, Really?

Well, in my own defence, I knew that the answer to the last two questions would be a resounding Very. But for some reason the idea grabbed me and I couldn't help but think that it would be a Really Exciting Project, and that somehow I would like to get involved. So I wrote to Christian offering my help after considering it for a few days, around mid March, and he welcomed me aboard.

The Ball Was Already Rolling

Christian had already been in contact with the people behind DebianBootstrap, and he had already created the repository http://openrisc.debian.net/ with many packages of the base system and beyond (read: packages name_version_or1k.deb available to download and install). Still nowadays the packages are not signed with proper keys, though, so use your judgement if you want to try them.

After a few weeks, I got up to speed with the status of the project and got my system working with the necessary tools. This meant basically sbuild/schroot to compile new packages, with the base system that Christian already got working, installed in a chroot, probably with the help of debootstrap, and qemu-system-or1k to simulate the system.

Only a few of the packages were different from the version in Debian, like gcc, binutils or glibc -- they had not been upstreamed yet. sbuild ran through qemu-system-or1k, so the compilation of new packages could happen "natively" (running inside Qemu) rather than cross-compiling the packages, pulling _or1k.deb packages for dependencies from the repository that he had prepared, and _all.deb packages from snapshots.debian.org.

I started by trying to get the packages that I [co-]maintain in Debian compiled for this architecture, creating the corresponding _or1k.deb. For most of them, though, I needed many dependencies compiled before I could even compile my packages.

The GNU autotools / autoreconf Problem

Since very early, many of the packages failed to build with messages such as:

Invalid configuration 'or1k-linux-gnu': machine 'or1k' not recognized
configure: error: /bin/bash ../config.sub or1k-linux-gnu failed

This means that software packages based on GNU autotools and using configure scripts need recent versions of the files config.sub and config.guess that they ship in their root directory, to be able to detect the architecture and generate the code accordingly.

This is counter-intuitive, having into account that GNU autotools were designed to help with portability. Yet, in the case of creating new Debian ports, it meant that unless upstream had very recent versions of config.{guess,sub}, it would prevent the package to compile straight away in the new architectures -- even if invoking gcc without ado would have worked without problems in most cases for native compilation.

Of course this did not only affect or1k, and there was already the autoreconf effort underway as a way to update these files automatically when building Debian packages, pushed by people porting Debian to the new architectures added in 2013/2014 (mips64el, arm64, ppc64el), which encountered the same roadblock. This affected around a thousand source packages in unstable. A Royal Pain. Also, all of their reverse dependencies (packages that depended on those to be built) could not be compiled straight away.

The bugs were not Release Critical, though (none of these architectures were officially accepted at the time), so for people not concerned with the new ports there was no big incentive to get them fixed. This problem, which conceptually is easily solvable, prevented new ports to even attempt compile vast portions of the archive straight away (cleanly, without modifications to the package or to the host system), for weeks or months.

The GNU autotools / autoreconf Solution

We tackled this problem mainly in two ways.

First, more useful for Debian in general, was to do as other porters were doing and submit bug reports and patches to Debian packages requesting them to use autoreconf, and NMUing packages (uploading changes to the archive without the official maintainers' intervention). A few NMUs were made for packages which had bug reports with patches available for a while, that were in the critical path to get many other packages compiled, and that were orphan or had almost no maintainer activity.

The people working in the other new ports, and mainly Ubuntu people which helped with some of those ports and wanted to support them, had submitted a large amount of requests since late 2013, so there was no shortage of NMUs to be made. Some porters, not being Debian Developers, could not easily get the changes applied; so I also helped a bit the porters of other architectures, specially later on before the freeze of Jessie, to get as many packages compiled in those architectures as possible.

The second way was to create dpkg-buildpackage hooks that updated unconditionally config.{guess,sub} before attempting to build the package in the local build system. This local and temporary solution allowed us in the or1k port to get many _or1k.deb packages in the experimental repository, which in turn would allow many more packages to compile. After a few weeks, I set up many sbuilds in a multi-core machine attempting to build uninterruptedly packages that were not previously built and which had their dependencies available. Every now and then (typically several times per day in peak times) I pushed the resulting _or1k.deb files to the repository, so more packages would have the necessary dependencies ready to attempt to build.

Christian was doing something similar, and by April at peak times, among the two of us, we were compiling some days more than a hundred packages -- a huge amount of packages did not need any change other than up-to-date config.{guess,sub} files. At some point, late April, Christian set up wanna-build in a few hosts to do this more elegantly and smartly than my method, and more effectively as well.

Ugly Hacks, Bugs and Shortcomings in the Toolchain and Qemu

Some packages are extremely important because many other packages need them to compile (like cmake, Qt or GTK+), and they are themselves very complex and have dependency loops. They had deeper problems than the autoreconf issue and needed some seriously dirty hacking to get them built.

To try to get as many packages compiled as possible, we sometimes compiled these important packages with some functionality disabled, disabling some binary packages (e.g. Java bindings) or specially disabling documentation (using DEB_BUILD_OPTIONS=nodoc when possible, and more aggressively when needed by removing chunks of debian/rules). I tried to use the more aggressive methods in as few packages as possible, though, about a dozen in total. We also used DEB_BUILD_OPTIONS=nocheck for speeding up compilation and avoiding build failures -- many packages' tests failed due to qemu-system-or1k not supporting multi-threading, which we could do nothing about at the time, but otherwise the packages mostly passed tests fine.

Due to bugs and shortcomings in Qemu and the toolchain --like the compiler lacking atomics, missing functionality in glibc, Qemu entering in endless loops, or programs segfaulting (especially gettext, used by many packages and causing the packages failing to build)--, we had to resort to some very creative ways or time-consuming dull work to edit debian/rules, or to create wrappers of the real programs avoiding or forcing certain options (like gcc -O0, since -O2 made buggy binaries too often).

To avoid having a mix of cleanly compiled and hacked packages in the same repository, Christian set up a two-tiered repository system -- the clean one and the dirty one. In the dirty one we dumped all of the packages that we got built, no matter how. The packages in the clean one could use packages from the dirty one to build, but they themselves were compiled without any hackery. Of course this was not a completely airtight solution, since they could contain code injected at build time from the "dirty repository" (e.g. by static linking), and perhaps other quirks. We hoped to get rid of these problems later by rebuilding all packages against clean builds of all their dependencies.

In addition, Christian also spent significant amounts of time working inside the OpenRISC community, debugging problems, testing and recompiling special versions of the toolchain that we could use to advance in our compilation of the whole archive. There were other people in the OpenRISC community implementing the necessary bits in the toolchain, but I don't know the details.

Good Progress

Olof Kindgren wrote the OpenRISC health report April 2014 (actually posted in May), explaining the status at the time of projects in the broad OpenRISC community, and talking about the software side, Debian port included. Sadly, I think that there have been no more "health reports" since then. There was also a new post in Slashdot entitled OpenRISC Gains Atomic Operations and Multicore Support shortly thereafter.

In the side of the Debian port, from time to time new versions of packages entered unstable and we started to use those newer versions. Some of them had nice fixes, like the autoreconf updates, so they did not require local modifications. In other cases, the new versions failed to build when old ones had worked (e.g. because the newer versions added support and dependencies of new versions of gnutls, systemd or other packages not yet available for or1k), and we had to repeat or create more nasty hacks to get the packages built again.

But in general, progress was very good. There were about 10k arch-dependent packages in Debian at the time, and we got about half of them compiled by the beginning of May, counting clean and dirty. And, if I recall correctly, there were around the same number of arch=all (which can be installed in any architecture, after the package is built in one of them). Counting both, it meant that systems using or1k got about 15k packages available, or 75% of the whole Debian archive (at least "main", we excluded "contrib" and "non-free"). Not bad.

By the middle to end of May, we had about 6k arch-dependent packages compiled, and 4k to go. The count of packages eventually reached ~6.6k at its peak (I think that in June/July). Many had been built with hacks and not rebuilt cleanly yet, but everything was fine until the amount of packages built plateaued.

Plateauing

There were multiple reasons for that. One of them is that after having fixed the autoreconf issue locally in some packages, new versions were uploaded to Debian without fixing that problem (in many cases there was no bug report or patch yet, so it was understandable; in other cases the requests were ignored). The wanna-build for the clean repository set up by Christian rightly considered the package out-of-date and prepared to build the more recent version, that failed. Then, other packages entering the unstable archive and build-depending on newer versions of those could not be built ("BD-Uninstallable"), until we built the newer versions of the dependencies in the dirty repository with local hacks. Consequently, the count of cleanly built packages went back-and-forth, when not backwards.

More challenging was the fact that when creating a new port, language compilers which are written in that same language need to be built for that architecture first. Sometimes it is not the compiler, but compile-time or run-time support for modules of a language are not ported yet. Obviously, as with other dependencies, large amounts of packages written in those languages are bound to remain uncompiled for a long time. As Colin Watson explained in porting Haskell's GHC to arm64 and ppc64el, untangling some of the chicken-and-egg problems of language compilers for new ports is extremely challenging.

Perl and Python are pretty much a pre-requisite of the base Debian system, and Christian got them working early on. But for example in May, 247 packages depended on r-base-dev (GNU R) for building, and 736 on ghc, and we did not have these dependencies compiled. Just counting those two, 1k source packages of the remaining 4k to 5k to be compiled for the new architecture would have to wait for a long time. Then there was Java, Mono, etc...

Even more worrying problems were the pending issues with the toolchain, like atomics in glibc, or make check failing for some packages in the clean repository built with wanna-build. Christian continued to work on the toolchain and liasing with the rest of the OpenRISC community, I continued to request more changes to the Debian archive through a few requests to use autoreconf, and pushing a few more NMUs. Though many requests were attended, I soon got negative replies/reactions and backed off a bit. In the meantime, the porters of other new architectures at the time were mostly submitting requests to support them and not NMUing much either.

Upstreaming

Things continued more or less in the same state until the end of the summer.

The new ports needed as many packages built as possible before the evaluation of which official ports to accept (in early September, I think, the final decision around the time of the freeze). Porters of the other new architectures (and maintainers, and other helpful Debian Developers) were by then more active in pushing for changes, specially remaining autoreconf issues, many of which benefited or1k. As I said before, I also kept pushing NMUs now and then, specially during summer, for packages which were not of immediate benefit for our port but helped the others (e.g. ppc64el needed updates to ltmain.sh for libtool which were not necessary for or1k, in addition to config.{guess,sub}).

In parallel in the or1k camp, there were patches that needed changes to be sent upstream, like for example Python's NumPy, that I submitted in May to the Debian package and upstream, and was uploaded to Debian in September with a new upstream release. Similar paths were followed between May and September for packages such as jemalloc, ocaml, gstreamer0.10, libgc, mesa, X.org's cf module and cmake (patch created by Christian).

In April, Christian had reached the amazing milestone of tracking and getting all of the contributors of the port of GNU binutils to assign copyright to the Free Software Foundation (FSF), all of the work was refreshed and upstreamed. In July or August, he started to gather information about the contributors of the GCC port, which had started more than a decade ago.

After that, nothing much happened (from the outside) until the end of the year, when Christian sent a message about the status of upstreaming GCC to the OpenRISC community. There was only one remaining person to assign the copyright to the FSF, but it was a blocker. In addition, there was the need to find one or more maintainers to liaise with upstream, review the patches, fix the remaining failures in the test suite and keeping the port in good shape. A few months after that and from what I could gather, the status remains the same.

Current Status, and The Future?

In terms of the Debian port, there have not been huge visible changes since the end of the summer, and not only because of the Jessie freeze.

It seems that for this effort to keep going forward and be sustainable, sorting out the issues with GCC and Glibc is essential. That means having the toolchain completely pushed upstream and in good shape, and particularly completing the copyright assignment. Debian will not accept private forks of those essential packages without a very good reason even in unofficially supported ports; and from the point of view of porters, working in the remaining not-yet-built packages with continuing problems in the toolchain is very frustrating and time-consuming.

Other than that, there is already a significant amount of software available that could run in an or1k system, so I think that overall the project has achieved a significant amount of success. Granted, KDE and LibreOffice are not available yet, neither are the tools depending on Haskell or Java. But a lot of software is available (including things high in the stack, like XFCE), and in many aspects it should provide a much more functional system that the one available in Linux (or other free software) systems in the late 1990s. If the blocking issues are sorted out in the near future, the effort needed to get a very functional port, on par with the unofficial Debian ports, should not be that big.

In my opinion, and looking at the big picture, not bad at all for an architecture whose hardware implementation is not easy to come by, and in which the port was created almost solely with simulators. That it has been possible to get this far with such meagre resources, it's an amazing feat of Free Software and Debian in particular.

As for the future, time will tell, as usual. I will try to keep you posted if there is any significant change in the future.