Python optimization and profiling
For the past 5 days, I have worked on replacing a part of Britney (our beloved testing migration tool). The part I am trying replacing is her "installability tester", which is probably one of the most used parts of Britney.
I know premature optimization is the root of all evil, but I felt compelled to be a bit "evil" this time. The reason is that a naïve algorithm could easily spend decades on computing the result of our most complete tests (each consists of roughly 2 * 35 000 packages).
Within a couple of days, I felt I had a reasonable candidate to use on the large tests. On the large tests, it ran with a noticeable speed regression and it produced one different result. The result difference was about 20-40 source packages and their binaries.
You may have noticed I used "difference" and not "regression". In the given test, our current implementation at some point gives up with an "AIEEE" error. So I decided to turn my attention to the speed regression first.
As any "good" programmer, I decided to use a profiler (cProfile) to determine my bottle-necks. This turned out to be a good idea, as the bottleneck was not quite where I thought it was. I wish I could say I just patched it out the issue, but... unfortunately not.
I played around with various changes such as using "implied" for loops rather than explicit ones etc. Some times I would manage to cut the runtime in half only to be unable to reproduce it later. Yesterday I finally realized what caused this. I was working in 3 different chroots, two 64-bit sid chroots[1] and a 32-bit stable chroot.
Turns out that my replacement is only slow in the stable chroot. As soon as I moved to a sid chroot, the runtime was more or less cut in half. I have not bothered to check if it is the 32 vs 64 bit, the stable vs sid part or maybe the python profile in stable is just slow[2]. I simply moved to my sid chroots for now and continued working.
With that taken care of, I figured I would remind myself of what I was up against. On the largest test I have available, I profiled my implementation to around 4:30 - 5:00 minutes. Regardless of my changes, I always got times in that range. I did manage to patch a bug in my code that reduced by only diff to 10 source packages (plus their binaries) at the price of 30 seconds.
So I was back to 5 minutes according to the profiler, but I noticed that my test runner disagreed. According to my test runner my implementation had a runtime of approx. 3 minutes and 40 seconds. For comparison my test runner puts the original implementation at 3 minutes and 20 seconds for that test.
In short, I have spent most of my day trying to optimize my code to compensate for the +20% overhead the profiler introduced.
Admittedly, my replacement still needs some speed improvement on some other tests where it still has a factor 2 runtime regression. I also need to fix the last diff, which now I suspect is in my replacement.
[1] In case you were wondering, yes those chroots are on two different machines.
[2] Of course, it could be a combination of that as well... Anyhow, I am too lazy to research it at the moment.