Michael Stapelberg’s Debian Blog

Announcing the new Debian Code Search Instant (2014-12-03)

For the last few months, I have been working on a new version of Debian Code Search, and today it’s going live! I call it Debian Code Search Instant, for multiple reasons, see below.

A lot faster

The new Debian Code Search is still hosted by Rackspace (thank you!), but our new architecture finally allows us to use their “Performance Cloud Servers” hardware generation.

The old code search architecture was split across 5 different servers, however the search itself was only performed on a single machine with a network attached block volume backed by SSD. The new Code Search spreads out both the trigram index and the source code onto 6 different servers, and thanks to the new hardware generation, we have a locally connected SSD drive in each of them. So, we now get more than 6 times the IOPS we’ve had before, at much lower latency :).

Grouping by Debian source package

The first feature request we ever got for Code Search was to group results by Debian source package. I’ve tried tackling that before, but it’s a pretty hard problem. With the new architecture and capacity, this finally becomes feasible.

After your query completes, there is a checkbox called “Group search results by Debian source package”. Enable it, and you’ll get neatly grouped results. For each package, there is a link at the bottom to refine the search to only consider that package.

In case you are more interested in the full list of packages, for example because you are doing large-scale changes across Debian, that’s also possible: Click on the ▾ symbol right next to the “Filter by package” list, and a curl command will be revealed with which you can download the full list of packages.

Expensive queries (with lots of results) now possible

Previously, we had a 60 second timeout during which queries must complete. This timeout has been completely lifted, and long-running queries with tons and tons of results are now possible. We kindly ask you to not abuse this feature — it’s not very exciting to play with complexity explosion in regular expression engines by preventing others from doing their work. If you’re interested in that sort of analysis, go grab the source code and play with it on your own machine :).

Instant results

While your query is running, you will almost immediately see the top 10 results, even though the search may still take a while. This is useful to figure out if your regular expression matches what you thought it would, before having to wait until your 5 minute query is finished.

Also, the new progress bar tells you how far the system is with your search.

Instant indexing

Previously, Code Search was deployed to a new set of machines from scratch every week in order to update the index. This was necessary because the performance would severely degrade while building the index, so we were temporarily running with twice the amount of machines until the new version was up.

In the new architecture, we store an index for each source package and then merge these into one big index shard. This currently takes about 4 minutes with the code I wrote, but I’m sure this can be made even faster if necessary. So, whenever new packages are uploaded to the Debian archive, we can just index the new version and trigger a merge. We get notifications about new package uploads from FedMsg. Packages that are not seen on FedMsg for some reason are backfilled every hour.

The time between uploading a package and being able to find it in Debian Code Search therefore now ranges from a couple of minutes to about an hour, instead of about a week!

New, beautiful User Interface

Since we needed to rewrite the User Interface anyway thanks to the new architecture, we also spent some effort on making it modern, polished and beautiful.

Smooth CSS animations make you aware of what’s going on, and the search results look cleaner than ever.

Conclusion

At least to me, the new Debian Code Search seems like a significant improvement over the old one. I hope you enjoy the new features into which I put a lot of work. In case you have any feedback, I’m happy to hear it (see the “contact” link at the bottom of every Code Search site).