Michael Stapelberg’s Debian Blog

How many packages does Debian Code Search contain? (2013-07-14)

The German computer magazine c't has covered Debsources in its most recent edition (c't 16/2013). In that article, they also state:

Debsources integriert auch eine Code-Suche, allerdings werden lediglich die
Quellen des Unstable-Zweigs durchsucht, der zirka ein Drittel des Quellcodes
von Debsources ausmacht.

This loosely translates to:

Debsources also integrates a code search engine, but it only searches the
sources of the unstable tree, which makes up for roughly one third of the
sources that Debsources covers.

I suspect the author of the article arrived at this conclusion because Debsources talks about 400 GiB of source code, whereas Debian Code Search talks about 130 GiB of source code.

However, it still struck me as odd and giving the wrong impression. I thought that packages don’t differ that much between stable, testing and unstable. So I fired up psql and pounded UDD until it revealed how many packages are different between stable, testing and unstable:

Conclusion

So, in conclusion, Debian Code Search covers 74% of wheezy, testing and unstable. Debsources also covers contrib and non-free, which explains the higher disk usage. In particular, there might be big blobs in non-free that account for a lot of disk space. Also, Debsources keeps sources around for a few weeks whereas Debian Code Search only keeps the most recent snapshot.

Query


SELECT
  COUNT(*)
FROM
  (SELECT
     migrations.source,
     sources.version AS stable_version,
     testing_version,
     unstable_version
   FROM sources
   LEFT JOIN migrations
   USING (source)
   WHERE sources.release = 'wheezy') AS x
WHERE
  regexp_replace(stable_version::text, E'-([0-9.]+)$', '') = regexp_replace(testing_version::text, E'-([0-9.]+)$', '');