Biological databases

The Debian community has yet not addressed the problem of incorporating biological databases into Debian. It seems not likely that this will happen in the Debian main distribution. The extra burden to maintain copies of e.g. the EMBL DNA sequence database on multiple mirrors of Debian puts too much of a burden to the mirrors with only little gain. And not everybody requires regular updates with the associated network traffic and the induced instability when exchanging the files while scripts might still be running reading the data. The provision of a set of tools that provides updates on demand seems a more likely scenario. This would then also need to manage the update of indices of e.g. sequence similarity tools.

For now, Debian offers libraries like BioPerl with its facility to access online repositories, circumventing problems with the updating of local data. Debian is well suited though to address the issue because of its means of introspection, the programmer can tell which databases are installed and what files are available.

Andreas Tille 2005-05-13