Best practices for database applications

version 20050711-1

copyright © 2004 sean finney <seanius@debian.org>

This document is licensed under the Academic Free License, Version 2.1

abstract

This draft describes a set of guidelines and best practices to be implemented by the maintainers of database application packages. Pending a final draft, desire, and acceptance by the developer community at large, this may serve as the foundation for an official policy--or it may simply remain as it is.

Database applications

In this document a "database application" is any program that relies on some form of data storage outside the scope of the program's execution. This is primarily intended to encompass applications which rely on a relational database server or their own persistent storage mechanism, though effectively is a much larger set of applications. In the future this scope may have to be narrowed to avoid ambiguity and be more effective as a policy.

Database types and placement

For the purposes of this document, there are two types of databases: "persistent" and "cached".

Persistant databases contain data that can not be entirely reconstituted in the case that the database is removed. Also included are databases that if removed would cause serious denial of service (making a system unstable/unusable) or security concerns. Applications using this category of databases are the primary focus of this document. Examples:

Cached databases are a specific group of databases which upon their removal could be sufficiently regenerated, and could be removed without causing serious denial of service or security concerns. Examples include:

Both of these database types fall under already defined guidelines in the FHS; persistant data must be placed under /var/lib/packagename, and cached data under /var/cache/packagename, respectively. The remainder of this document primarily addresses the former.

configuration

It must always be assumed that the local admin knows more than any automated system. He/She must be given the ability to opt out of any "assistance" on the part of the package maintainer. Packages providing any such automated assistance may do so by default if and only if the opt-out debconf prompt is priority high. With this in mind, directions for manually installing (and upgrading if relevant) the database must be included in the documentation for the package.

Overview of package installation/upgrade/removal processes

The following descriptions are divided into different parts, based on the action being performed. For each process, the acceptable behavior of database application packages is outlined.

Database installation

For packages providing automated assistance, database installation/configuration should be considered as part of the package installation process. A failure to install a database should be considered a failure to install the package and should result in an error value returned by the relevant maintainer script. Packages may provide a "try again" option to re-attempt configuration. Any such "try again" features here or elsewhere mentioned in this document must have a default negative response value, otherwise infinite loops could occur for noninteractive installs.

To properly handle package reinstallation and reconfiguration, any automated assistance must allow for a package to be reinstalled at the same version without removing or overwriting existing application data. Package reconfiguration may do so.

Database upgrading

Occasionally a new upstream version of an application will require modifications to be made to the application's underlying database. If an automated system is to assist in such an upgrade, it should be considered as a part of the package upgrade process; failure to upgrade the database should be considered a failure to upgrade the package, which is the only way to safely guarantee the chance to reattempt the upgrade.

Furthermore, any automated system that makes modifications to a database during upgrade must provide the ability to back-up the database before proceeding. Packages may perform such backups automatically, or prompt the admin via debconf. Failure to back up the database should also be considered a failure in the upgrade process of the whole package. As in the case of installation, automated assistance may provide a "try-again" feature to re-attempt the upgrade, but ultimately in the case of failure should cause a non-zero exit value to be returned to dpkg.

note: if the database in question supports transactions, placing each upgrade script within a transaction will make upgrades much more robust.

Database removal

A package should consider databases in a spirit similar to configuration files or log files; they are something to which the administrator may have some need even when the software that created it is no longer present. However, when a package is removed, there should be an avenue for the data to disappear. Exactly how both these needs should achieved is still up for some debate, and will probably evolve.

build-time and run-time tools

while not essential, a set of common tools for packaging and configuring these applications can make the life of the maintainer as well as the adminsitrator much easier.

dbconfig-common: a common framework for packaging database applications

a work in-progress project of such nature can be found at the dbconfig-common page.

dh_installdb

is there need for a build-time helper? clumps of common code could be plopped in via debhelper into maintainer scripts.

related threads/discussions