Multi-Version/Multi-Cluster PostgreSQL architecture

Solving a problem

When a new major version of PostgreSQL is released, it is necessary to dump and reload the database. The old software must be used for the dump, and the new software for the reload.

This was a major problem for Red Hat and Debian, because a dump and reload was not required by every upgrade and by the time the need for a dump is realised, the old software might have been deleted. Debian had certain rather unreliable procedures to save the old software and use it to do a dump, but these procedures often went wrong. Red Hat's installation environment is so rigid that it is not practicable for the Red Hat packages to attempt an automatic upgrade. Debian offered a debconf choice for whether to attempt automatic upgrading; if it failed or was not allowed, a manual upgrade had to be done, either from a pre-existing dump or by manual invocation of the postgresql-dump script.

There was once an upstream program called pg_upgrade which could be used for in-place upgrading. This does not currently work and does not seem to be a high priority with upstream developers.

It is possible to run different versions of PostgreSQL simultaneously, and indeed to run the same version on separate database clusters simultaneously. To do so, each postmaster must listen on a different port, so each client must specify the correct port. By having two separate versions of the PostgreSQL packages installed simultaneously, it is simple to do database upgrades by dumping from the old version and uploading to the new. The PostgreSQL client wrapper is designed to permit this.

General Architecture idea

The Debian packaging has been changed to create a new package for each major version. The criterion for creating a new package is that initdb is required when upgrading from the previous version. Thus, there are now source packages postgresql-8.1 and postgresql-8.3 (and similarly for all the binary packages).

The legacy postgresql and the other existing binary package names have become dummy packages depending on one of the versioned equivalents. Their only purpose is now to ensure a smooth upgrade and to register the existing database cluster to the new architecture. These packages will be removed from the archive as soon as the next Debian release after Sarge (Etch) is released.

Each versioned package installs into /usr/lib/postgresql/version. In order to allow users easily to select the right version and cluster when working, the postgresql-common package provides the pg_wrapper program, which reads the per-user and system wide configuration file and forks the correct executable with the correct library versions according to those preferences. /usr/bin provides executables soft-linked to pg_wrapper.

This architecture also allows separate database clusters to be maintained for the use of different groups of users; these clusters need not all be of the same major version. This allows much greater flexibility for those people who need to make application software changes consequent on a PostgreSQL upgrade.

Detailed structure

Configuration hierarchy

/etc/postgresql-common/user_clusters maps users against clusters and default databases
$HOME/.postgresqlrc per-user preferences for default version/cluster and database; overrides /etc/postgresql-common/user_clusters
/etc/postgresql/version/clustername Cluster-specific configuration files:
  • postgresql.conf, pg_hba.conf, pg_ident.conf
  • optionally start.conf: startup mode of the cluster: auto (start/stop in init script), manual (do not start/stop in init script, but manual control with pg_ctlcluster is possible), disabled (pg_ctlcluster is not allowed).
  • optionally pg_ctl.conf: options to be passed to pg_ctl.
  • optionally a symbolic link log which points to the postmaster log file. Defaults to /var/log/postgresql/postgresql-version-cluster.conf. Explicitly setting log_directory and/or log_filename in postgresql.conf overrides this.

Per-version files and programs

/usr/lib/postgresql/version version specific program and data files
/usr/share/postgresql/version
/usr/share/doc/postgresql/postgresql-doc-version

Common programs

/usr/share/postgresql-common/pg_wrapper environment chooser and program selector
/usr/bin/program symbolic links to pg_wrapper, for all client programs
/usr/bin/pg_lsclusters list all available clusters with their status and configuration
/usr/bin/pg_createclusterwrapper for initdb, sets up the necessary configuration structure
/usr/bin/pg_ctlclusterwrapper for pg_ctl, control the cluster postmaster server
/usr/bin/pg_upgradeclusterUpgrade a cluster to a newer major version.
/usr/bin/pg_dropclusterremove a cluster and its configuration

psql

We have abandoned the old non-standard error abort if a connection database is not specified; psql is not expected to be run directly and all connection parameters should be provided by pg_wrapper as specified above. In addition, if no explicit default database is specified in user_clusters, the default database will correspond to the user name, thus reintroducing the default upstream behaviour.

/etc/init.d/postgresql-version

This script now handles the postmaster server processes for each cluster. However, most of the actual work is done by the new pg_ctlcluster program.

pg_upgradecluster

This new program replaces postgresql-dump (a Debian specific program).

It is used to migrate a cluster from one major version to another.

Usage: pg_upgradecluster [-v newversion] version name [data_dir]

-v version specifies the version to upgrade to; defaults to the newest available version.

The Debian PostgreSQL developers