best practices draft for web applications

version 20050407

copyright © 2004 sean finney <seanius@debian.org>

this document is licensed under the Academic Free License, Version 2.1

note

progress on this draft and related work is currently sidelined while i'm focusing on the best practices for database applications

abstract

as it stands now, there is no common set of rules and guidelines for the mainers of packages that interact with databases, web applications, and web server modules. this results in a large amount of code duplication, inconsistent package behavior, possible unexpected data loss, and security concerns. it is clearly evidenced that a common infrastructure and set of guidelines are needed to raise the overall quality of such packages.

terms and conventions

httpd
any server providing http
apache
most popular web server for apps and add-ons
web application
suite of static/dynamic pages for common purpose
httpd modules/add
ons/extensions - httpd server enhancements (libapache_mod_*)
static page
pages requiring no server interpretation/execution
dynamically interpreted pages
pages requiring server interpretation to generate web content. most PHP applications are based from this type of content.
dynamically executed pages
similar to interpreted pages, but instead of interpretation the content is obtained by execution of a script or binary. most packages using a "cgi-bin" would fall under this category.
database
any server-side persistant data storage (relational or non)
database application
any application interacting with a database

goals of this document

web applications

file system layout

different file types

FHS - The File Hierarchy Standard

web applications should follow the same guidelines as any other software. most specifically, they should not make any assumption about how the administrator has arranged the file hierarchy outside of the FHS by placing files in non-standard places such as /var/www or /usr/local.

specifically, the table should serve as guidelines for the location of files:

type of file location
static web pages /usr/share/PACKAGE*
dynamically interpreted web pages /usr/share/PACKAGE*
dynamically interpreted web pages /usr/share/PACKAGE*
dynamcially executed web pages /usr/lib/cgi-bin/PACKAGE**
locally modified/overridden content /usr/local/share/PACKAGE*
site configuration /etc/PACKAGE/
*: it is strongly advised to use a subdirectory of this directory, such as "htdocs", "www", or "site". this will allow other non-web-accessible data to also be stored in the parent directory. this may be re-worded to be a requirement.
**: what to do with non-architecture-specific executable files is up for debate. another possible location would be /usr/share/PACKAGE/cgi-bin

smart handling of configuration files

in-line with standard FHS policy, any files that require being edited by the local administrator (for information such as "themes" or credentials to a database) must be located under /etc, in a directory specific to the package in question.

often the upstream authors of software will ship a configuration file that is located underneath the web root of the application. the admin must not be required to edit these files because package upgrades could lead to loss of this configuration.

the best way to work around such a problem is by using whatever is the appropriate "include" construct for the language in question (in PHP this would be "require_once", in perl "use") to include a smaller, trimmed down configuration file from underneath /etc.

interacting with web servers and other packages

again falling in-step with debian policy, no package should modify the contents of any other package's configuration files without first at least prompting the adminsitrator. any prompting must assume a negative response as the default answer to any such prompt.

if a package needs to restart a web server to enable itself, it should not make any assumptions about whether or not the server is running. most web servers provide a "reload" functionality which does not start the server if it is not running. using such a feature should be the best way. for example:


	invoke-rc.d apache reload || true

is a great technique for interacting with apache.

database applications

if a web application requires the installation and use of a database, please see the policy draft for database applications .

security concerns

packages that automatically enable themselves as publicly accessible should not have a default username or password, or otherwise provide administrative access without requiring credentials.

still to come