Chapter 2. I18n/L10n projects in Debian

Debian has different ongoing processes, to facilitate the internationalisation of the software it develops, as well as to assist translators in working on the localisation of content produced for Debian. This chapter explains the different i18n and l10n ongoing projects in Debian, and describes how they work.

2.1. Translation of the Debian website

A project website is so often used as the primary information source for both the users of a program, and for people who want to learn about it but don't use it yet. Information on a project's website is usually complementary to the documentation available through other means. This website information is also typically more up-to-date than the documentation provided when the software was installed, or the distribution media in which it provided was purchased.

This gives the website some priority in the internationalisation and localisation process of a universal project. It's out there, it's current, and it's being viewed: so it is translate, to make it accessible to many more people. A website is a moving target for translation, requiring frequent updates, but it's a key access point for users (both existing and future).

In order to present translated information to the user, the Debian project website uses Content Negotiation technology (more specifically, Apache's content negotiation) that is based on the user's web browser providing information on chosen languages (Preferred-Language HTTP header). The user may set this language preference individually in the browser, or simply ask the browser to follow the system preferences. Content negotiation, in this case, means that the website software can present a copy of the content translated to the user's most effective language (if a translated copy is available). More information on content negotiation is available on the Debian website's description of content negotiation. Translated websites will typically include a "language bar" which shows in which languages that site is available.

In this way, server administrators will manage both the original document (typically in English) and multiple translations of that document. Content negotiation takes care of the task of looking for the document for a given URL in the nominated language, either showing it if it is available, or defaulting back to the original page.

Translations for the Debian website are being created for 33 world languages, although only 10 language temas have more than 500 pages translated out of over 3000 available pages, and only 5 teams have translated more than 50% of the site. It is a very large and demanding task, but one we assign priority, so there is a great deal of translation activity around the Debian website. New languages will continue to be added, and further pages translated.

The web server is based on content managed by the wml program ,which allows separation of templates and content, and also provides a mechanism to generate content within the pages on compile. Wml is to HTML what a C source is to its object code.

Most of the information on the web server is available in a directory tree, handled through CVS (at cvs.debian.org). In that tree, there is a directory for every language in which the website is translated. In principle, all the languages follow the directory hierarchy defined in the original (English) pages although, depending on the actual content translated, the contents of each subdirectory within a language might vary. There is also a specific location for translation teams' content which is not translations, but rather specific content written in that language, usually specific to that language/culture. The mechanism for generating the pages is based on Makefile in such a way that all the wml files in a given directory are compiled and published together. The use of included files means translators do not need to concern themselves with the content of the Makefile files.

Thanks to the independence of real content and aesthetic information, translators can simply take the original (English) wml files, move them to their own directories and translate them. The templates that generate the website itself need not be translated, unless specifically required. The templates use wml's internationalisation tools and gettext to extract and present the translatable information (including headers, footers, menus, buttons, both interactive and passive text) in all the pages for translation.

One of the main issues in ongoing translation is the need to be able to monitor changes in the original content. In particular, translators need to know which changes require updating translations. The web server has the same problem. Using content negotiation to detect a user's language settings, is only looking to see if a translation exists and can be displayed, and is not aware of the status of any translated content. So it's possible that users might be presented with out-of-date information. This issue would not exist if the translation were removed when it was no longer current, since the web server would then simply provide the default language (English) to the user. However, many changes to the website are not fundamental changes, or are typo fixes to the original document, so it would seem senseless to provide the original document in these situations, since the translation (even if out of date) would still be useful to website readers who can't understand the original content.

To deal with this currency issue proactively, the website development team implemented a mechanism to detect out-of-date translations, based on translation headers. These headers are included in translated documents, describing which original file they translated, and specifying the CVS revision used. The use of this header makes it possible to introduce several additional tools to facilitate effective translations:

More information is available on the website development reference.

There are further tools, also based on this mechanism, that provide statistics related to the translation effort, including statistics on out-of-date translations[1].

Since translations have to follow the creation of the original content, they will typically lag slightly behind the translated content in an active website. This mechanism provides a way for translators to work on translations at their own pace, while ensuring users reading the content know how current it is, and have the option to switch to the original content.

Notes

[1]

Translations that are out-of-date for more than six months are automatically removed, regardless of the number of revisions they are behind. For more information read http://lists.debian.org/debian-www/2004/01/msg00323.html