Users of an operating system require, in order to be able to use it, both online and offline documentation.
In UNIX systems, online documentation is typically handled through manual pages (retrieved through the use of man)[1]. In Desktop environments both KDE and GNOME provide online help systems.
Offline documentation includes documents such as the Debian Installation Manual, the Debian Reference Guide, or the Security Debian Manual. These manuals, developed by the Debian Documentation Project (DDP), are typically printed by users and, sometimes, read from other (non-Debian) systems at the project's website. Some of the manuals are included in the official CD set and most of them are also available through Debian packages.
Obviously, users not proficient with English will prefer documentation in a language they're familiar with. This is why translation of documentation is needed.
The DDP project develops documentation for Debian written, primarily, in SGML format. The first documentation produced by the project was written in a variant of SGML called debiandoc-sgml for which tools were developed to convert the SGML documents into different formats (HTML for online viewing, PDF and PostSCript for printing, and simple text). The latest documentation written by members of the project or revisions of available documentation (such as the Debian Installation Manual) has been written using XML and more specifically Docbook-XML which is widely used by many free software projects[2].
The translation of documentation written in SGML or XML format, however, faces some initial issues:
SGML documents are usually written in a single (big) file. Since translation teams are typically created in order to handle (small) documents this means that the translation coordinator has to break up the original file in chunks;
Translators cannot directly use the tools they are used to such as tools intended to work on PO files;
Automatic publication of documentation in HTML format of documentation needs to be adapted so that they are capable of publishing both the original document and the translations without one conflicting with each other;
Automatic publication of documentation in PS/PDF needs to be able to handle non-European character sets (i.e. specific fonts) properly;
Tracking of changes and updates in SGML/XML documents needs to be done through source diffs (which requires access to a revision control system). If a single file is being used, tracking differences will be more difficult;
Diffing tools handle poorly the fact that text can be wrapped to different column lengths and paragraphs might be "moved" around without actually changing context.
In order to overcome these issues, the document writers, in cooperation with translators, have introduced significant changes to how documentation is written:
SGML documents are broken up into smaller files (typically one file per chapter);
New tools have been introduced to be able to convert SGML/XML documents into PO files: po-debiandoc (now deprecated), po4a (see Section 4.3.2) and poxml;
New tools have been introduced to be able to detect when files within a document have changed: doc-check (see Section 4.5) and track the specific changes through the revision control system;
The publishing toolset (based around Makefiles) has been adapted in order to build and publish both the original file and the available translations.
Currently, the Debian Documentation Project uses the CVS server at cvs.debian.org[3]. Documentation is compiled through a set of Makefiles and published on the official project web site, which updates its copy of the CVS repository and compiles all the documentation daily. The CVS server at cvs.debian.org holds most (but not all) of the documentation available. The most notable exception is the Debian Installer Manual which is available in the SVN repository of the Debian Installer project on Alioth (web access to the SVN repository). The development version of the Debian Installation Guide is available at the Debian Installer project's web pages also on Alioth.
Translators either get access to the CVS or use the original documentation maintainer as a proxy to publish the information in the CVS. When a translation is added to an available document the maintainer typically needs to update the LANGS (or LANGUAGES) variable in the document's Makefile in order to tell the publication system to also build copies for that language. If the added translation builds, it should be available in the Debian website after the next daily build.
In order to track the changes of documentation and when a translation needs to be updated, document maintainers and translators use the doc-check tool. See Section 4.5.
There are mainly two types of manpages provided within the Debian operating system: those packaged with upstream software (such as binutils or gcc) and those written for Debian tools provided in Debian-only packages. The translation of manpages of upstream software are typically provided either within the package itself or within manpages-XX package (where XX is a given language codename) the case of the translation of the Linux manpages.
At the time of this writing the following translation manpages packages are available:
manpages-de and manpages-de-dev - German manpages
manpages-es and manpages-es-extra - Spanish man pages
manpages-fi - Finnish man pages
manpages-fr - French version of the manual pages
manpages-hu - Hungarian manpages
manpages-it - Italian man pages
manpages-ja and manpages-ja-dev - Japanese version of the manual pages
manpages-ko - Korean version of the manual pages
manpages-nl - Dutch manpages
manpages-pl - Polish man pages
manpages-pt and manpages-pt-dev - Portuguese Versions of the Manual Pages
manpages-ru - Russian translations of Linux manpages
manpages-tr - Turkish version of the manual pages
manpages-zh - Chinese manual pages
Those manpage packages contain, for the most part, the same manpages available in the manpages or manpages-dev packages although some packages contain extra manpages.
Users will see the translated manpages through the internationalisation mechanisms of the man command which will review, when asked to present a manpage, if a translation is available under /usr/share/man/XX (with XX being the language code of the user's environment[4]). Consequently, the Debian installation system will install both manpages and manpages-XX through a default installation if the user selects an specific language task for which manpages are available.
One of the main issues with manpages, however, is that there is no provisions in the man to detect when a translation is out of date. As a consequence, users reading translated manual pages might be reading out of date content that does not really apply to the latest version of the program's man page.
The translation project is also lacking a central web page where teams can see (at a glance) which manpages are available for translation, which translations are out of date, translated or untranslated and who is the last translator of the manpage.
The translation of manpages specific to programs developed within the Debian project (such as the dpkg or apt tools) is a work that falls within the scope of the Debian translation teams.
The translation of manpages for Debian programs are included within the Debian package itself, which means that translators have to request the Debian maintainer to include the translation in it, once finished. Since Debian programs are typically managed through common revision control repositories available to Debian developers or contributors (either at cvs.debian.org or at alioth.debian.org) active translators of a Debian program will typically have access to those resources and will be able to commit directly into the source control zone were manpages are included. It is worthwhile noting, that it is also common for people active in the program translation to work on the translation of the manpage so that the translation of the program messages and options is consistent with the manual page itself.
In order to coordinate the translation of manpages and make it possible to track when the translation changes, the translation teams introduced the manpages in the CVS DDP area. This module includes several scripts in order to track manpages translations: check_trans.pl and compare_files.pl[5].
This module was introduce since translators did not have access to the CVS repository of the programs for which the translations were going to be made available. Consequently, the original manpages themselves could not be modified to include a translation control header to keep track whenever one was modified. In order to keep track of translation status the CVS module holds INFO with meta-data of translated documents including:
document's name in the CVS repository, may be different from the one in the source package. This is used as the document ID.
Document's encoding.
Location of this document in the source package (this value is only set when the source package does contain this document).
Original ID in the english/ directory.
CVS revision number of the original document on which a translation is based.
Translator's name, is be used to send automatic notifications when a translated man page is outdated.
Original manpages are included in the english/ directory of that CVS module by the translation teams and need to be updated manually when the original file is updated. Based on the meta-data information and the CVS revisions available for manpages the scripts can track when a manpage is outdated and notify the translator in charge of it.
Unfortunately, this mechanism, initially developed by the French translation team and used by other teams, is not being maintained. There have been no updates in the English manpages for two years and translations have not been updated there either.
FIXME: Describe use of po4a. The CVS-DDP module is not that much used anymore since most translators are now in the packages themselves...
Initially, some of the translation projects (French and Spanish) introduced their own documentation translation management system[6] in order to coordinate the translation of the Debian Documentation Project published manuals. This management system was based on a flat database that included the available documents in the DDTP system and the status of translations. With the use of Perl scripts, this database was converted into HTML files that were published on the website so that the translation team could see which documents were being worked on and who was coordinating the translation.
This system was not integrated with the document database provided by the DDP itself and the translation teams have, for a few years, made use of the translation robots (see Section 3.4) in order too coordinate translation of documents themselves.
[1] | The GNU project prefers the use of info documentation but most upstream developers just provide manpages. |
[2] | Including distributions such as Red Hat GNU/Linux, or the Linux Documentation Project |
[3] | The web interface can be accessed at http://cvs.debian.org/ddp/manuals.sgml/?root=debian-doc. |
[4] | As defined through the LANG variable |
[5] | There is an additional script, gen_db.pl, used to generate the wml files used for the translation coordination database in the web page area |
[6] | The Spanish management system (the status database has not been updated since January 2003) is available at http://www.debian.org/international/spanish/ltcp/. |