Internationalisation and localisation in Debian | ||
---|---|---|
Prev | Chapter 3. i18n/l10n infrastructure in Debian | Next |
One of the things that typically takes more time when coordinating translation projects is keeping track of what is each member of the team working on, and what is the precise status of each of the translations.
The most well known translation robot is the one used by the Translation Project. This is an e-mail service that takes care of PO file submissions for translations registered within the project. It checks if files sent to it are receivable, that is, if a translator has filled the translation disclaimer [1]. This robot also calls msgfmt to see if the PO file is healthy. The robot also sends notices of updated PO files to the translation teams whenever translations need to be updated.
This translation robot however, only tracks PO files, and only PO files of those GNU projects registered with it. Bearing in mind that the requirements of the translations teams in Debian were different, the translation teams started writing translation robots to handle their own translation process. This work was started by members of the French team and then reused by other translation teams including Spanish, Dutch, Brazilian Portuguese, and German.
The translation coordination robot is an e-mail robot that "listens" to the mails sent to the translation teams mailing lists (debian-l10n-XXXX) and looks for a set of pseudo-urls in the messages' subjects. These pseudo-urls are composed of a translation status and a translation item (see Section 3.4.2). The robot takes this information to compose a list of items being translated, the status of the translation and the translator in charge of it.
This is a useful tool for both translation coordinators and members of the translation teams. Any member can, at a glance, see the status of a translation and help if help is needed (for translations or reviews). It also helps people detect translations that are stalled (a translator stated that they were going to work on it, but didn't finish actually finish it).
Currently, there are several translation robots in place. First, there is a generic robot that handles different mailing lists by crawling the web site information with a set of scripts. This robot is used by most translation teams, although some teams use their own robot, currently active are the: Spanish team which uses its own robot, available at Spanish translation coordination robot, Dutch translation coordination robot, Catalonian translation coordination robot. These last robots, instead of crawling the web site, use real e-mail addresses which are subscribed to the team's mailing lists and handle messages in real time through procmail filters that filter this information to the translation robot's status database.
Translations evolve through the following stages:
First, a translation needs to be done when there is a new (untranslated) item or document. Typically, the translation team coordinator asks somebody in the team to work on it;
Then, a member of the translation team answers by saying that (s)he will work on that item;
Once the translator finishes the translation (s)he sends the item for review to the translation mailing list;
After several reviews by peers, the translation is finally changed and a new version is sent for a final review;
The translator then picks the final version and submits it to the upstream maintainer. This is sometimes a package maintainer, but it can also be somebody with access to the CVS where the translations are maintained. Typically, for some translations, this step involves translators using the Debian Bug Tracking System to contact the maintainers (see Section 4.6);
After some time, upstream maintainers incorporate the provided translation in the package (or CVS) and the translation is considered "published". This is the final step for a translation, until the original translated item changes.
When the original item (document, wml or PO file) is modified, the cycle starts again. However, in that case, the last translator who worked on it is considered upstream's point of contact ((s)he should be contacted whenever there is a need to update the translation). The last translator is also typically considered as the maintainer for upstream's translations so there is no need to tell the translation team that (s)he will be updating the translation ((s)he is supposed to do it, as (s)he is in charge). Moreover, on many occasions, when the changes to the original item are few, there is not really a need to do a full review of the new translation by the translation team.
A pseudo-url is made of the following:
These pseudo-urls are used in the mail Subject: to help the robot distinguish which mails need to be handled by it and which mails are part of the mailing list discussion.
The contents of the pseudo-url are:
The state the translation is in, for more information see Section 3.4.2.1.
The type of item being translated. The translation coordination robot accepts the following item as valid types in the pseudo-url to indicate a translatable item: po-debconf, debian-installer, po, man or wml (webwml is deprecated, wml should by used instead).
the name of the package where the document came from. www.debian.org is used for the wml files of the Debian web site cvs.
the filename of the document, it can contain other information such as the path to the file or the section for a manpage, so no other document in the same package should be referred the same.
The structure of name depends on the chosen type. In principle it's just an identifier, but it's strongly recommended to follow the following rules:
po-debconf://package-name/language.po
po://package-name/path-in-sourcepackage/filename.po
debian-installer://package-name/path-in-sourcepackage
wml://www.debian.org/address_of_page
man://package-name/section/subject
The translation coordination robot can track what stage is a translation for any item through the use of the following keys in the "state" part of the pseudo-url:
("Travail À Faire", French for "translation to do", "taf" also means "work" in French slang) is sent to indicate that there is a document that needs to be worked on;
(Intent To Translate) indicates that there is a translator that is planning on working on a given translation. This helps preventing translators to duplicate their work;
(Request For Review) states than an initial translation is finished. The translator will attach the translation itself to the sent e-mail, for peer review. This key might be used more than once for the same item[2] if substantial changes have been made to the initial translation based on other's comments. Just like with CVS commit e-mails, translators expect a reply to these requests even if it's just to say "The translation is OK.";
(Intent To Review) a peer of the translation team notes that (s)he is working on a review of the translation and might take some (typically because the translation is large, or because the reviewer will not have time available until a given point in time) This is used to prevent the original translator to consider the translation as finished (send an LCFC, see below);
(Last Chance For Comments) tells the team that the translator considers the translation to be finished and has included the comments from the review process. (S)he is giving a last chance for peers to review before the translation is submitted upstream. Typically, it is sent when there are no no ITR's, discussion following the RFR has ended and it has been three days since the RFR was sent. Most translation teams don't allow translators to do this unless at least one member of the team has reviewed the translator's work;
(Bug Tracking System) tells the team that a bug has been open to submit the translation to the maintainer. This is useful, since the translation robot can then automatically start checking if the bug report is still open and updates the translation status accordingly;
(bug FIXed) notes that an open bug has been fixed already (useful if the translation robot missed it being closed);
states that the translation has been finished and is now included upstream. This should be used in cases where no bug report is involved such as web site translations. Otherwise, the robot will handle DONE automatically by crawling the Bug Tracking System;
put a translation on hold, when the original version has changed but there is no need to update the translation, e.g. the translator knows other modifications will be done soon on the translation and they don't want someone else to update it too quickly.
This is a typical example of the way the translation robots are currently used.
A translator (T) wants to work on the PO-debconf translation of the exim4 package, instead of just saying so in natural language in the mailing list. (S)he sends this:
The translation robot, when seeing that format in the mail's subject, processes the mail, retrieves the data (it is an ITT, of the po-debconf type, for the exim4 package, send by translator T on this date and time), stores it in a database. This information is presented whenever the translation status page is viewed (since it's driven by the database). The body of the mail is not processed, only the subject.
Translator T completes the translation a few days later and wants people to review his/her work, so (s)he sends the following mail:
The file to review should be attached to the mail[3]. The robot processes this mail, notes the status change in the database and extracts the file from the mail. The web pages can be used to retrieve the file itself.
After some reviews, translator T sends a mail with an LCFC, attaching the file (which is, again, parsed by the robot) and after a few days sends the document to the BTS and sends this mail to the list:
Once this is done, the translator's work is considered finished and the translation robot will, through a periodic job, review if the bug is closed in the bug tracking system. Whenever it is closed, it will be marked as DONE (and will be hidden from the view of the page after a month)
Throughout this process both members of the team subscribed to the list, new members which were not subscribed when the process started and the translation team coordinator have full access to the translation status (and its history) through the web application of the translation team robot.
In the future, the different translation robots of the translation teams should be merged into one common database as part of Debian's infrastructure for translators. This would help prevent having robots coded in different ways and help that new features (such as handling compressed translations or testing PO files with msgfmt) need to be coded in each of the independent robots.
[1] | Translators of the Translation Project have to send, through postal e-mail, a form that disclaims in writing by the translators, before being accepted for inclusion in the distribution. For more information, read http://translation.sourceforge.net/HTML/disclaim.html |
[2] | Some translation robots use RFR2 for subsequent reviews |
[3] | Some translation robots don't handle compressed files but most will handle MIME attachments |