|
In this issue…
Future Translation Workbenches
Some Essential Requirements
This paper proposes a number of essential requirements intended to provide direction for translation workbenches of the future. The points made arise from a consideration of the problems and frustrations encountered during several years experience in the use of proprietary and in-house translation tools. The paper will also suggest innovations which may considerably improve the productivity and flexibility of future translation workbenches It is only by keeping translation software attractive to the increasingly demanding business world that computerised translation will move forward. Approaching the issue from the point of view of industry, this paper will attempt to summarise, albeit briefly, a few essential aspects of translation workbench design which I feel provide important directions for the future. The points made are based on several years experience in the use of proprietary and in-house translation tools at Rank Xerox. The specific recommendations in this paper arose during a detailed, in-house study of requirements for future translation workbenches conducted in 1992. I have chosen only a small number of key requirements for this paper, and have grouped them under four topic areas: openness, flexibility, productivity and information management. Why use translation tools?It is worth briefly reminding ourselves, first, why we use computers for translation in industry, and what we use them for. The hope is that computerisation will eliminate slow or complex manual tasks in order to reduce translation costs and time to market. At the same time, translation software should effectively maintain or improve quality of translation so as to increase customer satisfaction. An important point, to which we will return later, is that, in order to achieve faster time to market, companies are increasingly looking at ways of performing most of the translation on early versions of the text, and then translating small incremental updates nearer the product launch date. Translation tools have an important role to play here in reducing the bulk of update translations. DefinitionsThe term 'source matching' is used in this paper to cover what is often referred to elsewhere as 'translation memory' or 'repetitions processing'. A foreign translation retrieved by matching a string of source text during source matching is referred to as a 'recall'. The term 'change analysis' refers to a suggested application of 'text alignment' - this will be explained in detail below. Overview of translation processLet me set the scene for what I will say later by drawing a simple model of the translation process. Figure 1 shows, very schematically, the basic elements of most translation processes. At this level, these stages are largely the same for translation of both documents and software-based text. Note that translation software is required to assist in all the activities outlined in Figure 1. The translation workbench is far more than just an editing environment. It is needed for administrative activities long before and after the translator actually sits down to edit text. Project setup involves the creation of directory structures and the completion of other necessary administrative activities prior to the importing of data. Dictionary building normally takes place in advance of translation, and involves running software which can help identify unknown words or phrases in a corpus, obtaining validated translations for those terms, and preparing dictionaries for access during translation. During the import stage, the data is transferred from the outside world into the translation environment. At this point the data may be converted and tested for conformance to agreed formats. Segmentation of data into manageable chunks may take place at this stage or during data preparation. During data preparation the data is prepared for release to the translator. Processing at this point may include such things as source matching [1], machine translation and change analysis [2], as well as the preparation of reference information such as dictionary items and notes from designers. Once the material has been prepared, it is handed over to the translator, who will edit or supply the translation. During translation, the translator will usually need to ask questions about the meaning of the source text, ask about appropriate translations for specific items in the source, request more expansion space, etc. These queries need to be dealt with as quickly and efficiently as possible. This is the function of the query management subsystem. The export stage removes the data from the translation environment and puts it into the form required for return to the customer. Material must be proofread and then validated on behalf of the customer before being prepared for publication or loaded into the software. During the housekeeping stage, backups, archives and other administrative activities take place. The main dictionary database also needs to be updated in the light of changes or additions made during translation or validation. Let us now consider some essential requirements for developing future translation software. OpennessAim for a generic translation environmentAny large translation department has to deal with translation text from a wide and continually growing set of product environments. Each of these product environments organises and stores its data in different ways. If tools are written specifically to translate products from a particular product environment, it is usually difficult, sometimes impossible, to successfully adapt them for use later with completely new product environments with which the translation department may have to deal. Developing one-off solutions for the translation of differing environments, having to significantly adapt tools each time, or using a number of different translation tools can soak up a great deal of time, money and resources:
Here are some suggested solutions to these problems:
Design truly multinational softwareCurrent ALPS and Systran systems as used by Xerox are unable to cope with requirements to translate into languages beyond the basic Western European set. At CBS, we already translate into East European languages, including non-Latin character sets such as Russian and Greek. Future requirements include Middle Eastern and Far Eastern languages, where character shape and text direction are far more complicated than those found in Western European languages. Indeed, there seems to be a significant, general growth in the importance of non-European markets for the software industry. While designing translation environments, care must be taken to avoid locking the user into a restricted set of languages. This means allowing for factors such as the following:
Other factors also need to be borne in mind. For example, algorithms which parse segments for dictionary lookup must not be hindered by the fact that languages such as Japanese and Thai do not separate words with spaces, as we do in English. FlexibilityCreate user-defined systemsThe translation tools will need to be flexible enough to allow a slightly different approach where required, be it for technical or organisational reasons or simply user preference. It should be possible to configure the translation workbench to meet the needs of the organisation's ideal work process. The workbench should not dictate or restrict the choice of work process. As much information as possible about how the system works should be user-defined, rather than hard-coded. It should be possible to modify the user's or organisation's preferences by the use of definition files or property sheets which are easy to understand and modify. For example, users should be able to tell the system how to segment text, sift non-translatables, assign status's, present items in the reference area, skip through the source text, etc., etc. The translation tools should include a workflow management system which is easily adapted for different types of organisation. Allow flexibility between batch and interactive translationMachine translation is best suited for simple, sublanguage texts where information is explicit. For such texts, machine translation can give important productivity gains. As your source text moves from simple text, such as parts lists or service manuals, to such things as training manuals or marketing brochures, where stylistic variation, reliance on context and language complexity become relatively more important, the productivity of machine translation is rapidly eroded by the increasing need to post- edit. A large number of companies deal with both simple and complex texts. For this reason an ideal translation workbench should allow an organisation to translate using machine translation and/or interactive translation, wherever each is most appropriate. This may not mean designing your own machine translation system as part of your workbench. You may be able to simply provide filters and links to access an existing system. Enable decentralised translationEspecially where new markets are opening quickly, translation departments are likely to need to do the translation itself in remote locations, while administering and processing the data from a central location. The translation system of the future must enable this interaction between the remote translator and the central hub. The platform on which the editing software is based should be easily obtainable, cheap and portable. It must also be robust, and allow for simple installation and remote troubleshooting, while providing as much useful functionality as possible to the translator. Allow for future enhancementsSoftware must be written in such a way that future enhancements can be easily added without major rework. The aim is to avoid having to develop or find a new translation environment again - just simply refine what we have. ProductivityEnable fast and simple administrationWith a number of existing systems there can be high costs in terms of time and resource for administrative activities such as importing and exporting of files, data preparation, dictionary management and dictionary building. In fact, this is a key problem for the use of computerised translation tools in general. Only companies which can afford to carry these overheads can invest in the translation tools which will bring them greater productivity. Additionally, there is a cutoff point in the size of translation jobs, below which it is deemed more cost-effective to translate manually because of the administration costs. (Manual translation of a small update to a product can then become problematic because all the information needed for source matching and change analysis at the time of the next revision is lost. Manual translation also makes it harder to maintain translation quality in terms of consistency and validated terminology.) Translators' workbenches need to reduce administrative activities to the minimum in all aspects of the translation work. It must be possible to import, prepare, export and manage jobs quickly enough that the size of the job makes no difference. One way to achieve this is to reduce as far as possible the number of points at which the user has to intervene in order to run the software. However, I also wish to suggest a potential solution here based on a concept we shall call the 'job profile'. The job profile is a collection of definition files and parameters which define how the translation processes work, and the work flow for a given job or collection of jobs. There should be a single, simple interface for the user which groups together all the information for viewing or modification. Amongst other things, the job profile should define:
Restricted access rights should be associated with certain types of information, and certain defaults should also be available. If product specific changes have to be made, only the appropriate parts of the profile should be modified. It should be easy to access and change such information. One of the key benefits of the job profile is that it can be easily copied from project to project with minimal changes (often no change). When dealing with similar types of text, the use of user profiles can thus reduce the administrative setup time considerably - once the data has been imported, the whole process of data preparation for any number of target languages could be initiated by a single click on a button, since all the information needed by the system is already contained in the user profile. Enhance user friendliness and efficiencyFor a variety of reasons, it is not hard to find translation systems where user friendliness could be improved. For example, systems may have been put together quickly to meet schedule requirements, they may have been based on tools originally intended for other purposes or they may simply have been developed without due attention to users' needs. All systems I have so far used leave a good deal of scope for human factors improvements. There are those who would say that real productivity gains are made by looking at ways of preprocessing the data, rather than paying attention to the user interface. I agree that one must seriously look at the preprocessing in order to gain productivity (in fact, I will be making some recommendations along these very lines a little later). Nevertheless, I feel that productivity can be significantly improved by attention to the user interface in the following ways:
One potentially very useful tool for translators would be a 'morph modifier'. This would allow translators to quickly and easily post-edit word endings, agreements, etc. Automatic adjustment of endings, etc., when pulling items from the dictionary into the target text would also improve productivity. Minimise the work of the translatorIf we are to achieve simultaneous multinational launch of products into the worldwide marketplace, we need to ensure that translation is as efficient and productive as possible. One of the key areas in which translation productivity can be tackled occurs before the translator actually sits down to edit text. The objective of the data preparation stage should be twofold:
It seems to me that there is the potential in the future to greatly improve the productivity of update translation by an integrated approach which includes an application of the concept of text alignment. I will refer to this approach as 'change analysis'. Figure 2 illustrates the composition of an example document or software text extraction which has been updated. (The percentage figures for each component part are chosen so as to clearly illustrate the points which will be made.) The figure shows a document or software update of which 70% of the source has not been changed since the previous version. A further 5% of the text is comprised of segments which will not need to change in translation. Of the remaining 25%, 15% of segments have source text which will exactly or approximately match that of previously translated text held in databases - it is therefore labelled 'familiar text'. The remaining 10% is labelled 'unfamiliar text'. If the text in our example is simply sent to a machine translation system, there is no way of carrying forward into the new document the lessons learned from post-editing the previous mistakes. All the mistakes made last time will be repeated and will have to be post-edited again. In addition, machine translation systems are still only of limited use for complex text. The predominant way of tackling this problem at present is through the use of source matching. Source matches are obtained by matching the current source text in one way or another against that of previously translated databases. Where an exact or approximate match is found, the corresponding target segment is presented to the translator in a reference area or in the target text itself. Many existing systems already rely on source matching (i.e., repetitions or translation memory) to supply potential translations for all of the unchanged and familiar text indicated on the diagram. Source matching, however, provides only guesses at possible translations, usually taken in isolation from the context of the segment, therefore they must all be checked carefully. Where the recalled text is not exactly what is required, post-editing must take place or a new target segment must be typed in. This is far from ideal when you consider that 70% of the source text had not changed anyway. I propose that future systems can gain significant productivity benefits for update translations by introducing the concept of change analysis. If a program is run on the data to ascertain which parts of the text are unchanged from the previous version, and label those appropriately, the translation tools should be able to automatically insert the previous translations from the database into the target document. It is important to bear in mind that approaching the task in this way relies on a high degree of certainty that the translation supplied was exactly the same as that associated with this very piece of source text the last time around. If the translation system labels these items of text, the translator should be able to automatically skip unchanged text if desired. The translator should still be able to make edits to unchanged text, if they wish - for example to ensure that unchanged text preceding or following on from changed text flows correctly - but if there are, say, ten pages of uninterrupted text which have not been changed, the translator should be able to skip right past them. (Again, this is usually acceptable since, unlike source matching, the system has used contextual information to find the exact same translations as were used previously for these source segments.) Anything the translator misses should be captured during the validation stage. ImplementationHow would one implement such a system? It may seem that this is not a trivial problem for documentation text, but programs and methods do already exist which may achieve reasonable results. Rank Xerox has already implemented such a screening system at the level of complete software messages, via the use of message identifiers (IDs). We ask the product teams supplying software to associate each message with a unique ID which is unchanging throughout the product life. We use this ID to locate the previous translation of the message or icon in our database, and compare the text and properties against the new version. If there is no significant change, we can automatically insert the previous translation into the new database and the translator does not have to see the message. Additional productivity benefitsA recall is the foreign text retrieved from the previous database during source matching which corresponds to the source segment matched unchanged from the previous version, and label those appropriately, the translation tools should be able to automatically insert the previous translations from the database into the target document. It is important to bear in mind that approaching the task in this way relies on a high degree of certainty that the translation supplied was exactly the same as that associated with this very piece of source text the last time around. If the translation system labels these items of text, the translator should be able to automatically skip unchanged text if desired. The translator should still be able to make edits to unchanged text, if they wish - for example to ensure that unchanged text preceding or following on from changed text flows correctly - but if there are, say, ten pages of uninterrupted text which have not been changed, the translator should be able to skip right past them. (Again, this is usually acceptable since, unlike source matching, the system has used contextual information to find the exact same translations as were used previously for these source segments.) Anything the translator misses should be captured during the validation stage. ImplementationHow would one implement such a system? It may seem that this is not a trivial problem for documentation text, but programs and methods do already exist which may achieve reasonable results. Rank Xerox has already implemented such a screening system at the level of complete software messages, via the use of message identifiers (IDs). We ask the product teams supplying software to associate each message with a unique ID which is unchanging throughout the product life. We use this ID to locate the previous translation of the message or icon in our database, and compare the text and properties against the new version. If there is no significant change, we can automatically insert the previous translation into the new database and the translator does not have to see the message. Additional productivity benefitsChange analysis can have additional productivity benefits. Sometimes, for example, only a small change may have been made to a long message. In some cases it might only be a letter which has been put in upper case, or a article which has been added to the source - things which usually don't affect the translation. In other cases, it may only be a comma which has changed, but this may affect the translation significantly. In these cases, the change analysis program should draw the attention of the translator immediately to the actual changes made to the source, so that the impact to the previous translation can be quickly assessed. Otherwise the translator may spend a lot of unnecessary time scrutinising the text to find the changes. An integrated approachChange analysis does not, by any means, do away with the need for source matching. It should be seen as only one of a series of operations on the text during the data preparation stage. Having identified the unchanged text in our example above and copied the old translations to the target file, we have 30% of segments remaining. Our example shows that 5% of segments contain text which will remain unchanged in translation (for example, numbers). Determining what constitutes a 'non-translatable segment', and how to deal with it, should actually be done on a language by language basis. For example, decimal numbers need translation in some European languages but not in others. All numbers may need to be translated for Saudi or Thai markets. The key point is that, if the tools contain the appropriate information, non- translatables can be transferred automatically to the target document and leave the translator with more time to address the real translation work. Continuing with our example, we now have 25% of segments left. This is all text which has been changed in some way. Of this, three fifths will actually match against the source of other databases during source matching. Alternative recalls3, should be ranked and provided either in a reference area, or a mixture of target file and reference window, according to preference. Whereas, with most current systems, the translator has to check that recalls are valid in their context for 80% of our example, use of change analysis has reduced this activity to 15% of the text. The remaining 10% of segments in the example could be translated in one of two ways. They could be translated interactively with the assistance of dictionary information in the reference window, or they could be sent off to a machine translation system and post-edited - whichever is most appropriate for the material and circumstances in question. Information managementImprove translation contextThe translator needs to understand the context of a sentence or word they are translating for two fundamental reasons:
The translator needs to see text with as much contextual information as possible. Presenting text in paragraphs, rather than segment by segment, is a good first step towards improving contextual information. For translation of software messages, Rank Xerox has used a system for several years which shows the text in the editor within the display box into which it must fit. Given the lack of expansion space typically provided for foreign languages by English-speaking UI designers, this is an invaluable tool. It means that the translator can immediately tell if the translation doesn't fit, and can try an alternative translation or abbreviation. This avoids an extremely long-winded, cyclical process of translating, loading up software, retranslating, loading up again, and so on. Rank Xerox also has a simulator of the user interface, linked on-line into the translation editing environment. The simulator automatically keeps in step with the message shown in the editing environment. This is invaluable for dealing with things such as adjectival agreement (e.g., the word 'enabled' appearing on its own must be translated differently in some languages depending upon the gender or number of the text it qualifies). Facilitate the sharing of dataIn most systems, translators currently work in relative isolation. There is no simple mechanism for immediate sharing of useful information. For example, if a translator makes a change to a dictionary, that change should be communicated immediately and automatically to other translators dealing with the same target language (but not those dealing with other languages). There should also be an automatic way of providing documentation translators with the appropriate translation for a screen-based icon which was translated previously. Similarly, query management systems are often unwieldy and labour intensive, and comprehension queries tend to be raised many times over. In an ideal world, translators would immediately know whether a comprehension query had already been raised about a particular piece of source text, and would be able to subscribe themselves to that query for as long as they felt they needed to know the answer. Queries should travel quickly and intelligently across networks or modems to and from defined addressees. SummarySuggestions for innovation
RecommendationsOpenness
Flexibility
Productivity
Information management
Endnotes[1] Source matching is a Xerox term corresponding to 'repetitions processing', 'translation memory', or 'example based translation'. A translated string retrieved via source matching is referred to as a recall. [2] Change analysis is a means of detecting unchanged text during updates and automatically transferring the appropriate translation into the target text. It will be dealt with in more detail below, but it is a practical application of 'text alignment'. |
LISA Business Data Forum Summaries and Presentations LISA Globalization Consulting Network Webinars and TouchPoint Advisory Calls LISA Forum USA LISA@Chinasoft Fair LISA Forum Asia LISA Forum Europe LISA Forum India Open Standards • TBX • TMX |
||