|
In this issue…
EAMT Graduates to the Conference Circuit: User Orientation Comes into Focus
"Practical Applications of Machine Translation" was the theme for the 10th annual EAMT Conference, recently held in Budapest, Hungary. The common theme? MT developers must learn to view users as investors in MT systems, rather than as end users with constant quality concerns. István Lengyel, Head of Business Operations for Kilgray Translation Technologies and a member of the local organizing committee, provides conference highlights below.
The conference theme, Practical Applications of Machine Translation, was perhaps suggestive of MT as a mature technology homing in on practical implementation. At the same time, it implied that all translation technologies (including translation memory) are beginning to converge slowly. The conference opened with a keynote speech by Dominique Estival, Defense Science and Technology Organisation (DSTO - Australia), on the Language Translation Interface (LTI). On the basis of DSTO’s extensive experience of using language technology, Estival explained the concept of LTI as providing a single interface to a variety of MT translation tools, including MT and incorporating the translation memory tool, Wordfast. Viggo Hansen delivered the closing speech, in his capacity as the Secretary of EAMT. He drew attention to the fact that most developers came across as unaware or unconcerned as to who the real MT users may be. He reminded developers to view users as investors in MT systems, seeking to overcome certain linguistic barriers, rather than as end users with constant quality concerns. Hansen’s closing remarks underlined the common theme running throughout most of the presentations. Industry and ResearchIn addition to European delegates, EAMT also attracted a significant number of researchers and industry representatives from the U.S. and Canada, as well as a presentation from Japan. In front of an audience of over 90 people, 14 oral presentations and 20 poster presentations were delivered. Among the industry presentations were Jörg Porsiel (Volkswagen), András Bocsák (Comprendium), Hans-Udo Stadler (CLS Communication), Terence Lewis (Hook & Hatton) and Rudolf M. Meier (Siemens). While Siemens Netherlands saved EUR 231,000 in 2004 through the use of the Trasy Dutch-English MT system, pre-edited machine translation generated 24% of the total revenue for Siemen’s translation department. Philippe Langlais from the University of Montréal introduced a topical focus on evaluating the English-to-French machine translation for weather forecasts in Canada. Ariadna Font Llitjós, Jaime G. Carbonell and Alon Lavie from Carnegie Mellon University presented a paper on interactive and automatic refinement of transfer-based MT. They pointed out that most systems do not substantially improve after human feedback because they cannot update their grammars accordingly. Their presentation suggested a method for automatically refining MT rules. Naoki Asanoma of NTT Cyber Solutions Laboratories presented a method for cost-effectively creating a conversation corpus for speech-to-speech translation by deriving text from very basic ‘germ’ dialogs. Automatic Evaluation for Identifying Particularly Bad Machine TranslationsTranslation evaluation is a thorny issue, both in translation studies and with machine translation. While the evaluation of human translation remains subjective, with no agreement on how to quantify the factors involved, MT evaluation is quantitative. The question regarding the latter is whether or not it correlates with human judgment. A good BLEU score, as Christian Boitet (GETA, CLIPS, IMAG) argued, even if calculated against human translations, may not always reflect good MT quality. But as Anthony Aue (Microsoft Research) pointed out, this is not the only problem – most existing metrics do not help in production environments. Aue, Michael Gamon and Martine Smets from Microsoft Research discussed automatic evaluation of MT quality. They pointed out that evaluation is objective-driven, and that current metrics only work well if the task is a system comparison or a longitudinal tracking of system performance, i.e., in research and decision-making scenarios. Almost all evaluation methods, whether at the multi-sentence level (the BLEU metric and related ones) or at the single-sentence level (Kulesza and Shieber), require one or more human reference translations for each sentence to be evaluated. However, in a production system, it is totally unrealistic to expect every sentence to have a prior translation. In a production system, like the example-based MT (EMBT) system Microsoft Research is working on, massive amounts of text need to be evaluated automatically, so that the human translator can identify systematic translation errors and perform post-editing of low-quality output. If the system "knows" that its results are not very reliable, it saves a lot of work for the post-editor. Therefore, a sentence-level translation quality evaluation metric – that does not require human reference translations for each sentence – is the goal. In MT evaluation, form and content need to be checked. However, if there's a strong correlation between fluency of the sentence and its information content, it is enough to verify form only. The authors propose a system that relies on a combination of a four-gram language model and a classifier (support vector machine-type). The method requires a linguistic parser and the following types of linguistic data: (1) a set of machine-translated sentences, some of which are annotated by humans for quality and (2) a target language corpus. The existing four-gram language model can be trained on a domain-specific target language corpus to "learn the patterns" of the language, i.e., the probabilities of which words will follow which words. A perplexity score is then calculated which indicates how much the observed word sequence is expected in light of the word sequences in the training corpus. The authors propose collecting more information from the same corpus by using a linguistic parser for the target language, thus increasing the results' correlation between human judgment and perplexity scores. As Kuleza and Shieber (2004) and Corston-Oliver et al. (2001) point out, there is an assumption that MT output is known a priori to be of worse quality than translations produced by humans. Therefore, a classifier should be created that can establish with high confidence whether a sentence is machine-translated or human-translated. A linguistic parser is a language-sensitive natural language processing application. In the experiment cited by the Microsoft Research team, English-French machine translation was evaluated, using Microsoft's own NLPWin parser for French. The authors extracted part-of-speech tags, context-free grammar productions (e.g., a noun phrase comprised of a determiner, a noun and a prepositional phrase), and some semantic information. Sentences were represented as vectors of such features and encoded in a binary way. The classifier was able to achieve an average accuracy of about 77%. The authors evaluated BLEU (BiLingual Evaluation Understudy) against the LM (Language Model) and SVM (Support Vector Machine) metrics. BLEU took advantage of human translation, and thus achieved a relatively high correlation (but not high enough) with translation quality (0.58) and a lower correlation with fluency (0.41) on the sentence level. LM's correlation values were lower: 0.29 for overall quality and 0.34 for fluency. However, it is accepted that BLEU itself isn't enough at the sentence level. A combined score of LM and SVM yielded a fluency correlation as high as that of the BLEU score (0.42), with quality also increasing to 0.42. The practical use of this approach is to identify badly translated sentences, thus decreasing the time necessary to create an acceptable translation using MT. It was shown that the LM-SVM combined metric outperformed all other approaches in finding bad translations. The authors were looking for the worst 5% of sentences in the test set. These sentences were all annotated for quality by 6 people, so translation quality was a collective judgment. If we are to find 60% of the bad translations of a text (i.e., the 3% worst-quality translations), and ignore 40% (i.e., at a recall of 0.6), 65% will really be very bad, whereas 35% will not belong to the worst sentences, and we spend time in vain checking them (precision 0.65). If we want a totally good post-edited translation (if we suppose that only 5% is so bad that it cannot be accepted), a recall of 1, the precision will be 0.4. In other words, if you want to correct the 100 very bad sentences, you will have to go through 250 sentences. Promising, isn't it?
A Blend of Rule-based and Example-based Translation?Theoretical approaches were also introduced during the conference. MorphoLogic's Gábor Prószéky explained that rule-based (RBMT) and example-based systems (EBMT) are just two extremes of a generalized model. He then proposed a model in which there is an arbitrary number of possible transitions between RBMT and EBMT. The underlying idea behind the system is the rule-to-rule hypothesis of Bach (1976), in which a tight correspondence is imposed between syntax and semantics, so that every syntax rule is also a semantics rule, as applied to translation. Therefore, if a structure can be described syntactically in the source language, then it can also be described by structures within the target language. However, one-to-one matching is not necessary. What's surprising is that this notion is the basis for a quite efficient implementation marketed as an English-Hungarian MT product. In MetaMorpho, there are no separate rules or lexical entries, no dictionary as distinct from grammar. All are stored in the same way. The grammar, which also contains the lexical entries, operates with pairs of patterns consisting of one source pattern used during bottom-up parsing, along with one or more target patterns applied during top-down generation. Every symbol has a well-defined set of features between zero and a few dozen, which can either take their values from a finite set of symbolic items (like case information) or be a string. Patterns are very explicit, and embedded feature structures are not allowed. Some patterns are productive with little or no lexical information. The core grammar for English, for example, is contained in about a thousand rules. Specific patterns can override general patterns. The analysis is performed in three steps, and the generation of the target language equivalent is actually a 'by-product' of parsing. First, the sentence is segmented into terminal symbols and tokens. Then the morphological analyzer determines morpho-syntactic attributes of these symbols. The system uses MorphoLogic's generating HUMOR, which is available for a variety of languages. Moose, the bottom-up parser, then analyses the input sequence. If it is recognized as correct, the system produces root symbols and a parse tree. When the whole input is processed and no applicable patterns remain, the target equivalent is read top-down from the root symbols by firing the target patterns. Pattern pairs have conditions, and in the case of multiple target patterns, the first satisfactory result is fired. To handle complicated word-order changes, a subtree can be interpreted in different ways. There is no need to transfer an abstract structure. Since there is no independent transfer after syntactic analysis, the source is analyzed with regard to the final output. There is no interlingual representation either. It may be called a direct method, yet it is too sophisticated to term it as 'simple.' It shows some resemblance to the Rosetta MT system, (Landsbergen 1984) which uses logical semantic representations. However, while Rossetta was an interlingua-approach system, MetaMorpho is not. Could MetaMorpho represent a fourth paradigm to rule-based systems? Its patterns are also related to the "translationally equivalent patterns" used in the English-Japanese MT system of Kawasaki et al. Patterns are created using a grammar writer's workbench, RuleBuilder. Also, work on integration with a translation memory system has started. Both the technology and its English-Hungarian implementation are available on the market. Integration of MT and TMAs statistical machine translation is “halfway” between MT and TM technology, these two approaches are likely to be integrated soon. Sanjika Hewavitharana, Stephan Vogel and Alex Waibel suggested an interesting hybrid approach: augmenting a statistical translation system with a translation memory, and evaluating the system on a Chinese-English dialog corpus. R.M.K. Sinha from the Indian Institute of Technology, though unable to attend the conference, sent two interesting articles on translation divergence in English-Hindi MT and the integration of CAT (computer-aided translation) and MT in the AnglaBhart-II architecture. The latter includes TM, raw and generalized example-bases, interactive and automated pre-editing, paraphrasing, failure analysis and a number of heuristics that attempt to deal with a variety of constructs that are frequently encountered in real-life English text.It is becoming obvious that a little user interaction can increase the quality of results drastically. Thus, computer-aided human translation and human-aided machine translation may soon become synonyms. Statistical methods for phrase alignment, i.e. sub-segment level alignment are able to produce surprisingly good results. This was demonstrated by Chris Callison-Burch from the University of Edinburgh/Linear B. However, in order to incorporate this technology into a translation memory, an intuitive user interface will be required. That being said, it was surprising to me, that in the case of most languages, statistical methods can yield very good results that go beyond even that possible through TM technology. NOTE: The proceedings from the conference are still available. Please contact info@morphologic.hu for details. And don’t forget, the most important event in MT this year, the Machine Translation Summit is being organized by the Asia-Pacific Association for Machine Translation (AAMT), in Phuket Island, Thailand in September 2005. István Lengyel, economist and translator, is head of business operations for Kilgray Translation Technologies. He is currently working on his PhD in translation studies, and lecturing in team translation at the University of Szeged, Hungary. He is also one of the designers of Kilgray’s translation environment, MemoQ. Lengyel can be reached at istvan.lengyel@kilgray.com. |
LISA Business Data Forum Summaries and Presentations LISA Globalization Consulting Network Webinars and TouchPoint Advisory Calls LISA Forum USA LISA@Chinasoft Fair LISA Forum Asia LISA Forum Europe LISA Forum India Open Standards • TBX • TMX |
|||