LISA Home page [© 2010 • ISSN 1420-3693 • www.localization.org]
© 2010 SMP Marketing • ISSN 1420-3693 • www.localization.org
Integrating Content Management and Language Technology
An Alternative to MT

Dietmar Boie, Director of Operations, Language Intelligence, Ltd.

Each and every language technology project is a challenge because the requirements are always different. In this article, one of LISA’s newest members, Dietmar Boie of Language Intelligence, Ltd., describes why his company chose to implement an integrated solution (XML-based CMS with translation memory and glossary management), rather than an MT-based solution, for a client whose primary concerns were quality and cost. Preliminary data indicate a very high rate of return with a comparatively small upfront investment.


Dietmar Boie

In the beginning, it truly was an adventure. Being used to working with off-the-shelf language technology products, such as translation memory systems, we felt a little like Robinson, Adam and Eve in one person, as we had to cut back the jungle and bite into a number of poisonous apples. Many things were not as easy as we had expected. With time and experience, we have been able to cut down the adventure part and now work with a system we can safely and proudly offer to our customers in exchange for payment. This article describes CLS Machine Translation today and takes a look at some of the issues we have had to overcome during the “adventure phase.”

The Debate Around Machine Translation

Over the past few decades, major advances in the development of machine translation tools have been announced at shorter intervals. In 2003/04 alone, several conferences and workshops focused on the promise and reality of this Holy Grail of language technology.

The proponents of the technology argue that global development and the exponential increase in the availability and distribution of information in all languages, and the resulting need for translation of this information in real-time or in very high volume, create a business imperative for the use of machine translation. The opposition seems to be equally vocal and principled in their judgment of the technology and point out that the problem with today’s machine translation technology consists of the same issues that have plagued it from the very beginning.

First Things First: Understanding the Requirements

The issue will likely remain hotly contested between the pragmatists who insist that global business really has no choice but to implement this admittedly immature technology, and the idealists who argue that quality and the intelligibility of information should not be sacrificed for expediency. Thus, when one of our clients came to us for advice on which combination of the available language technology tools to deploy for their publications department, Language Intelligence developed a solution which offered significant benefits in terms of cost and time savings, without requiring a large-scale, upfront investment in a technology with an uncertain ROI.

First, we determined the basic requirements for the right solution. The system would be a language technology solution, would incorporate available mature technology, and would minimize redundancies and overlaps. Up until this point, the client had only done sporadic localization. Many of the existing localized documents had been translated by in-country offices, while some locales used the English documents. No language tools were used, and any terminology management being done was on a document basis by in-country offices. Quality control, such as it was, was done only on individual documents by in-country reviewers.

Figure 1

Figure 1. Original Translation Process.

The client needed to support eight languages (Chinese, Dutch, English, French, German, Italian, Portuguese, Spanish) initially, with an additional eight languages possible within the next few years. The expected volume was between 200K and 500K words per language annually.

The required language technology solution had to be integrated into a content management solution (CMS), which was being launched simultaneously. The purpose of the CMS was to control authoring and input into the system, as well as to repurpose as many of the assets as possible in both the source and target languages. The output of the system was to be TMX (Translation Memory eXchange) or a similar format, so that assets would be stored in an open format. This would allow them to be integrated into other tools and to be shared with future participants in the localization process. Terminology and quality would be managed via a robust glossary tool that would integrate with both the CMS and the language technology solution.

The language technology solution also had to be compatible with the company’s main publication applications (Adobe FrameMaker and InDesign) and to be supported by the company’s existing publication team. It could not be a proprietary solution, and it had to offer significant, quantifiable financial and time savings across the documentation and software publication processes.

Figure 2

Figure 2. Integrated CMS/TM workflow

Machine Translations – User Reactions

The challenge was to find the right solution to meet all of the process requirements with the best possible ROI (return on investment) for our client. The options and combinations investigated included process controls and tools, such as (1) Controlled English and other forms of controlled authoring, (2) full-scale automated machine translation, (3) human post-editing, (4) translation memory (TM) and (5) the control of assets for publication in a CMS. Ultimately, any combination of these components would have met most technical requirements with some customization. The machine translation option with human post-editing would have been able to process the largest number of assets for localization in the shortest possible time frame, but would likely have also required the most significant investment in both personnel and hardware. Although the price of machine translation systems has come down, the required maintenance and preparation are still relatively expensive.

Our client’s primary concerns were quality and cost, with time-to-market a distant third. There is little hard data about the current quality of automated machine translation, and only a few attempts have been made to determine an objective measure of quality. Samples of machine translation results from today and ten years ago by John Beaven, in “MT: 10 Years of Development”, show improvement in certain areas. However, the machine-generated translations are still not elegant or even good. Guerra attempts to objectively evaluate the speed advantage of machine translation over human translation, assuming that translation quality is the same, but neglects to define what constitutes quality (Guerra, 2004).

This rather undifferentiated assumption about quality is countered by Jeff Allen’s observation (Allen, 2004) that “[t]he concept of the word quality is relative to customer requirements and expectations, the time constraints of the project, the target register of language and style, technical accuracy and so on.” Morland’s study about NCR’s experience with deploying a fully automated, on-demand machine translation solution for non-native users of their Global Learning Division of Human Resources underscores the fluidity of the quality concept in translation. He concludes that “[t]hose who speak English well will prefer to read English rather than a clumsy and occasionally inaccurate version of their native language.” (Morland, 2002). Kamprath and Adolphson detail the investment for deploying, updating, and maintaining a fully automated machine translation environment with tightly controlled input through Controlled English as highly capital- and resource-intensive (Kamprath and Adolphson, 1998). A complex solution like this would clearly have exceeded the limitations imposed on budget and staff levels for our client.

An Integrated Solution – The Best of Both Worlds

The solution we envisioned would enhance our client’s current and future publication environment. It was centered on an XML-based Content Management System, which contained all assets for publication and allows for repurposing of existing text in any language in a variety of document formats and types. A detailed style guide, combined with an effort to replace text with symbols wherever possible, was designed to significantly reduce the amount of source text, which consequently would reduce the amount of assets requiring localization.

We chose Localization Suite with MultiTerm from TRADOS, one of the most robust glossary tools on the market, to centrally manage the terminology and the translation memory tools for the localization process. This language technology solution would integrate with and utilize the strengths of all other components and aspects of the workflow, while providing the highest translation quality with a measurable and significant reduction in cost.

The current process calls for identifying assets for localization from within the XML-based CMS and preparing the output for localization. The localization partner then analyzes the XML output for text and segments that can be repurposed. The ultimate goal is to store all localized assets in the CMS and assemble localized documents on demand from existing assets. The localization partner receives XML output in multiple languages, with the text requiring localization being identified through pre-determined mark-up. Creating complete documents with previously localized segments from the CMS in a file format that can be directly imported into the TM tool provides the translators and editors with all necessary, client-approved context and reference text they require. The edited documents are then returned to the client in the familiar source-target format for client review. After review, the localization partner evaluates all changes vis-à-vis the existing TM, incorporates all required changes, and then produces final XML files, which are sent back to the client for incorporation into the CMS and final production.

The Bottom Line – A High Return With a Comparatively Small Investment

This integrated solution has required little customization beyond (1) establishing DTDs (Document Type Definitions) and a content management schema for the XML system that can be used by TRADOS, (2) setting up a workflow for all project participants (3) and creating and maintaining a MultiTerm glossary and translation memory files. Most of the training required has been done on the customer’s end on the CMS, to help users identify and manage the amount of text for publication.

The implementation was accomplished in a relatively short time frame with the existing personnel in the Publications group. The process environment is currently in its first year of real-time application, so there is not yet enough data available to create an accurate long-term ROI analysis. However, early data indicate significant time and cost savings. Thus, Version II of a set of datasheets, which were created from the CMS as XML documents, shows a repetition rate of 75% when analyzed against the existing TM (translation memory). In addition, initial estimates indicate that authoring in a controlled environment and using symbols and graphics will reduce the source text in most document types by up to 50%. As this integrated system becomes more robust and capable, it will further reduce the cost of creating high-quality localized documents.

Unlike a machine translation system, there is no need to buy or own anything other than the CMS, which has the added benefit of managing all text and graphic assets in source and target languages. There is no upfront investment for deploying, customizing or maintaining a machine translation system that would still require human post-editing for publication and yet not offer the benefits of managing content. The actual localization tool is owned and maintained by the localization partner at a negligible cost, while the CMS provides the client with the benefit of ownership of the localized assets in XML format.

Compared to a more complex machine translation solution, initial ROI projections indicate that, with a minimal upfront investment, the client gains approximately a 30% reduction in the localization component cost of a given project (translation, edit, review, update), while reducing the text to be localized by about 50%. There is the added benefit of being able to re-purpose all localized assets at an ever-increasing rate. On a hypothetical $50,000 localization project, the initial reduction of text through controlled authoring, style guide and re-purposing of assets reduces the cost by half to $25,000. A 75%, asset-based repetition rate will shave off approximately another 50% of the remaining project cost, so that the total cost for the original $50,000 project will be reduced to only $12,500, or 25% of the original projected cost!

In addition, the combination of XML assets from the CMS with TM and human translation/editing also significantly reduces time-to-market. In this integrated scenario, our client can produce high-quality localized documents and realize significant cost and time savings without a costly up-front investment into an immature technology, such as MT, that currently requires complex and expensive support.

Lessons Learned

The result of our research and evaluation underscores a few important points. When investigating language technology options for any scenario, it is important to understand that no two situations are alike. Requirements may cover very different environments:

  • translation for gisting, i.e. abstracting in imperfect language the “gist” of any given text
  • translation for publication
  • language pairs that may be more or less difficult to handle in automated environments
  • the number of available financial and personnel resources to support a given scenario
  • time-to-market versus quality of translation
  • translation reviews by the client
  • glossary maintenance
  • and last but not least, any corporation’s pain threshold for initial investment in language technology.

The debate around machine translation is likely to remain with us for years to come. The proponents of MT argue that the implied time and cost imperative is sufficient to justify the adoption of a technology that is not perfect and still requires human intervention in the form of post-editing to produce publishable output (see Electric Word: Threat or Thrill? by Jaap van der Meer). They usually favor machine translation technology under tightly controlled circumstances and certain process exigencies. The opposition claims that, even after more than four decades of development and advances, the technology is far from perfect. Most automated solutions and high-end machine translation tools cannot produce translations for publication without human post-editing. Hutchins notes, for example, that “it does seem strange that after nearly 50 years of MT research some commercial systems still produce incorrect morphological forms…, incorrect noun-verb agreements, incorrect adjective-noun orders, and incorrect placements of verbs at the beginnings or at the ends of sentences and clauses.” (Hutchins, 2002), a criticism echoed by Reinhard Schäler in his article “A New Dawn for Machine Translation” (Schäler, 2004).

Whether one favors the technology because of a perceived exigency, or rejects the tools as too immature, the fact remains that companies are employing the technology with varying levels of success. Much of the available data focuses not so much on the ROI, but rather on other issues related to MT, e.g., the human response to documents and texts generated by machine translation (Morland, 2002), or automated and human assessments of machine translation quality (Coughlin, 2003). It is curious that a technology that has garnered so much attention in the debate on globalization has produced so little hard and quantifiable results on ROI or bottom-line savings. If these systems met the expectations of its users, one would expect to find multiple studies and ROI analyses supporting a technology that surely would have to be one of the most sought-after tools in the global market place. The fact that almost no information on ROI is available makes this either one of the most closely-guarded corporate secrets, or, more likely, a case where the costs associated with deploying and maintaining a capable large-scale, real-time system may equal or surpass that of the traditional approach combining process controls, human translation and editing with machine-assisted translation tools like translation memory, glossaries and content management components.

Whichever side of the debate one happens to find oneself on, the absence of hard data in support of either argument compounds the difficulty in evaluating the best language technology solution for a given scenario. The most important guidelines in any implementation are to understand the current requirements of the users, to choose non-proprietary solutions and to keep your eyes on the ball. The solution must, at the very least, meet the users’ current requirements and should be flexible and scalable enough to grow with the organization. Ultimately, a well-designed implementation of a machine translation tool can also produce significant benefits to the end user, especially where gisting of largely diverse, independent corpora is concerned. For more structured environments, however, other more mature solutions exist for producing high-quality localized assets with significant cost savings and a documented ROI.

References

Adolphson, Kamprath, Mitamura and Nyberg. “Controlled Language for Multilingual Document Production: Experience with Caterpillar Technical English.Proceedings of the Second International Workshop on Controlled Language Applications (CLAW '98).

Allen, Jeff. “Thinking about Machine Translation.” Perspectives on Machine Translation: MLC&T #62 supplement, p.3. 9 Mar 2004.

Beaven, John. “MT: 10 Years of Development.”

Capussotti, Carlo. “A Translator Protests Poor Use of Machine Translation.” Perspectives on Machine Translation: MLC&T #62 supplement, pp. 11-12. 9 Mar 2004.

Coughlin, Deborah. 2001. “Correlating Automated and Human Assessments of Machine Translation Quality.” Proceedings of MT Summit IX, pp. 63-70.

Guerra, Lorena. “Machine Translation: An Imperfect But Evolving Technology.” Perspectives on Machine Translation: MLC&T #62 supplement, pp. 5-6. 9 Mar 2004.

Hutchins, John. “Has machine translation improved? Some historical comparisons.” Proceedings of MT Summit IX.

Morland, D. Verne. “Getting the Message In: A Global Company’s Experience with the New Generation of Low-Cost, High Performance Machine Translation Systems.” 2002. Machine Translation: From Research to Real Users. Ed. S.D. Richardson. 5th Conference of the Association for Machine Translation in the Americas. AMTA. 2002 October 6-12, 2002.

Schäler, Reinhard. “A New Dawn for Machine Translation?” Perspectives on Machine Translation: MLC&T #62 supplement, p.4. 9 Mar 2004.

van der Meer, Jaap. “Electric Word: Threat or Thrill?Globalization Insider, Vol. XII, No. 4.5, 3 December 2004.


Dietmar Boie is Director of Operations at Language Intelligence, Ltd. He has more than ten years experience in the localization industry and has managed implementations of integrated Content Management and Workflow systems. Language Intelligence specializes in customizing and integrating language technology and GILT services for its global clients. Dietmar can be reached at dboie@languageintelligence.com.




Contents


LISA Business Data

LISA Publications Catalog

Industry Insights Reports

Best Practice Guides

Surveys

QA Model

Forum Summaries and Presentations

LISA Globalization Consulting Network

Webinars and TouchPoint Advisory Calls


Join LISA

Subscribe


Upcoming Events

LISA Forum USA
(Foster City, California, April 13–16, 2010)

LISA@Chinasoft Fair
(Chengdu, China)

LISA Forum Asia
(Suzhou, June 28–July 1, 2010)

LISA Forum Europe
(Budapest, October, 2010)

LISA Forum India
(New Delhi, December, 2010)


Open StandardsTBXTMX

Terminology SIG

Job and CV Postings