|
Making Money with Machine Translation
Every Cash Cow Starts Out as a Calf!
Several of our customers told us, “Our people keep sending confidential texts out to free machine translation systems on the internet. This is clearly a security issue; we must offer them a secure alternative. I don’t want to see any more of our documents on unidentified web servers. Do something about it!” If a customer actively asks for such a service, there must be some money in it, we thought, and that’s how our machine translation (MT) adventure began at CLS Communication.
In the beginning, it truly was an adventure. Being used to working with off-the-shelf language technology products, such as translation memory systems, we felt a little like Robinson, Adam and Eve in one person, as we had to cut back the jungle and bite into a number of poisonous apples. Many things were not as easy as we had expected. With time and experience, we have been able to cut down the adventure part and now work with a system we can safely and proudly offer to our customers in exchange for payment. This article describes CLS Machine Translation today and takes a look at some of the issues we have had to overcome during the “adventure phase.” CLS Machine Translation TodayTo our users, CLS MT looks very similar to any free translation service on the web. Via a web interface, it supports translation of text fragments, as well as entire documents. The interface allows the user to specify translation parameters such as source and target language or subject. Since we developed our own interface, we can easily adapt it to a customer’s corporate identity. Users have the choice between fifteen language versions (German-English, English-German, German-French, French-German, English-French, French-English, English-Spanish, Spanish-English, Spanish-German, German-Spanish, English-Italian, Russian-English, English-Russian, Russian-German, and German-Russian). The system now accepts documents in several formats, including .doc, .rtf, .htm and .txt.
Web Interface to CLS Machine Translation TerminologyCLS Machine Translation integrates 30,000 terms in four languages (German, English, French and Spanish) from our terminology database, CLSTerm, and 1,250,000 translation units of translation memory data. The terminology mainly covers the financial, insurance, legal and telecoms sectors. Our MT team has coded an additional 30,000 entries, including the names of companies and people, as well as unknown words identified in customer texts. As our MT team is permanently available for dictionary coding, it is able to quickly integrate a new customer’s terminology into the system to serve that particular customer. SecuritySecurity concerns were the mother of our MT project, as I explained earlier, so security is still a top priority: Data transfer between customers and our MT server is encrypted, based on SSL technology. Some customers even prefer to have their own dedicated MT server, accessed via a direct line, so that their data never travels over the internet. The system boasts a current uptime of 99.9%, with the MT team offering technical and linguistic support during office hours. UsesMost customers use CLS MT directly through its web interface, sending entire texts or just looking up single words and expressions in other languages. Currently, this type of use is what generates most of the money made with the service. For some of our customers, texts translated by the machine are post-edited by human translators at reduced rates compared to human translation. Such texts are typically internal documents that customers use purely for information purposes. Turning the Prototype Into a ProductWhen we first installed our MT engine and performed a couple of tests, everything ran smoothly. This, of course, was because we were all very “kind” users, in the sense that we only used the system as described in the documentation. However, when we made CLS MT available to our first test customers, system stability was a big issue. The system would crash when a customer sent a document that was too big, a Word document(!), or just because it did not feel like working that day, it seemed. We could not possibly offer an MT service on the market with considerable downtime, nor were we able to teach all of our customers how to convert a Word document to RTF, so this was a very urgent issue. In other words, we needed to turn the prototype into a product. First, we had to closely monitor the system “by hand,” restarting the servers whenever necessary (nice job looking at uptime controls all day…). Later, architectural changes reduced the number of crashes, and monitoring was automated. Terminology Database Data Is Not MT Terminology dataAs CLS already had a terminology database (TDB) of more than 60,000 entries, we expected to be able to simply transfer them from one system to the other, exporting them from the terminology management system and importing them into the MT terminology module. However, in many respects, TDB data is not MT terminology data. TDB data’s target audience is human, while MT terminology data’s target audience is a machine. Even though our TDB is very elaborately structured, including the definitions, usage and context sources for each entry, etc., the only fields that were of real use to the MT module were the actual terms along with their gender. As the two systems used very different classification systems, the subjects could not be transferred and recycled either. Moreover, our TDB system is concept-centered, i.e. synonyms such as investment fund, unit trust and FCP (and their French equivalents fonds de placement, fonds d’investissement and FCP) appear in the same entry. However, the MT’s terminology module is word-centered, i.e. the terms investment fund, unit trust and FCP all have their own entries and transfer definitions to the target terms. Consequently, all TDB entries including synonyms had to be hand-coded if no term pairs were to be lost. In order to automatically transfer these entries, they also needed to contain information about when each term pair is to be used. To deal with this in a pragmatic way, we decided to automatically import the first of the synonyms in each language, thus sacrificing all of the other possible term pairs in each entry. In a time-consuming process, we then had to comb through all the imported entries to check the imported term pairs. In the meantime, the CLS terminology team is working away, and of course, we need to include their current work in the MT system, too. As the issues described above cannot be solved quickly, the transfer of the new TDB entries created by the terminologists still involves a large amount of hand-coding. TM Data for Humans Is Not TM Data for MachinesTranslation Memory (TM) data was surprisingly easy to export from the TM system and import into the MT system. Here, the issue was more with the actual TM content. Again, TM data for human translators is not necessarily TM data for machines. Oftentimes, source and matching target segments depend on a specific context and are not of a generic nature. In one of the problematic segments, the German title Zweck ‘purpose’, for example, translated into What you can use your bank card for, which was fine in the particular letter the segment stemmed from. In most other contexts, however, this would be a rather awkward translation. For a human translator, this is easy to spot, and she will not make any mistakes because of this. The machine, however, will take any match for granted… In order to deal with this problem we have had to comb through the translation memories – partly using automated scripts, partly through a manual process – and delete such non-generic and context-specific sentence pairs. As this is quite labor-intensive, we cannot possibly integrate newly produced TM content into the MT system without the time delay required for cleaning the data. In summary, this is the advice that we learned the hard way while bringing CLS Machine Translation from the jungle to the market place:
Making Money?If you’ve been reading carefully, you may have noticed that the term money appears more often next to the term invest than next to the term make. This clearly mirrors the situation during the first phase, the “adventure phase,” when the actual product is being defined and set up. There is money in MT. However, if you want to make some, you need to be very patient and plan for the long-term. Every cash cow starts out as a calf! joined CLS Communication six years ago and is now responsible for translation memory and machine translation support for translators. She is a trained translator and terminologist and began her career in the language industry eight years ago with TRADOS Switzerland. Röthlisberger can be reached at monika.roethlisberger@cls.ch. |
LISA Business Data Forum Summaries and Presentations LISA Globalization Consulting Network Webinars and TouchPoint Advisory Calls LISA Forum USA LISA@Chinasoft Fair LISA Forum Asia LISA Forum Europe LISA Forum India Open Standards • TBX • TMX |
||