|
In this issue…
Linguistic Tools of the Trade
XEROX Ushers in the Future
Monica Beltrametti has been the Director of the Xerox Research Centre’s laboratory in Grenoble, France since 1995, and has played a key role in its development. The Grenoble Laboratory concentrates mainly on the study of language itself, with the aim of eliminating the barriers hindering the distribution and manipulation of electronic and paper documents in different languages. As well as enriching the intellectual capital of the company in her research efforts, she is also responsible for “guiding Xerox into the future” - i.e. for increasing the company’s business value by transferring technology to the market. Michael Anobile and Deborah Fry spoke with her about her work. Michael Anobile: You joined the Xerox research organization four and a half years ago. What made you take the step at the time? Monica Beltrametti: Before joining Xerox, I managed the computing and network organization at the University of Alberta in Canada. Part of our dream was to be able to print from anywhere to anywhere, to have a “library without walls”, and Xerox was one of the companies I approached to see if they could help us realize our plan. The Centre was created around the same time, and it seemed a natural transition to go from being a Xerox customer to being a Xerox person myself, researching and marketing these systems. Deborah Fry: Why have you chosen to de-velop language processing tools in Europe? Monica Beltrametti: Xerox started research in linguistics in the United States almost 20 years ago, with Palo Alto as its first center. It started out with the emphasis on pure research, and not so much on trying to transfer it to the market. When the Grenoble Centre was created, it was felt that researching linguistics was a natural topic to have in Europe, because of the cultural and linguistic boundaries. If you want to do business in Europe, you have to eliminate some of these barriers. So, the idea was to do research in linguistics aimed at abolishing them. Deborah Fry: So how have you gone about developing your particular tools? Monica Beltrametti: I think we have a very systematic way of developing our tools. We have our underlying technology - finite state technology - and we have various linguistic technologies. First of all, we implemented the basic components which we regard as our building blocks, such as morphological analyzers, language testers, disambiguators, etc. They can be sold alone for incorporation into other people’s products, e.g. to manage multilingual documents, indexing, etc. However, they can also be used in combination with each other to construct higher-level applications. Michael Anobile: What markets and business sectors is Xerox targeting? Monica Beltrametti: One market is the OEM information retrieval market. It’s really a horizontal market with customers like Oracle, America Online and Verity. These people have incorporated (our) linguistic tools into their products. The other market we are pursuing is a vertical one: authoring and translation of product documentation. Deborah Fry: How do these two fit together? On the one hand, you’re targeting end users and putting multilinguality into the machine, and on the other, you’re empowering specialists. Is there a tension here or a synergy? Monica Beltrametti: The two markets are totally different, although the commonality is at the tool level. The authoring and translation tools are based on the components that we sell for the information retrieval market, but they are targeting another market. So, from the architecture point of view there is a lot of commonality, but the markets are totally different. There are no conflicts, because we have two different companies within Xerox targeting the two different markets. Inxight, Xerox’s start-up company, handles the OEM information retrieval market, and a new business unit has been created here in Grenoble for the authoring and translation market, Multilingual and Knowledge Management Services (MKMS). The unit reports to one of Xerox’ business groups, the Document Services Group (DSG). Michael Anobile: How are people going to be using your products in each case? Monica Beltrametti: MKMS’s clients are professional authors and translators at global corporations that primarily produce complex machinery, e.g. aerospace, automotive, earth moving equipment, etc. These generate lots of user and maintenance documentation, and we help the authors and translators work more effectively. Above all, they need to use terminology consistently. You can’t talk about car “boots” and “trunks” in the same document. So when the authors write the documentation, the tools check if the terms used are indeed consistent, and if they are part of the allowed vocabulary. Often, only about 30% of the documentation changes between product versions, so the tools also recognize which parts of the document have already been translated in the past. Deborah Fry: So you offer memory tools? Monica Beltrametti: Yes. Translation memory tools ensure that only new parts of the documentation are translated. MKMS’s three major products are terminology management tools for authors and translators, translation memory tools for translators, and a translation aid for end users. The translation aid translates words in context and is primarily aimed at field support maintenance. These people often receive documentation in English because it’s too expensive to produce in other languages, but their English is not sufficient to understand the entire documentation. Now they can just click on any words they don’t understand, and these will be translated. Michael Anobile: How much of this are you using in-house? Monica Beltrametti: Most Xerox documentation is translated at Welwyn Garden City, in The Xerox Limited Technology Centre. They also use our tools to translate into various languages, and their feedback has made us much more efficient. We have given them tools so far for 7 languages, and we are going to transfer 5 more languages within the next year. Michael Anobile: And are any of them double-byte languages? Monica Beltrametti: So far no. There are 7 European languages, and next year we are going to do some Eastern European languages — Hungarian, Polish, Czech, and Russian. Michael Anobile: Have you been looking at portable platforms for the field maintenance market? I understood your products were client-server based. Monica Beltrametti: The translation aid tool runs as a standalone on a portable computer. Deborah Fry: So in fact you have a very powerful linguistic engine on a very small machine now? Monica Beltrametti: Yes, part of the strength of our technology is that it’s very compact. Michael Anobile: This is a good lead in to the question of how the Web will facilitate online translation? Monica Beltrametti: The major Xerox customers are big global corporations; Xerox has not pursued the classic Internet systems market yet. That is not to say that we will not pursue it in the future. Deborah Fry: If you work for one of these large companies, you often have the benefit of homogeneous processes and standardized vocabularies and texts. When you move out onto the Web you have a much more fragmented process, which makes consistency in translation and authoring much more difficult. What are your thoughts on that? Monica Beltrametti: That’s certain true, but if you look at product documentation, you have restricted texts and certain terminology. With general texts, you need a very large dictionary to cover all the possibilities, and it’s often unclear which terminology is needed –legal, scientific, etc. There are ways around this, though: you can always ask the user to enter the text category first if you don’t want to keep enormous dictionaries on-line. This is not a business we have pursued yet, and it’s unclear whether we will. Also, if we do pursue Internet business, it may not be translation business. There are many other things linguistic tools can do, such as extract terminology from texts. Michael Anobile: Do you think morphological analysis is an application that would be used? Monica Beltrametti: I don’t think end users are interested in morphological analysis, but they might be interested in submitting a document, having it indexed correctly and giving it a keyword, so that they can store it or later retrieve it. Deborah Fry: How do you think these applications are going to impact the localization and translation business? Monica Beltrametti: Well, people want to produce documentation faster and more cheaply. Speed is essential, because you don’t want to have a product ready that can’t be launched because the documentation isn’t ready. This is becoming more of a problem with increasing globalization: companies now have to translate documentation into 30 languages, while in the past it was perhaps 10. For these firms, it’s no longer a question of producing the documentation more cheaply, but of having it done quickly so that they don’t waste millions and millions of dollars on products waiting around to be sold. Faster and more efficient tools mean that time-to-market will improve and translation into more languages will be more efficient. Michael Anobile: Will these developments cover the entire localization lifecycle? Monica Beltrametti: That’s the goal. If it doesn’t get faster, you cannot be faster in time to market. Deborah Fry: Can you see another paradigm emerging: keep the documentation in English, and translate on demand or add keywords on demand? Monica Beltrametti: I don’t think the end user will want this, and/or only for certain applications. If you have a user manual in your car, you want to read it in your own language now. It has to be there when you sell the car. However, more and more people will want customized documentation. If you have a car without a sunroof, then you don’t want a user manual that reminds you “Oh damn, I should have bought that option.” I don’t see a real use for translation on demand. If you decide you are going to sell cars in 30 different countries, then the documentation has to be there in 30 different languages. Traditionally, manufacturers have translated documentation in a few languages themselves and let the importing countries handle the others. This presents a lot of difficulties: it’s slow, the quality isn’t good, and the companies lose revenue. The new paradigm will be to centralize product documentation translation. Deborah Fry: Does that mean in-house or working with freelancers and external suppliers? Monica Beltrametti: All these models are open. It basically means that one organization will manage it all, instead of the importing countries. Companies who generally outsource other business will outsource their translation as well. Other companies want to have some of the tools in-house. Michael Anobile: Can you give me an indication as to how much of Xerox’s translation business is outsourced? Monica Beltrametti: Welwyn Garden City does translation for Xerox. Full stop. MKMS has 2 branches – one is responsible for the Centre’s tools which are integrated into the customer’s environment. The other branch handles outsourcing, so if people don’t want to have the tools in-house, then the other part of MKMS can offer them the service they need. Michael Anobile: So you are actually a service supplier on the open market as well? Monica Beltrametti: Yes!! Deborah Fry: So how big is the MKMS operation? Monica Beltrametti: The MKMS tools branch was created in January, and they employ nearly 10 people now here in Grenoble. It will expand considerably next year, but the official numbers aren’t public yet. The outsourcing branch has around 40 employees, but it’s hard to get an exact headcount, because they hire people on demand, depending on the jobs they have. They already have very substantial revenues. Michael Anobile: Moving back to the technology side of things, what is your perspective on translation memory technology versus machine translation technology? Monica Beltrametti: If you talk to professional translators of product documentation, most of them prefer to work with translation memory. It’s partial, but it is also precise. There are machine translation systems being pushed heavily at the moment, and over the years they will certainly improve. MT applications for technical product documentation will evolve most quickly, because the terminology is known, and the English is not necessarily Shakespeare. The debate will continue, and I think the balance will change as the technology improves. Deborah Fry: Do you think users would be well advised when preparing terminology to bear high-level machine translation in mind for the future? Monica Beltrametti: I think the one doesn’t exclude the other. You could have translation memory set up your document to take advantage of the things you translated in the past which you know are correct. Then, instead of doing the rest of the documentation by hand, you could use a machine translation system… Michael Anobile: For a first pass… Monica Beltrametti: Yes. Michael Anobile: But this functionality hasn’t been integrated with your tools yet? Monica Beltrametti: No, we have not pursued the machine translation market at all, although we have researched in that area. Michael Anobile: What was the rationale behind that decision? Monica Beltrametti: We didn’t want to produce a system that needs a lot of post-processing. We wanted one that does it correctly the first time round. This is our goal, and it remains to be seen whether we’ll reach it or not. Deborah Fry: How does your approach differ from those of your main competitors? Monica Beltrametti: Firstly, I think, one has to define competitors. If you look at MKMS, for example, Xerox can cover the entire value chain from research to components, applications such as the tools we just described to integration services and outsourcing. There is no other vendor that can do that. There are people in tools, and people in integration, but nobody has covered the whole value chain yet. Michael Anobile: So you’re offering customized one-stop shopping? Monica Beltrametti: That’s right. Deborah Fry: Single tools vendors would probably say that a suite of items from a single vendor would not be as strong as different products from different vendors. How would you answer them? Monica Beltrametti: I think customers want to have tools that are interoperable. You certainly could buy from separate vendors, but I think you might end up with a nightmare. You have to ensure that all the applications are interoperable, that they all offer the same languages, etc. Some vendors cover one language and not the other, so it’s not an ideal solution. Michael Anobile: What are your views on the standards available to help users capitalize on today’s language processing technology, such as OSCAR, TMX (translation memory exchange), and TBX (term base exchange). Monica Beltrametti: Standardization always helps streamline what you do. We welcome standards, because they make systems more interoperable, but these standards are just emerging, and it will take some time before they are accepted by everyone. Deborah Fry: What standards for language processing tools do you think are the most interesting for clients? Monica Beltrametti: I know that some vendors are struggling with the fact that they get documents in many different formats. They have to filter them to create a format for the translation and filter it back afterwards. That’s certainly one aspect. Michael Anobile: Does Xerox have any plans for the speech-to-text area? Monica Beltrametti: We haven’t done any research in that. We think it’s an important interface, and we will look into it to understand how our tools interface to voice, but we won’t do any research there. We’ll partner in this area. Michael Anobile: You’ve worked in both North America and Europe. Where do you see the greatest receptivity to language processing tools? Monica Beltrametti: As companies become global, regardless of whether they’re from Europe or the States, they become very receptive. After all, they know that they have to produce documentation quickly and efficiently. I think that the United States lagged a little bit behind Europe perhaps, but now it’s really very similar. Deborah Fry: Spinning all this ahead ten years, how will these linguistic tools and technologies change business, and change document processing within companies, for example? Monica Beltrametti: It depends a lot on the Web, actually. Multilingual documents will be online, because people feel more comfortable in their own language. They might know a bit of English, but they prefer their own language when they have to write a document. It’s always a difference between active and passive vocabulary. I think these tools will allow people to work in their own language, but also to communicate with people who speak other languages. Due to that, knowledge that is not flowing at the moment will be able to flow in the future. For example, if a sales person writes a contract for a customer in German, a colleague in the United States will not know how to capitalize on it, because s/he can’t read it. Translation is expensive, so s/he won’t even try to retrieve a multilingual document. All this will change. I think there will be a huge impact on knowledge management. Michael Anobile: And in filtering, which goes along with that ... Monica Beltrametti: Absolutely. Michael Anobile: Many people are just starting in this area. They’re not using the tools yet, but they understand that this is going to impact their business in the coming years. How can they best prepare their companies for this global outlook? Monica Beltrametti: I think that they should start shopping around for multilingual tools and packages that have multilingual functions in them. They should also strike alliances with companies that can help them integrate multilingual tools into their working environment. Deborah Fry: How important is terminology development? Monica Beltrametti: It’s extremely important. These tools will have to read terminology themselves, because each company will have its own. So, it’s important to have terminology tools to assist in creating databases that can be fed into other tools, like translation aids or machine translation. That’s almost the starting point. Michael Anobile: By extension, any standardization efforts in terminology would benefit the global market, wouldn’t they? Monica Beltrametti: Yes, that’s right. Michael Anobile: Thank you very much. Monica Beltrametti
|
LISA Business Data Forum Summaries and Presentations LISA Globalization Consulting Network Webinars and TouchPoint Advisory Calls LISA Forum USA LISA@Chinasoft Fair LISA Forum Asia LISA Forum Europe LISA Forum India Open Standards • TBX • TMX |
||