|
In this issue…
OSCAR Advances
Since its foundation in Washington DC in June 1997, LISA’s OSCAR Special Interest Group has made rapid progress with its translation memory exchange format activities. In this article, OSCAR’s Technical Secretary Alan Melby offers both an update on OSCAR’s current and future activities for those who have been following the discussion, and a more general introduction for those new to the subject. OSCAR: The story so farLISA is taking its name seriously and is active in developing standards to allow localization tools to interoperate more efficiently. Yves Savourel of localization service vendors ILE, Technical Chair of LISA's OSCAR group, sees the future of translation tools as components that users can plug into custom processes. Software development is clearly moving in the direction of components, but a key to the success of the component approach in translation is the passing of complex data among the components. The need for data exchange gave birth to OSCAR. The LISA Special Interest Group on Open Standards for Container/Content Allowing Re-use (OSCAR) met most recently in San Diego in October 1997 in conjunction with the Machine Translation Summit VI. At the Catamaran Resort Hotel, representatives of major translation tools developers stayed off the beach for a whole day to hammer out final details of the Container portion of TMX, the first OSCAR standard. The October decisions were further discussed by e-mail, and the Container portion of TMX was first made public in December 1997 on the LISA Website. In addition, a proposal for the Content portion of TMX was presented and is currently being debated on the OSCAR electronic discussion list. What is TMX and why is it important to the localization industry? TMX, which stands for Translation Memory [database] eXchange, is a format that will allow users to re-use information in a translation memory database much more easily than they can now. A large translation memory database is a valuable corporate asset and is essential to the operation of advanced translation tools, including translation memory lookup and terminology research. In order to get the most out of a translation memory database, it must be possible to pass the information between various tools. These tools may come from multiple vendors and some tools may have been developed in-house. Users want to be able to feed one translation memory database into several tools with a minimum of fuss and, if possible, without having to develop custom filters. This is where TMX comes in. If translation memory data is converted to TMX format, it can be imported into any tool that supports TMX. An exchange format may be elegant but useless if no one implements it. TMX is practically guaranteed to be implemented since OSCAR includes major tools developers such as Logos, Systran, Star, and Trados, who have been able to have input into the definition of TMX. This is a much wiser strategy than developing a standard in a vacuum and then presenting it to those who must implement it. The next meeting of OSCAR is scheduled to be held in Salt Lake City on February 26, 1998, in conjunction with the 1998 LISA Forum - USA. Franz Rau of Microsoft Inc., chair of the OSCAR group, has asked Alan K. Melby (Brigham Young University), Technical Secretary of OSCAR, to conduct the Salt Lake City meeting, which will focus on discussion and approval of the full TMX standard, including both Container and Content portions. Rau's charge is as follows: "After publishing the standard and having it endorsed by LISA, it is now time to incorporate the last details and make it available to all." Once the complete TMX standard is officially adopted by OSCAR, then the tools developers in OSCAR will develop import and export filters for TMX and begin the testing and refinement phase. From TMX to TBX - TermBase eXchange for humans and machines…Overlapping with the development of the TMX standard, next month's OSCAR meeting will continue preliminary discussion of the TBX standard. TBX is OSCAR's format for TermBase eXchange. A termbase (terminology database) is the other major data resource. It complements translation memory databases and is also used in multiple translation tools. At the October 1997 OSCAR meeting, a brief discussion of terminology interchange made it clear that one important need is to synchronize the terminology in a termbase designed for human consumption and the terminology in a machine translation lexicon. More and more users are realizing that there is often a place for both machine translation and human translation in the same organization. Of course, the terminology that appears in documents translated by both methods must be consistent. But such synchronization does not happen by magic. What is needed is a database that is fed by both concept-oriented termbases and machine translation dictionaries and used to determine whether the same source terms are found and linked to the same target terms in all tools. Missing terms and conflicting term pairs can be detected and a terminologist can make necessary repairs. OSCAR will first study a format called OLIF from the OTELO project. OLIF is an exchange format designed to facilitate this synchronization process. We welcome to OSCAR several representatives of the OTELO project as new members, including Daniel Grasmick of SAP, Peter Quartier of Lotus, and Gregor Thurmair of GMS. They bring not only their significant individual experience in translation technology but the group experience of working on the OTELO project, a large European Community project that uses data exchange standards to integrate various translation technologies. Equally, we would like to welcome the international localization service providers SDL aboard. …and for concept-oriented termbasesAnother need is terminology interchange among concept-oriented termbases. Here the emphasis is not synchronization but passing whole termbases from one tool to another with minimum information loss. For this task, OSCAR will first study a format called MARTIF that evolved from the Text Encoding Initiative and is currently a Final Draft International Standard under ISO Technical Committee 37. MARTIF is designed to be tailored to the needs of a particular user group, such as LISA members. In addition to the definition of a LISA subset of MARTIF, a connection between MARTIF and OLIF will be developed; however, the flat nature of OLIF may prevent retrieving some of the relationships between elements when going back to MARTIF. MARTIF and OLIF complement each other. MARTIF is for relatively lossless interchange among concept-oriented termbases. OLIF is for synchronization of terminology from a number of sources. OSCAR will eventually adopt a TBX that fills both needs. If it turns out to be feasible to avoid inventing new wheels, then TBX will consist of OLIF and a LISA subset of MARTIF. The LISA subset of MARTIF is not yet defined. Judging from the subset of MARTIF (then called E-TIF) adopted by LISA several years ago, the LISA subset will be somewhat larger than MARTIF Lite, but significantly smaller than the full set of data categories allowed in MARTIF. The most likely procedure for defining the LISA subset will be to gather sample entries from termbases currently maintained by LISA members and analyze them by mapping them to the data categories in MARTIF (which is based on the inventory in ISO 12620). Then the data categories that are used by most or all of the LISA member companies will be proposed as the LISA subset of MARTIF. Since all subsets of MARTIF share the same overall structure (technically, they conform to the same SGML DTD), MARTIF Lite and the LISA subset will be highly compatible. The likely result will be that MARTIF Lite will be a subset of the LISA subset, so that no conversion from MARTIF Lite to LISA MARTIF will be needed. Conversion from LISA MARTIF to MARTIF Lite could involve some information loss if the data categories used are not available in MARTIF Lite, but the central information about the terms that designate a concept, an indication of subject field, and, of course, the terms themselves will be preserved. The activity within OSCAR is an encouraging sign in the Localization Industry: competitors working together for the good of the users. OSCAR standards will allow users to mix and match components according to needs. No one is locked in to any particular tool, and decisions can be made on the basis of performance. The playing field is more level. Who could ask for anything more? Professor Alan Melby
|
![]() 8-12 December 2008 |
||