|
In this issue…
MARTIF Lite: User-driven Terminology Interchange
As an implementation of the MARTIF terminology interchange format, MARTIF Lite has been designed to enable end users in all sectors to manage their own terminology exchange activities, based on a simple, generic format. Following an overview of the recent history of the development of terminology interchange formats, this article describes the new MARTIF Lite proposal and the reasoning behind its implementation. The long and winding road: From MATER to MARTIFMany readers will undoubtedly be familiar with the name MARTIF (Machine-Readable Terminology Interchange Format). Modern developments in this field can be traced back to an earlier terminology exchange format called MATER [1], which was only published after magnetic tape was already becoming obsolete. MATER was subsequently reworked into a more open format known as MicroMATER [2], designed in particular to meet the needs of the then new PC architectures. The rapid spread of Information Technology demanded a fresh approach to the problem of terminology interchange, which resulted in the TIF (Terminology Interchange Format) proposal supported by the Text Encoding Initiative (TEI), LISA, ISO and other organizations. The motivation for developing TIF was described as follows: "The TEI/LISA/ISO description of terminological data was originally designed primarily as a terminology interchange format (TIF) to allow users of terminology databases (TDBs) to exchange database records. In this guise it is called the Electronic Terminology Interchange Format (E-TIF) … A universal interchange format is a crucial element in making interchange easier because it eliminates the need for system developers to create an interchange for each potential pair of interchange partners." [3] After many years of debate and dispute—too many years, most users would argue—TIF, which had in the meantime been renamed MARTIF [4], finally reached a stage where MARTIF Part 1 ("Negotiated Interchange", where both parties agree on the fields to be included in the exchange operation) has now been submitted to ISO as FDIS (Final Draft International Standard) 12200. The last round of balloting will take place in the first half of 1998, and Sue Ellen Wright of Kent State University, Convenor of the ISO TC 37 Sub-Committee 3 Working Group on MARTIF, told the TAMA '98 symposium [5] that she expected this part of MARTIF to be published during summer 1998. Work is now progressing on developing the specifications for the second part of MARTIF, known as "Blind Interchange", which aims to produce a subset of MARTIF which will allow automatic exchange without the receiving party being given prior notification about the fields to be included in the exchange operation, other than the name of the registered subset. Also in 1998, the OSCAR Special Interest Group is expected to announce its intention to define a subset of MARTIF tailored to requirements in the localization industry. This was confirmed at TAMA ’98 by Alan Melby of Brigham Young University/Provo, who acts as Technical Secretary to OSCAR, and is described in Alan Melby’s own article appearing in this issue of the LISA Newsletter. However, this progress should not hide the fact that numerous members of the academic terminology community, particularly in Europe, appear less than willing to accept the compelling need to tailor terminology interchange format standardization and related activities—at least in part—to actual user requirements and workflows, rather than to theoretical notions of excellence or perfection which have little or no relevance whatsoever outside the protected realms of academic pursuits. The return of MARTIF LiteAt times, this apparent reluctance to take heed of user demands (an attitude which is often interpreted by users, perhaps mistakenly, to signal academic arrogance) has dogged the history of "MARTIF Lite". This concept first appeared in the Final Report of the EU-sponsored POINTER Project [6]: "Another area where a solution is urgently required is that of user-controlled terminology interchange processes. As it stands, the specification of the MARTIF standard is all-embracing, but the level of detail available could make its application less attractive to users of less complex terminology management systems wishing to populate and exchange termbanks with a relatively simple structure. ISO 12200 is addressed to software developers or system engineers, and database managers with advanced computer know-how, and ISO 12620 is designed for end users wishing to structure their own database with categories of terminological information, in the hope that others will use the same, or at least compatible categories. There is now growing pressure in the marketplace for the implementation of a less complex tool, more readily understandable to average users without comprehensive knowledge of computing and encoding, to allow the exchange of terminology data across systems and platforms. A less comprehensive protocol ("MARTIF Light") with a limited number of standardized entry fields, and with integrated compatibility with HTML, would overcome these problems. As a subset of the MARTIF standard, it would not in itself be a standard, but rather a MARTIF-compliant set of tools available to end users. In particular, it would encourage greater interchange of terminological resources between users (in particular translators) employing various low-level terminology management and glossary systems. Such a tool — or set of tools — could be developed quickly and at very little cost by an ad-hoc working party, including users and application developers, which would guarantee MARTIF conformity in its application." Despite discussions with users and a number of members of the TC 37 MARTIF working group, progress was slow. The idea surfaced again in a 1996 article by the present author [7]: "Where machine-processable terminology is required, past efforts at achieving a common standard have not been particularly successful. The MARTIF standard (ISO DIS 12200) goes some way towards to achieving this goal, but it appears to have become somewhat bogged-down in increasingly intricate detail. There seems little point in spending years developing an ISO standard (a process which in itself is hardly market-oriented) unless it gains widespread acceptance in an industry, and MARTIF will certainly require re-engineering before it reaches this stage. What could happen is that one industry leader will adopt a particular set of protocols and the rest will follow suit. Again, time-to-market will be the driver." Representatives of the academic community were quick to respond to these comments [8], demonstrating a keen understanding of the problems involved. The real breakthrough occurred at TAMA ’98, when—following an informal session attended by a number of users, as well as representatives of the academic community—Alan Melby and the author agreed, with the support of other users and researchers [9], to develop draft specifications for "MARTIF Lite" and work towards their finalization and publication in the first half of 1998. Why MARTIF Lite?The name "MARTIF Lite" is actually somewhat misleading. Other names for this protocol were mooted in the past, for example "Simplified Universal Terminology Interchange" (SUTI, pron. "sooty") or "Simplified Universal Interchange Protocol" (SUIP, pron. "sweep"), but SUTI and SUIP—although no doubt more familiar to UK users above a certain age—were eventually upstaged by MARTIF Lite, a name which seems to have stuck, despite grumbling in some quarters. An added attraction of this name is that it will undoubtedly increase user and market awareness of MARTIF itself. MARTIF Lite is not itself a terminology interchange standard, but rather an implementation of MARTIF. The basic idea is to specify, and make freely available, an inherently open, heavily truncated subset of MARTIF for generic use by end-users. Examples of potential applications for MARTIF Lite are:
A number of issues are addressed here. One of the most serious problems facing most holders and users of terminology today, in particular translators and authors, and SMEs active in the terminology services sector, is the extreme difficulty they have in exchanging, selling and buying terminology because of the lack of a common format. Although many systems are, for example, able to import and export simple csv files (delimited text files using commas, semi-colons or other characters as delimiters for the individual field entries), this is not true for all commercial TMSs. A further problem is the difficulty in identifying and reaching agreement on the record fields to be imported or exported. In turn, this unfortunate situation has been a major contributory factor in hindering the development of the terminology market and preventing the profitable exploitation of a substantial number of terminological resources. Many observers and language industry analysts are agreed that the potential market for terminology is vast, but that a number of factors, including seller/buyer uncertainty about interchange formats, represent a barrier to growth. As a simplified information framework for terminology interchange, MARTIF Lite is intended to be an "entry threshold" interchange format for those individuals and organizations unwilling or unable to apply the full MARTIF format. Another issue relates to terminology input workflows. In the case of TMSs in widespread use today, it appears that considerable effort has gone into technical considerations, and there is no doubt that such systems are very powerful. But do they actually give users in the real world what they want? Are they flexible and adaptable enough to mirror and support user workflows? Where are the real productivity gains to be made using such systems, and where are the objective audits of such productivity gains in the real world, rather than in public-sector organizations and academic institutions not subject to the same productivity and time-to-market pressures? Are TMSs designed to enable rapid entry of large amounts of terminological data (not the 500 terms per semester demanded of students, but rather the 500 terms per day which might be necessary in a commercial environment)? It is surely ironic that one of the simplest and quickest ways to import large volumes of terminology data is still to enter it into a word processor table or spreadsheet (using cut and paste, and drag and drop functions etc. wherever possible), convert the table into a csv or similar file (semi-colons often prove to be the most useful separators, as commas often appear in definition and source fields) and then import it into the TMS. For profit-driven independent language service providers, life is simply too short to allow an expensive translator to sit for days on end entering terminology, and few corporate language services departments today have the financial resources to employ a full-time terminologist. This situation has also been understood by some researchers: "…. the highly labor-intensive documentation of sometimes encyclopedic terminographical information associated traditionally with, for instance, the creation of terminology theses in university environments or government funded term banks is more often than not simply not feasible under commercial conditions …" [10] The target MARTIF Lite userMARTIF Lite is designed to be easy to use, pragmatic and to work "straight out of the box". It restricts the exchangeable terminology data to the absolute minimum necessary to ensure that the core elements of a terminology record can be transferred "blindly" from one computer to another with the lowest possible workload and a minimum of user intervention, albeit under the user’s control at all times. The structure of the current draft version of MARTIF Lite is shown in the accompanying box. It has been deliberately kept to the smallest feasible number of interchange fields, but on the other hand, experience shows that these will probably be more than sufficient for relatively small-scale terminology collections. The current draft version has been restricted to a maximum of two languages. If there is sufficient interest and response from users, the number of available languages could be increased. However, it is unlikely that the number of available fields per language will be increased. Any requirements for a larger amount of information will be met by MARTIF itself, or perhaps by the OSCAR subset. Not only will individuals and organizations then be able to exchange data simply and quickly, it will also be much easier for them to publish this data commercially, for example for sale across the Internet. Nobody is claiming that MARTIF Lite is any revolutionary breakthrough, but it should certainly help facilitate growth in the commercial (and non-commercial) availability of terminology data across borders and languages. The conversion routineAlan Melby has agreed to develop a simple filter/conversion tool to enable conversion between MARTIF Lite and csv format. It is intended to make this tool freely available at no cost, although a final decision still has to be taken. Several tools manufacturers have already lent their support to the MARTIF Lite initiative, and it is to be hoped that all others will do likewise. In addition (as mentioned above), any other application which can import or export delimited text files, including spreadsheets (e.g. MS Excel, Lotus 1-2-3) and databases (e.g. MS Access, etc.) can be used for conversion into and out of MARTIF Lite. Next stepsThe next stage in the definition of MARTIF Lite is to ask users to comment on the draft shown and described above. This draft will also be made available to users through other means, including CompuServe FLEFO, newsgroups and professional association publications. Based on the feedback received, a finalized version can then be published and disseminated as widely as possible throughout the user community, together with the filter/conversion tool. The current target date is early summer 1998. In particular, we would ask users for responses to the following questions:
Please respond to the author at the e-mail address shown below and/or to Alan Melby at: The author wishes to express his gratitude to Alan Melby and Sue Ellen Wright for their participation, encouragement and comprehensive technical advice during the development of MARTIF Lite. [1] ISO 6156: Magnetic tape exchange for terminological/lexicographical records (1987) [2] Melby A. (1988): MicroMater: A Proposed Standard Format for Exchanging Lexical/Terminological Data Files. META, Vol. 36, No. 1, p. 135-160 [3] Melby A., Budin G., Wright S.E. (1993): Terminology Interchange Format (TIF). A Tutorial. TermNet News No. 40, Vienna. This comprehensive article provides a full description of the TIF format, together with DTDs and implementation examples. For a discussion of some of the problems involved in terminology interchange, see: Wright S.E. (1993): Special Problems in the Exchange of Terminological Data. Proceedings of TKE ’93, Indeks Verlag, Frankfurt/Main, 1993. [4] An introduction is provided by Klaus-Dirk Schmitz in: MARTIF. A New ISO Standard for the Interchange of Terminological Data. TermNet News No. 50/51 1995, Vienna, 1996. A fuller description of MARTIF is to be found in: Melby A., Schmitz K.D., Wright S.E.: The Machine Readable Terminology Interchange Format (MARTIF). Putting Complexity in Perspective. TermNet News No. 54/55 1996, Vienna, 1997. [5] TAMA ’98: 4th TermNet Symposium on Terminology in Advanced Microcomputer Applications: “Tools for Multilingual Communication”, Vienna, 15-16 January 1998 [6] The POINTER Final Report is available from http://www.mcs.surrey.uk/AI/pointer/index.html [7] Bonthrone, R: The Business of Terminology—A European Perspective. ELRA Newsletter, Issue 3/1996, Paris. Reprinted in Issue 54/55 1996 of TermNet News. [8] Melby A., Schmitz K.D., Wright S.E.: The Machine Readable Terminology Interchange Format (MARTIF). Putting Complexity in Perspective. TermNet News No. 54/55 1996, Vienna, 1997. The latest information on MARTIF is available at: [9] The following individuals deserve particular mention for their support and advice at this stage: Khurshid Ahmad and Lee Gillam, University of Surrey; Sue Ellen Wright, Kent State University; Olaf Michael Stefanov, UNOV Reference and Terminology Unit; Alexis Crespel, CL Servicios Lingüisticos; Thorsten Mehnert, Wordnet [10] Wright S.E.: Economic Issues of Terminology Management. TermNet News No. 54/55 1996, Vienna, 1997. [11] ISO FDIS 12620 “Terminology – Computer applications – Data categories” defines a large number of possible data categories which can be used in computer-based terminology collections. It has been developed in parallel with the MARTIF standard. More information about ISO FDIS 12620 and its applications can be found at the following address: http://www.ttt.org/clsframe/index.html
Notes:
Robin Bonthrone
|
![]() 8-12 December 2008 |
|||||||||||||||||||||||||