LISA Home page [© 2008 • ISSN 1420-3693 • www.localization.org]
© 2008 SMP Marketing • ISSN 1420-3693 • www.localization.org
TMX Special Issue

Arle Lommel, LISA

TMX - a standard ahead of its time

In 1997, the year LISA's OSCAR special interest group was started, I attended the LISA Workshop on Integrating Advanced Translation Technology (IATT) in Washington D.C. Most of the attendees at the IATT workshop were individuals coming to learn about translation technologies and had little previous experience with translation technologies. A good deal of time in both the IATT workshop and in sessions conducted by vendors in the LISA Forum was spent on simply explaining what Translation Memory is and why it would be useful. Just five years ago many in the localization industry still weren't fully aware of TM and its importance, or needed to be convinced that it would affect them.

Times certainly have changed, but it is useful to look at how recently the things we now think we've always known were still new and untested. I would like to suggest that this is one reason for what some perceive to be slow implementation of TMX - OSCAR was ahead of its time in even addressing the problem of translation memory exchange in two ways:

First, if the majority of attendees at the IATT workshop had to be convinced that TM was something that they needed to use, it is no wonder that most of them were not looking at exchange issues at that point.

Second, TMX was the first publicly-defined and available standard based on XML (itself not finalized when work on TMX began!). TMX anticipated a need that was just then beginning to be perceived as an issue, and it used the latest means to achieve it. This put TMX in the position of being so far ahead of the crowd that there has been a lag between TMX and the progress of translation tools vendors and customers. TMX had to sit, so to speak, until implementations caught up with it and could finally begin to demonstrate where it was successful and where changes were needed.

Since 1997 adoption of TM and other technologies has been fairly rapid - by 1999 most attendees at LISA events were using TM and knew the technology in (perhaps painful) detail, or were in the process of implementing TM. However there is a lag between first adoption of TM and the point where TM repositories are large enough where exchange becomes worthwhile - if a company only has 12,000 segments in their TM repository it represents a much smaller investment than if they had 6 million segments, a figure that is not unreasonable today. So TMX has also had to wait for maturation of TM assets, something that takes years to happen.

Since 2000 the need for TMX has been more and more acutely felt, and industry adoption of TMX has been quite high (approximately 25% of companies surveyed now use it in some form), but often without any real direction. Some of the issues that have surfaced recently (such as the effects of different segmentation methods on TMX usability) simply could not have been anticipated in 1997 and only now have become apparent. While complaints about TMX's limitations are real, I would say that TMX was remarkably well designed considering few had any experience with translation memory exchange issues at the time.

TMX has been remarkably stable since it was first issued to the public. Although now on version 1.4, the changes have been fairly minimal, and credit should be given to the original OSCAR team that succeeded in creating a standard with enough foresight to allow for its use today.

Is TMX perfect? No, but OSCAR is working to make it so. Is it usable at present? Yes, and it seems that some companies may already be using it as their main format for Translation Memories (see information below in the TM Survey results). Given current work in OSCAR's Segmentation Working Group it seems that TMX will become even more useful and that the time when TMs could be freely shared between tools with minimal difficulty is on the not-too-distant horizon.

Preliminary results of the LISA TM Survey

As many of our readers are no doubt aware, LISA has been conducting a survey regarding TM usage. Although results are preliminary, a few interesting points have already surfaced:

  • The majority of respondents use TM for the bulk of their translation work and are generally quite sophisticated in their understanding of the limitations of TM. The most common reason cited for not using TM on some projects was that work was received in formats that TM tools don't support, or was received in hard copy. But other respondents indicated that they didn't use TM for all their work because they dealt with text types not amenable to TM (marketing, literary text, extremely short texts, etc.). Awareness of TM and what it can be used for has certainly gone up in recent years.
  • Roughly 2/3 of those who responded to the survey plan on increasing their use of TM. In most cases they plan to do this by using TM for formats they do not currently use it for or for languages they haven't used it for yet. Of the remaining 1/3 who answered that they weren't expanding usage to increase, the majority were already using TM for everything they could use it for. In total some 90% of TM users are either already using TM for everything they can or soon will be.
  • The bulk of TM users (approx 70%) have been using it less than 5 years, with roughly 10% having started use in the last year. Not surprisingly, given the cumulative nature of TM data, length of use correlates very strongly with the size of TM assets a company has. The average number of TM segments held by companies using TM for over five years is in excess of 4 million, but drops to roughly half a million segments for those using TM for one to five years and only about 15,000 segments for those using TM less than a year.
  • The larger the translation volume a company handles the longer it is likely to have used TM. No company among those surveyed with more than 20 million source words per annum has been using TM tools for less than a year and the majority of these high-volume companies have been using them for more than 5 years. This is not surprising, given the potential savings of TM on high volumes.
Comparison of translation volume and years in which TM is in use

Figure 1. Years of TM use versus translation volume. Columns show relative numbers of survey respondents in each category.

  • When asked if they would be interested in buying, selling, or trading Translation Memory assets (in transactions with other companies), 65% of respondents indicated willingness to participate in a market for TM assets in some form, with 50% interested in exchanging with other companies. Although such a market does not presently exist and certain legal issues may cause problems, the rise of web services and web-based TMs may make this sort of market feasible in the foreseeable future. This is one trend that will need to be explored further. At this point it is impossible to make any generalizations about the companies interested in a TM asset marketplace, but the number would be considerable.
  • Approximately 27% of those surveyed are currently using TMX in some capacity. Surprisingly there seems to be no real trend for companies with large TM repositories at present to make use of TMX any more than businesses with small repositories. If anything the largest TM repositories tend to be in proprietary in-house formats. Intriguingly 6% of those responding indicated that they use TMX alone (with neither in-house nor commercial tools) - this result needs to be investigated further to determine how companies can be using TMX alone for their needs.

Those who adopted Translation Memory tools around five years ago today have large and mature TMs. When TMX was first proposed the number of companies with multi-million segment TM repositories was quite small, but their numbers are growing - and the real need for TMX is growing with this. It is possible to discern in the data from this survey a coming swell of companies with large and valuable TMs who may find TMX useful, if not vital.


Arle Lommel

Arle Lommel is LISA Publications Manager. A native of Alaska, he currently resides in Indiana. In addition to working for LISA, he is an emeritus member of the Brigham Young University Translation Research Group (TRG), a Provo, Utah-based translation, theory and technology think-tank directed by Dr. Alan Melby, and has edited a number of books on linguistics.




LISA 2008 events

Advertise with LISA


ADAPT Localization

LISA Forum Europe

8-12 December 2008
Registration Open


LISA Surveys

EventsNews

Joining LISA

Best Practice Guides

LISA Wireless Primer


OSCARTBXTMX

Terminology SIG

Job and CV Postings