LISA Home page [© 2010 • ISSN 1420-3693 • www.localization.org]
© 2010 SMP Marketing • ISSN 1420-3693 • www.localization.org

In this issue…


Standards

IP vs. Customer Satisfaction: EuroTermBank and the Business Case for Terminology Sharing

Signe Rirdance, EuroTermBank Director, Tilde

A lot of attention has recently been duly given to the benefits and ROI of terminology management within a company. The benefits of having everybody in a company using and contributing to the same terms must be obvious to the readers of the Globalization Insider. However, should terminology management stop there? It might be worthwhile to take note of developments in the wider field of terminology.


Signe Rirdance
With no claim to being a comprehensive overview, this article attempts to provide a few glimpses into terminology sharing, a concept that has roots in terminology standardization efforts by national and supranational organizations, as well as in the philosophy of open-source and wiki-type knowledge management. In particular, this article will focus on the multilingual terminology portal, EuroTermBank, a European undertaking that provides some important lessons learned and a future vision for terminology sharing.

Why should companies share their valuable and costly assets?

Terminology Sharing

At first thought, sharing of terminology within and across vertical industries may not seem a plausible idea for businesses, after having (eventually) made the investment in acquiring and implementing a terminology management system, hiring and training the required staff members, and making sure that all of the internal and external stakeholders are included in the terminology management lifecycle at the right time in the right way. Why share their valuable and costly assets?

The answer is simple – because it is going to be increasingly to one’s own business benefit. As Multilingual Computing predicts in its April/May editorial, sharing language resources, including terminology, is set to become a common practice, driven by advances in translation technology and customer requirements. They give an example of a comparison of 200 core technical terms in the field of CAD/CAM across three leading CAD/CAM companies, which show that 50% of their German translations differ for the same English terms. And any experienced translator can quote multiple instances of unsubstantiated terminological variants used by their clients for terminology in the same or related fields that cannot possibly have anything to do with competitiveness or unique branding.

What about customer satisfaction: “interfeiss” vs. “saskarne?”

Translators get paid for managing lists or databases of terminologies to be used for each of their clients, but how about users? How does it increase my satisfaction as a user that file is called fails by half of the software I use in Latvian, my native language, but datne by the other half? Or that half of them translate interface as interfeiss, while the other half go for saskarne? Knowing how much has probably been invested by software producers to create and manage this widely differing terminology for each of their products in multiple languages, it seems unfortunate that the step of streamlining and harmonizing terminology within the industry/ies represented on my laptop (and related gadgets such as mobile applications, and the like) has not been taken – yet. The challenge is that this step cannot be taken without companies being ready to share the non-differentiating, non-competing part of their terminology.

Sharing involves contributing and benefiting from a return.

The same benefits that make a compelling case for a company to start managing its terminology apply just as well to sharing and unifying terminology:


  • increased customer satisfaction

  • reduced time-to-market

  • easier quality control, etc.

Specifically, terminology sharing provides the advantage of promoting one’s well-developed terminology the industry standard terminology, and improving the inconsistent or under-developed areas by leveraging expertise from others. Thus, sharing involves both giving and taking, contributing and benefiting from a return.

Materializing this vision creates a number of new challenges, such as (1) an appropriate platform and (2) a set of tools that support sharing, methodologies and best practices, and (3) a sustainable business case. It would be premature to announce solutions to these challenges. On the contrary, we find ourselves at the starting point for defining the area of terminology sharing and its list of challenges.

At the same time, there are a few implementations to learn from. One example certainly is the sharing and harmonization of terminology taking place across the multiple institutions of the European Union, with its 23 official languages as of this year, a translation budget of € 800 million in 2006, and well over 2,000 in-house staff in translation services, along with numerous contractors. Their shared terminology repository (IATE), which went public in June 2007, includes EU-specific terminology from a number of legacy databases created by multiple EU agencies and is intended, within EU institutions, to serve as the platform for terminology harmonization. (For more information, read The Translation Challenge at the European Commission: Multilingualism as a Democratic Right.)

EuroTermBank in Brief

LISA home page

Another initiative of a similar nature, the multilingual terminology portal EuroTermBank (www.eurotermbank.com), has publicly available since January 2007. Developed within the framework of the European Commission eContent programme, it addresses the growing requirement and readiness to share, consolidate, and harmonize multilingual terminology resources. EuroTermBank Consortium members are universities, research institutions and private companies that each provide their specific expertise to the consortium (for the full list of members, see http://www.eurotermbank.com/About.aspx).

EuroTermBank provides a one-stop access point to multilingual terminology, through the consolidation of terminology collections in its own database and the federation of interlinked terminology databases maintained within external termbanks. The EuroTermBank system accesses interlinked, external terminology bases and displays consolidated results of querying collections in its internal database and resources from the interlinked external databases. It interacts with both human users, through a web browser, and machine users, through its API.

Currently, EuroTermBank enables searching within approximately 600,000 terminology entries, containing over 1.5 million terms in various languages originating from approximately 100 terminology collections, along with 300,000 terms in external, interlinked termbanks.

The initial focus of EuroTermBank has been on terminology collections from the “new Europe,” including Estonian, Hungarian, Latvian, Lithuanian, and Polish terms and their equivalents in English, German, French, Russian and other languages (overall, almost 30 languages). A wide variety of subject fields are covered, including a large number of EU terms, as well as terminology used in technology, sciences and economics.

Figure 1. Coverage of subject fields in the EuroTermBank terminology collections.

Best Practice Methodology

To create a terminology bank that supports handling of diverse terminology collections, the EuroTermBank Consortium partners worked to identify and evaluate current terminology processes, actors, standards and best practices in participating countries and throughout the world. As a result, best practice methodology for multilingual terminology management was created that focuses on a number of important terminology processes, e.g., terminology workflow, concept analysis, data structure, exchange formats, among others. These were then analyzed and recommendations were provided on three levels: local, national and international. This structured approach facilitates the identification of best practice that is specific to typical actors and scenarios for terminology management.

Key results of this research are available in the monograph, Towards Consolidation of European Terminology Resources. Experience and Recommendations from the EuroTermBank Project (please contact the author of this article for your complimentary copy).

Application of Standards

It was realized early in the project that the only way to manage the enormous heterogeneity of formats and data structures present in diverse terminology collections is by rigorous implementation of applicable international standards.

To describe the multitude of various terminology resources across participating countries, the TeDIF (Terminology Documentation Interchange Format) standard was used, which establishes a common format for bibliographical and factual terminology data.

For its data model, EuroTermBank developed a comprehensive data structure that complies with the following three ISO standards:

  • ISO 12200, specifying the machine-readable terminology interchange format (MARTIF)
  • ISO 12620, specifying the data categories used in computer applications in terminology
  • ISO 16642, defining the terminological markup framework (TMF) for computer applications in terminology
To enable easy term import, export and exchange with other terminology databases, the LISA TBX (TermBase eXchange) standard is used. As each terminology collection is structured differently, a number of converters have been developed or adapted for various types of resources. However, one can expect a proliferation of TBX adopters as it becomes an ISO standard, which will in the future provide for the seamless exchange of terminology resources.

Federated Approach for the Consolidation of Distributed Resources

In addition to the inclusion of terminology content in its database, EuroTermBank proposes terminology consolidation at the level of uniting dispersed terminology databases in a federated system of interlinked resources. To ensure the viability of this system, inclusion of a termbank in this federated model requires it to be independently supported and maintained at both the institutional and technical levels.

The federation of terminology is a new phenomenon.

The federated approach to terminology consolidation provides a solution to at least one inherent challenge of all terminology banks – the maintenance of terminology is performed at the local or national level, with the changes then instantaneously available for integration with other federated resources.

An important by-product of this approach is the promotion of a unified methodology for terminology work and the proliferation of know-how for industry standards.

The federation of terminology is a new phenomenon, so there are a number of challenges yet to be faced. For example, (1) ensuring the reliability of the sources or of the source data in case an important resource of the federation becomes unavailable (temporarily or permanently), and (2) ensuring a unified approach to change management on all levels, from data structure to the changing terminology content and preservation of legacy data.

Entry Compounding

Another common challenge to termbanks that becomes more visible in a federated model is how to map the diverse subject field classification systems upon which each one of them is based. Any attempt at consolidation of terminology resources from various sources that belong to the same subject field will necessarily have to deal with the challenge of identifying, verifying and merging (collating) matching terminology entries.

If the terminology bank contains entries coming from different collections that designate the same concept in various languages, there is an obvious interest to merge them into one unified multilingual entry, thus reducing the number of identical entries. For example, displaying one merged entry for a term pair such as EN computer – FR ordinateur, instead of 5 identical entries for this term pair, each from a different collection. Furthermore, if the database contains an entry for EN computer – LV dators, in addition to the above EN-FR term pair, a new candidate term pair can be established: FR ordinateur – LV dators, for which the English term has served as the intermediary element.

In reality, merging entries automatically on the basis of a matching term in one common language, as in the example above, will lead to many erroneous term correspondences. The only error-free method for merging entries is the manual evaluation by a terminologist or an expert to confirm whether or not these entries indeed denote the same concept. However, this may not be an option due to costs or the unavailability of qualified experts, especially for consolidation of huge terminology collections spanning multiple languages and subject fields.

In view of these problems, EuroTermBank proposes a practical solution by introducing the automated terminology entry compounding approach for matching terminology entries based on available data. As in a machine translation environment, it prompts the user about potential incompatibilities and errors. Also, its application is limited to a data representation method that does not propose creation of new permanent term entries. Rather, it is a visualization aid that displays entries with certain matching parameters across selected collections.

Figure 2. A compounded entry with terms in 7 languages, based on 3 original entries from different collections for the term hydraulic turbine.


However, further development of entry compounding methodologies and practices will be of utmost importance for (1) displaying shared terminology, (2) identifying and dealing with matches and inconsistencies across shared collections, and (3) automated uses of shared terminology resources.

To Be Continued …

Many questions still need to be asked and solutions found before terminology sharing becomes widely accepted. Not least among those are concerns over confidentiality and how to secure one’s intellectual capital. How can companies make sure that they share only that part of their terminology that they are ready to move to the public domain? Or, what strategies should companies adopt to fully utilize the feedback received from comparing their terminology to that of others? Some of the challenges have been identified and addressed in this article. Sharing of all types of feedback, thoughts, criticisms and insights are welcome, as they will help to create a more comprehensive framework for moving forward in this very important area.

For a related presentation, see Globalization of Terminology: Challenges and Solutions, by Signe Rirdance and Christian Galinski (Director, Infoterm). (You must be a LISA Member and logged into the LISA web site to be able to download this presentation.)


Signe Rirdance is the EuroTermBank Director and can be reached at signe.rirdance@tilde.lv.



Contents


LISA Business Data

LISA Publications Catalog

Industry Insights Reports

Best Practice Guides

Surveys

QA Model

Forum Summaries and Presentations

LISA Globalization Consulting Network

Webinars and TouchPoint Advisory Calls


Join LISA

Subscribe


Upcoming Events

LISA Forum USA
(Foster City, California, April 13–16, 2010)

LISA@Chinasoft Fair
(Chengdu, China)

LISA Forum Asia
(Suzhou, June 28–July 1, 2010)

LISA Forum Europe
(Budapest, October, 2010)

LISA Forum India
(New Delhi, December, 2010)


Open StandardsTBXTMX

Terminology SIG

Job and CV Postings