LISA Home page [© 2010 • ISSN 1420-3693 • www.localization.org]
© 2010 SMP Marketing • ISSN 1420-3693 • www.localization.org
Translation Quality Assessment: An Overview

Geoffrey Kingscott

Geoffrey Kingscott, previously of Praetorius Limited and now a consultant with a particular interest in translation quality assessment (TQA), reviews the history of TQA and provides an assessment of the current state of play.


Geoff Kingscott

For the average industrial customer, there is no problem with quality assurance in translation. A translation, in his/her eyes, is either right or wrong since language transfer is simply a matter of one-for-one substitution. The average customer does not understand how you can give the same source text to ten different professional translators and have, as a result, ten target texts with a considerable measure of individual variation. He does not understand that translation can only ever be an approximation, that every word in a language is so loaded with nuance and cultural variation that exact equivalence is rarely achieved, even between closely related languages.

It is sometimes argued that when it comes to technical translations, exact equivalences will be found, because here we are dealing with specific objects. But this is not so. Even here a language’s cultural way of looking at the world, its Weltanschauung, will come into play. Technical translation (and indeed all translation) often requires supra-textual knowledge, that is, knowledge of how things operate which goes far beyond straightforward linguistic knowledge. But if translation can only be an approximation on the one hand, and requires supra-textual knowledge on the other, will it ever be possible to establish parameters for objective evaluation of translation quality?

A Bit of History

The discussion of what constitutes a ‘good’ translation has been going on for some two thousand years and arguments have raged, usually between those who wanted as close a rendering of the original as possible and those who believed in a ‘natural style’ in the target text. Eventually the ‘natural style’ came to prevail, and is best summed up by Alexander Fraser Tytler in a celebrated book published in 1790: Essay on the Principles of Translation. Tytler formulated (page 16) three rules of translation, which have been so influential, that it is worth quoting them in full:

“I. That the Translation should give a complete transcript of the ideas of the original work.

II. That the style and manner of writing should be of the same character with that of the original.

III. That the Translation should have all the ease of original composition.”

Rule I is today reflected in most systems for measuring translation quality - there is usually a category termed “Omission” or “Completeness.” Rule II encompasses what is today called by the linguistic term “Register” - you do not translate the plain text of an instruction manual intended for dumper truck drivers into more high-flown literary language. Rule III really did finally lay down the ground rule for “Style” - that you must not let the original language stick through, like a sore thumb, into the translation, e.g., German-type embedded clauses surviving into the target English text.

But this is all very general.

So what happens from time to time is that a practicing translator or commissioner of translations pauses in their work, and looks round for some scientific or objective way of measuring what he is doing. Dr. Eugene Nida (now in his late 80’s) is a Bible translator who was surprised to find there was no book setting out a theory of translating, so he sat down and wrote one. And that book, Towards a Science of Translating, with special reference to principles and procedures involved in Bible translating, set in motion what is now a huge international academic discipline - translation studies. There are hundreds and hundreds of publications on this subject, as well as conferences and associations. However, for the purposes of this article, I shall focus on one approach that I think is relevant to TQA.

The Importance of the Skopos Theory

This is the so-called Skopos Theory that was developed in Germany by two translation scholars, Katharina Reiss and Hans Vermeer. In contrast to the more traditional approach - “the source text is sacred” - that derives from literary translation, the skopos theory argues that the target text has to be established according to the purpose, function or skopos that it is intended to fulfill. This is important for evaluating the quality of technical translations. The effectiveness of a translation for its intended purpose has to be the guiding principle in any assessment system. And this has introduced one concept largely overlooked in previous academic discussions - that the potential consequences of an error are more important than linguistic solecisms. So a phrase such as “Now we eats” might infuriate a university teacher of translation, but it is of tiny importance compared with putting 15 mg instead of 1.5 mg in dosage instructions.

In October 1999, Professor Peter Schmitt of Leipzig University told delegates at the first major international conference specifically on TQA, “There are no generally accepted objective criteria for evaluating the quality both of translations and interpreting performance. Even the latest national and international standards for this area - DIN 2345 and the ISO 9000 series - do not regulate the evaluation of translation quality in a particular context. Professional translators rightly expect help from translation theorists, who in turn point to the complexity of the subject matter and who so far have failed to come up with any answers or have suggested criteria and procedures which translators and interpreters find impossible to apply in practice…An industry that is over 2,000 years old and which has an annual turnover of more than 2,000 million marks in Germany alone ought, on the eve of the new millennium, to be developing a clear idea of how to determine the quality of its product, thus creating the basis for practicable quality management systems.”

Quality assurance and TQA are related fields, with some overlap, but must not be confused with one another. Quality assurance in translation was examined at a conference in Luxembourg earlier that year (March 1999) to celebrate the launch of the German standard DIN 2345. The author of this article was the keynote speaker at the Luxembourg conference, and I identified four basic types of quality assurance procedures as follows:

  • bottom-up revision
  • top-down revision
  • qualifications of performer
  • constrained procedures

Briefly, bottom-up revision is where a reviser who is more senior and/or more experienced than the original translator looks at the translation. Top-down revision is where the translator is seen as the expert, but a less-qualified linguist comes along as a sort of sweeper-up and makes sure nothing has been omitted or figures wrongly transcribed, etc. Qualifications of performer is dear to the heart of translator associations: translations should only be done by qualified professionals. Constrained procedures is my shorthand term for procedures such as the ISO 9000 series, which prescribe the way in which a translation is to be carried out.

The DIN 2345 standard, though it included elements of qualifications of performer and constrained procedures, also introduced a new element, by placing emphasis on the contract between work-giver (Arbeitsgeber in German) and translator/translation company (Arbeitsnehmer).

The PEXIS Approach

To my mind, this marked an important step forward, and in a paper I gave to the Leipzig conference six months later, I drew on both skopos theory and DIN 2345 to outline an approach which I called PEXIS (Purpose-oriented Explicit or Implicit Specification). My argument was that the quality of a translation cannot be determined unless the specification is known. Often, there is no explicit specification (though DIN 2345 suggests there should be), but in many cases the text type does suggest an implicit specification. For example, if you are translating a patent application for filing purposes, there is a particular set of parameters that must be followed.

Another paper presented at the Leipzig conference was Dr Kurt Godden’s description of the J2450 metric of the Society of Automotive Engineers. So the dreaded word ‘metric’ was mentioned at last. Kurt Godden, who in 1996 had been Translations Manager for General Motors, had looked round for a system and not found anything satisfactory, so he set out to author one with help from other automotive companies and suppliers.

The J2450 System

The concept of metrics is not new. They have been used in technical writing for some time, and also in quality assurance systems such as Six Sigma. But as far as I know, the J2450 was the first major attempt to use them for translation. The J2450 system has already been discussed in the LISA Newsletter here and here so I will only recall the salient points. It is intended only for automotive service literature, which is one reason why ‘style’ is ignored. Basically there are seven error categories, and every error is evaluated as either serious or minor. The categories, with the number of points that are given to error, are as follows.

Error Categories

There are two guidance rules: when an error is ambiguous, always choose the earlier category; and when in doubt, always choose ‘serious’ over ‘minor.’

One surprising feature of the J2450 metric is that it does not have a category for mistranslation, and this has been worrying the European J2450 committee that has been set up in parallel to the original, U.S.-based, committee. The European committee would like a new category, “Misinterpretation of Source.” The U.S. committee considers that mistranslation is covered by the category “Wrong Term.” The discussion on this point is ongoing.

Another feature that sometimes raises eyebrows is that errors in the translation which are caused because the source text is itself in error are not forgiven, i.e., they are still regarded as errors.

The lack of any room for maneuver in how many points are awarded for an error once the decision has been made on major/minor is not felt to be a problem in practice. “While sometimes this assignment of numeric weights will over-value the severity of an error, it will under-value it at other times. The underlying assumption of SAE J2450 is that these deviations will tend to cancel each other…”

Because J2450 is a measurement tool rather than a revision tool, it does not address what to do in the event of a “Disaster Error,” an error so serious that it could be life-threatening or otherwise highly injurious to the customer’s interest if it is allowed to go uncorrected. Here, it is interesting to look at the approach of Siemens in Germany, which uses a sampling quality system that divides errors into “minor,” “serious” and “critical” and uses a metric weighting system, with a failure point at 30. However, a critical error always results in a “poor” rating and a demand to rework the text. There are also category thresholds, so failure to understand the source text has a limit of two points, which will fail the translation even if the overall total is under the 30 threshold.

Other TQA Approaches

LISA, of course, has its own metric for assessing the quality of localization work, and this can be accessed on the LISA website. It certainly gives a useful set of guidelines for anyone laying down the parameters of localization TQA.

The ITR Company in the UK has released Blackjack, a software program that can be used as a semi-automatized tool for assessing TQA. It implements 21 error categories and a scoring system of 0 to six for each error.

Blackjack screen shot

With 21 error categories, Blackjack is obviously a more complex tool to implement than J2450, but it does allow for feedback on errors with a view to remedial action.

The largest translation organization in the world, the Translation Service of the European Commission, has been steadily increasing the amount of work it puts out to freelance suppliers. It therefore needs a reliable TQA system. Its current system looks like this:

European Commision TQA system

Style, it will be noticed, is not included. “We are not interested in the personal likes or dislikes of the evaluator,” states the European Commission’s Andrew Evans. Some evaluators can, he says, be “over-zealous” (in the translation world, such over-zealousness is known, somewhat indelicately, as the IDTISIC syndrome - “I didn’t Translate It So It’s Crap!”).

For any translation of five pages or less, the EC looks at the whole translation. In longer translations, they look at five pages taken at random.

The Current State of Play

The first conclusion must be that metrics are here to stay. Customers (normally) are not going to revise the translations themselves, so they need to have some assurance that what they are getting is of satisfactory quality. “Before-the-event” quality assurance procedures, such as practitioner-qualification and constrained-procedures can go some way to providing that assurance, but they need to be supplemented by a reasonably objective “after-the-event” way of measuring translation quality.

The second conclusion that has emerged from experience is that the identification of error categories and the weighting they are given depend very much on the type of text involved. My own opinion is that more work needs to be done on text typology, but that most large-scale translation projects do fall into a reasonably small number of easily-definable text types.

Experience, particularly with J2450, shows that the establishment of TQA processes is lengthy and time-consuming. A third conclusion therefore is that there should be no unnecessary re-inventing of the wheel. Experiences need to be pooled and shared, and this issue of the LISA Newsletter will make a major contribution in this direction.

A fourth conclusion, emerging from J2450 experience, is the importance of training (and evaluating) the evaluators, so that everyone is singing from the same hymn sheet. The maverick evaluator, who categorizes even the tiniest punctuation error with the severest mark possible (IDTISIC syndrome), or the over-indulgent linguist who fails to imagine the consequences of an error, can wreck confidence in the procedure. We can learn a lot here from the experience of language testing, which has become something of an exact science. One day there could be an Institute of Translation Quality Assessment, perhaps located in one of the interested universities (Saarbrücken, Leeds, Surrey, Leipzig and the Monterey Institute of International Studies all come to mind).

A fifth conclusion is to recognize that translations are not produced in isolation, in unreal laboratory conditions. The European Commission’s inclusion of reliability in delivery as a factor in evaluating TQA must surely be taken up in other systems (Siemens also evaluates delivery). As Andrew Evans told a recent TQA seminar at the University of Aston in the UK, “A translation which is perfect is still no use if it comes in after the meeting has already taken place.”


For nearly 20 years, Geoffrey Kingscott ran translation companies in the UK (Praetorius Limited) and the U.S. (Praetorius North America Inc.), before selling these in October 2001 to the Logos Group. He is now a freelance consultant with a particular interest in translation quality assessment. Kingscott is the author of many articles on various aspects of multilingual documentation and can be reached at geoffrey.kingscott@btopenworld.com.




Contents


LISA Business Data

LISA Publications Catalog

Industry Insights Reports

Best Practice Guides

Surveys

QA Model

Forum Summaries and Presentations

LISA Globalization Consulting Network

Webinars and TouchPoint Advisory Calls


Join LISA

Subscribe


Upcoming Events

LISA Forum USA
(Foster City, California, April 13–16, 2010)

LISA@Chinasoft Fair
(Chengdu, China)

LISA Forum Asia
(Suzhou, June 28–July 1, 2010)

LISA Forum Europe
(Budapest, October, 2010)

LISA Forum India
(New Delhi, December, 2010)


Open StandardsTBXTMX

Terminology SIG

Job and CV Postings