LISA Home page [© 2010 • ISSN 1420-3693 • www.localization.org]
© 2010 SMP Marketing • ISSN 1420-3693 • www.localization.org

In this issue…


Change Management in Language Engineering - ELRA

Robin Bonthrone [1]

The European Language Resources Association, ELRA, elected its first permanent Board of Directors at its General Assembly on 25 September 1995. The date marks a further milestone in the life of the Association, which was founded in the spring of this year and has since attracted c. 70 members from all over Europe. ELRA aims to promote the creation, distribution and usage of language resources in all their machine useable forms, and in particular spoken resources, written resources (both lexica and corpora) and terminology. This article gives a brief description of the Association, its aims and activities, and the reasons for its foundation.


A multilingual Europe

To start with the last point first, let us examine briefly the reasons behind the expansion of the language industry. It has become almost a clich' to say that we live in a global village (or as Peter Drucker has put it perhaps even more tellingly, in a "global shopping center" [2]). For the first time since the spread of the human race, the world has a truly multinational economy, global political institutions and global competition. At the same time, the peoples of it still speak an estimated 4,000 to 6,000 different languages. Even though many of these would seem to be endangered [3] , a good 600 or so are still spoken by upwards of 100,000 people, and are can therefore be classed as "reasonably safe" [4] . Multilinguality is therefore a fact of life, and of business life as well - something that LISA members will need no reminding of.

What is true of the world as a whole is equally true of one of its major political and economic blocks, the European Union. The most recent extensions have given it 11 official languages and a host of minority and lesser-used ones. In addition, the opening up of Central and Eastern Europe has created new markets on the fringes of the European Union - ones for which further integration is very much an objective. In both these areas, multiliguality is a political, social and economic factor of note. One reaction to this is the "Bangemann Report" which places strong emphasis on the need for rapid availability of tools and resources to enhance communication in a plurilingual society.

A further point is that Europe also has a growing need for communication with other parts of the world. Although English is firmly established as the lingua franca of both business and politics, it is by no means universally sufficient, and translation or localization of communications, products, and marketing collateral can often provide a decisive competitive advantage. Indeed, some sources suggest that the amount of translation into English is stagnating or declining [5]. At the same time however, traditional resources such as dictionaries and glossaries, and more modern ones such as term banks and text corpora are often not (freely) available, or of questionable quality if they are.

The rise of language engineering

This leads on to the next point, computer-based language applications. The backbone of the new global economy - in both senses of the word - is made of hardware and software. Computers (and especially microprocessors and PCs) have revolutionized our lives in the past fifty years, giving rise to whole new industries and applications, including localization and internationalization. The integration of this technology into mechanical processes and manufacturing is well advanced, and a comparable - but as yet less pronounced - development is now visible in white-collar and intellectual work - including language-related issues. Space is too limited in this article to give an overview of the emerging language engineering industry, and the trends within it, but two examples which are readily recognizable examples to LISA members can perhaps be given: machine translation and voice recognition systems.

Of course, these examples are also proof that computers have some way to go before they can cope fully with multilinguality, or indeed monolingual natural language. This is only natural given the size and complexity of the problems involved, and considerable progress has been made in the past few years both in terms of research results and in terms of a basic reorientation towards user-friendliness, or at least consultation [6]. However, much remains to be done, especially as the industry as a whole is still small, underfunded and, in may cases, highly fragmented (the last fact is perhaps especially true of terminology, with its potentially vast numbers of users, vendors and applications). In all areas, though, considerable investments are needed to build up future infrastructures and promote activities designed to provide effective solutions within the medium- to long term.

Give us the tools...

When it comes to promoting language research and solutions development, the lack of machine-readable resources for all sectors of language engineering is even more of a problem than conventional media. For example, statistical approaches to MT, terminology extraction, lexicon generation, etc. are only possible if the corresponding data and text bases - both mono- and multilingual - are available. At present, though, resources are lacking for even the major European languages and subject fields, and this problem is even more acute in the case of lesser-used languages and more specialized domains. Another point is accuracy: the faster-moving the industry, the greater the need for new, and updatable resources.

A further problem is the dissemination of those high-quality resources which do exist - all too often, the lack of a distribution mechanism means that they are not reutilized outside their original sphere of application. Publishing high-quality dictionaries is in most cases is a fine and relatively unlucrative art, which means that only a small number of titles appear, and these are mainly confined to those sectors in which an ROI is more certain. In addition, the dictionary production process means that many works are no longer leading edge by the time of publication. Another problem is the invisibility of so-called "grey literature" (i.e. literature which is not published and therefore has no ISBN or other reference number, or which appears as part of another work, such as a glossary in a text book). Similarly, there is a lack of commercial awareness amongst the holders of resources in both the public and the private sectors, which often leads to either over-protectiveness (expressed as a fear of giving away a competitive advantage), or extreme generosity (and sometimes lack of discrimination) in disseminating results.

This is not to deny that there are a number of real problems currently hindering full professional dissemination. These include copyright and data security issues (obviously particularly acute with online media), and the numerous different aspects of validation and quality assurance (process quality, fitness for original purpose, customer expectations, etc.). Resource distribution facilities stand and fall with the quality of their contents, and users must know what they are buying (or being given). Therefore, contents validation and quality control procedures need to be developed and implemented - a task which, like the copyright issues, requires both conceptual and practical work to be successful.

Last but not least, a need was felt by the European Commission, national governments and other potential funders for an expert body to advise them on matters of policy and technology in the language engineering sector. This would help to ensure that the considerable funding needed to erect such a language infrastructure are spent wisely.

...and we will finish the job

All these reasons were behind the decision to create a European body to promote the creation, distribution and usage of language resources. These are normally divided into three areas: speech, written resources (i.e. both corpora and lexica) and terminology - a classification which was also adopted in the structure of the new European Language Resources Association with its three colleges, or subject fields. A further starting point was the model provided by the LDC (Linguistic Data Consortium) in the United States, which was set up to facilitate the distribution of resources in that country and beyond.

A number of working meetings attended by representatives of the RELATOR, PAROLE, SPEECHDAT and POINTER projects (all working in the area and co-funded by the European Commission), plus appropriate national bodies and other major players led to the appointment of an Interim Steering Committee chaired by Brian Oakley. This Committee was responsible for conducting business, including ensuring a financial basis for ELRA's work and appointing its Chief Executive Officer, in the period up to the September General Assembly. The new Association was formally registered as a non-profit body in Luxembourg towards the end February 1995. Almost half of its 16 founder members (drawn from nine European countries were commercial organizations, while the rest were public and academic bodies. In the six months since registration, membership has climbed to nearly 70.

Aims and objectives [7]

ELRA aims to validate and distribute European language resources that are offered to it for that purpose. In addition, it will serve as a clearinghouse for information on language engineering, gathering data on market needs and providing high-quality advice to potential and actual funders, including the European Commission and national governments. Equally, it will promote the development and application of standards and quality control measures and methodologies for developing electronic resources in the European languages. In time ELRA aims, in its own words, "to become the focal point for pressure in the creation of high-quality and innovative language resources in Europe".

To this end, the four European projects mentioned earlier have already started work on collecting and classifying information on existing resources in the European languages. This nucleus of information will be taken over and extended over time to produce a catalogue and library of resources, which will be made available via a variety of different media. In this way, potential users can either obtain direct access to the resources they require, or at the very least be presented with a mechanism for obtaining further information from (and negotiating licensing rights with) their owners. In the case of direct access, ELRA will naturally have to conclude distribution agreements with the developers or owners of the language resources in question. The Association's distribution team will be active both within Europe and throughout the rest of the world, working either directly or through agents, as the local situation dictates.

Similarly, ELRA aims to conclude cross-licensing agreements with similar bodies in other continents, in order to maximize the market for European resources, and to provide an additional service to European users. It welcomes cooperation on the basis of mutual access with organizations throughout the world. Thus the LDC in the United States has already formally welcomed ELRA's foundation, and relations will be expanded in future. Profits from distribution activities (and indeed any other income) will be reinvested in the work of the Association, and used to encourage the creation of resources where there are gaps which are unlikely to be filled by normal economic forces within a reasonable time-scale.

The pursuit of quality

As regards validation, a sub-committee of the ELRA Board will be set up to oversee the development and implementation of the relevant models and procedures. This work - and the results obtained - will probably differ to some extent in each of the three language engineering areas, since the precise problems are also different. Thus, for example, speech is heavily application-oriented, while corpora are often more broadly focused. Terminology, on the other hand, can either take the form of input for a specific application or be a standardized (or standardizable) resource.

In addition, the validation schema developed must take account of the widely differing purposes - ranging from in-house experiments to for-profit publication - for which existing resources were developed. Such variation obviously requires a flexible but clear and user- friendly system of judgment if misunderstandings - and disappointment - are to be avoided. Over time, methods of automating the work will be developed where possible, but to start with validation will be performed using a mixture of factual statements classifying material in accordance with a standard framework, and material of a more qualitative nature, provided by independent experts in the field.

As well as creating its own validation and quality control methods and procedures, ELRA will, of course, encourage the use of international standards in the products that it distributes. The Association's aim is not to reinvent the wheel but, on the contrary, to facilitate and promote existing initiatives wherever possible, while filling, or helping to fill, the gaps. Thus it will also stimulate and support the creation of appropriate standards where there is a need, working through bodies like EAGLES and the ISO committees.

The organization of organizations

Membership of ELRA is open to all interested organizations in the public or private sector in Europe, be they language resource providers or users. (The definition of "Europe" follows that of the International Postal Union - i.e. it is rather wider than the list of member states of the European Union or the European Economic Area.) Individual members can be admitted in exceptional cases at the discretion of the Board. Membership fees are currently 1,000 ECU a year, a figure consciously set as being low enough to encourage widespread participation, but high enough to discourage frivolous applications. In addition, it is planned to introduce associate membership status at some point to allow the participation of bodies from outside Europe.

In between General Assemblies, the Association's governing body consists of a Board comprising 12 members of the Association elected by a General Assembly of all full members. Voting is in two parts: a general list and a college list. The general list covers six places (i.e. half) of the Board, with candidates being nominated and selected at large (i.e. by all members irrespective of their affiliation to a particular college or colleges). The remaining six Board places are filled as the results of elections within the individual colleges, with two places being available for each. In this way, a balance can be achieved between the different interests, stages of commercial development and numbers of players in the three areas, avoiding the dominance of any particular one. Day-to-day operations concerned with the gathering and distribution of resources will be performed by the Chief Executive Officer and a small back- up team skilled in marketing and business negotiation, and in the technology of language engineering.

This team will also provide support and advisory services to the Board and Association members, including information on matters such as standards, IPR and commercial ownership rights, and the use of resources (where necessary and sensible, external professional advice will also be sought). In addition, the Association will issue a regular newsletter to keep its members informed of activities and developments in the field, both in Europe and worldwide (in the early days it may rely on the good offices of ELSNET to distribute information). Last but not least, ELRA will always act as a source of information for the journals and media in the field, including the use of bulletin boards on the Internet as appropriate.

Footnotes

[1] Robin Bonthrone, a managing partner in Fry & Bonthrone Language Consultancy and Services, represents the DIT (German Terminology Institute) in ELRA. He was elected Secretary to the Board in September 1995.

[2] Peter F. Drucker "Management - Tasks, Responsibilities, Practices", USA 1973,74

[3] Steven Pinker, "The Language Instinct", UK 1994

[4] Pinker, ibid, citing research by Michael Krauss

[5] cf Lawrence Venuti, "Translation, Authorship, Copyright" in "The Translator: Studies in Intercultural Communication" Vol 1. No. 1

[6] This was, for example, a common theme at the MT Summit V in Luxembourg in July 1995

[7] I am obliged to Sarah Houston, Assistant to the ELRA Board, for providing additional background material and the logo at extremely short notice


Robin Bonthrone
Fry and Bonthrone Language
Consultancy and Services
Rochusplatz 10
D-55252 Mainz-Kastel
Tel: (49) 6134 22504
Fax: (49) 6134 22860
Email: Compuserve 100277,3467


The ELRA Board

President:Antonio Zampolli (I)
Treasurer:Tom Schneider (D)
Vice Presidents:Norbert Kalfon (E),
Joseph Mariani (F),
Angel Mart'n-Municio (E)
Secretary:Robin Bonthrone (D)
Board Members:Lou Boves (NL)
Georges Carayannis (GR),
Guiseppe Castagneri (I),
Christian Galinski (A),
Harald H'ge (D),
Benthe Maegard (DK)
Chief Executive:Khalid Choukri (F)
Assistant:Sarah Houston (L)
Contact address:ELRA - Attn. Sarah Houston
c/o CL International, 46 Grand Rue, L-1660 Luxembourg
Tel.: +352 46 91 60, Fax.: +352 46 91 61
e-mail: 100126.1262@compuserve.com



Contents


LISA Business Data

LISA Publications Catalog

Industry Insights Reports

Best Practice Guides

Surveys

QA Model

Forum Summaries and Presentations

LISA Globalization Consulting Network

Webinars and TouchPoint Advisory Calls


Join LISA

Subscribe


Upcoming Events

LISA Forum USA
(Foster City, California, April 13–16, 2010)

LISA@Chinasoft Fair
(Chengdu, China)

LISA Forum Asia
(Suzhou, June 28–July 1, 2010)

LISA Forum Europe
(Budapest, October, 2010)

LISA Forum India
(New Delhi, December, 2010)


Open StandardsTBXTMX

Terminology SIG

Job and CV Postings