LISA Home page [© 2010 • ISSN 1420-3693 • www.localization.org]
© 2010 SMP Marketing • ISSN 1420-3693 • www.localization.org

In this issue…


DOCWARE - The Document Localisation Workbench

DOCWARE CONSORTIUM (BULL, CNRS, LINGA, The OPEN UNIVERSITY)

The consortium and various industry representatives met in Paris two weeks ago to review the DOCWARE prototype. The final product is destined to become a very marketable system. Here is an overview of the project underlining its concepts, technologies and partnership contributions.


Product Internationalisation and Localisation

This project aims to create methods and tools to facilitate the fast, accurate and economical production of high-quality documentation in several languages. Internationalisation is the process of developing products that can easily be tailored to different cultures and different markets. This process of adaptation is called Localisation. Localisation includes the translation of data in text form, and the incorporation of procedures and customs specific to the culture of the target users.

The Document Localisation process involves:

  • Receiving the Document: this step consists of introducing different items of information for each new document which arrives at the Localisation Centre
  • Conversion of the document into a canonical format, and saving of:
    1. the original word-processing format
    2. the textual content
    3. any graphics or other non-textual content
  • Logical Document Indexing and Delta Extraction: this allows differences between two successive versions of the document to be extracted and easily localised;
  • Control of the Source Language, if appropriate, and Pre-Editing where Machine Translation is to be used;
  • Machine Translation (if any);
  • Proof-reading;
  • DTP (publishing);
  • Distribution and Archiving.

The aim of this project is the implementation of a pilot application for document localisation into many languages, based on Work Group organisation and dedicated workstations. The pilot application will be in the Information Technology domain. It will also be limited to English and French. It will be based on the following organisation: An autonomous entity, called the Localisation Centre, will perform mass localisation for a number of internal or external customers. The Centre's personnel will be composed of technical writers, translators, terminologists and indexers. These people are not software engineers and will require a workstation especially designed to meet their needs. This will be called the Localisation Work Station or LWS. The Localisation Centre will comprise a number of Localisation Work Stations (LWS) connected via a local area network (LAN) to a file server.

This Pilot Application will:

  • use existing products, such as Bull Controlled English and the Systran MT System, without modification: The use of Controlled English not only improves the translatability, it encourages the use of standard, concise text in the source language. Although many grammatical constructions are removed from the source language, the set of constructions supported is expressive enough for writing documentation in a particular domain. The tremendous advantage of this approach is that the person who eliminates ambiguities from the source text does not need to know the target languages.

    Although Systran has been chosen in this project, in fact any MT system can be used, provided that (a) the Source to Target languages pairs are supported, and (b) a good multilingual terminology for the domain is available to the system.

    The performance of MT is measured in terms of its speed in comparison to that of manual translation. If the source text is in complete compliance with the Controlled English rules and if all the words used are in the dictionary, the use of Systran for English to French translation reduces the time to localise a document by three in comparison with localisation using manual translation. The speed of doing a translation is one of the major advantages of using MT. For many applications, a translation must be available immediately. Any significant delay will make the translation useless.

  • develop and enhance existing prototypes,
  • integrate all the preceding tools and prototypes into a comprehensive Localisation Work Bench.

The Localisation Work Station (LWS) Concept.

A unique multipurpose work station has been designed to satisfy the needs of the different types of users who play a part in the documentation Localisation Process: translators, terminologists, technical writers, etc. The definition of as many work stations as professions would lead to the uncontrolled multiplication of configurations, considering how rapidly different professions evolve and how quickly needs change.

Given the diversity of the LWS users, a software layer is needed to:

  • improve ergonomics,
  • eliminate parameters,
  • automate tasks,
  • control data access,
  • supervise operations.

The main functionalities of the LWS are:

  • Automation - to relieve users of all computing tasks, allowing them to concentrate exclusively on the translation and documentation skills. Functions which do not necessarily require human intervention, such as telecommunications control between the different components of the network, or automatic rebooting in case of a problem, are automatically controlled.
  • Saving Context - Context saving gives the user, for each selected document, information about the work: - remaining to be treated, - in progress and - completed. It allows the continuation and/or the recovery of an operation on another station for the same user.
  • Multipurpose Usage - Each LWS can be used by several users on the same document or different documents (non-dedicated station). A user can start a job on one station, and continue it on another one. The work will be saved and is updated at each work session.

The Indexing Workstation and the IMEM Concept

Full-text indexing, search and retrieval, coupled with multilingual localisation will enable production and delivery of Intelligent Multilingual Electronic Manuals (IMEM). Compared to existing products, we introduce a functional and ergonomic "richness", which can be summarised as follows:

  • interactive indexing; through this facility we are trying to solve the problems posed by the chronic limitations of morphosyntactic analysis algorithms, which are the foundation of tools for indexing and interrogation in natural language;
  • multilingualism; by allowing interrogation in a language other than that of the manual;
  • cumulative indexing; for technical manuals, cumulative indexing would allow the subsequent versions of the index to be easily upgraded/updated;
  • incremental indexing; a technical manual can change. We propose that only the additions be analysed;
  • selective indexing; it is not always useful to index an entire technical manual.
  • dynamic composition of response units: We propose that the responses to a given query be defined automatically, depending on the question. We would avoid the time-consuming, repetitive and error-prone task of dividing the manual into homogeneous units which correspond to questions envisaged in advance.

The IMEM concept means delivery of the following elements on diskette (or CD-ROM): the text of the manual, its presentation attributes, its indexes, the associated analysis dictionaries, and a usable Machine Aided Document Reading tool.

Building Marketable Products

Because of its realistic approach in combining linguistic research and professional real-life requirements, the LWS project leads to potential use. The distribution of groups performing localisation is as follows (Future Technology Surveys, Inc) (1992 estimates):

Large translation companies4.3%
Small translation companies8.6%
Individual researchers/professionls5.6%
In-house by Corporations0.0%
In-house by publishers1.6%

LWS's first focus will be on Corporations and large translation Companies with a heavy demand for translation. They represent 54% of the translation market; about 135 million pages. Because more customers are switching to standard hardware/software platforms like UNIX and PC, LWS will have a large potential commercial base.

The Consortium Partners

Bull SA Group

Bull is one of the world largest suppliers of information systems, with a revenue of around $6 billion, active in more than 100 countries around the world. Bull possesses valuable experience in the following NLP fields:

  • Voice recognition & dictation
  • Automatic indexing of textual databases
  • Natural Language querying of relational databases
  • Voice integration in hypermedia office automation.

LINGA s.a.r.l.

LINGA is a start-up company founded in 1991 and specialises in the computational linguistics field. In particular, the company is oriented towards advanced technology of language processing. The company participates as co- ordinator in several broad- ranging projects in the field of terminological workstations. These projects have led to testing of controlled language and multilingual dictionaries re- use technologies.

The OPEN UNIVERSITY

The Open University has some 6000 employees of whom 2000 are located at the headquarters in Milton Keynes. The Open University has an interest in the transnational provision of educational services, and while a lot of these are in English, the importance of providing versions in other languages is recognised.

Centre National de la Recherche Scientifique (CNRS)

The French National Centre for Scientific Research is Europe's largest basic-research agency. CNRS has numerous partnerships with higher education, research agencies and Corporations from Europe and the rest of the world. The Unit called "Informatique Droit Linguistique" is working on DOCWARE. It has special skills and knowledge in computational linguistics and Arabic languages.


Contact Information

Bull
Mr. Rafik BELHADJ, ILO Technologies
Tel: (33-1) 30 80 34 01
Fax: (33-1) 30 80 70 78
E-mail: R.Belhadj@frcl.bull.fr

CNRS
Mr. Fathi DEBILI
Tel: (33-1) 43 50 54 01
E mail: Zribi@idl.msh_paris.fr
LINGA
Mr. Girard DIMANCHE
Tel: (33-1) 34 87 22 99
Fax: (33-1) 34 87 38 13
E mail: LING10@calvacom.fr
OPEN University
Mr Ray HUDSON
Tel: (44) 1908 653096
Fax: (44) 1908 652140
E Mail: R.Hudson@open.ac.uk




Contents


LISA Business Data

LISA Publications Catalog

Industry Insights Reports

Best Practice Guides

Surveys

QA Model

Forum Summaries and Presentations

LISA Globalization Consulting Network

Webinars and TouchPoint Advisory Calls


Join LISA

Subscribe


Upcoming Events

LISA Forum USA
(Foster City, California, April 13–16, 2010)

LISA@Chinasoft Fair
(Chengdu, China)

LISA Forum Asia
(Suzhou, June 28–July 1, 2010)

LISA Forum Europe
(Budapest, October, 2010)

LISA Forum India
(New Delhi, December, 2010)


Open StandardsTBXTMX

Terminology SIG

Job and CV Postings