|
In this issue…
Document Alignment Techniques and Consistency Checking
(An ATPT by any other name...)
Earlier this year at the LISA Annual meeting in Dublin, members got a preview of a running prototype to automatically align texts and to check for terminology consistency across texts. A number of LISA members signed up for the ATPT project (Automated Translation and Proofreading Tool), offering consulting, testing and sample data in exchange for licensed access to the prototype and potential exploitation rights. A new version of the tool has attracted the attention of the Swiss Government. A final proposal is being submitted with a projected starting date in April '94. The revised ATPT project aims at the design and development of a program to align parallel (multilingual) texts and to check for translation consistency across documents. Translation consistency and reliability is a growing problem for all those involved in multilingual document production. Applying advanced NLP techniques to construct new tools for identification of inconsistencies and to aid in the subsequent editing and proofreading tasks can help alleviate this problem. What is needed are means to recognize corresponding parts of texts and their translations (alignment) and ways to evaluate a variety of translation quality criteria in the corresponding parts. Though simple alignment techniques are now well-known in the academic sector, they are not yet generally available in commercial products. Current translation tools align texts as they are translated, they do not take previously translated texts in a variety of formats to automatically feed the memory. Moreover, current alignment algorithms view texts as a basically flat stream of characters (or series of words in segments such as sentences) though actually texts are complex, structured objects consisting of chapters, sections, paragraphs, sentences, phrases, etc. or screens, command lines, etc. with complex links between the parts. And current tools sold as 'memory managers' do little more than simple pattern matching over sequences of characters to retrieve previously aligned sentences. They do not provide any diagnostics on the quality of the translations (neither in the case of old translations as stored in the memory, nor in the case of new ones as they are created). The new project will extend known techniques to allow for sophisticated alignment mechanisms which can take into account a more complex view of a document as a richly structured object. In addition, with the elaboration of an extended set of linguistic objects and their translation properties, a range of translation quality control criteria can be established. The successful incorporation of document structure in alignment techniques will assure a future in developing a prototype that can lead to a range of useful and sophisticated applications. ISSCO again encourages LISA members to either participate actively or in a consulting role. We are committed to sharing this technology either under the umbrella of a LISA organized Special Interest Group (e.g., SIG - TOOLS) or as a project in it's own right. INFOTERM has agreed to participate to help assure that the terminology interface receives adequate treatment - i.e. in issues such as format, data associated to the terms and NLP questions regarding the mapping from termbank entries to textual instances of the phrases). A short presentation and discussion at January Forum will allow us to finalize details with participants. |
LISA Business Data Forum Summaries and Presentations LISA Globalization Consulting Network Webinars and TouchPoint Advisory Calls LISA Forum USA LISA@Chinasoft Fair LISA Forum Asia LISA Forum Europe LISA Forum India Open Standards • TBX • TMX |
||