|
In this issue…
“Unto Every Nation, Kindred, Tongue and People”
Supporting Multilingual Publications at Headquarters for The Church of Jesus Christ of Latter-day Saints
While most localizers deal with a relatively small number of languages (a few dozen at most), some large organizations are beginning to deal with more languages. The Church of Jesus Christ of Latter-Day Saints currently deals with over 170 languages to some extent. In this article John Hopkins describes the challenges his department has faced and some of the solutions they have arrived at to support this large number of language. Having worked for The Church of Jesus Christ of Latter-day Saints for over 10 years, I have used desktop publishing technology to produce publications in many languages. Along with other co-workers, I have also developed custom software to help automate various aspects of the Church’s worldwide multilingual publishing. The Graphics Division currently supports publishing in over 170 languages to some degree or another, and this will increase. In all of this, we have found that the Church tends to push the limits of multilingual publishing technology. Therefore we must often do our own development work, namely in fonts, text conversion, hyphenation, and production automation. In these efforts, we have found that standard computer codes are not defined for all the languages. Characters are missing from standard fonts needed to express them, especially on the Internet. Menus in commercial software won’t accommodate all the languages we could publish in. The list goes on. Clearly, software support for multilingual publishing needs to improve, not just for us, but for all organizations seeking to communicate worldwide. The Church’s publishing production system uses three different computer platforms. Text input is normally done on PCs using WordPerfect or Microsoft Word. Most production work is done on Apple Macintosh computers using QuarkXPress, Adobe Illustrator, Adobe PhotoShop, and other desktop publishing software. Some production work is done on a UNIX-based publishing system called Paragon (formerly known as Bedford). Although no longer commercially supported, this system usually handles publications and languages that the software on the Macintosh or PC cannot yet handle well. FontsThe Church has produced many of its own fonts for all three platforms of its production system for some time. Fontographer on the Macintosh has been used over the years to create these fonts. Graphics maintains fonts for over 40 different character encodings, totaling over 2000 fonts. Each character encoding supports one or more languages. Placing all these fonts on production computers strains system resources. Font management software helps, but software developers should seriously look at Unicode and OpenType font technologies. They have tremendous potential to simplify font creation, text conversion, and production in general for a large multilingual publishing operation. The Church created many of its fonts before any standards were in place for their character encodings. In order to follow current trends in standard 1-byte character encodings, Graphics Support has needed to make more encodings to accommodate the same number of languages, largely increasing the associated number of fonts. This, however, has helped us better facilitate data conversion and Internet support. Nevertheless, the Church still publishes in languages for which no industry standard character encodings exist yet. Since the Church has periodically been required to change character encodings for languages for various reasons, one of the first XTensions made was called Font Converter. It converts text in old fonts within QuarkXPress documents to corresponding new fonts, doing any needed character conversion. An important change that came with version 4.1 of QuarkXPress Passport was the ability to make an XTension that would make the Church’s fonts work with all caps and small caps, called LDS Correct Case. Without it, QuarkXPress capitalization only works for industry standard character encodings. Any font using our custom encodings brought up incorrect capitals when these case-changing type styles were applied. Text ConversionPrograms to convert text written on PCs must be written for every character encoding to go to both the Mac and UNIX systems. The Church has used the Shaftstall programming environment (no longer commercially supported) on the PC to make these programs for several years. Again, fully integrating Unicode into desktop publishing programs would greatly reduce this work. So in preparation, Graphics Support now converts text to Unicode 2-byte format from various versions of WordPerfect and Microsoft Word. Then we make and use programs to convert the text from Unicode to each 1-byte character encoding for the Mac. Making this intermediate change to Unicode has cut text conversion programming by more than half. It also helps prepare the way for using XML. Additionally, Graphics made routines for cleaning up text for production, such as removing double paragraph returns or converting double hyphens to en-dashes. On the Mac the target text format is XPress Tags. These cleanup routines also place a normal style sheet tag at the beginning of the text, preserving only the bold and italic styles from the word processor’s text formatting. The Paragon publishing system also uses a coding structure unique to its requirements. So why aren’t there more style tags for headers and body text? The reason has to do with the size and scope of the Church’s multilingual production process. Yes, XPress Tags could be generated from the original English document, but the tags get in the way of the translators with their cryptic and confusing text. Since scores of translators are located all over the world, using all kinds of PCs and word processing programs, keeping anything but the most basic formatting is useless. So the English text for translation is normalized to one point size, one font, one color, etc., using only essential formatting. The Church’s Translation Department also inserts tags into the text called Versification codes, or Word Cruncher codes, along with localization notes. These codes provide a guide to translators and production artists, clearly marking corresponding paragraphs and document structure. Unfortunately, these codes are currently used for visual cues only. Graphics Support may build a program to process them if a good XML solution does not come along first. This would improve the importing and formatting of translated text. Graphics is looking at XML as a next step, but good tools using it either don’t exist or have not matured enough yet. Since XML uses Unicode, it will greatly simplify text conversion when the XML originates from the PC word processor and when Unicode is fully supported in the page layout program. XML tags are also more informative to the translator and graphic designer, not to mention the computer itself, opening opportunities for better automation. HyphenatorsGraphics Support makes hyphenators from hyphenated word lists. These word lists are typically derived from existing publications. Hard hyphens are inserted into the words using search and replace routines; corrections are made by hand. After a translator approves a hyphenated wordlist, Graphics Support creates a hyphenation pattern file for the Macintosh to be used by a multihyphenator XTension for QuarkXPress Passport, which we developed in-house and called LDSHyph. Quark added some XTension hooks in version 4.1 of Passport, which allowed us to fully integrate our hyphenators with theirs. Now Graphics can have up to 100 languages in Passport’s menus, and we use every slot available. Our languages sit right next to Quark’s as equals when it comes to hyphenation. In order to show our languages in Passport’s menus, we currently must alter a resource within Passport and its dictionary files, using an AppleScript program. Although spell checkers could be made from the hyphenated word lists, it is not a concern for us in Graphics because that is the responsibility of the translators and their tools. Production AutomationProduction starts with the creation of a publication in English. That publication is then translated into the various languages. Completed translations are sent to the appropriate location for production. Staff at these different locations may handle things differently. Since non-English work accounts for about 70–90% of our work in Graphics and is fairly repetitive, it makes sense to focus automation development efforts here. Special paginating programs were made for the Paragon publishing system to paginate scriptures and curriculum publications for the various languages. On the Macintosh, custom automation started with QuicKeys, progressed to AppleScript, and then moved to application extensions and standalone utilities, all working together in various ways. Since most text formatting could not be preserved in the translated text, text formatting became one of the first and main automation concerns. Graphics has employed a couple of techniques using AppleScript to tackle this problem and has netted some timesavings. These techniques are based on the fact that the number of paragraphs of the English text and the translated text remain about the same. Scripts “mirroring” the format of the English text run fast enough that if inconsistent paragraph counts occur, dummy paragraphs may be added temporarily and the script rerun until the formatting is correct. We also use AppleScript to automatically change fonts and hyphenation in style sheets or paragraphs to those needed for the given language. However, text with tags that can somehow last throughout the entire process shows the greatest potential for optimum performance. Since XML uses Unicode and is more compatible with the Internet, it is a better solution in the long run than other tagging systems. Production artists could swap translated text into QuarkXPress documents or HTML templates, perhaps using application extensions or plug-ins, with the help of XML. To support Graphics’ scripts and XTensions, we found that a central database relating languages, fonts, characters, etc., was very helpful. Lately, we have seen a need to expand this database to service other areas of support as well. The database is in FileMaker Pro and TableServer because of their excellent support of AppleScript and other reasons. Automating the pagination of long documents on the Mac is the next focus. Graphics has started using a third party paginating XTension called Autopage from KyTek to handle this in QuarkXPress. Incidentally, the developer of Autopage, Keith Erf, also developed most of the paginator in the Paragon publishing system. The Church’s Translation Department has started using translation memory tools for the more established languages. These tools should improve as time goes on with the incorporation of Unicode, OpenType, and XML. ConclusionsWhen an organization publishes in scores of languages today, some important points become clear—Unicode, OpenType fonts, and XML need to be fully implemented in the operating system and all the software tools to get the best efficiency. Better standards for identifying languages need to be in place with identifying tags for many more languages than are currently established (see http://www.sil.org/silewp/2000/001/SILEWP2000-001.html). Automating as many repetitive tasks relating to multilingual production as possible and practical is a must. Literally thousands of languages are spoken in the world today, but the vast majority of people in the world can be reached, at least to some extent, with only a few hundred. Software developers could consider going beyond just supporting tens of languages to supporting, or preparing to support, hundreds, leaving room for thousands. Software businesses may not find it economically feasible to fully support many languages. In that case, they could make it possible for third parties to easily extend their software to cover additional languages. Allowing a third party to incrementally expand each language’s support is important too. Some companies may only need a language slot and hyphenation, not spell-checking, localized menus or dialog windows. Software that makes all this possible would be more valuable. Companies making such software would be better prepared to keep clients and gain more. Making these improvements would help many publishing organizations afford to communicate in more languages to meet their growing global communication needs, making it easier for them and us to speak “unto every nation, and kindred, and tongue, and people.”1 We would ask the help of other multilingual publishers in encouraging and helping software developers and standards organizations to make ongoing enhancements to their products so multilingual publishing will be as efficient and powerful as it can be and needs to be. Hopefully, there is a large enough demand to make it feasible for them to do so. Though not the only important factors for good publishing software, these multilingual publishing issues represent some very significant opportunities. John D. Hopkins
|
LISA Business Data Forum Summaries and Presentations LISA Globalization Consulting Network Webinars and TouchPoint Advisory Calls LISA Forum USA LISA@Chinasoft Fair LISA Forum Asia LISA Forum Europe LISA Forum India Open Standards • TBX • TMX |
||