|
In this issue…
OpenType and Unicode
Has Unicode Finally Come of Age?
This article explores OpenType, a relatively new cross-platform font technology that is emerging as the primary way to handle Unicode text data. While Unicode has not yet lived up to its promise, OpenType is an important step towards seeing Unicode become mainstream and useful in the localization industry. Unicode—where is it?We have been hearing for years now about how Unicode will revolutionize text processing and the localization industry, but how much of a difference has Unicode made to the average translator or localizer so far? Even speaking charitably, the answer would have to be “about none”. Complaints about the lack of Unicode-capable tools abound, as do criticisms of the lack of interoperability between those tools that do support Unicode and those that do not—problems that can make existing Unicode support useless in many cases. So far Unicode has been all talk and no action for the vast majority of those in the language industries. Sure, Windows 95/98 supports Unicode to some extent, but how many clients want to use Arial and Microsoft Word for all their projects? Windows 2000 may be an improvement, but use of Windows 2000 is not yet ubiquitous among localizers (some of the major localization and DTP tools have serious bugs under Windows 2000), and even then you need applications that support Unicode before the OS-level support can do any good. My experience is that most localization firms are still using the same font and typographic resources that they have been using since the early 1990s. For these firms Unicode is at best a distant blip on the radar. There is downright skepticism and disillusionment on the part of many of those who have waited so long for the Unicode feast but still haven’t even seen the menu. Why is it that Unicode has failed to make an impact for most users so far? It’s not as if anyone doesn’t want what Unicode has to offer. Unicode represents a solution not to just today’s problems, but to the problems of the last twenty years. So why haven’t we seen the promised era of multilingual possibilities? The answer is simple. With few exceptions, affordable, easy-to-use, and powerful end-user products for working with Unicode simply did not exist. Until late 1999 none of the major software packages used to provide final output in the localization industry supported Unicode to any real extent except for Microsoft Word for Windows—hardly a tool most “power users” would consider using for all their work. Even where tools did support Unicode, they often had to interface with non-Unicode tools, rendering the support in any one program useless. The present situationSo what do most companies now do for projects that are not supported by easily available font resources? What do they do for clients that specify certain fonts must be used when there are no versions of those fonts for the needed language(s)? Most localization companies have a copy of FontLab or Macromedia Fontographer sitting around somewhere so that they can modify fonts for clients when they want to enter into a new market. Even for relatively “simple” markets such as Poland or Hungary, commonly available fonts are completely inadequate since they leave out important characters used in the national languages. The selection of CE (Central European) fonts that include the needed characters available to localizers is small and often inadequate, necessitating the building of custom versions of each font a company uses. (This can quickly add up to dozens of fonts, when it is considered that these changes have to be made to each weight and style in a font family!) Some large companies and organizations have spent (and are spending) considerable resources in developing fonts, collation sequences, hyphenators, case-conversion algorithms, and other linguistic resources needed for effective localization into languages throughout the world (see the article by John Hopkins for one example). These expenses are not trivial and in certain instances organizations have decided not to localize for certain target audiences simply because the development costs for entering the target locale are too high to justify on the basis of even the most optimistic estimates of potential sales. What this means is that basic linguistic issues still stand in the way of effective localization for a sizable portion of the world’s population. In addition to the costs of developing resources for new locales there is a fundamental inefficiency present when companies have to develop the resources for a language before they can begin a project—these linguistic resources must often be supplied to translators and engineers before their work can begin, leading to inefficient use and scheduling of human resources. In some cases this can delay the start of the “real” work by weeks or months. Many of these problems would, of course, be taken care of with Unicode. While Unicode cannot provide hyphenators or collation sequences by itself, it can make linguistic processing much easier by providing a standard way for dealing with the textual data itself, meaning that portable and standard ways of dealing with linguistic issues can be created more easily. The limitations of Unicode as it is presently defined, such as incomplete CJK representation, may make Unicode unusable in some industries, but the general momentum in the language industries is such that most text-processing will move to Unicode sooner or later, especially as Unicode is being actively expanded and there is no real competition to Unicode for the honor of being a universal character set. Since Unicode is clearly needed and there are no serious alternatives, where are the applications and fonts needed to use Unicode? How can we enter into the promised era of easy multilingual text processing? OpenType—Unicode at last!There is finally a real font solution for Unicode that has the potential to break into the mainstream and make a difference for those in the multilingual trenches. This solution is OpenType, a joint development of Adobe and Microsoft. OpenType uses Unicode encoding to access typographically-rich font data from one or more scripts in one font file (see Figure 1).
Figure 1. Some of the more than 1300 glyphs in Adobe MinionPro, an OpenType font, for scripts not found in standard single-byte Roman fonts: 1.extended Roman; 2. Roman ligatures, archaic letter forms, and decorative letter variants; 3.Polytonic Greek (suitable for Classical as well as modern Greek); 4.Cyrillic. Before focusing to OpenType, however, it is only fair to mention Apple’s pioneering work in this same field with QuickDraw GX. QuickDraw GX debuted in 1994 (although parts of it were released in 1991) and was intended to become the default imaging and font-handling technology for Macintosh computers (and to compete with PostScript at many levels, not just in font imaging). Although not Unicode-based, fonts in QuickDraw GX format promised much the same set of features as present OpenType fonts have. Lack of cross-platform support doomed QuickDraw GX, and those few vendors that did implement it ended up losing on their ventures. QuickDraw GX slowly withered until Apple finally dropped support with Mac OS 8. OpenType was first made public in July 1995 by Microsoft, and Adobe agreed to use Microsoft’s definitions for OpenType. The biggest differences between OpenType and QuickDraw GX are that QuickDraw GX was not Unicode-based and that OpenType relies on applications for font rendering, whereas QuickDraw GX built font rendering and other linguistic tasks into the operating system, a difference in approach may have implications for how powerful and wide-spread OpenType is likely to become—a subject to which I will return later. OpenType font files differ from font files in earlier “standard” single-byte formats (TrueType or PostScript) in a number of ways. They are double-byte and can therefore contain up to 65,536 glyphs per font file. (There are even ways to get around this upper limit in the future!) While OpenType fonts are technically TrueType (that is, they exist in a TrueType “wrapper”), they can include either TrueType or PostScript font data (but apparently not both at the same time), an important point as most RIPs (Raster Image Processors) in present-generation printers handle TrueType with difficulty. OpenType fonts are cross-platform, working on both Windows and Macintosh without any modification, making cross-platform font issues considerably less troublesome than they have been to this point. Presumably, since OpenType fonts work under Mac OS X, there is no reason why OpenType support could not be added to standard Unix flavors as well, so OpenType portability will probably extend into the Unix world at some point. One (fairly major) advantage that OpenType fonts have over their PostScript Type 1 predecessors is that, because the fonts are technically TrueType, all data is contained in one file, versus the two (or more, if metrics files are included) files needed for traditional PostScript fonts. What this will add up to, when OpenType is supported by the various operating systems and applications, is a much simpler and more uniform approach to font handling across platforms and operating systems. Because OpenType fonts are Unicode-based they can handle characters in most of the world’s languages (if the font developer makes glyphs for those languages, that is) and can unambiguously represent this data in a way that allows changing of fonts and scripts without loss of characters semantics, and without worries about different potentially-incompatible encodings. One additional advantage OpenType fonts have over traditional font formats is that they are built with certain character information encoded in the font, enabling applications to automatically select complex ligatures, positional variants, and typographically correct small-caps, among other features, without switching fonts. While some of these features are relatively minor in Roman-script languages, the importance of intelligent glyph selection in Arabic and various Devanagari-script (Indic) languages cannot be overstated. Availability and application and OS supportWhile OpenType is being aggressively pursued by Adobe and Microsoft, how important will it be to localizers? The answer is that it is too early to tell how widespread OpenType will become, but early indications are good: Adobe is actively porting its flagship fonts to OpenType (some 21 fonts are available as of this writing) and the cost of these fonts is comparable or even cheaper than the Type 1 equivalents available from Adobe. Adobe MinionPro, for example, retails for $169 U.S. and includes the equivalents of the old Minion, Minion Expert, and Minion Cyrillic typefaces, which together retail for over $400 U.S. In addition MinionPro contains polytonic Greek and Central/Eastern European Roman characters that are not available at all except in OpenType format (the Greek selection alone has more than 330 glyphs, outstripping the slots available in a single-byte font by about 100). All things considered the OpenType font is probably less than a third the cost of its equivalent in PostScript Type 1 fonts. So, while pricing is not a problem, OS and application support are still inadequate for most multilingual power users. As of this writing only Windows 2000 really supports OpenType fully in most scripts. While OpenType fonts can be loaded and used in Windows 9x, the support is spotty and IMEs may not work with OpenType fonts. (I have been informed that Asian OpenType fonts do work much better under Windows 9x than do Roman OpenType fonts. This may be because Asian-language support under Windows 9x already assumes double-byte fonts, making the leap to OpenType smaller for Asian fonts than it is for Roman). Windows 2000’s implementation of OpenType is really quite good, allowing users access to most of the characters simply by switching keyboard layouts to appropriate ones for the scripts they are typing in. This support is, however, hampered by a severe lack of support at the application level. For those using versions of Windows prior to Windows 2000 or versions of the Mac OS before OS X, non-Asian OpenType fonts function almost identically to old single-byte fonts—characters not in the single-byte range are not available to users unless a specific application supports them, but most applications can at least access the standard single-byte range. (Note that this is also true in Windows 2000—applications must be built to support OpenType even though the OS provides input methods and some of the needed resources. OpenType-enabled applications work better under Windows 2000, but non-OpenType applications don’t become OpenType capable simply by being run under Windows 2000.) In addition some OpenType fonts require the use of Adobe Type Manager to display correctly in these older systems. Mac OS X, commercially released on March 24, uses OpenType and Unicode for some of its multilingual capabilities—all system resources are stored in Unicode and Apple has released very high-quality Japanese OpenType fonts with OS X that include, according to Apple, the largest Japanese character set available on any commerically available platform. Surprisingly, script support is an area where Microsoft is at present further along than Apple—the current version of OS X includes input methods for only Western-European languages, Chinese (Traditional and Simplified), Japanese and Korean. OS X does not come with Cyrillic, Greek, or CE Roman scripts, a curious step backward for Apple—OS 9 came with a number of scripts which can not be directly accessed by OS X applications at all. OS X does, however, support input of Unicode characters via their hex references (at least if an OS X application supports Unicode). This works well in TextEdit, Apple’s basic OS X text editor, but few commercial applications support even this crude input method yet. An additional problem in my tests was that certain OpenType features, such as automatic ligatures, do not seem to work in TextEdit, despite being in its menus. A verdict on Mac OS X’s OpenType support will have to wait until input methods and keyboard layouts are finally released for various scripts. The only major applications I know of that support OpenType cross-platform are from Adobe—InDesign, Illustrator 9.0, Photoshop 6.0 and Acrobat 4.0 and higher (although making OpenType PDFs in Acrobat relies on support from other applications, for obvious reasons). These applications support OpenType quite well, although the usability of OpenType fonts is hampered by the fact that, except under Windows 2000 and OS X, there is no way to directly input most characters, forcing users to make use of relatively awkward “insert character” dialogs that do not allow for easy input of characters, such as the one shown in Figure 2 from InDesign, or to import Unicode text from a Unicode-capable text editor. There are other applications that support OpenType, such as Microsoft Office, but these are often hampered by a lack of cross-platform implementation—even Microsoft Office 2001 for Macintosh fails to implement any OpenType features at all, despite having been released well after Adobe began shipping some of its OpenType-capable programs for both platforms.
Figure 2. An “insert character” dialog box from Adobe InDesign allows the input of OpenType/Unicode characters. As of this writing, none of the commercially available font editors in common use can work with OpenType fonts. Fontographer, which has not been updated since 1996, does not even recognize OpenType fonts and current versions of FontLab will open them but ignore all character data (basically treating them as blank fonts!). Microsoft has a freely-available tool called Visual OpenType Layout Tool (VOLT) that is used by font developers and which is the only easily-available tool for manipulating OpenType fonts at present, but reactions to it are mixed. Many type designers are looking forward to the upcoming release of FontLab (due in May or June for PC, later for Mac OS 9; $549 full retail, $199 upgrade, http://www.fontlab.com), which will support OpenType fonts with either TrueType or PostScript data. It is impossible to forecast when the majority of applications commonly used in the localization industry will support OpenType, but the OS-level support found in Windows 2000 and Mac OS X will help ensure implementation. It is likely that the process will proceed slowly but surely until OpenType is the default font format for most applications, but the process may take some years. Localization for OpenTypeI mentioned earlier on that OpenType rendering features are implemented in applications rather than in the OS (unlike in QuickDraw GX). What this means is that it is up to applications developers (or localizers) to determine how to implement OpenType and which of its features to support for each application. Microsoft cites this application-dependence as a distinct advantage of OpenType, and it does have certain benefits, such as the ability to carefully tailor font-rendering abilities to the needs of users of specific applications, and to offer various levels of OpenType support (e.g., a developer of a DTP package is very likely going to want to offer different features than the developer of an e-mail client, even though both may see OpenType as the best way to handle multilingual issues in their software). The disadvantage of this method is that it forces applications developers to spend resources in the design and implementation of language-specific features. Unless robust language-specific text- and font-handling libraries are made widely available and affordable in equivalent versions for both Mac and PC by Microsoft or some other company, developers of cross-platform applications are highly unlikely to localize for smaller markets. Microsoft is developing these sorts of system-level resources for Windows through its font-rendering APIs (Uniscribe) and utilities such as RichEdit, which are implemented fully in Windows 2000. Using Uniscribe takes much of the burden in dealing with complex scripts and Unicode off the application developer, but it is not cross-platform, and so is not usable by developers looking for cross-platform consistency in font rendering. In addition, although Uniscribe is intended to provide basic language support, its level of support may not be robust enough for all developers. What this means is that applications developers are still unlikely to localize complex applications (such as DTP packages) for smaller markets since they would have to develop complex and expensive language-specific features for both Mac and PC at a cost out of proportion to the market potential. An additional drawback to leaving font-handling to the application is that implementation of OpenType features may be quite idiosyncratic as different developers implement different methods for accessing the features. Whether or not it will actually prove to be a problem remains to be seen. ConclusionThe development and implementation of OpenType holds open the promise of widespread Unicode use and the advent of a new era in electronic typography. While OpenType has its problems (particularly in the present lack of application support), it is the only implementation of Unicode on the market at present that has any real chance of acceptance by the IT industries as a whole, and its development should be watched with interest and anticipation by those in the language industries. Despite its limitations, current OpenType implementation is good enough that the LISA Newsletter is set largely in Adobe MinionPro, an OpenType font, and I am currently in the process of setting a linguistics text in the same face. While I may have specific complaints about OpenType, such complaints are normal for young technologies and there is no real alternative to OpenType in the work I do, unless I want to spend large amounts of time with my trusty copy of Fontographer and still not have the job done right. Perhaps the real beginning of the Unicode era is here. For further informationThe following web sites and articles provide useful information on OpenType and advanced typography:
|
LISA Business Data Forum Summaries and Presentations LISA Globalization Consulting Network Webinars and TouchPoint Advisory Calls LISA Forum USA LISA@Chinasoft Fair LISA Forum Asia LISA Forum Europe LISA Forum India Open Standards • TBX • TMX |
||