|
In this issue…
CJKV Information Processing: A review
CJKV Information Processing
The first thing most readers will notice about CJKV Information Processing (1999, O’Reilly & Associates) by Ken Lunde of Adobe Systems is its sheer mass. At just over 1100 pages, the volume is as large as many desk dictionaries and its size is almost guaranteed to confirm the worst fears of those predisposed to consider information processing in Chinese, Japanese, Korean and Vietnamese (CJKV) an impossible subject. CJKV Information Processing is not, however, an impossible, or even particularly difficult, book; it is well-enough written that readers might consider curling up with it in a comfortable chair—high praise indeed for any computer-oriented book, and especially for one on a topic as potentially mind-numbing as Asian text processing. In 1994 LISA published two reviews of Understanding Japanese Information Processing, also by Ken Lunde. The present volume is in one sense a much-revised and expanded edition of that volume, but it goes much further than its predecessor by taking on four languages instead of one, and deals with them admirably. It is this broad scope that requires the encyclopædic length of the volume and which makes CJKV Information Processing an indispensable volume for those dealing with Chinese, Japanese, Korean or Vietnamese on a regular basis. At $64.95 (US) the book is not cheap, but the sheer difficulty of finding even a fraction of the information contained in it elsewhere makes the book a bargain. Especially useful are the code samples (mostly in Perl, although they could easily enough be adapted to other languages) dealing with many of the thornier problems of CJKV processing, such as repairing corrupted files or converting between encoding methods; when compared to the hundreds of dollars that could be spent on commercial utilities to perform some of these tasks, the price of the book begins to look like a real bargain. The real strength of the book is that it is the only comprehensive source of its kind. Looking through Lunde’s comprehensive and well-organized bibliography confirms this—only one of the volumes listed even claims to approach the same scope, and it is now twelve years out of date (meaning it was published before many of Lunde’s primary sources were even in existence). Most of the information available from other sources comes from technical specifications, RFCs, character set standards, and technical volumes published in Asian languages that would be inaccessible to English readers who do not also know particular Asian languages quite well. Of course much of the knowledge contained in the book also comes directly from Ken Lunde, who in his position with Adobe Systems is placed to know almost everything there is to know about the subject. The book addresses almost every topic of potential interest to those needing to deal with CJKV text processing. These topics are all discussed in a clear manner, with less salient information reserved for later in the chapters or relegated to footnotes, a pleasant change from many books that overload all but the most technically-oriented reader with details that are really beside the point. If the volume has any real weakness it is that it does not deal with some basic design issues that too many localizers are completely unaware of. In my experience, one of the major obstacles in localization for Asian languages is a lack of awareness on the part of localizers (including, unfortunately, native speakers of Asian languages) about Asian typography conventions and technologies, and this volume does not address some of these very common problems. For example, CJKV Information Processing nowhere mentions that it is a bad idea to use italics for CJK fonts because: 1, italic versions do not exist for most CJK type faces (and trying to use them can lead to significant and expensive problems when documents are taken to press), and, 2, the various uses of italics are Roman-script conventions that all have their own native ways of being handled in CJK-script languages; using italics is a sure way to mark your localized document as foreign. When even the “experts” in the field typically lack such fundamental knowledge, information on basic typography conventions really should have been included in CJKV Information Processing. Another minor complaint is that Lunde uses certain terms and conventions (such as the W-notation in the font name HeseiMin-W3) without explanation, even though he is extremely careful to define and explain other, less useful, terms. Granted, most readers who have any experience with CJK fonts will know that “W3” is a reference to the relative weight (essentially the degree of boldness) of a typeface, but for those new to CJKV typography an explanation would be nice. Such lapses, though infrequent, could leave readers missing important information. Overall these are mere quibbles with an otherwise comprehensive and impressive volume. CJKV Information Processing is sure to be a staple on the shelves of Asian localizers and others who deal with Asian text, at least until another edition of the book appears. AudienceCJKV Information Processing is geared towards someone with an interest in dealing with CJKV text with basic computer knowledge, and some background in linguistics. Although Lunde does not state that a linguistics background is required, such a background, particularly with regards to writing systems in general, certainly helps make the book more accessible. Importantly—and correctly, considering the varied job descriptions and backgrounds of those who must deal with Asian text to some extent or another—Lunde assumes no knowledge of any Asian language in his readership. While knowledge of an Asian language would certainly help readers of the book, one does not need to be intimately familiar with Japanese, for instance, in order to follow even the most arcane points made in the book concerning Japanese. In general language industry professionals should have no trouble following Lunde’s writing and examples The book is not aimed at designers or DTP specialists. Although they would benefit from reading CJKV Information Processing, it does not tell readers how to properly set up Asian documents, nor does it explain the very real differences between Asian and Western document layout and design; its focus is on CJKV text from an informatics standpoint and how to correctly design CJKV-specific text features for various purposes. (Unfortunately, to the best of my knowledge, there are no good and complete English-language sources on Asian document design. TopicsCJKV Information Processing lives up to its name, covering most aspects of text and information processing in these languages, while giving considerable information relevant to other scripts as well, particularly in a Unicode environment. Among the topics covered are the writings systems (and their histories) used in China, Taiwan, Japan, Vietnam, and Korean (and in other countries to a lesser extent, such as Singapore), the various character sets and encoding methods for these writing systems, font formats and typography (within the limits noted above), input and output methods, and software that uses CJKV. It is in his discussion of the writing systems and their implementation in computing systems that Lunde is at his best. Even individuals who speak the languages that are covered in the book may well learn a thing or two about their writing systems. The issues needed to deal with CJKV are made clear without overstating their complexity It is unlikely that any one individual will need all of the information contained in CJKV Information Processing, so the modular layout of the book keeps topics cleanly separated and makes it easy to find information relevant to particular questions. Some of the chapters do rely on a working knowledge of earlier chapters, but readers should find it easy to go to a chapter and locate specific and understandable information on most topics without having read the entire book In addition to the main topics discussed in the book, almost 500 pages of appendices present detailed descriptions of encodings, examples of code, and various other material that allows readers to find almost any specific piece of information they are likely to need in dealing with CJKV texts. While a detailed examination of the appendices is beyond the scope of this review, it seems likely that few readers would not find whatever information they might possibly want in the appendices (they include, for example, full code-point lookup tables for all the major public CJKV character-encoding standards) SummaryIf you have to deal with CJKV-script texts on anything beyond the most superficial level, CJKV Information Processing is an indispensable volume. A full understanding of even a fraction of the material covered in the volume would be of the greatest benefit to anyone dealing with CJKV texts on a professional level (arle@lisa.org) is a project manager with LISA and an emeritus member of the BYU Translation Research Group (TRG). Originally from Alaska, he has a degree in linguistics from Brigham Young University in Provo, Utah, and is active in linguistics research and publication. |
LISA Business Data Forum Summaries and Presentations LISA Globalization Consulting Network Webinars and TouchPoint Advisory Calls LISA Forum USA LISA@Chinasoft Fair LISA Forum Asia LISA Forum Europe LISA Forum India Open Standards • TBX • TMX |
||