|
In this issue…
BOOK REVIEW: Understanding Japanese Information Processing, Review II: A Tour Book for the Land Ahead
by Ken Lunde, published by O’Reilly & Associates, Inc., ISBN: 1-56592-043-0
Last year the LISA Forum newsletter announced the publication of Ken Lunde's book. Within a few months his publishers were amazed with its success, not only in terms of sales but also in view of what people are saying about his technical expertise and linguistic savvy. If you manage a Japanization program or plan to seriously compete in the Asian market, your office should not be without this indispensable reference work. To give you a satisfactory evaluation of this book, both a native Japanese and a non-Japanese perspective are provided. It's a daunting task. Ken Lunde has captured and organized a set of subjects that are not so easily summarized. He has not only written us a tutorial to Japanese text representation, he has also supplied a wealth of references and explanations. Engineers actively developing products for the Japanese (or Chinese or Taiwanese) market will probably keep this book near at hand. This book is good for a spectrum of engineers, from the ones who must write I/O processors or string processing libraries, or just applications programmers like me who use all these tools. Managers can also gain an understanding of the magnitude of the work involved. This book explains the market requirements as it covers some of the problems and details engineers need to be ready to cope with. A co-worker wrote to me about this book: "I have over 6 years experience in Japanese processing--I did not expect to learn much from the book, but I did. (To the extent that I will buy a personal copy)." As a reference book it also provides clear pictures of how encodings use their number space, and has many useful tables in the appendices. What about people like myself, us neophytes? I didn't know much about the Japanese language or writing system before I read this book. Oh, I thought there were three writing systems (there are four), and that there were thousands of single characters for thousands of words. I also heard that every computer manufacturer has its own character code "standard" (not as bad as I thought). This book is a tutorial to a new way of writing and making a computer program deal with that writing. Now, I don't expect to implement by own Front End Processor to input kanji, or even to worry about how to display Japanese text. As a compiler kernel portability engineer my primary goal is to process the text from disk and put it back on disk. Having an understanding of how Japanese nuts and bolts are different from my American nuts and bolts helps me plan accordingly. No software component is truly isolated from another if even one radical assumption changes. And when you look ahead to this market, more than one assumption has to change. A fairly good balance has been made between the details and their human context. A full accounting of the history could be interesting and helpful to decide what to implement, but that would have made for a much larger book. And if he had glossed over the problems and incompatibilities an engineer can expect you'd feel cheated when you tried to sell your first product and it failed. Besides, you want to read this book for the most modern reason anyway: you haven't a clue where to start designing for processing Japanese text and don't know what questions to ask. One way to look at this book is a tour book for a completely new region of cyberspace. A kind of hitchhikers guide to kana and kanji. It doesn't have exhaustive detail, but I feel it has given me a bird's eye view of Japanese text processing. Mr. Lunde's style is light and casual, necessary for such a ponderous undertaking. Sometimes I felt like I was reading email from him, only with illustrations and tables to make the point clear. He's written well, and to my level. I'm a little concerned there is a brief section on hex and octal; if you need these explained then maybe this is not the right book for you. Otherwise the material is aimed well at people who have been coding for computers for several years, but haven't spoken a lick of Japanese. There are a lot of graphical tables showing the numeric ranges used by various encoding methods. He has also included C code fragments to illustrate precisely certain processes. These are all short and easily keyed in by hand if need be. I like his free use of Japanese characters in his sentences, and not just as examples. It helps me feel a bit more comfortable with the mixed English/Japanese future I'm faced with. I have the clear impression he has lived in Japan and reads and writes the language well; that inspires confidence in his book. The book divides into two major parts. The first part is a bit longer than the second. Together they are useful as a reference whenever you have a question. O'Reilly would help us in the future by adding section numbers and put the section titles down in the page footer. I found it difficult to quickly flip to the short sections pertaining to material in later places in the book. Major sections are shown in the header, but this is not enough to make this an easy to use reference book. The first part of the book is a tour of how computers process Japanese text. Each chapter builds nicely on the preceding chapters. I expect that depending on how much you know you can skim the book until you come to something you don't know. I think as a neophyte I should have read less details and done a light reading without worrying about remembering everything. Mr. Lunde has made sure it's possible to do this without too much backtracking (again, footers would help). The book is trying to be a large list of the existing choices while at the same time read like a connected narrative. This is a hard thing to do. The book starts with an introduction to a whole new writing system, without going into the language it symbolizes. Actually there are four writing systems in use. In case you thought the number 65 meant the way the letter A looks, Mr. Lunde defines the differences in character sets, codes, encodings and fonts, and each of these gets discussed in further details in the following 5 chapters. Input and output get their own chapters, though I suspect this is too cursory for someone to charge off and implement their own input and output processing. Once the first six chapters have laid down the groundwork that ASCII users accept on a simple reference card, the last three chapters discuss some of the practical uses by way of examples. Chapter 7 provides the bridge by appropriately discussing conversion algorithms. Anyone processing Japanese text probably wants to distinguish between external encoding and internal encoding, or at least be able to convert the user's data however it's been encoded to what the application expects to use. Discussions of some real world applications in text processing (operating systems, terminals, word processing) and email wrap up the first part of the book. The email chapter will be useful for companies which intend to actually support their product. If those companies can't see user bug reports or perform adequate marketing that include some Japanese text, you probably won't fare well. Furthermore, some of the on-line references in the second part of the book are email addresses. In the latter part of the book there are 15 appendices, plus a bibliography. These form almost half the book's content. The appendices contain many tables to save you from having to research further for basic data, plus a long list of free and commercial resources for getting more information. Here's a summary of what you can reference in these appendices: kanji tables; conversion rules and tables; alternatively indexed kanji to help you find them faster (by sound, stroke count and radical); a couple of pure kanji lists; descriptions of corporate character sets and encoding methods. You'll succeed with the on-line references if you have FTP access to the Internet. Even if you don't, Mr. Lunde describes the FTPMAIL service for performing FTP transfers via email. Finally, he wraps up with an extensive glossary and a bibliography. There is not much missing from this book. It doesn't answer everyone's questions, but as I said it can help people like me formulate better questions. I was disappointed in Mr. Lunde's recommendation of EUC as the internal encoding method of choice. I would have hoped for a Unicode recommendation, or, barring that, at least a comparison of the merits of the two to make clear what the tradeoffs are. Primarily, his rationale for recommending EUC is Japan-centric. It ignores the other world scripts. I think the case for Unicode is stronger when you are making a product for more than just the U.S. and Japan. O'Reilly & Associates could make some improvements as well. They should add multilevel section numbers, and add footers that show the full section number and low level section title. Also the index should be the very last part of the book. Instead, they have followed the index with advertising, making the reader doing a quick reference work a bit more to find the index. I did, however, like the lay-flat binding. Lay-flat bindings are a nearly lost feature in books, even reference books. This book is required reading for any company thinking of writing globally-accessible software. Sooner or later you'll want to sell software to Japan, China, or Taiwan, and without the information in this book you'll still be stuck processing, at best, ISO Latin-1. Trademarks:Microsoft, Windows, Win32 and Windows NT are trademarks of Microsoft Corporation. |
![]() 8-12 December 2008 |
||