|
In this issue…
BOOK REVIEW: Understanding Japanese Information Processing, Review I: A Japanese perspective of Understanding Japanese Information Processing
by Ken Lunde, published by O’Reilly & Associates, Inc., ISBN: 1-56592-043-0
Last year the LISA Forum newsletter announced the publication of Ken Lunde's book. Within a few months his publishers were amazed with its success, not only in terms of sales but also in view of what people are saying about his technical expertise and linguistic savvy. If you manage a Japanization program or plan to seriously compete in the Asian market, your office should not be without this indispensable reference work. To give you a satisfactory evaluation of this book, both a native Japanese and a non-Japanese perspective are provided. Ken Lunde's book, "Understanding Japanese Information Processing" (UJIP for short) is a good reference for anyone interested in localizing their software for the Japanese market. It is in a class of its own and no comparable book exists today. Since an accompanying review done by a non-Japanese will give good summary of the whole book, I try to focus what is interesting to me as a Japanese reader. I also try to present missing topics from UJIP and what I think are important to the non- Japanese audience. First let me give you brief background of mine. I studied physics as major then studied computer science as graduate student. I spent a year at the Computer Science Department of the University of British Columbia, Vancouver, B.C., Canada as an exchange student. This overseas stay made me first realized the difficulty of learning Japanese by foreign students. My present work involves localization of overseas software products (mainly from North America) for the last seven or eight years at the present company I work for. Before that, I was a member of the team to write a Japanese word processor for then emerging 16-bit Japanese personal computers. The targets of the localization of overseas software products that I have worked on are NEC 9800 series PCs and various UNIX workstations. In so doing, I have gained general idea about how to go about localization, collected rule of thumbs and begun to wonder if there is a quick and efficient way to explain Japanese language processing to foreigners. To ease the explanation, over the course of years, I have written a short summary of Japanese processing on computers so I can present the problems and techniques to solve them to overseas software houses. UJIP is a good reference book I can heartily recommend to anyone interested in entering the Japanese market. Now I can do away with my own little document. I hope my background gives some credit as a reviewer of UJIP. Before starting the main review, a word of caution is necessary. You have to take some of my comments with a grain of salt. For that matter, we should always review articles with a grain of salt. However, there is a good reason for my mentioning this here. This is because of the nature of the subject. One's mother tongue is like air. You just take it for granted. It is not often the case that you have to look at it squarely to analyze its grammar, for example. People also take idiosyncrasies of the language for granted. When a foreigner asks someone about the details of the grammar of the mother language, it is likely the subtlety of it has never entered his or her mind unless, of course, he or she is a linguist. One has also built up his/her own idea of how people should use, speak and write the language. A different usage is looked upon with suspicion. If his or her compatriot does it, it is just a forgiven aberration, but if a non-native speaker of the language does it, he or she might say the usage is wrong. I was born and grew up in Japan. I am not free of biased judgment about how Japanese is used. Also, my experience with particular brand of software and hardware in my localization work is limited. My targets were these computers: PCs with Japanese MS-DOS, engineering workstations running under UNIX with various versions of X11 window system, and proprietary window system such as HP Starbase and Sun's SunView. Therefore, my comment about the Japanese processing may reflect strongly my own view of how Japanese processing should be done although I try to be very even handed in this review. On the other hand, I can say that my experience with localization work has given me better qualification than ordinary programmers. At least I am aware of many I18N and localization problems exist. Anyhow, you always need a careful study for yourself and a good second opinion from a Japanese partner if you try to enter the Japanese market seriously. Other Japanese may say things very different from what I say here. Before starting the review, a historical perspective is in order. One big problem anyone faces in presenting Japanese information processing to anyone (Japanese and non-Japanese alike) is the MESS it is in. The author, for example, talks that "this chapter may have given you the impression that Japanese encoding is a real mess." toward the end of Chapter 4. This mess is the result of the short history of Japanese information processing on computers. The Japanese character code set was first standardized in a comprehensible manner in 1978, and then subsequently has gone through a few revisions. All during this period, many companies keep its own character code sets that have quite a few nonstandard characters on the side. These nonstandard characters have no way of interchanging with different standards and yet have been used widely. The reader should remember that the Japanese language processing is very new and many changes may come in the future. Also, the backward compatibility with the data before the standards came into appeared may become a big problem for you. The mess or the confusion is not limited to the character set. Since Japanese processing on computers is relatively new, many technical words have been coined for it. But not all of them have gained wide acceptance yet. Therefore, some phrases used in an office may not be quite understood elsewhere. You can substitute the "office" with "network", "university" and so on in the previous sentence. Many technical words used solely in the print shops are now used by computer programmers. Again, some of them are quite new, and many programmers are still unfamiliar with them. Thus there is always the pitfall of finding Japanese technical jargon that is not [yet] used widely. I have seen some young Japanese programmers perplexed since their limited-circulating vocabulary of Japanese processing didn't get understood at a meeting until somebody points out an equivalent, yet widely accepted phrase. UJIP is free of this problem. Except for a few cases, which I would mention in this review, the Japanese technical jargons used in the text are widely accepted and so you can talk to Japanese programmers with ease. (Provided, of course, either you speak Japanese or he/she will speak your language.) With this historical perspective in mind, I can say that the book is an excellent snapshot or exposition of how Japanese processing is done on computers today. I can recommend this book to anyone who has a software product that needs localization to the Japanese market, that is, Japanization. Not only UJIP is free of major errors found in some articles on Japanese (I can recall some major gaffs in a BYTE magazine article some years ago), but the author's work at Adobe Systems has also made it possible to print Japanese characters in easy to read fonts. I think this is the first time that an English technical book uses good quality modern Japanese fonts in it. Even before I started to read the book in detail, I was pleasantly surprised to find such good-looking Japanese characters and the illustration of a fugu (blowfish) on the cover. For some readers, this may be the first time you see Japanese characters. So I think the usage of the good quality modern Japanese font is a good thing. I used the word 'modern' because overseas print shops often carry very old-looking Japanese fonts that are no longer used by modern Japanese print shops. The reason a font face becomes outdated is as follows. There is a fashion of preferred font face. Also, the Japanese character has gone through vast change in the last 50 years: immediately after the end of W.W.II, the language itself, and the education of the language and the way language is printed and written went through radical change. Character simplification was one such change. Many small overseas shops with Japanese characters still carry the old cast of the "old" Japanese characters that went with the early Japanese immigrants of the first half the 20th century. Check out if the local shop carries the latest typeface if you plan to have the Japanese documentation printed locally. You don't want to print a documentation of high-tech software using vintage 1940's typefaces. This change of characters makes the Japanese information processing on computers difficult. People who grew up when the old characters were used never really threw away the old characters. People's names, for example, were legally registered using old characters and their names in the original form must appear in legal documents today. The government once tried to discourage the old characters in newly born children's names by "banning" certain characters, but popular pressure stopped the ban. The government now has a list of the Kanji characters [added to the standard character set in the last 10 years or so] that can be used to register new babies' names. However, not all the old characters are permissible even today. This is in the Appendix G (Jinmei-yo Kanji list) of UJIP. If your software products have anything to do with legal work, or government databases that must deal with people's names, then you must really tackle the tricky issue of handling proper names with old and new characters and typefaces. Apparently, UJIP can't describe the details of the turbulent history of the Japanese language. The trouble with the kanji list for people's name happened even without the existence of computers. You would see other irrational or perplexing problems surrounding Japanese computer standards when you study them. These are quite likely the result of the turmoil the Japanese language itself suffered in the last half a century. Computers magnified the problem in a sense. Now with this historical background, let us look at the chapters in turn. Overview of Japanese Information Processing.The first chapter tries to give the summary of Japanese language. In this chapter, "half-width katakana" (page 6), "Transcribe[ing] Japanese using Roman characters (page 9), "compound(s)" (page 10) appear without sufficient explanation. All these are covered in detail in later chapters. A complex topic like Japanese information processing on computers requires concentration as well as persistence. Some topics can only be explained indirectly and recursively. So the readers should go on reading even if they are confused at the first try. (I wonder how many other topics are new unknown ideas or phrases to non-Japanese readers when they first read this chapter. Among the three phrases I picked up above, I know the first two phrases quite well, but chose as possible problems for the non-Japanese reader. The last one turns out to be the English word for Jyukugo, compound-word, explained later in UJIP, which was new to me. Non-Japanese readers would find more new phrases.) The author's current work at Adobe systems to produce Japanese fonts also introduces the word "weights" (page 14) to mean "thickness" of the stroke of typefaces [ I am not sure myself ] without explanation. To his credit, he explains the basics of font technology very well here and later in the book although some people might find the space spent over typeface issues a little overwhelming. Since the reader of this book may be a new student of Japanese, examples of real world Japanese computer environment would have been a welcome addition. What I have in mind is like these: a copy of a page of Japanese newspaper (vertical writing), a page of a technical or scientific magazine (horizontal writing), photos of a Japanese dumb CRT, a Macintosh with KanjiTalk, a PC with MS-Windows 3.1J, an X11 display on a workstation, and so on. Addition of these in a future edition will be very illustrative. The Japanese Writing System.This chapter explains the basics about the characters used in Japan. The explanation is standard and the usage of good quality Japanese fonts to show examples is a big pleasure. How Japanese writing system , has developed is described briefly. By the way, there are topics in this chapter I haven't thought about since my high school days. I don't know if the history helps non-Japanese to understand our writing system. The Japanese children learn the writing system through osmosis and repetitive learning. In my case, only at the later age of 17th, I came to learn about the history of language per se. (In the third grade or so, there was a mention of different pronunciation of Kanjis, but historical explanation escaped me.) The knowledge of the history has helped me in understanding why Japanese writing systems are the way they are today with all the idiosyncrasies. But the realization came only after I talked to Canadian students learning Japanese and later began working on localization and tackle the language issues on computers. I think that teaching the grammar and such of one's mother tongue so that foreign programmers can build software for it has been a blind spot, and overlooked for so many years by programming community in general. By the way, don't be overwhelmed with the content of the chapter: Katakana, hiragana, and kanji certainly are in the few thousands in total. But, many foreign students do learn them so that they can at least read many, if not all, of them. Also, educated Japanese grownups do read practically all of them, but I know that many are forced to look up dictionary if they need to write complex, and not often used Kanjis. In my opinion, the idea of pictograph, a character that denotes the shape of the objects it represents (page 27) is overrated. If I look at the Egyptian hieroglyphs, I can certainly recognize the bird, eye, fish and other figures in them. On the other hand, Kanji has been used as ideograms so often in Japan that nobody thinks of it as pictograph in everyday setting. [Now, this may be my biased opinion.] Only in a class room when the history of Kanji is taught, the idea of pictograph is resurrected. Even simple characters such as the Kanji for the Sun or Moon listed on page 27 as example of Pictographic Kanji, have become almost ideographic. Usually people associate the "idea" of the Sun and the Moon with the characters and not the shapes anymore. (And the meaning associated with them is often day for the letter sun, and month for the letter moon and change according to context.) I don't want to confuse the non-Japanese readers, but just to show how a big change occurred in the last 50 years is the usage of Katakana. On page 21, it is explained that "Katakana are used primarily in two ways: to write words of foreign origin, called gairaigo, and for emphasis ...". This is true today. However, not so long ago, like until 1955, some documents used Hiragana for the words of foreign origin and used katakana for main text. The usage of Katakana and Hiragana was completely reversed! Such writings were found in some magazines, and technical books such as mathematics text that I read at a local library. In addition, many old laws were written and published using Katakana for main body of text instead of Hiragana before 1945. Even today they are reproduced using Katakana. (A publisher made headlines when it decided to print the old laws using hiragana for main text several years ago.) The processing of Japanese characters on computers can't remain unaffected when there are these big changes of usage over a short time. (In the computer industry, things change very fast. However, the rest of the society changes at a snail's pace. Fifty years is not a long time for the change of language usage to take root firmly in general population.) In this sense, I don't think the Japanese language usage has the stability needed for the stable computer standardization. But we must cope with it somehow. Japanese Character Set Standards.This is a comprehensive treatment of Japanese character set standards. The author tries to separate the Non-electronic character set standards and electronic character set standards in the text. This may not be clear to, say, English speaking readers. For them, the characters are from A to Z with upper and lower cases, some numbers and symbols. That's it, isn't it? The problem with Japanese is that the part "From A to Z" is actually three sets, Hiragana, Katakana, and Kanji. Since there are so many characters, it makes sense to talk about which characters (Kanjis) to include in the standard! Some kanjis are either too old to use today, or too specialized to have current usage. For example, some kanjis were used transcribing Buddhism writings from Sanskrit. This is why the selection of characters into the standard makes sense. Hiragana, and Katakana sets are more or less agreed upon except minor disagreements. One disagreement was about the inclusion of old Hiragana and Katakana for sounds that are no longer used in modern Japanese. The other was about how to use Hiragana and Katakana to transcribe sounds like "vi" and "ve" that are not indigenous modern Japanese sounds. Selection of Kanjis poses major problems. In UJIS, you will find the various lists of Kanjis that are taught in schools, used in print, and additional characters. JIS character code sets are described and the alphabet soup of JIS standard is explained. Because of these topics, I think it was wise to separate the presentation of the non- electronic character standards and computer standards. Only at the second reading, I realize that this separation will help non-Japanese readers to understand the knotty problems in Japanese standards. At the end of the chapter is a paragraph titled "Advice to Developers." These and other advice you can find at various chapters are well thought out. The readers will not regret following this advice. However, as the author points out also, things are in state of flux: for example, UNICODE and ISO 10646 may see some additions in the future. So try to keep updated by joining mailing lists explained appendix K. This chapter and the next becomes somewhat dry on occasions, but I can hardly blame the author because of the nature of the topic. Japanese Encoding Methods.This chapter discusses the Japanese character encoding on computers. Three basic encodings JIS, Shift-JIS, and EUC (Extended UNIX code) are presented. Chapter 7 discusses the conversion between these encoding methods. This chapter is a very comprehensive survey of various encoding methods. The author says (page 99), "The information presented in this chapter may have given you the impression that Japanese encoding is a real mess." Luckily, the above three encoding methods are the ones that the reader has to handle mostly. So the reader needs to concentrate on these three encodings. The book gives sound advice regarding the usage of encoding. I agree with the advice. EUC is the way to go for workstations, and JIS is for information interchange such as E-mail. (Chapter 9 discusses Japanese E-mail in detail.) Today, Shift-JIS is the de facto standard on PCs. So you must support Shift-JIS for a PC software product. However, if you are asked to port to certain hardware environment, you may not have a choice, and must support the encoding that the hardware system supports. Japanese InputIn this chapter, you will find out how to enter many Japanese characters on Japanese computers. Again, the coverage is good and explanation is standard. On page 113, "In-line Conversion" appears. This is an input method in which the transformation from phonetical rendering to mixed kanji kana string takes place on the spot where the application expects the input string on the screen. It seems to me that "on-the spot conversion" may be more preferred term nowadays. I would use "on-the-spot" instead of "in-line" myself. Sun Microsystems uses the "on-the-spot" in their English document of Asian language extension of the XView window library. One misunderstanding ought to be clarified. The author explains in detail M-style array used by M-style keyboard sold by NEC. This is a keyboard with a special arrangement of the physical key layout and keymapping to facilitate Japanese input. The author mentions, on page 121, that "This keyboard array is also used for the TRON (The Real-time Operating systems Nucleus) project." This is incorrect. The TRON project designed the TRON keyboard specification from scratch, and its keyboard layout is different. On the cursor look, these certainly look similar, but again they are different. It would have been instructive if the differences of the M- Style and the TRON keyboard layout and the design philosophies of these ergonomic keyboards could be compared. (I doubt if the non-Japanese readers are so interested in the ergonomic issues of the Japanese input. However, please take note that the ergonomic problems are very important when there are more than two thousand different characters too input. Fatigue of input operators is certainly large.) The author's thorough study and research shines in the last section of this chapter titled "Japanese Character Dictionaries." This and the bibliography at the end make a good guide to the books and dictionaries. This alone may be worth the purchase of the book along with the list of contact addresses in the appendix. I am using the FEP (front end processor) input method (page 102) for myself currently. The FEP input method based on conversion using a dictionary is not deterministic in that there is always the chance that the first choice presented by the conversion engine is not what I want to enter. Such incorrect first choice can be frustrating sometimes. Further, without diligent checking on the user's part, we can enter incorrect characters. It is not uncommon to find someone's writing with incorrect characters that are clearly the result of incorrect conversion: the type of mistake can't happen when people write by pen. The incorrect input caused by FEP is a new hazard for writers and editors today. I am interested in direct input method using non-associative (or unassociation input) method such as T-code (page 112). T-code uses a pair of two keystrokes to represent a Japanese character and there were many research efforts to develop it by Professor Yamada of University of Tokyo and his students. They have clearly shown the advantage of T-code for professional keyboard users such as typist. (I am not a professional keyboard typist, but my computer work requires me to hit keys often. A key counting routine has been built into my copy of kterm (page 227), a Japanese terminal emulator. I have found out that the key counts can go up as many as 2,000,000 (yes, two million) in just a few months. At this rate, I can be considered a typist of a sort. The key count includes backspaces to delete mistyped characters and cursor motion keys as well as the key strokes for ordinary UNIX commands outside my favorite editor, Nemacs (page 212). You will be surprised to find that people using computers for work tend to type MANY characters. If you are a programmer, try instrumentation and find out.) The only practical problem with T-code is to obtain a suite of software that runs on popular computer hardware systems to make it possible to use the T-code input methods on every computer I might use. Unless I can use the input method everywhere, the attractiveness diminishes. [Since the FEP method is the most used human-machine interface, the portability issue is a great one today.] Admittedly, the users of T-code are minority today. Probably their number is less than the number of the users of Dvorak keyboard. Given the short history of Japanese processing on computers, I can't guess which method will be popular in the year 2001. Japanese OutputThe author's knowledge of font technology as exemplified in the description of Adobe's font suite shines in this chapter. However, this also makes this chapter a little disappointing. This chapter doesn't cover the old-fashioned character printer (as opposed to printers with advanced page description language support such as PostScript.) The explanation given to printing font technology is good. I think someone not familiar with the font technology can read this book and come up with good minimum knowledge of it. (However, as a non-native English user, I find the usage of the word "weight" as in "a difference in weight" (of two fonts on page 139) a little obscure.) The detailed explanation given to Japanese font technology such as Adobe's ATM will be convenient for someone with a Macintosh software that needs localization since Macintosh uses the font technology extensively. (This reviewer used Macs for work, but never done localization work on them.) One general problem that faces someone who tries to come to the Japanese market is that basic system software such as OS developed elsewhere and then localized for Japan doesn't have non-Japanese documentation for the localized features. This is very frustrating for the non-Japanese developer and the Japanese developer as well. The non-Japanese developer can't access the information easily. On the other hand, the Japanese developer has a difficult time translating and explaining the localized features in English or whatever language the foreign software developer speaks. For example, PostScript has been extended to support Japanese fonts and Japanese language. The author suggests (on page 139) "For more information on handling Japanese fonts under PostScript, I suggest reading the Japanese edition of the PostScript Language Tutorial & Cookbook, written by Adobe Systems, and published by ASCII Corporation in Japan. An additional 30 pages or so that are not found in the English version provide a tutorial on handling Japanese fonts." This means that you definitely need a Japanese partner to understand the Japanese handling in certain type of software products. Another reason that having a Japanese partner is a good idea is that basic software such as OS is not released on the same date everywhere in the world. The Japanese version may be older than, say, the English version and sometimes you can't use the advanced features available in the latest release of such software if you plan to bring the application software to Japan. Such a delay is very common. One example of such a delay is mentioned for the case of Super ATM. This extension to ATM (Adobe Type Manager) allows computer users to substitute the font usage with an available similar looking font if the designated font doesn't exist. However, this is not available for Japanese ATM. Knowledge of version mismatch, localized features, and missing features from the Japanese version of basic system software is hard to come by without having a Japanese partner. From my own experience with working with software companies in USA, the sales offices of hardware companies in USA don't have good information about localized products they sell abroad. In this regard, they are not of much help. I suspect that the situation is similar everywhere in the world. A good Japanese contact is indispensable. One thing I miss is the treatment of inexpensive character printers. It is true that font technology used by page description language (PDLs) has become available at decreasing cost. However, there are many inexpensive printers without full-fledged PDL support. Essentially these printers print Japanese characters that correspond to character codes sent from the computer. They have built-in Japanese font ROM at fixed resolution such as 24x24, 32x32, and 48x48. Often, the printers understand functions invoked by special escape sequence in the output stream. These functions are often used to draw lines, changes the character styles (say, italics or bold). In Japan, there are three popular printer control languages. These are not full PDL in the sense of PostScript language, but are based on features triggered by the escape sequence in the output stream. They are NEC printer escape sequence often called PC/PR, Epson printer escape sequence ESC/P, and Canon LIPS escape sequence used for Canon's laser printers. Although high-end page printers now support PostScript, there are more printers with support for the above escape sequence functions. Localization of the software products that must support variety of output devices must pay attention to the escape sequence functions above. From my limited knowledge, ESC/P (it may not be called as such outside Japan) seems to be available on many Japanese-made computers sold abroad. So the reader may be familiar with it. Another thing that is missing from this book (and I think it probably is inevitable given the diversity of the topic) is the handling of Japanese output under window systems such as X11. X11R5 is now the de facto standard for window technology in the workstation market. The latest release X11R5 has the support for wide characters (wide character refers to more than 1 byte character long character code such as ones used by Japanese). On top of the basic support of wide character of X11R5, vendors have provided Japanese language support such as Sun's JLE (Japanese Language Environment). The handling of Japanese fonts and the functions to use the I18N features of X11 may be the topic of a full-featured book. But, I think some mention of these in a longer form would have been desirable in this chapter. X11R5 provides output functions for wide character strings and multi-byte character strings. These functions directly correspond to output functions for 8-bit character strings. (X11 titles from the same publisher cover I18N features of X11R5 very well.) Japanese Information Processing TechniquesThis chapter presents in detail how to perform conversion between different Japanese code systems. I believe any good C programmer can see how to handle Japanese characters in computer programs from the examples in this chapter. The author's examples of what can go wrong are also very illustrative. The examples are detailed enough. One Japanese editor from a big vendor, which here shall remain anonymous, had the same bug exactly described in the chapter. Character handling alone is not what localization is all about. Line breaking, word wrapping and other text processing algorithms are also important for localization. As is mentioned in the first page of this chapter, the "locale model" of I18N and L10N (localization) have been proposed. Unfortunately, anyone who has studied the locale issue seriously would find that what to model (and how) are scarcely solved today. Features of language processing such as the currency symbol and the date format are often mentioned as the target of locale handling. In my opinion, they are not enough. At least, we have to see how far locale model can go and build additional methods if locale model is not enough. For example, I had tough time in localizing a US-made spreadsheet-like application for a money trader. It runs under X11 on the Sun workstation. It has to show the currency mark for Japanese YEN (for Japanese), and the dollar mark (or other currency symbols) simultaneously. The Japanese currency mark for YEN is mapped to the same character code as ASCII '' (backslash) in the Japanese character code system (JIS-ROMAN). (By the way, this means you see many Yen currency marks in C program listings and DOS file path names.) Unfortunately, the backslash character was used to signify interactive command input for the program. To show the Yen currency mark, I had to choose the Japanese font using Sun's JLE system's indirect way of choosing font sets. I can change the font by the value given to LANG environment variable of UNIX and the JLE system's font set mapping. However, if I choose the Japanese font, I couldn't show the backslash marks easily. In the end, it was obvious that internal modification of the program was necessary. I don't think this was the problem of locale model alone. If the original developer knew that the YEN currency symbol had the same code position in the Japanese character code with the ASCII backslash character, he would have designed the software differently in the first place. But again, what locale should be used for money trader programs? Locale model is limited to the single target localization. The locale mode, I think, is only good for small scale simple application and probably doesn't work for a program meant for multinational environments. This doesn't mean that the locale model is useless. Rather I think the limit of locale model should be pushed as far as possible, and things that don't work with locale model ought to be collected and documented for future benefit of international programming community. Japanese Text Processing ToolsThis chapter lists the software products (commercial and non commercial) that are available on many different computers. Non-Japanese readers find this chapter useful to find Japanese text processing tools that run on their computers. Since the new versions of software come out often, the chapter may have the danger of coming out of date. The author maintains a file accessible on Internet to record the latest development. If the reader owns an IBM-PC compatible, the best way to have a Japanese processing environment is to use IBM DOS J/V. It is an extension to MS-DOS by Japan IBM. The reviewer bought an IBM PC compatible with Intel 486DX/33 for home use after IBM DOS J/V came out. (I am writing this review on this PC.) NEC- PC, the most popular brand of PC in Japan, with comparable CPU power was very expensive back in 1992 due to the lack of competition. DOS J/V made it possible to use competitively priced compatible PCs for Japanese processing. By the way, NEC- PC, which is NOT compatible with IBM-PC, is the most popular computer in Japan. The lack of detailed explanation of NEC-PC is another missing point from the book. If your software runs on PC, then it is impossible to ignore NEC-PC. The difference of hardware is now supposed to be absorbed by the underlying software layer. On the workstation, X11R5 now supports wide and multi-byte characters and would provide the common ground for supporting Japanese language. On PCs, Windows 3.1J is supposed to provide the common ground for Japanese processing on PC architecture. However, the rivalry between Microsoft and IBM has spawned two different versions of MS-Windows. IBM Japan added its own features to MS-Windows and released its own version of MS-Windows 3.1J. The original version and IBM version use different Japanese font file names, and users have reported different behavior of DOS-window (generally favorable comments on the IBM version for the support of DOS-window). This situation may plague users for some time to come. The section describing GomTalk7 (Macintosh) discusses the delay between the original (say, American) release of OS and the corresponding Japanese version. There were about two years delay between the release of the English version of system 7 and the release of KanjiTalk 7.1 (Japanese version of system 7.). For a commercial development and marketing, such a delay has to be considered. Be careful if your software depends on the advanced feature of an OS that is available only in the latest release. There is a likely possibility that your software is useless in Japan where the Japanese OS is a few versions older. The situation is improving, though. At least, Sun Microsystems has shortened the delay to three months with their latest Solaris upgrade. Other companies are not so fortunate. Windows 3.1J was introduced June 1993. This is much more than one year's delay from the release of the English version. For word processors, lately, many big name word processors have become available for Japanese Windows.
I can praise WordPerfect for its support of vertical editing of Japanese on the screen. Ichitaro is the most popular Japanese word processor on MS-DOS ported to MS-Windows environment. Many dictionaries not listed in the book are now available in CD-ROM formats for specially built CD readers. (Sony, NEC and others make CD book readers these days.) A freeware software that runs under MS-Windows to read these special CDs exists and is used widely today (DDwin: DD stands for Data Disk, I think.). I have seen an article that favorably reports the usage of this software and dictionary disks for a professor's English literature study. The titles are expanding, and I believe the author's on-line file will keep the readers updated on these developments. There is a section "Machine Translation Software". The reviewer has evaluated many such products by looking at demos at trade-shows an uses a commercial software called Sharp DUET system (English to Japanese translation system). The reason for my research is the high cost of translation. Frankly speaking the systems mentioned in the book is in the lower priced category, and I found them lacking in capability when I evaluated these and others about four years ago. Sharp DUET is probably 5- 10 times as expensive as the products mentioned here. My suggestion for anyone looking for good translation software is to ask for demos using your own document and see how good the support is. Big name companies fell out of my candidate list due to the lack of good technical support. Machine translation works well if you understand the limitation. For example, pre-editing and post-editing is necessary today. You must have a native Japanese speaker check the translation result. You must count the cost of document translation (which is not small) if you are interested in coming to the Japanese market. A successful translation agency uses Sharp DUET and translated some HP manuals using the system. Nihon NCR also uses DUET system internally. (I was surprised to find people from many big computer companies and semiconductor companies at the user group meeting of Sharp DUET a couple of years ago. Everyone in the computer industry seems to have the problem of document translation.) Successful usage of translation software or working with a translation agency that uses such translation software may be the key to cutting down the documentation cost. But, again, you need a good Japanese partner to carry the translation task in a satisfactory manner. Using Japanese E-mail and NewsThis chapter describes the basics of exchanging e-mails and news reading/posting using Japanese. Again the explanation is very clear. I can add one small clarification. At the beginning of page 234, C News and INN packages are mentioned as news reading programs that allow escape characters (which are essential for Japanese character code to work.). I think it is the underlying transport layer program such as I-news (of B-news) that is responsible for the stripping of escape characters rather than the news reading program. Again, the distinction may not be that important for the intended audience. The author shares his practical knowledge of the pitfalls of handling Japanese in E-mail and news articles by showing us how to restore a mangled Japanese text. An earlier chapter explains fully how to restore Japanese text when escape characters are stripped. This is not a mental exercise. Now and then I DO receive such mangled messages from Japanese sites due to misconfigured software. AppendixIt is not often the case that a book should be praised for the appendix as well as its main text. This book is an exception. The main text alone is good enough for recommendation. Moreover, the author's compilation of valuable information in the appendix makes this book attractive. (Including Bibliography, it is more than 170 pages.) Japanese code Conversion TableThis is a short table to simplify manual code conversion among four code systems (Kuten, JIS, EUC and Shift-JIS). JIS X 0208-1990 TableAll the characters of the mentioned character sets are listed. This may be the first time you see all these characters. Again Adobe font technology makes this list very readable. By the way, don't get overwhelmed by the sheer number of characters. The list contains 6355 characters. I, for one, don't use many characters in this list myself. Many of these characters are used only for certain words and phrases. Someone leading a software engineer's life won't use them often anyway. I can at least write about 2000+ characters and read practically all of them if I encounter them in books. JIS X 0212-1990 TableThis is the supplemental character set listing. JIS Code Table SupplementsThis lists two index tables (Pronunciation Index and Radical Index) to help you in using the previous two tables in Appendix B and C. Joyo Kanji List
|
LISA Business Data Forum Summaries and Presentations LISA Globalization Consulting Network Webinars and TouchPoint Advisory Calls LISA Forum USA LISA@Chinasoft Fair LISA Forum Asia LISA Forum Europe LISA Forum India Open Standards • TBX • TMX |
||