|
In this issue…
Chinese is not Chinese!
Starting doing business in China? This short article covers just one small facet of the many aspects you will need to pay attention to-the language. In its all its myriad forms-spoken, written, encoded, and displayed. The Chinese LanguageSuppose you have an interesting product for the Chinese market and you want to cover other Chinese-speaking Asian markets, beginning with Hong Kong. So you look for a localization supplier and, since they speak Mandarin in China, you tell your supplier you want Mandarin. Your assumption is both right and wrong. Although a good part of the world's population understands Chinese, the language actually comes in many different shapes and flavors, as will be shown below. Spoken LanguageSince you may have implemented audio/speech sequences in your product, or voice control may be involved as well, the spoken language could play an important role in the localization process. Hong KongHong Kong mainly uses Cantonese as its spoken language. It is very different from the standard language spoken in Taiwan or Mainland China (you might as well compare Swedish and Spanish). Although standard Chinese in Taiwan or Mainland China would be understood, the spoken language in Hong Kong would still be Cantonese. TaiwanIn Taiwan the spoken language is Guoyu. The pronunciation is the same as in Mainland China. Mainland ChinaThe standard spoken language is Putong Hua. RemarksEven though characters and encoding systems are different, Taiwan Chinese and Mainland Chinese (as spoken languages) sound the same. (This is sometimes called "Mandarin" in Western countries or "Beijing" in Japan and Korea. Mandarin/ Beijing refers to the standard spoken language, without indication of the region or encoding used.) However, the way things are expressed might/does differ. Whereas the Chinese spoken in Hong Kong sounds very different to that in Taiwan, some expressions, when looked at the "written version" of what has been spoken, are much more like Taiwanese. Nevertheless, the language used in Hong Kong has different expressions from those used in Taiwan and Mainland China. Written LanguageHong KongHong Kong uses traditional characters, which represent the "long" form of Chinese characters. TaiwanTaiwan also uses traditional characters. Mainland ChinaMainland China uses simplified characters. These were introduced in the 1950s to support education by using a short, simplified form of some characters. Usage of traditional characters in publications is forbidden (theoretically). RemarksChinese is also one of the four official languages in Singapore, where simplified characters are also used. In the days "before Windows", the shape of the characters used (i.e. the long or short form) was determined by the respective encoding (see below). This is very different under Windows, where fonts determine the display. EncodingChinese is a double-byte language, i.e. two ANSI characters (hence the term "double"-byte) are combined by the Chinese system to yield a Chinese character. All characters above ASCII 128 are used to build Chinese characters. Certain applications are "double-byte enabled", (i.e., they have the capability to detect double-byte code). Thus they do not allow any formatting that would destroy Chinese characters or would result in the wrong positioning of punctuation marks. Double-byte enabled word counts would give you a correct count when Chinese is involved. (Single-byte word counts tend to count each single ANSI character.) However, with Chinese you have to make a choice, unlike English, which can be covered with one code page. Hong KongBig5 Code (Dawuma), with about 13,600 characters in the character set. TaiwanBig5 Code. Mainland ChinaGB Code, an abbreviation for "Guobiao" Code. There are about 6,700 characters in the character set. RemarksThese code systems have no similarities code-wise, and though the characters look the same they are not compatible. However, code converters are available. These usually only work with pure ANSI text. Supposing the product was localized to Big5 code, the product would run without modifications in Hong Kong and Taiwan on almost all Chinese systems. In Mainland China, the product would run with the help of Chinese subsystems on almost all machines (see below). If localized Microsoft Windows is used to create a product version that is readable (in terms of the code) in Mainland China, the Big5 code has to be converted to GB Code. Formatted text, like help files or manuals, usually poses a problem to code converters: a program to strip formatting from documents (i.e., to separate the contents from the layout) has to be used. After code conversion, the original formatting would be reapplied to the converted text. Most translation memory applications can handle this task. Display/FontsTo display traditional, long characters, a traditional character font has to be used. The standard font in Big5 comes with traditional characters, but traditional fonts are available for GB, as well. To display simplified, short characters, simplified fonts have to be used. The standard font in GB is in simplified characters, but simplified fonts are available for Big5, as well. Some Chinese subsystems enable switching between traditional and simplified displays without actually having to change the fonts. Chinese SystemsMS Localized WindowsSome products are especially designed for localized Windows. (This design forces the user to use one special operating system that might not be the optimal system for handling a lot of multi-lingual projects.) These products will not produce correct Chinese output under Chinese subsystems. EncodingIf localized MS Windows is used, the user normally is only able to display information that was written using the respective encoding of his or her operating system. MS Office is an exception to the rule; non-MS products have to be written in Big5 to be usable on a Traditional Windows and GB products have to be run on Simplified Windows. It is not possible to switch off Chinese. Font MappingFurther restrictions apply to font mapping. Usually, certain Chinese fonts are linked to certain single-byte fonts (e.g., Heiti is mapped to Arial). In a text formatted as Chinese, single-byte characters, e.g., numbers, would appear in that predefined font. This mapping is not adjustable under localized MS Windows. This sometimes results in a weird display (or no display) of characters above ASCII 128, since all these characters are interpreted as a part of a Chinese character. This feature does not allow correct output of these special characters, especially when languages with special characters such as Spanish, French, or German are involved. InputInput methods for Chinese characters are restricted, as well. Traditional Windows comes with a different set of input systems than does Simplified Windows. System Font DisplayThe display of system menus, error messages, etc. is usually dependent on the respective coding system. That is, Traditional Windows shows traditional characters, while Simplified Windows shows simplified characters. Chinese SubsystemsEncodingChinese subsystems such as CStar, Richwin, TwinBridge or the like, have the capability to switch the system code to either Big5 or GB. A non-Chinese Windows (such as English Windows) could be used as well. This would allow the user to switch off Chinese if necessary. However, some applications might not produce correct Chinese output under the subsystems. Font MappingIn subsystems, font mappings can be customized (the degree to which this is possible depends on the subsystem used.) Correct display of characters above ASCII 128 can often be maintained. Handling of special characters in Spanish, French, etc. is possible. InputSeveral input methods are offered to serve a wide range of customers. Thus, it is possible to use simplified input methods on a traditional system and vice versa. System Font DisplayDepending on the subsystem used, the display can be customized. The following restrictions apply: on MS Traditional Windows (Big5 encoded) the system display can only be changed from Traditional to Simplified when the subsystem is set to Big5. If the subsystem were set to GB, the system display would be garbage. The same is true for MS Simplified Windows; here the subsystem would have to be set to GB, whereas with Big5 the menus and messages etc. would not be displayed correctly. SummaryWhat should not go unmentioned is that, depending on what you are specialized in, your product might have to be adjusted to the different (legal, financial, etc.) systems in several Chinese-speaking countries. You could consider this as very basic localization, in which no language is involved. As said before, suppose you started with a product version localized for Hong Kong. For audio output, although Cantonese would be appropriate, you could stay compatible with all China by using Guoyu or Putong Hua. Voice control would be difficult, because of the vast difference in the spoken languages. Your chosen encoding system would be Big5. No problem with Taiwan, but difficult for Mainland China. If you want to stick to the solution "one version/one encoding", in order to use your product on localized MS Windows, mainland users would have to install a subsystem. The approximate additional costs for each user would be between USD $90 and $180. However, the "minimal" solution of simple code conversion might leave the user unsatisfied. Depending on the amount of material to be localized, an additional version for GB environments could make more sense. So in the end, three different product versions might have to be created. Peter Stumpf
|
LISA Business Data Forum Summaries and Presentations LISA Globalization Consulting Network Webinars and TouchPoint Advisory Calls LISA Forum USA LISA@Chinasoft Fair LISA Forum Asia LISA Forum Europe LISA Forum India Open Standards • TBX • TMX |
||