|
In this issue…
Korean Localization Standards
Past, Present and Future, and the Microsoft Perspective
In a reprint from his speech given at the LISA Forum - Asia in Singapore in December 1996, Christopher Chung studies the evolution of standards for representing Korean, and gives an overview of Microsoft's policy in this area. The Structure of HangulThere are 24 basic consonant and vowel symbols in Hangul (14 consonants and 10 vowels), called jamos. Each jamo represents a corresponding Korean sound element. These basic jamos can be extended to 52 by combining two or more basic elements to create compound vowels or consonants, which also represent unique Korean sound elements. There is thus a total of 21 vowels (10 basic and 11 compound ones) and 30 consonants (14 basic and 16 compound) in modern Hangul. There are two types of Hangul syllables. One form consists of a leading consonant and a middle vowel (C+V form), while the more complex form has an additional trailing consonant (C+V+C form). 19 of the 30 consonants can be leading consonants, all 21 vowels can be middle vowels, and 27 of the 30 consonants can be trailing consonants. This results in 399 (19*21) possible two-jamo syllables and 10,773 (19*21*27) possible three-jamo syllables, making a total of 11,172 different modern Hangul syllables. Of these 11,172 syllables, about 3,000 to 4,000 are in daily use, while the rest are far less frequently used, although they are needed by the professional printing industry in particular. Multiple Korean standardsA number of different coding methods are used to represent the Korean language in the computer industry (covering everything from the PC to the mainframe). Mainstream coding standards include:
The n-byte code system assigns a unique code to individual jamos in a 7-bit code space, with a Hangul syllable being represented by two to n 7-bit bytes. Since the introduction of PCs and the ensuing Hangul MS-DOS version 2.12 (and 3.2), the PC industry has used the double-byte Johab coding system as the standard. The actual code assignment uses two bytes or sixteen bits, with the MSB being set to one, and five bits each being assigned to each C+V or C+V+C form jamo. However, although the Microsoft code assignment became a de facto industry standard, other manufacturers had vested interests in maintaining their HBIOS (Hangul BIOS implementation for Hangul automata), and some of them chose to adopt proprietary variations of the five-bit assignments for each jamo. This caused confusion among PC users and hindered application development and the interoperability of PCs from different manufacturers. In the mid-eighties, the Korean government was working on a master plan for the new Korean government networks for Administration, Education, Finances, Defense, and Security. The first implementation, the Administration network, aimed to connect various government, provincial and district offices, using a standardized hardware and software platform. The government commissioned a research institute to design the platform, and a consortium of several PC companies joined the project. The final design based on super-micro Unix hubs and PC-based front ends was chosen. The Wansung system was invented to accommodate this hardware platform, since the n-byte Unix code system was not amenable to PC platforms, and the Johab system required major investment in the full localization of Unix communication software which neither the Korean government nor the industry were prepared to make. As a result, the South Korean government announced a new Korean Standard Code System (KSC 5601-1987) and the Hangul Standard Basic Input Output System (KSC 5842-1987) in 1987. To avoid conflicts with Unix, and with Unix control characters in particular, the Wansung code system set the 8th bit of each byte of double-byte Wansung codes. This greatly limited the available code space, which meant that Wansung code only covers the 2,350 most commonly used syllables in modern Hangul. These were selected via a frequency survey, and each syllable was assigned a sequential code point without any consideration being given to the internal structure. The Wansung code made a uniform national coding standard possible and promoted the interoperability of PCs. However, it contains just 20% of all possible Hangul characters (11,172), a fact which caused an uproar in various sectors. Furthermore, the Wansung system ignored the internal structure of Hangul jamos/syllables and was intrinsically deficient when it came to text processing in Hangul. To process Hangul text, it was necessary to process the input in Johab, and then convert the results to Wansung for internal/external storage. This critical flaw in the Wansung code system has been criticized right from the start, and the demand for a Johab-based code standard has been mounting ever since. The move to JohabOnce the Korean government had realized the need for Johab, it announced a new code standard amending the existing one in October 1992. The new standard is based on double-byte Johab code, which means that all 11,172 Hangul syllables can be expressed. The new specification states that, to comply with the standard, systems should support Wansung, Johab, or both. The new standard thus allows the Wansung and Johab systems to coexist for the time being, although the government intends to convert to the Johab standard as Johab-compliant solutions become available. The Ministry of Culture is strongly pushing the Johab code standard and is actively pressurizing the computer industry to deliver Johab-based platforms. It is anticipated that the government will restrict government procurement programs to Johab-based platforms as soon as a Johab-compliant solution becomes available, thus further pressurizing the industry to move to Johab. Microsoft CH supported Johab for Hangul MS-DOS V. 2.12 and 3.2 and for applications such as Hangul Multiplan and Chart. However, in 1987 we converted to Wansung and changed all platforms into this. Currently, the only Johab-compliant product we ship is Microsoft Hangul MS-DOS 6.0, which is based on the Wansung code system and has API-level Johab support for Johab application development. Christopher D. Chung
|
LISA Business Data Forum Summaries and Presentations LISA Globalization Consulting Network Webinars and TouchPoint Advisory Calls LISA Forum USA LISA@Chinasoft Fair LISA Forum Asia LISA Forum Europe LISA Forum India Open Standards • TBX • TMX |
||