LISA Home page [© 2010 • ISSN 1420-3693 • www.localization.org]
© 2010 SMP Marketing • ISSN 1420-3693 • www.localization.org

In this issue…


From the 2003 Globalization Insider Archives

Which Direction Is This Language Written In?

Tex Texin, Founder, Chief Architect, XenCraft

In a companion piece to our recent article on the politics and culture of scripts, Tex Texin looks at some of the issues associated with different scripts and the directionality of scripts. Even if you know what script to use for a given language, you may find that your software doesn’t support the direction you need, or that your product, document or web site requires substantial modification to look right when “reversed” or rotated to deal with other scripts.


Tex Texin


With the same trepidation that one might ask a doctor, “Will it hurt much?” I am occasionally asked, “Which direction is this language written in?” The apprehension is easy to understand. Supporting text written in new or different directions has a reputation for both complexity and difficulty.

The reputation is not unearned. Writing text vertically requires a careful look at the orientation and placement of text. Some characters (e.g. punctuation) will change their shape when displayed vertically.

Writing text right-to-left requires specialized rendering algorithms, in part because they are also generally mixed with text written left-to-right. Numbers, for example, are written with the most significant digit placed left-most, regardless of the horizontal writing direction. Languages such as Arabic also have sophisticated rendering requirements to change character glyphs based on the position of the character in a word or for ligaturing. The drawing of lines connecting characters is affected by the justification of the text.

User interfaces and reports will also be laid out differently. The user interface must change because users will scan the screen (or page) differently. Left-to-right readers expect the most important information to be presented in the upper left corner. The least significant information is expected in the lower right. Right-to-left readers start scanning in the upper right.

Changing user interfaces from left-to-right (LTR) to right-to-left (RTL) can be both confusing and amusing. For example, a control such as the VCR control with arrows from left-to-right for First, Previous, Next, Last, is symmetric, so its mirror image is identical to the original.



However, the meaning of each button is changed, so that it now reads (left-to-right again) Last, Next, Previous, First.

It can be amusing to look at multi-panel images that tell a story from left-to-right and view them from the perspective of a right-to-left reader. My I18nGuy web site has a 3-panel example advertisement, where a detergent washes a dirty shirt. In reverse, the detergent ruins a clean shirt. (See User Interfaces for Bidirectional Languages)

Both vertical and right-to-left writing can require a careful review of language, since references to location or direction will need revision. In some cases, localization costs can be reduced by careful upfront attention to terminology to avoid such language. The W3C Internationalization Group is now recommending to specification writers that terms such as “property-left” and “property-right” be avoided in favor of terms such as “property-before” and “property-after.” When the writing direction changes, for example from left-to-right to either top-to-bottom or right-to-left, “before” and “after” are still correct and do not need to be modified. (This is true for most W3C specifications purposes. Your functionality may vary.)

There are several other difficulties, but we can understand why a localization manager might be apprehensive that adding a new language will require a product to pioneer new writing directions. Knowing the directionality of text will be important to web designers, authors, programmers and localization managers, because text direction can be taken as an indicator of both complexity and difficulty and because the organization and directionality of screen and page layouts are affected. Therefore, knowing the writing direction can be relevant to estimating the work involved to support a new language. At least that is the common perception. After gaining some experience, as with other aspects of globalization, it will not seem so daunting.

Which Languages Are Written Right-To-Left?

To be precise, languages don’t have a direction. We represent language using a writing system or “a set of visible or tactile signs used to represent units of language in a systematic way.” (Source: Coulmas, Florian. 1996. The Blackwell Encyclopedia of Writing Systems. Blackwell: London) The collection of signs or “symbols used to represent textual information in one or more writing systems” (Source: Unicode Consortium Glossary) is called a script.

Microsoft’s Global Development Web Site offers a fairly comprehensive definition of “script”: “A collection of characters for displaying written text, all of which have a common characteristic that justifies their consideration as a distinct set. One script can be used for several different languages (for example, Latin script, which covers all of Western Europe). Some written languages require multiple scripts (for example, Japanese, which requires at least three scripts: the hiragana and katakana syllabaries and the kanji ideographs imported from China). This sense of the word “script” has nothing to do with programming scripts such as Perl or Visual Basic Scripting Edition (VBScript).

So it is scripts that have a writing direction, and languages that are written in a particular script will be written with the direction of that script.

Some languages can be written in more than one script. For example, Azeri can be written in any of the Latin, Cyrillic, or Arabic scripts. When written in Latin or Cyrillic scripts, Azeri is written left-to-right (LTR). When written in the Arabic script, it is written right-to-left.

Which Script Should I Use?

If a language can be written in more than one script, which script should a localizer use, or should the text be provided in all scripts?

The answer will depend on your target audience. The script may change for different countries or regions. The script may also change by legislation or with changes in government policy. For example, to reach the Azeri-speaking population in Iran, you would use Arabic script. However in Azerbaijan, from the late 1930s, Cyrillic was the script of choice and it became policy in 1940. Beginning in 1991, after the fall of the Soviet Union, a gradual switch to Latin occurred. Latin became mandatory for official uses in 2001. Even so, for software applications deployed for non-governmental operations, you should consider your target audience. Cyrillic script may be most appropriate for an older audience and Latin script for a younger market. You will most likely support both scripts to reach the general Azerbaijani population.

If you want to reach all Azeri speakers, you will implement all three scripts. In doing so, you will need to take into account that there may be terminology and other differences among Azeri speakers based in different countries, just as there are differences between English and French speakers in different countries.

In choosing among scripts, be aware that your decision may have political, religious, demographic or cultural overtones. In countries where the language of higher learning was Russian, Cyrillic will be used by educated people. Consider whether offering Cyrillic script sends a message of either sophistication and capability or being difficult-to-use. Latin is associated with Pan-Turkic movements, and more generally can indicate Western-leaning movements. Arabic script has associations with Islamist movements. (For more on the topic of political and social factors in script choice, read Scripts, Scripts Everywhere, But Nary a Letter to Use.)

Therefore, just as you research the languages, date formats, number formats, currency symbols, etc. that are required for proper localization to a particular culture, you may also need to investigate the correct script or scripts to use. Towards that end, I have created a list of script suggestions and their writing directions for several countries, for the GEO task force of the W3C Internationalization working group. The suggestions are part of a FAQ on Script Direction and Languages. See the Directionality of Commonly Requested Languages Table.

As many countries have more than one official language, and often have large numbers of speakers of minority languages, you should not use this list to define your localization strategy, but should independently evaluate your regional market requirements.

For example, Israel has two official languages: Hebrew and Arabic. However, Russian and English are also popularly used there. China includes Cantonese, Gan, Hakka, Mandarin, Minbei, Minnan, Wu and Xiang, among others. India (also known as the land of 1,000 languages) includes Assamese, Bengali, Bihari, English, Gujarti, Hindi, Kannada, Kashmiri, Malayalam, Oriya, Panjabi, Sindhi, Tamil, Telugu, Tibetan and Urdu.

One of the difficulties I had producing the FAQ list was the sizable amount of misinformation on the web, sometimes from credible and popular authors. Some prolific authors, who are not in the globalization field, when they do provide coverage of globalization, make some egregious errors. For example, Thai is described as requiring double-byte character sets and right-to-left rendering. It requires neither.

Of course, Arabic, Azeri, Farsi, Hebrew, Pashto and Yiddish are examples of languages that can be written with both the right-to-left scripts of both Hebrew and Arabic.

Languages written in the Cyrillic, (Modern) Greek, Indic, Latin and Southeast Asian scripts are written left-to-right. Examples include the modern languages of the Americas, Europe, India and Southeast Asia.

Languages such as Chinese, Japanese and Korean are more flexible in their writing direction. Historically, they were written vertically top-to-bottom, with the vertical lines proceeding from right to left. More commonly today, they are written horizontally left-to-right. However, they are also occasionally written right-to-left. Chinese newspapers sometimes combine all of these writing directions on the same page! Mongolian is still written vertically, with columns proceeding left-to-right.

The languages mentioned above are not the only ones to be written in more than one direction simultaneously. Early Greek was at first written right-to-left. It then became boustrophedon, where one line is written right-to-left and the next is written left-to-right, alternating directions with each row. The characters also changed direction, reversing the way they faced with each row change. (Boustrophedon literally means “ox-turning.” The name is apt since the writing flows back and forth the same way an ox is used to plow a field.) Eventually (after 500 B.C.), Greek settled on the left-to-right direction.

Ancient Egyptian hieroglyphs also changed direction, especially when used for decoration or to indicate the person or figure being discussed. The writing could go right-to-left, left-to-right, or in columns top-to-bottom.

By now, I am sure you are wondering if any text is written bottom-to-top. Numidian is an ancient script written bottom-to-top, with columns going left to right. Not so ancient, is the script Hanuno'o or Mangyan, used in the Philippines until the 17th century. It is also written vertically bottom-to-top, with columns running left-to-right.

Fortunately, the modern languages used in business and requiring localization today do not require all of these combinations. But in case you do, the industry is moving to support you. CSS3 Text Module is a World Wide Web Consortium specification for Cascading Style Sheets in the Candidate Recommendation stage. It includes some new features such as “writing-mode,” in which you can specify horizontal direction of left-to-right, right-to-left, or vertical writing of top-to-bottom, with columns proceeding either to the left or to the right. Sorry, no boustrophedon or bottom-to-top yet!

If you have a question about scripts or script directions, please send it to www-international@w3.org with “Script FAQ suggestions” as the subject.



Tex Texin is XenCraft's founder, chief architect and Xen Master. XenCraft is a software consulting and training company specializing in software globalization and helping companies around the world move to new markets and the web. XenCraft is a member of the Unicode Consortium, and Texin has been using Unicode to internationalize products since 1993. He is also an active member of the World Wide Web Consortium’s Internationalization working group. He can be reached at Tex@XenCraft.com.




Contents


LISA Business Data

LISA Publications Catalog

Industry Insights Reports

Best Practice Guides

Surveys

QA Model

Forum Summaries and Presentations

LISA Globalization Consulting Network

Webinars and TouchPoint Advisory Calls


Join LISA

Subscribe


Upcoming Events

LISA Forum USA
(Foster City, California, April 13–16, 2010)

LISA@Chinasoft Fair
(Chengdu, China)

LISA Forum Asia
(Suzhou, June 28–July 1, 2010)

LISA Forum Europe
(Budapest, October, 2010)

LISA Forum India
(New Delhi, December, 2010)


Open StandardsTBXTMX

Terminology SIG

Job and CV Postings