|
Who needs ローカラィゼーシヨン.ком?
Internationalized domain names
In a recent article in Scientific American on internationalized URLs, Dr. Markus G. Kuhn of the University of Cambridge argues that internationalized domain names aren't needed because "familiarity with the ASCII repertoire and basic proficiency in entering these ASCII characters on any keyboard are the very first steps in computer literacy worldwide". Often we assume that only Americans and the occasional Brit still believe that the Roman alphabet is good enough for everything and that it is simply the cultural myopia of Americans that has kept the world from experiencing a full multilingual computing revolution. If ever we needed proof that inertia with regard to international issues is not a uniquely American trait, consider that Dr. Kuhn is a German in Britain arguing against the need for further internationalization of one of the most basic aspects of the Internet experience. (To be fair, we only know what Scientific American chose to quote from their interview with Dr. Kuhn -- his overall attitude may be quite different.) It is surprising that Kuhn's statement assumes people will access computers via a traditional PC with a keyboard (or via any device with a keyboard at all). One only need look at the success of the iMode and similar services to know that the PC with a keyboard paradigm is being shaken. Although L&H may have gone under in a now all-too-typical accounting scandal, the ideas of voice-activated portable computers certainly did not die with L&H, and it is in the context of fully localized voice-activated portables accessing local content that Kuhn's statement is most likely to become a quaint anachronism. When people don't type URLs and need something handy in their own (spoken) language as a handle for a place on the Internet there will be a real need for internationalized access to the Internet -- internationalized to an extent far beyond anything we now have. That said, in 2002 Kuhn is largely right. To successfully access the Internet (and many computer programs as well) knowledge of ASCII is vital because content and websites require it and because many internationalized applications are a shell over a fundamentally non-internationalized backbone. Considering the surprising longevity of 7-bit gateways on the Internet (how long has it been since 7-bit machines were made?), it is unlikely that technical dependence on ASCII will change on the world-wide level any time soon, but if history teaches us anything it is that where there is a will there is a way, and, more importantly, where there is a customer there is a way. If customers demand internationalized URLs they will appear, technical obstacles notwithstanding. While present technical obstacles are not trivial, proposals already exist (and have, to an extent, been implemented) that allow the use of non-Roman scripts in URLs. These proposals rely on mappings between Unicode and ASCII that allow for the encoding of URLs in text strings that can safely navigate 7-bit gateways on the Internet. In these standards the native-script URL is transformed into an ASCII string, which passes across the internet and is mapped to a numerical (IP) address for a machine. Current users would see these URLs as ASCII strings (and rather incomprehensible ones at that) after they have entered them. In the near future, however, this is expected to change as end-user applications display the script-appropriate URL and hide the conversion, making it appear to end-users as though internationalized URLs are what the Internet "really" uses. At present the way in which these new URLs are created, handled and controlled is somewhat haphazard and various services for domain name selection have sprung up, with companies even offering machine translation(!) of domain names for a fee (it's likely you'd get what you pay for with those services...). Given time though this will change as society learns to deal with the multilingual web and companies learn to avoid hidden dangers, such as URLs that aren't what they appear to be (see When is a .com a .con?, below). The new international name possibilities have some companies scrambling to register their names in a world where competitors could snap them up in Devanagari. Cyber-squatters have found whole new continents of names to sit on. Increasingly there will be a real need for companies to help navigate these waters. LISA members, with their expertise and skills, can help find culturally-appropriate domain names in various countries and protect investments against opponents and opportunists. The prospect of internationalized URLs should be exciting for LISA member as they stand only to benefit from these proposals. VeriSign and ETF, prime movers in the internationalization of domain names, have links with LISA members to deal with these issues and to have a degree of intelligent forethought in the design of systems to handle multilingual URLs. We are on the cutting edge in this arena and it is up to us to get involved in the design of standards for URLs to make sure that the needs of those "on the ground" are met. When is a .com a .con?Internationalized URLs lead to at least one potential problem reported on in Scientific American: clever misuse of superficial (or historical) connections between scripts to fool the end-user. One example cited is that a hacker could register a domain that ends in what appears to be ".com", but the "c" is really the Cyrillic letter "ce" and the "o" is the cyrillic "o", both of which happen to be virtually indistinguishable at the glyph level from the first two Latin letters in ".com". Thus, by mixing Roman and Cyrillic characters the following two textually distinct, but visually identical, URLs could be created. (Note that although pixel-for-pixel identical, the images of URLs shown below did in fact come from distinct text. This isn't a trick in which both images are from the same source.)
These could lead to different sites, even though the URLs look the same. An enterprising hacker could enter "domain.com" (the real site) and bury a link (perhaps in a credit-card processing script) to "domain.com" (the counterfeit) in order to steal data intended for the real "domain.com". If carefully done such a deep hack could sit undetected for years since the end user would never know he or she had been pulled from one site to another. The hacker could collect all sorts of information without anyone knowing it. Variations on this scam could be pulled off using at least two non-Roman scripts -- Cyrillic and Classical Greek, both of which are supported by Unicode -- there may be others as well. (As an aside, we clearly aren't all the way to the situation where this could work yet. Most browsers still don't really support Unicode text and ignore the W3C's recommendations for displaying multiple character sets in an HTML document. That's why the URLs above are graphics rather than text -- most of our readers wouldn't be able to even see them if they were Unicode text. LISA members don't seem to be better equipped than the general public in this regard either, so we clearly have a long way to go…) With any technological leap there is the potential for abuse, but in general the benefits outweigh the risks and ways are found to deal with the risk. (For example, use of the https:// protocol for secure transactions was largely fueled by the need to prevent on-line theft of credit-card or other financial information.) Internationalized URLs should prove no exception. Concerns about fraud are justified, but to say that there is no need for internationalized URLs in non-Roman scripts ignores the increasing trend in computing to treat text as text, regardless of the script. To grant ASCII a privileged place among scripts is logical given today's technology, but there is no fundamental reason why this place of importance will last as language technology improves and more and more people demand that computers work with their languages. Politics, power and domain namesThe political dimension of internationalization also cannot be ignored. In contrast to the "English is good enough" or "Learn English, Buddy!" mentality is the growing tendency for use of native languages, both out of concern for the needs of people in various countries and as a means of asserting political independence. The move to internationalized and localized URLs is just a continuation of this trend, which has shown no signs of stopping. Internationalized URLs will become but one strand of the natural corollary to globalization -- localization. There is no a priori reason why localization would extend to everything but URLs. Given the importance people attach to their language and culture it is understandable why many would desire to have fully localized URLs. There is a bit of irony, for example, in Indian nationalist sites that have to use English (which they claim is the language of the oppressor) in their URLs and content because languages they would prefer to use aren't supported by their computers. Similarly why should an Armenian need to learn ASCII in order to access an Armenian-language site geared at Armenians in Armenia? To the extent that language technology serves as a barrier to the empowerment and betterment of individuals it is language technology that needs to change, not people. Concerns about politics and power are almost certain to increase demands for non-ASCII URLs.
is LISA Publications Manager. A native of Alaska, he currently resides in Indiana. In addition to working for LISA, he is an emeritus member of the Brigham Young University Translation Research Group (TRG), a Provo, Utah-based translation, theory and technology think-tank directed by Dr. Alan Melby, and has edited a number of books on linguistics. |
LISA Business Data Forum Summaries and Presentations LISA Globalization Consulting Network Webinars and TouchPoint Advisory Calls LISA Forum USA LISA@Chinasoft Fair LISA Forum Asia LISA Forum Europe LISA Forum India Open Standards • TBX • TMX |
||||||||