LISA Home page [© 2010 • ISSN 1420-3693 • www.localization.org]
© 2010 SMP Marketing • ISSN 1420-3693 • www.localization.org
More Than You Ever Wanted to Know About Java Locales
Java Software Development in a Global Economy, Part II

Tim Stevens, LexisNexis

Today’s software developers need to be conscious of their companies’ efforts to go global and familiar with how to implement the related requirements in their code. With Java, this support is built into the language, yet even today, most experienced Java developers lack the knowledge to fully exploit that functionality. In this, our second in a series of articles on Java’s support for internationalization (i18n), we cover the Locale class in detail and take a brief look at some of the problems inherent in client/server i18n programming.


Tim Stevens

In our previous article, Java Software Development in a Global Economy, we discussed a number of internationalization (i18n)-related topics that ran the gamut from defining what a Locale is, to looking at handling character decomposition with the RuleBasedCollator class. This time, however, we’ll be diving a bit deeper and focusing quite a bit more. We’ll start back at the beginning with the Locale class, but this time with much more detail, and some code examples. Also, we’ll take a brief look at some of the problems inherent in client/server i18n programming. So, without further ado, let’s dive into the Locale class.

Locale is the key to most of Java’s i18n-friendly classes, yet it has little, if any, real functionality in itself. It drives how dates and numbers are formatted, how GUIs are displayed, which fonts are used… you name it. It is comprised of three parts:

  • a language
  • a regional variation (country code)
  • a variant

A Locale is typically represented by the value of these three fields in order, separated by underscore characters ('_'). The language is specified by a lower-case ISO 639 two-letter language code, such as "en" for English, "fr" for French, and "de" for German. The full code listing can be found here. It’s worth noting that these codes have changed over the years and continue to change. The current Locale constructor accepts both old and new codes, though an already instantiated Locale will always return old codes when queried.

The regional variation, or country code, is specified by an upper-case ISO 3166 two-letter country code, such as "GB" for the United Kingdom, "US" for the United States, "FR" for France or "DE" for Germany. (As you can see, often times country codes and their respective language codes will match, and are distinguished by their case.) The full English ISO 3166 code list can be found here. If no country code is provided, the Locale represents only the language in question.

The final parameter, the variant, has no such easily explainable standard, and is optional. The variant is basically a catchall field to describe anything about the Locale that isn’t covered by the first two fields. For this reason, multiple variants can be combined with the underscore character ('_'). A commonly used variant is "EURO," to specify use of the Euro, rather than that Locale’s native currency. Another example is the "Traditional" variant which, when applied to the "es_ES" locale (Spanish in Spain), indicates that any text that is processed should be sorted according to the traditional Spanish sorting rules as applied in Spain. Additionally, the "es_ES_Traditional_EURO" locale indicates traditional sorting plus the use of the Euro.

The following short code example shows a few ways to create a Locale:

//Get the default Locale
Locale def = Locale.getDefault();

//Get a Locale representing the Spanish language
Locale spanish = new Locale("es");

//Get a Locale representing Spain
Locale spain = new Locale("es", "ES");

//Get a Locale representing Spain, using
//  traditional collation rules
Locale spainTraditional = new Locale("es", "ES", "Traditional");

Being able to create Locales by country and language codes is very useful, but the vast majority of the time, Java applications can simply rely on the Java Runtime Environment (JRE) to determine which Locale is appropriate for the current machine. You do that with a simple call to the static method Locale.getDefault(), as above. The JRE depends on various operating system-specific features to determine which Locale is the correct one for the current machine. In Windows 2000, along with other versions of Windows, this default Locale is very easily changed via the Regional Options section within the Control Panel. It should look something like this:

screen shot

As you can see in the screenshot above, it’s very easy to change the default Locale from the top drop-box. It’s not as easy or apparent in other operating systems, however, and you’ll have to do some research for your particular development environment to find out how to change the default Locale. One way to temporarily do it, regardless of platform, is via a command-line parameter to the JRE called user.language. You can use this to override the language that the operating system is reporting to the JRE. For example, this is very useful for testing when Locales are changed frequently. However, this parameter is not supported in all versions or platforms of the JRE and should not be used as a way to force a specific Locale in a production environment.

In addition to the constructors above, which take specific language and country codes, and the default Locale, there are static constants attached to the Locale class, representing various commonly used Locales. Here are a few examples (note Java has “Locale.UK” and not “Locale.GB,” as you might expect from the country code):

//Get a Locale representing the US
Locale america = Locale.US;

//Get a Locale representing the UK
Locale england = Locale.UK;

//Get a Locale representing Canada
Locale canada = Locale.CANADA;

//Get a Locale representing the English language
Locale english = Locale.ENGLISH;

Once you have your Locale instance, you can then feed that Locale into any Locale-aware class to control its behavior. As mentioned above, the Locale itself has very little inherent functionality, aside from methods to access the country / language / variants, and to get a description of the Locale. It is the class that controls the i18n aspects of Java.

Take the NumberFormat class as an example. As with most internationalized classes, the correct way to obtain an instance of NumberFormat is to use the getInstance() method. There are two versions of this method, as shown in the following example:

 import java.text.*;
 import java.util.*;

 public class NumberFormatExample
 {
   	public static void main(String[] args)
    {
      //Get an instance of NumberFormat to work
      //  with this machine’s
      //  default Locale
      NumberFormat defaultFormatter =
       NumberFormat.getInstance();
      System.out.println("Default: " +
       defaultFormatter.format((double)999999999.99));

      //Get an instance of NumberFormat
      //  to work with Germany’s
      //  formatting rules
      NumberFormat deutscherFormatierer =
       NumberFormat.getInstance(new Locale("de", "DE"));
      System.out.println("Deutscher: " +
       deutscherFormatierer.format((double)999999999.99));
    }
 }

Executing the above example on a machine with a default Locale set to US English produces the following output:

 Default: 999,999,999.99
 Deutscher: 999.999.999,99

Running the same application on a machine with a default locale of "de_DE" produces the following output:

 Default: 999.999.999,99
 Deutscher: 999.999.999,99

As you can see, this application formats the first number for the appropriate Locale for the machine running the application, while the second number is always formatted as appropriate for Germany. This example shows how it’s possible to have an application process data in multiple Locales simultaneously, and to process data according to the default Locale as well as custom Locales.

A more complicated example of this would be a client/server type application, with a Java Swing GUI on one end capturing information from a user, and a Java server running on a machine accepting data from a number of different clients. In this case, a client application could accept information on various banking transactions, including dates and currency amounts, and then format it into an XML String and perform an HTTP POST to the server.

This application is simple enough to design when the client and server are running in a single Locale, but when you start to think about it in global terms, things get a bit more complicated. To begin with, the GUI must be designed to work correctly in various Locales. This not only means presenting text in the correct language, but also formatting menus, buttons and graphics as appropriate for the Locale in question. None of this affects the server design, but the data being sent to the server must be considered for i18n as well.

Let’s imagine two users, one in the U.S. and another in the U.K., both entering information for the current date, which is February 1, 2003. The American user would enter the current date as 02/01/2003, while the U.K. user would enter 01/02/2003. Both users click the "Submit" button on the GUI, and the server receives both bundles of data from the two GUIs at the same time. How does it know how to handle these dates?

There are a few ways around this problem. The first solution is to rely on the users by placing example text like "DD/MM/YYYY" next to the date field. In this way, the server can always expect dates using a single standard and thus remain basically i18n-ignorant. However, this method will not only result in a number of incorrect entries from users, but is also not very flexible. Imagine that the application is later installed in Germany. Suddenly, the server will have to distinguish between "1,000" as received from American, English and German users.

There are many ways to lift the burden from the user to provide consistent input to the server. The first is to use Formatters on the client side to create output that always looks the same to the server. Let’s say the server is running in the U.K., and we want all client output to be formatted correctly for the server. Here is an example of the type of method that can be used to ensure that output is formatted properly for the server:

/**
 * Takes a number as entered by a user and formats
 * it as is appropriate for the server.
 * @param     userNumber  The data entered by the user
 * @return    The number formatted for the server
 * @throws    ParseException thrown if the number
              cannot be parsed
 */
public static String formatNumber(String userNumber)
 throws ParseException
{
    NumberFormat    serverFormatter    = null;
    NumberFormat    localFormatter     = null;
    Number          number             = null;

    localFormatter  = NumberFormat.getInstance();
    serverFormatter = NumberFormat.getInstance(Locale.UK);

    number = localFormatter.parse(userNumber);
    return serverFormatter.format(number);
}

If an American user enters "1,000," this method will return "1,000" to be wrapped in XML. However, if a German user enters "1,000," this method will return "1." Using similar methods to process dates, currencies and anything else that may differ by Locale before wrapping it in XML and sending it to the server, will again ensure that your server can essentially remain ignorant about all localization issues. However, should your server’s Locale ever change for any reason (server relocation, etc.), you will have to update all of the clients to format the data for the server’s new Locale. Even if the clients all pull the server Locale from a configuration file (as they should), that’s still a sizable update.

A better option is to encode the Locale in the data being sent to the server. XML makes this very easy with the inclusion of the xml:lang attribute, which can exist at any element level. xml:lang specifies the language for all of the content within the element to which it is applied, including any children. However, xml:lang values set on children override any xml:lang values set on parents. This allows for XML documents to contain data represented as appropriate for any number of languages, which is very flexible, but can be a headache in certain circumstances. You may want to restrict its usage internally.

Let’s say the highest-level element in your XML data is the element. Your clients will simply need to add an xml:lang attribute, including the language and country codes of the client’s default Locale, separated by a '-' character. For example, a client running in Germany will produce output such as the following:


Back on the server-side, obtaining the xml:lang value and parsing its values to create a Locale instance isn’t difficult. Then, simply use that Locale to create instances of any Locale-aware classes (Formatters, etc.) in the server code, and you should have no i18n-related issues on the server, regardless of where your clients or server are installed.

I hope this overview of the Java Locale class has been helpful. We’ll be looking into other Java i18n-related classes in detail in future articles and will continue to demonstrate applications for their use.


Tim Stevens is a software engineer for LexisNexis, specializing in Java software development. He recently worked on a small team to develop a corporate standard for Java internationalization. Tim also contributes articles and reviews to a number of gaming related sites and magazines and can be reached at .




Contents


LISA Business Data

LISA Publications Catalog

Industry Insights Reports

Best Practice Guides

Surveys

QA Model

Forum Summaries and Presentations

LISA Globalization Consulting Network

Webinars and TouchPoint Advisory Calls


Join LISA

Subscribe


Upcoming Events

LISA Forum USA
(Foster City, California, April 13–16, 2010)

LISA@Chinasoft Fair
(Chengdu, China)

LISA Forum Asia
(Suzhou, June 28–July 1, 2010)

LISA Forum Europe
(Budapest, October, 2010)

LISA Forum India
(New Delhi, December, 2010)


Open StandardsTBXTMX

Terminology SIG

Job and CV Postings