|
Globalization of Voice Applications: It’s Only the Beginning! (part 2)
Installment 2 of 2
Globalizing software (creating software for multiple languages and locales), and the follow-on process of localization, is challenging enough for “normal” software products and not-too-complex web sites. However, when it comes to one of the “newest kids on the block,” voice-enabled applications, the fun really begins. There are only a handful of voice technology providers who have attempted to create globalized solutions, and Oracle Corporation is one of them. Recently, LISA interviewed Curtis Tuckey, Director, and Ashish Vora, Senior Speech Applications Engineer, at Oracle’s Voice Laboratory in Chicago in the U.S., to gain insight into their vision for voice application globalization. In installment one, the two men outlined Oracle’s voice applications strategy, as well as the business and technical challenges that lie ahead. In the second installment in this issue of the Globalization Insider, they:
If you would like to meet Curtis Tuckey or Ashish Vora in person to increase your knowledge of voice-enabled applications, plan to attend their presentations at the LISA FORUM EUROPE: “Managing Content - Moving Markets: Streamlining Global Workflow Through Content Management,” to be held in London from June 30-July 3, 2003.
What standards exist in the voice applications industry? What groups are driving these standards? Unfortunately, there are very few standards in voice applications development. Within the Internet application space, we have started to see an effort at standardization driven by the emergence of markup-driven application development languages such as VoiceXML. The VoiceXML specification actually incorporates aspects of several other specifications including Speech Recognition Grammar Specification (SRGS), Semantic Interpretation of SRGS (SI) and Speech Synthesis Markup Language (SSML). There is another proposal for a voice application development language called Speech Application Language Tags (SALT) that is being pushed by Microsoft. Additionally, there are a variety of standards and specifications for lower level, telephony-related issues, including Call Control XML (CCXML), Session Initiation Protocol (SIP), Parlay, Java APIs for Integrated Networks (JAIN), etc. ![]() There are a variety of groups driving these standards efforts. Most of the VoiceXML efforts (as well as CCXML) are being driven by the W3C. The W3C is also actively exploring the creation of a new language for multimodal application development (applications with both visual and voice-based interfaces). VoiceXML will form a significant part of this new language. Microsoft’s SALT proposals are being driven by the SALT Forum. Many of the various telephony specifications have their own working groups helping to drive the definition of the specification. For example, SIP is being driven by the SIP Forum, Parlay is organized by the Parlay Group and JAIN is being led by the Java Community Process. What standards does Oracle support in this field and why? ![]() Oracle supports a number of standards in the field of voice applications. The Oracle9iAS platform supports all of the W3C proposals in the context of voice application development. We are active participants in working groups related to VoiceXML Interoperability and Conformance. We feel strongly that VoiceXML provides a good model for application development and that there is a large enough development community behind the specification to ensure its success. Why does Oracle view globalization as one of the critical driving factors in the adoption of voice-enabled applications? ![]() Globalization is a driving factor in the adoption of voice-enabled applications quite simply because having more applications available in more languages increases the reach of any software offering. More specifically, there are two main reasons to treat globalization as a critical factor in voice applications:
What are the shortcomings of current internationalization/localization practices as applied to voice apps? Without going into too much technical detail (please refer to Globalization of Voice Applications: Issues, Approaches and Challenges for the Future for a more in-depth treatment of this question), current internationalization/localization practices for screen-based applications within Oracle follow four main guidelines:
For voice application development, several of these guidelines fall short, namely the guideline to minimize freeform user input interactions and binary resource files. Many interactions in voice applications tend to approximate freeform user input because there are often a variety of different inputs (synonyms) that map to a particular behavior. As voice applications become more sophisticated and make use of more conversational interfaces, this problem is exacerbated as it becomes necessary to do semantic evaluation of the input being passed to the application. As far as binary resource files go, many voice applications make use of professionally recorded audio files to output content to users that are analogous to image files in terms of their complexity for translation. Beyond these shortcomings, voice applications also create certain new requirements. Foremost among these is the need to properly present all data to the user in a way that achieves maximum understandability. Because there is no visual or spatial awareness associated with a voice application, it is imperative that voice applications properly format content so that it is free of abbreviations and symbols that may have ambiguous pronunciations. This is especially true for various types of content that need to be presented in a locale-aware fashion – dates, times, currencies, etc. In screen-based applications, this content must be formatted according to the conventions of a particular locale, but often this simply affects the ordering of elements. The final representation of the information still relies on numeric and symbolic information that a user can interpret when viewing it, e.g., a string written as “6/2/2003” can be interpreted as a date that means the second day of June in the U.S. versus the sixth day of February in Great Britain. For voice applications, this level of formatting is insufficient. There is a larger variety of platforms to which to write speech applications than for screen-based Internet applications. If your goal is to write a truly portable voice application, this fact – combined with the variations in the implementation of the VoiceXML specification by different platform providers – presents a huge set of challenges. Even if you are only planning on running against a single VoiceXML browser, variations in the underlying ASR and TTS engines can cause your application to create a different user experience, or in the worst case, not work at all. How is Oracle addressing these shortcomings? Oracle has defined the Voice Globalization Framework to cover two main aspects: application output and application input. On the application output side, Oracle has put together two pieces of technology. The first of these is the Structured Datatype Expansion Framework (SDEF) that takes various primitive datatypes as input and formats it as the fully spelled-out, correctly localized interpretation of that datatype. The SDEF allows application developers to write applications free of abbreviations and symbolic representations of data. Once content has been expanded correctly, the remaining task is to associate pre-recorded audio files with that content to achieve a professional sounding interface. In order to accomplish this, Oracle has created the Concatenative Speech Server (CSS) that is a domain-specific, text-to-speech synthesis system. Basically, application developers create application- or domain-specific libraries that contain mappings between text strings and audio files. The CSS can then use these mappings to match strings of textual content and replace them with the matching audio file reference. Voice application input presents its own set of challenges. Again, more detailed information on these issues is provided in the whitepaper. Basically, voice application input is facilitated through the use of grammars that are codified representations of words or phrases that may be spoken by a user. There are a variety of grammars that require internationalization; for the initial version of the Voice Globalization Framework, Oracle decided to address one of these – the VoiceXML Builtin Grammars. In an effort to simplify voice application development, the VoiceXML specification defines certain basic input grammars for a handful of basic datatypes such as dates, times, numbers, digits, etc. Unfortunately, the implementation of these builtin grammars varies greatly from implementation to implementation of VoiceXML. Furthermore, the specification provides very little direction on how these grammars are to be handled for other languages. Therefore, in an effort to create some standardization around this, Oracle has created the Oracle Global Builtin Grammars (OGBG) that enforce a standard set of functionality on the builtin grammars, both across VoiceXML platforms and across languages. There has been on-going work in Natural Language Processing (NLP) for many years. In an ideal world, what should NLP technology be able to deliver for globalized voice-enabled applications? In an ideal world, the promise of NLP is conversational voice interfaces with a minimal amount of effort required to constrain the types of input at application development time. Thus, an application developer could write an application without really knowing what a user might say, and the NLP processing engine would be able to recognize arbitrary speech and perform some useful instructions based on its recognition results. Unfortunately, the reality is that NLP is a really difficult problem, and we have yet to see it done in an effective way, even for English. Expanding the complexity of this problem to many other languages only increases the challenges that NLP researchers face, but we certainly look forward to breakthroughs in this space in the years to come. What recommendations can you provide to content creators and localization vendors to enable them to become preferred vendors to voice applications developers? Here’s what they can do to prepare:
What can LISA do to help bridge the gap between all of the various stakeholders (platform providers, voice applications developers, NLP researchers, content and localization vendors, etc.)? ![]() We think LISA is in an excellent position to help drive innovation among the different groups that interact with the voice application development process. In particular, we would like to see LISA take an active role in the following areas:
Are there additional requirements for support organizations when supporting global voice-enabled applications? From a production standpoint, there are not any special requirements for supporting global voice-enabled applications. But, when it comes to deployment of these applications, there can be some requirements placed on support organizations. For example, though there is no special configuration required of the actual application server, a support organization will have to build up the voice gateway infrastructure in each of the target languages. Additionally, once an application goes live, it is necessary to have help desks trained on the various language-specific versions. For more detailed technical information, please consult Globalization of Voice Applications: Issues, Approaches and Challenges for the Future, a white paper by Ashish Vora, available only on the LISA web site. is Director of the Voice Laboratory at Oracle Corporation. Before joining Oracle, he held various research and development positions at Motorola, Lucent Technologies, AT&T and General Motors. He holds a Ph.D. in mathematics from the University of Wisconsin and can be reached at curtis.tuckey@oracle.com. , is a Senior Speech Applications Engineer in the Voice Laboratory at Oracle Corporation. He has developed a set of voice applications that ship with Oracle9i Application Server Wireless & Voice, co-authored an integration and acceptance process for voice gateway vendors and created an architecture to simplify the globalization of voice applications. He holds a B.S. degree in Computer Science from Stanford University and can be reached at ashish.vora@oracle.com. |
LISA Business Data Forum Summaries and Presentations LISA Globalization Consulting Network Webinars and TouchPoint Advisory Calls LISA Forum USA LISA@Chinasoft Fair LISA Forum Asia LISA Forum Europe LISA Forum India Open Standards • TBX • TMX |
||