|
In this issue…
Focus on StandardsOAXAL: What Is It and Why Should I Care?
As we watch the ever increasing adoption of XML in the publishing domain, it becomes more obvious that certain things were missing from the original, standard perspective. As proprietary solutions quickly appeared to plug the gaps, they came with the commensurate drawbacks: a lack of openness and a lack of transparency. Fortunately, proprietary solutions are no longer the only solution in the publishing domain. Open Standards are now being applied successfully to meet the challenges.
One of the most fundamental aspects of Open Standards is how well-designed they are. It is not surprising: provide a group of involved industry experts with a democratic charter, peer and public review processes, and the result is usually a well-designed solution. I never cease to be surprised how much better Open Standards based solutions are than their proprietary equivalents. No wonder many people refer to RTF as “really terrible format!” OAXAL allows the creation of open and effective solutions for technical publishing Open Standards also provide an example of IT best practice: You create a specification by talking to all of the interested parties, publish the results for public comment, and then print the results. Yes, rarely does the process go according to plan, but the nature of standards allow for revision and review in light of practical feedback. Rather like democracy, they have the ability for self-correction. OAXAL – Open Architecture for XML Authoring and Localization Reference Model – is a newly founded OASIS reference architecture technical committee. It covers all aspects of technical publishing to allow the creation of open and effective solutions. OAXAL – The BasicsOne of the things that XML kicked off was a veritable explosion of Open Standards. The main reason for this is that XML provides the necessary extensible vocabulary. At the same time, there has been a dramatic reduction in communication costs that allow for cheap and regular teleconferencing throughout the world, which has made it much easier for people to collaborate worldwide on standards. ![]() Let’s take a look at all of these standards in detail: UnicodeThose of us who remember the 'bad old days' before Unicode, praise it every day. The various illogical and contradictory encoding schemes that made up the 'Tower of Babel' that preceded Unicode was the cause of much grief to anyone involved in the translation of electronic documents. XMLWhere would we be without XML? It has been a monumental standard that has given us the extensible language that was lacking previously. It was as if, finally, the IT industry was given a common language to enable everyone to talk to one another. It is not perfect, but rather like democracy, all of the alternatives are so much worse that anything else is not worth considering. Coming on the back of the lessons learned from SGML, it will remain for many years the fundamental building block of all sensible IT systems. Interestingly enough, the adoption of XML in the publishing industry has been much slower than that in computer science in general. There have been many reasons for this, but with Open Office and the latest version of Microsoft Office, the final hurdles have been breached.Here are two very important tips to allow you to leverage XML for globalization-related tasks:
W3C ITSITS stands for Internationalization Tag Set. It allows for the declaration of Document Rules for localization. In effect, it provides a vocabulary that allows the declaration of the following for a given XML document type such as DITA:
W3C ITS provides much more, including a namespace vocabulary that allows for finetuning localization for individual instances of elements within a document instance. W3C ITS is therefore at the core of localization processing. Standard XML VocabulariesDITA, DocBook, XHTML, SVG – all of these standards dramatically reduce the cost of XML adoption. One of the factors that initially limited the adoption of XML was the high cost of implementation since XML DTD and/or Schema definition is neither simple nor cheap. Not only can costs be reduced dramatically, but as is the case with DITA, these standard tools and utilities often introduce key advances in the way we understand, build and use electronic documentation. xml:tmxml:tm (xml:text memory) is a key standard from LISA OSCAR.xml:tm introduces a revolutionary approach to document localization Think of xml:tm as the standard for tracking changes in a document. It allocates a unique identifier to each translatable sentence or standalone piece of text in an XML document. It is a core element of OAXAL, as it links all of the other standards into an elegant, integrated system. At the core of xml:tm are the following concepts, which together make up 'Text Memory':
You can think of Author Memory in terms of change tracking, but also as a way to insure authoring consistency – a key concept in improving authoring quality and reducing translation costs. As far as Translation Memory (TM) is concerned, xml:tm introduces a revolutionary approach to document localization. It is very rare that a standard introduces such a fundamental change to an industry. Rather than separating memory from the document by storing all TM data away from the document in a relational database, xml:tm uses the document as the main repository with no duplication of data. This approach recognizes, fundamentally, that documents have a lifecycle, and that within that life cycle they evolve and change, and that at regular stages in that cycle, they require translation. SRXSRX (Segmentation Rules eXchange) is the LISA OSCAR standard for defining and exchanging segmentation rules. SRX uses an XML vocabulary to define the segmentation rules for a given language and to specify all of the exceptions. SRX uses Unicode regular expressions to achieve this. The key benefit of SRX is not so much exchange, as the ability to create industry-wide repositories for the segmentation rules for each language. To this end, companies such as Heartsome, Max Programs and LISA Member XML-INTL have donated their own rule sets to LISA. Unicode Technical Report 29Unicode does not end with the encoding of character sets. The technical reports, which form part of the standard and are included as an annex, are equally important. TR29 stands out as the way to define what constitutes words, characters and punctuation. If you are writing a tokenizer for text, Unicode TR29 is where you start.TMXTMX (Translation Memory eXchange) is the original standard from LISA OSCAR. It helped break the monopoly that proprietary systems had over translation memory content. TMX allows customers to change systems and Language Service Providers without loosing their TM assets. Before GMX/V, there was no standard for word or character counts! GMXGMX (Global information Management Metrics Exchange) is a three-part standard from LISA OSCAR that focuses on translation metrics. GMX/V defines what constitutes word and character counts, and allows for the exchange of metrics information within an XML vocabulary. Believe it or not, before GMX/V, there was no standard for word or character counts. GMX/V defines a canonical form for counting words and characters in a transparent and unambiguous way. The two associated standards, yet to be defined, will be GMX/C for complexity and GMX/Q for quality. Once the three GMX standards are available, they will provide a comprehensive way of defining a given localization task. XLIFFXLIFF (XML Localization Interchange File Format) is an OASIS standard for the exchange of data for translation. Rather than having to send full unprotected electronic documents for localization, with the inevitable problems of data and file corruption, XLIFF provides a loss-less way of round tripping text to be translated. Language Service Providers, rather than having to acquire/write filters for different file formats or XML vocabularies, have merely to be able to process XLIFF files, which can include translation memory matching, terminology, etc. Similarly, Computer Assisted Tool (CAT) providers have only one format to deal with, rather than a spectrum of original or proprietary exchange formats. Putting it all togetherAll of the above-mentioned standards can now be put together in the following elegant architecture: ![]() Why does this matter? The answer is simple, but not necessarily obvious at first glance. Prior to OAXAL, the typical workflow for a localization task looked as follows: ![]() This is how the vast majority of localization tasks are conducted. Each arrow is a potential point of failure, as well as being very labor-intensive. At the ASLIB conference in 2002, Professor Reinhard Schäler of the Localisation Research Centre at the University of Limerick presented the standard cost model for the Localization Industry as follows: ![]() Over half of the cost of a localization task is consumed in project management costs. This is a very error-prone and labor-intensive way of doing things. With OAXAL, you can automate the complete workflow as follows: ![]() This provides considerable cost savings and improves speed, efficiency and consistency. It also allows for a standard and consistent way of presenting the text to be translated via a browser interface which further removes many manual processes. The current generation of web 2.0 browsers allows for the creation of a fully functional translator workbench, including the ability to have multiple translators working on the same file, auto propagation of matches within the file being worked upon, and the support of infinitely large files. Translators' work is constantly saved, as well as written to a translation memory database, for immediate availability. ![]() The ProofBased on OAXAL, DocZone now provides an economic, XML publishing solution So much for the theory. None of this would be convincing without a reference implementation. Thankfully, OAXAL is backed up with a real-life, successful proof of concept product that is available on the market today: www.doczone.com. DocZone is a SaaS (Software-as-a-Service) solution for technical documentation. It is a comprehensive, web-based XML publishing solution that encompasses all aspects of XML authoring and localization and a full implementation of OAXAL. DocZone comprises an XML content management system, XML editor and the XTM suite of author memory and CAT tools from XML-INTL. In addition, it breaks the mold of expensive XML publishing systems and provides a subscription or pay-per-use model to remove the usual inhibitors to implementation of such systems. Up until this year, it would have been difficult to persuade people that an economic, XML publishing solution would be available. Thanks to open standards within an OAXAL framework, DocZone has achieved this. Andrzej Zydroń is a member of LISA OSCAR and Chair of the OASIS OAXAL Technical Committee (TC). He also sits on OASIS TCs for Translation Web Services, XLIFF and XLIFF segmentation, and serves as an invited expert to the W3C ITS TC. As CTO for XML-INTL Ltd., Zydroń is currently developing the next generation of XML-based text memory systems to reduce authoring and translation costs for documentation. Zydroń is fluent in English, Polish and French. |
![]() 8-12 December 2008 |
||