TMX 2.0 Specification Draft

OSCAR Working Draft - October 15, 2007

lisasig.gif

This version:

http://www.heartsome.org/tmx/tmx-03282007.html

Latest version:

http://www.heartsome.org/tmx/tmx.html

Previous version:

http://www.lisa.org/standards/tmx/tmx.html

Editor:

Rodolfo M. Raya <rmraya@heartsome.net>

Previous Editors:

Yves Savourel
Alan K. Melby

Copyright © The Localisation Industry Standards Association [ LISA ] 1997-2007. All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to LISA.

The limited permissions granted above are perpetual and will not be revoked by LISA or its successors or assigns.

This document and the information contained herein is provided on an "AS IS" basis and LISA DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Abstract

This document defines version 2.0 of the Translation Memory eXchange format (TMX). The purpose of the TMX format is to provide a standard method to describe translation memory data that is being exchanged among tools and/or translation vendors, while introducing little or no loss of critical data during the process.

Status of this Document

This document constitutes an initial draft for discussion. Comments may be sent to tmx2@lisa.org.

Table of Contents


Abstract

1. Introduction
1.1. XML Compliance
1.2. Character Encoding
1.3 Extensibility
1.3.1. Extension Points

2. General Structure
2.1. Header
2.2. Body

3. Detailed Specifications
3.1. Elements
3.1.1. Structural Elements
3.1.2. Inline Elements
3.2. Attributes
3.2.1. TMX Attributes
3.2.2. XML Namespace Attributes

4. Content Markup
4.1. Overview
4.2. Selection Rules for Inline Elements

5. TMX Compliance
5.1 Validation of TMX Files

6. Changes Since Previous Version (Non-Normative)
6.1 Backwards Compatibility

Appendices
A. Sample Document
B. XML Schema for TMX
C. Glossary
D. References
Normative
Non-Normative


1. Introduction

TMX is defined in two parts:

  • A specification of the format of the container (the higher-level elements that provide information about the file as a whole and about entries). In TMX, an entry consisting of aligned segments of text in two or more languages is called a Translation Unit (the <tu> element).

  • A specification of a low-level meta-markup format for the content of a segment of translation-memory text. In TMX, an individual segment of translation-memory text in a particular language is denoted by a <seg> element. See the section on Content Markup for more details.

1.1. XML Compliance

TMX is XML-compliant. The TMX vocabulary is defined using an XML Schema (see Appendix B) It also uses various third party standards for date/time and language codes. See the References section for more details.

TMX files are intended to be created automatically by export routines and processed automatically by import routines. TMX files are "well-formed" XML documents that can be processed without explicit reference to the TMX Schema. However, a "valid" TMX file must conform to the TMX Schema, and any suspicious TMX file should be verified against the TMX Schema using a validating XML parser.

Since XML syntax is case sensitive, any XML application must define casing conventions. All elements and attributes names of TMX are defined in lowercase.

The namespace for TMX 2.0 is defined as "http://www.lisa.org/tmx20". For example, if you want to use TMX in another XML document you document would look like this:

<?xml version="1.0"?>
<myformat xmlns:tmx="http://www.lisa.org/tmx20">
<data>
  <tmx:tmx version="2.0">
    <tmx:header ...
       ... TMX data ... 
    </tmx:body>      				
  </tmx:tmx>
</data>
</myformat>
		

1.2. Character Encoding

TMX files are always in Unicode. They can use either of three encoding methods: UTF-16 (16-bit files), UTF-8 (8-bit files) or ISO-646 [a.k.a. US-ASCII] (7-bit files).

In all cases, unlike in HTML, only the following five character entity references are allowed: &amp; (&), &lt; (<), &gt; (>), &apos; ('), and &quot; ("). For 7-bit files, extended (non-ASCII) characters are always represented by numeric character references. For example: &#x0396; or &#918; for a GREEK CAPITAL LETTER DELTA.

Since all XML processors must accept the UTF-8 and UTF-16 encodings and since US-ASCII is a subset of UTF-8, a TMX document can omit the encoding declaration in the XML declaration.

Note that UTF-16 files always start with the Unicode byte-order-mark (BOM) value: U+FEFF.

1.3 Extensibility

Although TMX provides a rich set of elements for exchanging Translation Memory data, sometimes it may be necessary to extend TMX vocabulary using XML Namespaces.

You can add non-TMX elements, as well as attributes and attribute values, to any TMX document. All foreign elements and attributes added to a TMX file must be defined using an XML Schema. All XML Schemas declared in a TMX document must be made available to permit validation of the foreign constructs included in the file.

Although TMX offers this extensibility mechanism, in order to avoid a nimiety of information and increase interoperability between tools, it is strongly recommended to use TMX capabilities whenever possible, rather than to create non-standard user-defined elements or attributes.

Applications that depend on TMX format for exchanging Translation Memory data are not required to understand and support non-TMX elements or attributes. A TMX application can safely ignore foreign elements or attributes present in a TMX document.

1.3.1. Extension Points

TMX supports the use of foreign XML elements in the following elements: <body>, <header>, <internal-file>, <tu> and <tuv>.

Foreign attributes can be added to any TMX element, provided that the attribute name is fully qualified with the corresponding namespace prefix.


2. General Structure

A TMX document is enclosed in a <tmx> root element. The <tmx> element contains two elements: <header> and <body>.

2.1. Header

The <header> contains meta-data about the document. In addition to its attributes, <header> can also store document-level information in <note> and <prop> elements. The SRX segmentation rules used to generate a TMX file can be included in the <header> using a <segmentation> element. Inline codes extracted from <seg> elements and replaced with <g> or <x/> elements are stored in the <header> inside an <inline-data> element.

2.2. Body

The <body> contains the collection of translation units (the <tu> elements). This collection is in no specific order.

Each <tu> element contains at least two translation unit variants (the <tuv> element). Each <tuv> contains the segment and the information pertaining to that segment for a given language.

The text itself is stored in the <seg> element, while <note> and <prop> allow you to store information specific to each <tuv>.

A segment can contain markup content elements: The <bpt>, <ept>, <g>, <ph> and <x/> elements allow you to encapsulate or replace original native inline codes. The <hi> element allows you to add extra markup not related to existing inline codes. And the <sub> element, used inside encapsulated inline code, allows you to delimit embedded translatable text.

See the Sample Document section for an example of TMX document.


3. Detailed Specifications

3.1. Elements

TMX elements are divided into two main categories: the structural elements (the container), and the inline elements (the content markup).

3.1.1. Structural Elements

The structural elements are the following:


<body>

Body - The <body> element encloses the main data, the set of <tu> elements that are comprised within the file.

Required attributes:

None.

Optional attributes:

None.

Contents:

Zero, one or more <tu> elements, followed by
Zero, one or more non-TMX elements.


<context>

Context Information - The <context> element describes the context of a <tu>. The purpose of this context information is to allow certain pieces of text to have different translations depending on where they came from. The translation of a piece of text may differ if it is a web form or a dialog or an Oracle form or a Lotus form for example. This information is thus required by a translator when working on the file. Likewise, the information may be used by any tool proposing to automatically leverage the text successfully.

Required attributes:

context-type.

Optional attributes:

None.

Contents:

Text.


<external-file>

External file - The <external-file> element specifies the location of the actual SRX file being referenced. The required href attribute provides a URL to the file. The crc attribute accepts a value that can be used to assure the integrity of the file. The optional uid attribute allows a unique ID to be assigned to the file.

Required attributes:

href.

Optional attributes:

crc, uid.

Contents:

Empty.


File header - The <header> element contains information pertaining to the whole document.

Required attributes:

creationtool, creationtoolversion, segtype, o-tmf, adminlang, srclang, datatype.

Optional attributes:

o-encoding, creationdate, creationid, changedate, changeid.

Contents:

Zero, one or more <note> or <prop> elements in any order, followed by
Zero or one <inline-data> element, followed by
Zero or one <segmentation> element, followed by
Zero, one or more non-TMX elements.


<inline-data>

Inline data - The <inline-data> element holds the elements with the information necessary to rebuild the inline tags in a translated document.

Required attributes:

None.

Optional attributes:

None.

Contents:

One or more <tag> elements.


<internal-file>

Internal file - The <internal-file> element contains the actual SRX file with the segmentation rules used when generating the TMX document.

Required attributes:

None.

Optional attributes:

None.

Contents:

One SRX file embedded using SRX namespace.


<note>

Note - The <note> element is used for comments.

Required attributes:

None.

Optional attributes:

creationdate, creationid, changedate, changeid, o-encoding, xml:lang.

Contents:

Text.


<prop>

Property - The <prop> element is used to define the various properties of the parent element (or of the document when <prop> is used in the <header> element). These properties are not defined by the standard.

As your tool is fully responsible for handling the content of a <prop> element, you can use it in any way you wish. For example the content can be a list of instructions your tool can parse, not only a simple text.

<prop name="user-defined">name:domain value:Computer science</prop>
<prop name="x-domain">Computer science</prop>

The <prop> element may be deprecated in future versions of TMX standard. Use attributes defined in a namespace different from TMX instead. See the Extensibility section for more information.

Required attributes:

name.

Optional attributes:

xml:lang, o-encoding.

Contents:

Tool-specific data or text.


<seg>

Segment - The <seg> element contains the text of the given segment. There is no length limitation to the content of a <seg> element. All spacing and line-breaking characters are significant within a <seg> element.

Required attributes:

None.

Optional attributes:

xml:space.

Contents:

Text data (without leading or trailing white spaces characters),
Zero, one or more of the following elements: <bpt>, <ept>, <g>, <hi>, <ph> and <x/>.
They can be in any order, except that each <bpt> element must have a subsequent corresponding <ept> element.


<segmentation>

Segmentation - The <segmentation> element points to or contains the SRX segmentation rules that were used in the generation of the TMX file.

Required attributes:

None.

Optional attributes:

None.

Contents:

Either exactly one <internal-file> or one <external-file> element.


<tag>

Tag - The <tag> element contains the actual inline information represented with <g> and <x/> in <seg> elements.

Required attributes:

id, type.

Optional attributes:

endmrk, o-encoding.

Contents:

Code data.


<tmx>

TMX document - The <tmx> element encloses all the other elements of the document.

Required attributes:

version.

Contents:

One <header> followed by
One <body> element.


<tu>

Translation unit - The <tu> element contains the data for a given translation unit.

Required attributes:

None.

Optional attributes:

tuid, o-encoding, datatype, usagecount, lastusagedate, creationtool, creationtoolversion, creationdate, creationid, changedate, segtype, changeid, o-tmf, srclang, group, g-order.

Contents:

Zero, one or more <note>, <prop> or <context> elements in any order, followed by
Two or more <tuv> elements, followed by
Zero, one or more non-TMX elements.


<tuv>

Translation Unit Variant - The <tuv> element specifies text in a given language.

Required attributes:

xml:lang.

Optional attributes:

o-encoding, datatype, usagecount, lastusagedate, creationtool, creationtoolversion, creationdate, creationid, changedate, changeid, o-tmf, xml:space.

Contents:

Zero, one or more <note>, or <prop> elements in any order, followed by
One <seg> element, followed by
Zero, one or more non-TMX elements.


3.1.2. Inline Elements

The inline elements are the elements that can appear inside the a segment. With the exception of the <hi> and <sub> element, they all enclose or replace any formatting or control codes that is not text but resides within the segment. See also the Content Markup section for more information.

The inline elements are the following:


<bpt>

Begin paired tag - The <bpt> element is used to delimit the beginning of a paired sequence of native codes. Each <bpt> has a corresponding <ept> element within the segment. A <btp> element must contain a <sub> element if the matching <ept> does not contain one.

Required attributes:

i, type.

Optional attributes:

equiv-text, x.

Contents:

Code data,
Zero, one or more <sub> elements.


<ept>

End paired tag - The <ept> element is used to delimit the end of a paired sequence of native codes. Each <ept> has a corresponding <bpt> element within the segment. A <btp> element must contain a <sub> element if the matching <bpt> does not contain one.

Required attributes:

i.

Optional attributes:

equiv-text.

Contents:

Code data,
Zero, one or more <sub> elements.


<g>

Generic group placeholder - The <g> element is used to replace any inline code of the original document that has a beginning and an end, does not overlap other paired inline codes and can be moved within its parent structural element. The actual inline data is stored in <tag> elements in the header of the file. The required xid attribute is used to reference the <tag> element that contains the replaced code.

Required attributes

xid, type.

Optional attributes:

equiv-text, x.

Contents:

Text data.


<hi>

Highlight - The <hi> element delimits a section of text that has special meaning, such as a terminological unit, a proper name, an item that should not be modified, etc. It can be used for various processing tasks. For example, to indicate to a Machine Translation tool proper names that should not be translated; for terminology verification, to mark suspect expressions after a grammar checking.

Required attributes:

type.

Optional attributes:

x, comment.

Contents:

Text data,
Zero, one or more of the following elements: <bpt>, <ept>, <g>, <hi>, <ph> and <x/>.
They can be in any order, except that each <bpt> element must have a subsequent corresponding <ept> element.


<ph>

Placeholder - The <ph> element is used to delimit a sequence of native standalone codes in the segment, or the initial or ending portion of a paired tag that does not have its matching code within the segment, that contains embedded translatable text.

Required attributes:

type.

Optional attributes:

x, assoc, equiv-text.

Contents:

Code data,
One or more <sub> elements.


<sub>

Sub-flow - The <sub> element is used to delimit sub-flow text inside a sequence of native code, for example: the definition of a footnote or the text of title in a HTML anchor element.

Here are some examples (translatable text underlined, sub-flow is bolded):

Footnote in RTF:

Original RTF:

Elephants{\cs16\super \chftn {\footnote \pard\plain \s15\widctlpar \f4\fs20
{\cs16\super \chftn } An elephant is a very large animal. }} are big.

TMX with content mark-up:

Elephants<ph type="fnote">{\cs16\super \chftn {\footnote \pard\plain \s15\widctlpar \f4\fs20
{\cs16\super \chftn } <sub type="fnote">An elephant is a very large animal. </sub>}}</ph> are big.

Index marker in RTF:

Original RTF:

Elephants{\pard\plain \widctlpar
\v\f4\fs20 {\xe {Big animal\bxe }}} are big.

TMX with content mark-up:

Elephants<ph type="index">{\pard\plain \widctlpar
\v\f4\fs20 {\xe {<sub type="index">Big animal </sub>\bxe }}}</ph> are big.

Text of an attribute in a HTML element:

Original HTML:

See the <A TITLE="Go to Notes "
HREF="notes.htm">Notes</A> for more details.

TMX with content mark-up:

See the <bpt i="1" type="link">&lt;A TITLE="<sub type="link">Go to Notes </sub>"
HREF="notes.htm"></bpt>Notes<ept i="1">&lt;/A></ept> for more details.

Note that sub-flow are related to segmentation and can cause interoperability issues when one tool uses sub-flow within its main segment, while another extract the sub-flow text as an independent segment.

Required attributes:

type.

Optional attributes:

datatype.

Contents:

Text data,
Zero, one or more of the following elements: <bpt>, <ept>, <g>, <ph>, <x/> and <hi>.
They can be in any order, except that each <bpt> element must have a subsequent corresponding <ept> element.


<x/>

Generic placeholder - The <x/> element is used to replace any inline code of the original document. The actual inline data is stored in <tag> elements in the header of the file. The required xid attribute is used to reference the <tag> element that contains the replaced code.

Required attributes:

xid, type.

Optional attributes:

equiv-text, x.

Content:

Empty.

3.2. Attributes

This section lists the various attributes used in the TMX elements.

3.2.1. TMX Attributes
adminlang

Administrative language - Specifies the default language for the administrative and informative elements <note> and <prop>.

Value description:

A language code as described in the [RFC 4646]. Unlike the other TMX attributes, the values for adminlang are not case-sensitive.

Default value:

Undefined.

Used in:

<header>.


assoc

Association - Indicates the association of a <ph> with the text prior or after.

Value description:

"p" (the element is associated with the text preceding the element), "f" (the element is associated with the text following the element), or "b" (the element is associated with the text on both sides).

Default value:

Undefined.

Used in:

<ph>.


changedate

Change date - Specifies the date of the last modification of the element.

Value description:

Date in [ISO 8601] Format. The recommended pattern to use is: YYYYMMDDThhmmssZ
Where: YYYY is the year (4 digits), MM is the month (2 digits), DD is the day (2 digits), hh is the hours (2 digits), mm is the minutes (2 digits), ss is the second (2 digits), and Z indicates the time is UTC time. For example:

date="20020125T210600Z"
is January 25, 2002 at 9:06pm GMT
is January 25, 2002 at 2:06pm US Mountain Time
is January 26, 2002 at 6:06am Japan time

Default value:

Undefined.

Used in:

<header>, <tu>, <tuv>.


changeid

Change identifier - Specifies the identifier of the user who modified the element last.

Value description:

Text.

Default value:

Undefined.

Used in:

<header>, <tu>, <tuv>.


comment

Comment - a comment in a tag

Value description:

Text.

Default value:

Undefined.

Used in:

<hi>.


context-type

Context type - The context-type attribute specifies the context and the type of resource or style of the data of a given element. For example, to define if it is a label, or a menu item in the case of resource-type data, or the style in the case of document-related data.

Value description:

Text without spaces. Pre-defined values are as follow:

database

Indicates a database content.

element

Indicates the content of an element within an XML document.

elementtitle

Indicates the name of an element within an XML document.

linenumber

Indicates the line number from the sourcefile (see context-type="sourcefile") where the source text is found.

numparams

Indicates a the number of parameters contained within the source text.

paramnotes

Indicates notes pertaining to the parameters in the source text.

record

Indicates the content of a record within a database.

recordtitle

Indicates the name of a record within a database.

sourcefile

Indicates the original source file from which the TMX file is created.

In addition, user-defined values can be used with this attribute. A user-defined value must start with an "x-" prefix.

Default value:

Undefined.

Used in:

<context>.


creationdate

Creation date - Specifies the date of creation of the element.

Value description:

Date in [ISO 8601] Format. The recommended pattern to use is: YYYYMMDDThhmmssZ
Where: YYYY is the year (4 digits), MM is the month (2 digits), DD is the day (2 digits), hh is the hours (2 digits), mm is the minutes (2 digits), ss is the second (2 digits), and Z indicates the time is UTC time. For example:

date="20020125T210600Z"
is January 25, 2002 at 9:06pm GMT
is January 25, 2002 at 2:06pm US Mountain Time
is January 26, 2002 at 6:06am Japan time

Default value:

Undefined.

Used in:

<header>, <tu>, <tuv>.


creationid

Creation identifier - Specifies the identifier of the user who created the element.

Value description:

Text.

Default value:

Undefined.

Used in:

<header>, <tu>, <tuv>.


creationtool

Creation tool - Identifies the tool that created the TMX document. Its possible values are not specified by the standard but each tool provider should publish the string identifier it uses.

Value description:

Text.

Default value:

Undefined.

Used in:

<header>, <tu>, <tuv>.


creationtoolversion

Creation tool version - Identifies the version of the tool that created the TMX document. Its possible values are not specified by the standard but each tool provider should publish the string identifier it uses.

Value description:

Text.

Default value:

Undefined.

Used in:

<header>, <tu>, <tuv>.


crc

Cyclic redundancy checking - A private value used to verify data as it is returned to the producer. The generation and verification of this number is tool-specific.

Value description:

Number (possibly not decimal).

Default value:

Undefined.

Used in:

<external-file>.


datatype

Data type - Specifies the type of data contained in the element. Depending on that type, you may apply different processes to the data.

Value description:

Text. The recommended values for the datatype attribute are as follow:

unknown

undefined (default)

alptext

WinJoust data.

cdf

Channel Definition Format.

cmx

Corel CMX Format.

cpp

C and C++ style text.

hptag

HP-Tag.

html

HTML, DHTML, etc.

interleaf

Interleaf documents.

ipf

IPF/BookMaster.

java

Java, source and property files.

javascript

JavaScript, ECMAScript scripts.

lisp

Lisp.

mif

Framemaker MIF, MML, etc.

opentag

OpenTag data.

pascal

Pascal, Delphi style text.

plaintext

Plain text.

pm

PageMaker.

resx

Windows .NET resources.

rtf

Rich Text Format.

sgml

SGML.

stf-f

S-Tagger for FrameMaker.

stf-i

S-Tagger for Interleaf.

transit

Transit data.

vbscript

Visual Basic scripts.

winres

Windows resources from RC, DLL, EXE.

xliff

XLIFF (XML Localization Interchange File Format).

xml

XML.

xptag

Quark XPressTag.

In addition, user-defined values can be used with this attribute. A user-defined value must start with an "x-" prefix.

Default value:

"unknown".

Used in:

<header>, <tu>, <tuv>, <sub>.


endmrk

End marker - The "endmrk" contains the formatting code of a closing tag replaced by a <g> element.

Value description:

Text.

Used in:

<g>.


equiv-text

Equivalent text - Indicates the equivalent text to substitute in place of an inline tag.

Value description:

Text.

Used in:

<bpt>, <ept>, <ph>, <g> and <x/>.


group

Group identifier - indicates that a given <tu> element belongs to a logical group of related translation units.

Value description:

Text without spaces.

Used in:

<tu>


g-order

Group order - defines the order of the <tu> within a given logical group. Used together with group attribute.

Value description:

Number starting in 1 and incremented in steps of 1 unit. Must be unique within each logical group defined with the group attribute. Its initial value is reset to 1 in each logical group.


i

Internal matching - The "i" attribute is used to pair the <bpt> elements with <ept> elements. This mechanism provides TMX with support to markup a possibly overlapping range of codes. Such constructions are not used often, however several formats allow them. For example, the following HTML segment, even if not strictly legal, is accepted by some HTML editors and usually interpreted correctly by the browsers.

For example:

[----------------------------]
<B>Bold <I>Bold and Italic</B> Italics</I>
        [--------------------------------]

With the TMX content mark-up, since the <ept> element does not have a type, it can be difficult to know which sequence of codes it closes as illustrated by the following segment:

TMX (with incomplete content mark-up):

<bpt> &lt;B></bpt> Bold,
<bpt> &lt;I></bpt> Bold+Italic<ept> &lt;/B></ept> ,
Italic<ept> &lt;/I></ept>

The attribute i is used to specify which <ept> is closing which <bpt>:

TMX (with correct content mark-up):

<bpt i="1" x="1" type="bold"> &lt;B></bpt> Bold,
<bpt i="2" x="1" type="italic"> &lt;I></bpt> Bold+Italic<ept i="1"> &lt;/B></ept> ,
Italic<ept i="2"> &lt;/I></ept>

Value description:

Number starting in 1 and incremented in steps of 1 unit. Must be unique for each <bpt> within a given <seg> element. Its initial value is reset to 1 in every <seg> element.

Default value:

Undefined.

Used in:

<bpt>, <ept>.


href

Hypertext reference - The "href" attribute contains a valid URL that describes the location of a file.

Value description:

Text.

Default value:

Undefined.

Used in:

<external-file>.


id

Identifier - The "id" attribute is used in <tag> elements as unique identifier. The value of the "id" attribute is determined by the tool creating the TMX file and must be unique within the document.

Value description:

Text, matching the Name production as required by the ID attribute type in XML standard.

Default value:

Undefined.

Used in:

<tag>.


lastusagedate

Last usage date - Specifies when the last time the content of a <tu> or <tuv> element was used in the original translation memory environment.

Value description:

Date in [ISO 8601] Format. The recommended pattern to use is: YYYYMMDDThhmmssZ
Where: YYYY is the year (4 digits), MM is the month (2 digits), DD is the day (2 digits), hh is the hours (2 digits), mm is the minutes (2 digits), ss is the second (2 digits), and Z indicates the time is UTC time. For example:

date="20020125T210600Z"
is January 25, 2002 at 9:06pm GMT
is January 25, 2002 at 2:06pm US Mountain Time
is January 26, 2002 at 6:06am Japan time

Default value:

Undefined.

Used in:

<tu>, <tuv>.


name

Property name - Tool specific name used to identify the type of a <prop> element.

Value description:

Text.

Default value:

Undefined.

Used in:

<prop>.


o-encoding

Original encoding - As stated in the Encoding section, all TMX documents are in Unicode. However, it is sometimes useful to know what code set was used to encode text that was converted to Unicode for purposes of interchange. The o-encoding attribute specifies the original or preferred code set of the data of the element in case it is to be re-encoded in a non-Unicode code set.

Value description:

One of the [IANA] recommended "charset identifier", if possible.

Default value:

Undefined.

Used in:

<header>, <tag>, <tu>, <tuv>, <note>, <prop>.


o-tmf

Original translation memory format - Specifies the format of the translation memory file from which the TMX document or segment thereof have been generated.

Value description:

Text.

Default value:

Undefined.

Used in:

<header>, <tu>, <tuv>.


segtype

Segment type - Specifies the kind of segmentation used in the <tu> element. If a <tu> element does not have a segtype attribute specified, it uses the one defined in the <header> element.

The "block" value is used when the segment does not correspond to one of the other values, for example when you want to store a chapter composed of several paragraphs in a single <tu>.

<tu segtype="block">
<prop type="x-sentbreak">$#$</prop>
<tuv xml:lang="en"><seg>This is the first paragraph of a big section.$#$
This is the second paragraph.$#$This is the third.</seg></tuv>
</tu>

In the example above the property "x-sentbreak" defines the token used to indicate the separation between sentences within the block of text. You can therefore easily break down the segment into smaller units if needed. You can imagine many other ways to use this mechanism.

A TMX file can include sentence level segmentation for maximum portability, so it is recommended that you use such segmentation rather than a specific, proprietary method like the one above.

The rules on how the text was segmented can be carried in a Segmentation Rules eXchange (SRX) document.

Value description:

"block", "paragraph", "sentence", or "phrase".

Default value:

Undefined.

Used in:

<header>, <tu>.


srclang

Source language - Specifies the language of the source text. In other words, the <tuv> holding the source segment will have its xml:lang attribute set to the same value as srclang. (except if srclang is set to "*all*"). If a <tu> element does not have a srclang attribute specified, it uses the one defined in the <header> element.

Value description:

A language code as described in the [RFC 4646], or the value "*all*" if any language can be used as the source language. Unlike the other TMX attributes, the values for srclang are not case-sensitive.

Default value:

Undefined.

Used in:

<header>, <tu>.


tuid

Translation unit identifier - Specifies an identifier for the <tu> element. Its value must be unique within the file.

Value description:

Text without spaces.

Default value:

Undefined.

Used in:

<tu>.


type

Type - Specifies the kind of data a <bpt>, <g>, <hi>, <ph>, <sub> or <x/> element represents.

Value description:

Text. Depends on the element where the attribute is used.

The recommended values for the type attribute, when used in <bpt> and <g>are as follow:

bold

Bold.

color

Color change.

dulined

Doubled-underlined.

font

Font change.

italic

Italic.

link

Linked text.

scap

Small caps.

struct

XML/SGML structure.

ulined

Underlined.

xliff-bpt

XLIFF <bpt> tag.

xliff-g

XLIFF <g> tag.

The recommended values for the type attribute, when used in <ph> and <x/> are as follow:

index

Index marker.

date

Date.

time

Time.

fnote

Footnote.

enote

End-note.

alt

Alternate text.

image

Image

pb

Page break.

lb

Line break.

cb

column break.

inset

Inset.

xliff-bx

XLIFF <bx/> tag.

xliff-ex

XLIFF <ex/> tag.

xliff-it

XLIFF <it> tag.

xliff-ph

XLIFF <ph> tag.

xliff-x

XLIFF <x/> tag.

The recommended values for the type attribute, when used in <hi> are as follow:

abbrev

Indicates the marked text is an abbreviation.

abbreviated-form

ISO-12620 2.1.8: A term resulting from the omission of any part of the full term while designating the same concept.

abbreviation

ISO-12620 2.1.8.1: An abbreviated form of a simple term resulting from the omission of some of its letters (e.g. 'adj.' for 'adjective').

acronym

ISO-12620 2.1.8.4: An abbreviated form of a term made up of letters from the full form of a multi-word term strung together into a sequence pronounced only syllabically (e.g. 'radar' for 'radio detecting and ranging').

appellation

ISO-12620: A proper-name term, such as the name of an agency or other proper entity.

collocation

ISO-12620 2.1.18.1: A recurrent word combination characterized by cohesion in that the components of the collocation must co-occur within an utterance or series of utterances, even though they do not necessarily have to maintain immediate proximity to one another.

common-name

ISO-12620 2.1.5: A synonym for an international scientific term that is used in general discourse in a given language.

datetime

Indicates the marked text is a date and/or time.

equation

ISO-12620 2.1.15: An expression used to represent a concept based on a statement that two mathematical expressions are, for instance, equal as identified by the equal sign (=), or assigned to one another by a similar sign.

expanded-form

ISO-12620 2.1.7: The complete representation of a term for which there is an abbreviated form.

formula

ISO-12620 2.1.14: Figures, symbols or the like used to express a concept briefly, such as a mathematical or chemical formula.

head-term

ISO-12620 2.1.1: The concept designation that has been chosen to head a terminological record.

initialism

ISO-12620 2.1.8.3: An abbreviated form of a term consisting of some of the initial letters of the words making up a multi-word term or the term elements making up a compound term when these letters are pronounced individually (e.g. 'BSE' for 'bovine spongiform encephalopathy').

international-scientific​-term

ISO-12620 2.1.4: A term that is part of an international scientific nomenclature as adopted by an appropriate scientific body.

internationalism

ISO-12620 2.1.6: A term that has the same or nearly identical orthographic or phonemic form in many languages.

logical-expression

ISO-12620 2.1.16: An expression used to represent a concept based on mathematical or logical relations, such as statements of inequality, set relationships, Boolean operations, and the like.

materials-management​-unit

ISO-12620 2.1.17: A unit to track object.

name

Indicates the marked text is a name.

near-synonym

ISO-12620 2.1.3: A term that represents the same or a very similar concept as another term in the same language, but for which interchangeability is limited to some contexts and inapplicable in others.

part-number

ISO-12620 2.1.17.2: A unique alphanumeric designation assigned to an object in a manufacturing system.

phrase

Indicates the marked text is a phrase.

phraseological-unit

ISO-12620 2.1.18: Any group of two or more words that form a unit, the meaning of which frequently cannot be deduced based on the combined sense of the words making up the phrase.

protected

Indicates the marked text should not be translated.

romanized-form

ISO-12620 2.1.12: A form of a term resulting from an operation whereby non-Latin writing systems are converted to the Latin alphabet.

set-phrase

ISO-12620 2.1.18.2: A fixed, lexicalized phrase.

short-form

ISO-12620 2.1.8.2: A variant of a multi-word term that includes fewer words than the full form of the term (e.g. 'Group of Twenty-four' for 'Intergovernmental Group of Twenty-four on International Monetary Affairs').

sku

ISO-12620 2.1.17.1: Stock keeping unit, an inventory item identified by a unique alphanumeric designation assigned to an object in an inventory control system.

standard-text

ISO-12620 2.1.19: A fixed chunk of recurring text.

symbol

ISO-12620 2.1.13: A designation of a concept by letters, numerals, pictograms or any combination thereof.

synonym

ISO-12620 2.1.2: Any term that represents the same or a very similar concept as the main entry term in a term entry.

synonymous-phrase

ISO-12620 2.1.18.3: Phraseological unit in a language that expresses the same semantic content as another phrase in that same language.

term

Indicates the marked text is a term.

transcribed-form

ISO-12620 2.1.11: A form of a term resulting from an operation whereby the characters of one writing system are represented by characters from another writing system, taking into account the pronunciation of the characters converted.

transliterated-form

ISO-12620 2.1.10: A form of a term resulting from an operation whereby the characters of an alphabetic writing system are represented by characters from another alphabetic writing system.

truncated-term

ISO-12620 2.1.8.5: An abbreviated form of a term resulting from the omission of one or more term elements or syllables (e.g. 'flu' for 'influenza').

variant

ISO-12620 2.1.9: One of the alternate forms of a term.

Any of the suggested values listed in the tables above can be used with <sub> element.

The values listed for <bpt>/<g> and <ph>/<x/> can be used with <tag>

In addition, user-defined values can be used with this attribute. A user-defined value must start with an "x-" prefix.

Default value:

Undefined.

Used in:

<prop>, <bpt>, <ph>, <hi>, <sub>, <x>.


uid

Unique ID - The "uid" attribute is used to provide a unique ID to identify the file that contains the segmentation rules used when generating the TMX document.

Value description:

Text.

Default value:

Undefined.

Used in:

<external-file>.


usagecount

Usage count - Specifies the number of times a <tu> or the content of the <tuv> element has been accessed in the original TM environment.

Value description:

Number.

Default value:

Undefined.

Used in:

<tu>, <tuv>.


version

TMX version - The version attribute indicates the version of the TMX format to which the document conforms.

Value description:

Fixed text: the major version number, a period, and the minor version number. For example: version="2.0".

Default value:

"2.0"

Used in:

<tmx>.


x

External matching - The x attribute is used to match inline elements <bpt>, <ph>, and <hi> between each <tuv> element of a given <tu> element. This mechanism facilitates the pairing of allied codes in source and target text, even if the order of code occurrence differs between the two because of the translation syntax. Note that an <ept> element is matched based on x attribute of its corresponding <bpt> element.

For example:

<seg>link to <bpt i="1" type="link" x="1">&amp;a href="www.mysite.com" title="<sub type="x-title">my site</sub>"&gt;</bpt>my web site<ept i="1">&lt;/a&gt;,</ept> and this is <ph type="image" x="2">&lt;img src="john.gif" alt="<sub type="alt">John's picture</sub>"/&gt;</ph> John.</seg>

<seg>enlace a <bpt i="1" type="link" x="1">&amp;a href="www.mysite.com/es" title="<sub type="x-title">mi sitio</sub>"&gt;</bpt>mi sitio web<ept i="1">&lt;/a&gt;,</ept> y este es <ph type="image" x="2">&lt;img src="juan.gif" alt="<sub type="alt">foto de Juan</sub>"/&gt;</ph> Juan.</seg>

Value description:

Number starting in 1 and incremented in steps of 1 unit. Must be unique within a given <seg> element. Its initial value is reset to 1 in every <seg> element.

Default value:

Undefined.

Used in:

<bpt>, <ph>, <g>, <x/>, <hi>.


xid

External Identifier - The "xid" attribute is used in <g> or <x/> elements to reference the id attribute of the <tag> element that contains the original corresponding code data or format replaced by the given element.

Value description:

The value of the referenced id.

Default value:

Undefined.

Used in:

<g> and <x/>.

3.2.2. XML Namespace Attributes
xml:lang

Language - The "xml:lang" attribute specifies the locale of the text of a given element.

Value description:

A language code as described in the [RFC 4646]. This declared value is considered to apply to all elements within the content of the element where it is specified, unless overridden with another instance of the xml:lang attribute. Unlike the other TMX attributes, the values for xml:lang are not case-sensitive. For more information see the section on xml:lang in the XML specification.

Default value:

Undefined.

Used in:

<tuv>, <note>, <prop>.


xml:space

White spaces - The "xml:space" attribute specifies how white spaces (ASCII spaces, tabs and line-breaks) should be treated.

Value description:

default or preserve. The value default signals that an application's default white-space processing modes are acceptable for this element; the value preserve indicates the intent that applications preserve all the white space. This declared intent is considered to apply to all elements within the content of the element where it is specified, unless overridden with another instance of the xml:space attribute. For more information see the section on xml:space in the XML specification.

Default value:

default.

Used in:

<seg>


4. Content Markup

4.1. Overview

Each TM system uses a different method of marking up the formatting. Formats are constantly evolving, and new formats will be introduced on a regular basis. Attempting to collect, interpret, disseminate and maintain finite descriptions of each formatting tag used at any given time by any of the TM systems is not possible.

The best way to deal with these native codes is to delimit them by a specific set of elements that convey where they begin and end, and possibly additional information about what they are (bold, italic, footnote, etc.).

Native codes can be grouped into three categories:

  • Codes that either begin or end an instruction, and whose beginning and ending functions both appear within a single segment. For example, an instruction to begin embolden for a range of words which is then followed in the same segment by an instruction to end bold formatting.

  • Codes that either begin or end an instruction, but whose beginning and ending functions are not both contained within a single segment. For example, an instruction to embolden text may apply to the first three sentences in a paragraph, but the instruction to turn off bolding may only appear at the end of the third sentence. Its beginning instruction is present in the first segment, while its closing tag is present in the third segment.

  • Codes that represent self-contained functions that do not require explicit ending instructions. An image or cross-reference token are examples of these standalone codes, or codes that have unknown behavior.

Content markup can also be classified, using a different point of view, in two categories:

  • Native codes that contain embedded translatable text. For example, the "alt" attribute used in links and images in HTML documents.

  • Pure native code. For example, the <br/> tag in HTML.

The element <sub> is provided to delimit sub-flow text within a sequence of native codes. For instance, if the text content of a footnote is defined within the footnote marker code, it may be demarked with the <sub> element.

4.2. Selection Rules for Inline Elements

Combining the two classification criteria listed in the previous sub-section, the rules for selecting the inline tags used to mark up each category of native code sequences are:

  1. Use <bpt> and <ept> elements to enclose paired sequences of native code that begin and end within the <seg> element and contain translatable text in either the initial or final sequence, requiring the use of a <sub> element.

  2. Use a <g> element to replace paired native codes that begin and end within the segment and don't contain translatable text. The replaced sequences of native codes are stored in a <tag> element in the <inline-data> element of the <header> of the TMX document.

  3. Use a <ph> element to enclose a standalone sequence of native code, or a paired code isolated from its partner, that contains translatable text and requires the use of a <sub> element.

  4. Use an <x/> element to replace any standalone sequence of native code, or paired code isolated from its partner, that doesn't contain translatable text. The replaced sequences of native codes are stored in a <tag> element in the <inline-data> element of the <header> of the TMX document.

Examples:

  1. Paired codes containing translatable text

    Source text:

    <p>link to <a href="www.mysite.com" title="My Site">my web site</a>.</p>

    Text with content markup:

    <seg>link to <bpt i="0" type="link">&amp;a href="www.mysite.com" title="<sub 
    type="x-title">My Site</sub>"&gt;</bpt>my web site<ept i="0">&lt;/a&gt;,</ept>.</seg>
  2. Paired codes without translatable text

    Source text:

    Text in {\i italics}.

    Text with content markup:

    ...
     <inline-data>
      <tag id="id2345" endmrk="}" type="italic">{\i </tag>
      ...
     </inline-data>
    </header>
    ...
    <seg>Text in <g xid="id2345" type="italic">italics</g>.</seg>
    ...
  3. Standalone sequence with translatable text

    Source text:

    ...
    This is <img src="john.gif" alt="John's picture"/> John.
    ...

    Text with content markup:

    ...
    <seg>This is <ph type="image">&lt;img src="john.gif" alt="<sub type="alt">John's picture</sub>"/&gt;</ph> John.</seg>
    ...
  4. Standalone sequence without translatable text

    Source text:

    text displayed in <br/> two lines.

    Text with content markup:

    ...
    <inline-data>
       <tag id="id457" type="lb">&lt;br/&gt;</tag>
    ...
    <seg>text displayed in <x xid="id457" type="lb"/> two lines.</seg>
    ...


5. TMX Compliance

TMX compliance is defined as follow:

  • Given:

    • An original document with inline codes (for example an HTML file) translated by a tool XYZ.

    • The translation memory of that document saved in TMX format, using <bpt>, <ept>, <g>, <ph> and <x/>, elements as described in section Selection Rules for Inline Elements.

    • The segmentation rules in SRX format used to break blocks of source text into smaller fragments, either embedded in the TMX document or referenced in an <external-file> element.

  • Assuming:

    • The translated segments do not have more or less tags than the source segments.

    • All non-TMX elements and attributes have been removed from the TMX file.

The tool XYZ supports TMX Export if the TMX document created by tool XYZ contains all the information required to re-create the translated document without loss of text, data or formatting.

The tool XYZ supports TMX Import if any TMX document containing all the information required to re-create the translated document (possibly created by a TMX Export compliant tool), can be imported in tool XYZ and effectively be used to re-create the translated document without loss of text, data or formatting.

Tools that offers both import and export features must support both TMX Import and TMX Export to be TMX compliant.

Whenever possible, the original formatting information should be included in the exported TMX file, enclosed in <bpt>, <ept> and <ph> elements or stored in <tag> elements in the header of the file.

Under especial circumnstances, for example if the source document is a binary file, it may not be possible to include the original source formatting codes in inline elements. In such cases, the formatting information necessary to build the translated document must be extracted from the source document. Nevertheless, all inline elements must still be present in the correct places —acting as empty placeholders— and they must comply with the Selection Rules for Inline Elements. When <g> and <x/> elements are used under these circumstances, they can point their xid attribute to a common empty <tag> element.

5.1 Validation of TMX Files

A cross-platform utility that validates TMX documents against TMX Schema and also verifies if they follow the requiremenst described in this document is included as part of TMX 2.0 specifications.

Source code of the validation tool is available for download in OSCAR's web site.


6. Changes Since Previous Version (Non-Normative)

The main changes in this version (2.0) relative to the previous version (1.4b) are as follows:

6.1 Backwards Compatibility

TMX 2.0 was designed to be compatible with TMX 1.4b. It should be possible to upgrade a valid TMX 1.4b file to 2.0 by:

  1. Removing any DOCTYPE declaration from the file

  2. Changing the value of version attribute from "1.4" to "2.0"

  3. Removing all elements and attributes that were deprecated in TMX 1.4 (i.e. <ut>)

  4. Replacing <bpt>/<ept> pairs and <ph> elements with <g> and <x/> as necessary to comply with the Selection Rules for Inline Elements.


A. Sample Document

<?xml version="1.0" encoding="UTF-8"?>
<tmx version="2.0" 
   xmlns="http://www.lisa.org/tmx20"
   xsi:schemaLocation="http://www.lisa.org/tmx20 tmx20.xsd"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:xyz="urn:myApps:xyz">
   <header creationtool="Sample Creator" creationtoolversion="1.1.1" 
      segtype="block" o-tmf="unknown" adminlang="en-US" srclang="*all*" datatype="x-sample">
      <inline-data>
         <tag id="id2345" endmrk="}" type="italic">{\i </tag>
         <tag id="id457" type="lb">&lt;br/&gt;</tag>
         <tag id="id458" type="lb">&lt;br/&gt;</tag>
      </inline-data>
      <segmentation>
         <internal-file xyz:myattribute="custom rules">
            <!-- Segmentation rules in SRX 2.0 format -->
            <srx:srx version="2.0" xmlns:srx="http://www.lisa.org/srx20">
               <srx:header segmentsubflows="yes" cascade="yes">
                  <srx:formathandle type="start" include="no"/>
                  <srx:formathandle type="end" include="yes"/>
                  <srx:formathandle type="isolated" include="yes"/>
               </srx:header>
               <srx:body>
                  <srx:languagerules>
                     <srx:languagerule languagerulename="Default">
                        <!-- Common rule for most languages -->
                        <srx:rule break="yes">
                           <srx:beforebreak>[\.\?!]+</srx:beforebreak>
                           <srx:afterbreak>\s</srx:afterbreak>
                        </srx:rule>
                     </srx:languagerule>
                  </srx:languagerules>
                  <srx:maprules>
                     <!-- Common breaking rules -->
                     <srx:languagemap languagepattern=".*" 
                       languagerulename="Default"/>
                  </srx:maprules>
               </srx:body>
            </srx:srx>
         </internal-file>
      </segmentation>
      <!-- Other elements -->
      <xyz:other />
   </header>   
   <body>
      <!-- Paired codes with translatable text -->
      <tu srclang="en-US" datatype="html" tuid="sample1">
         <tuv xml:lang="en" datatype="html">
            <seg>link to <bpt i="1" type="link" 
            x="1">&amp;a href="www.mysite.com" title="<sub type="x-title">my 
            site</sub>"&gt;</bpt>my web site<ept i="1">&lt;/a&gt;,</ept>.</seg>
         </tuv>
         <tuv xml:lang="es" datatype="html">
            <seg>enlace a <bpt i="1" type="link" x="1">&amp;a 
            href="www.mysite.com/es" title="<sub type="x-title">mi 
            sitio</sub>"&gt;</bpt>mi sitio web<ept i="1">&lt;/a&gt;,</ept>.</seg>
         </tuv>
      </tu>
      <!-- Paired codes without translatable text -->
      <tu datatype="rtf">
         <context context-type="x-my-context">text formatting options</context>
         <tuv xml:lang="en">
            <seg>Text in <g xid="id2345" type="italic">italics</g>.</seg>
         </tuv>
         <tuv xml:lang="fr">
            <seg>Texte en <g xid="id2345" type="italic">italiques</g>.</seg>
         </tuv>
      </tu>
      <!-- Standalone sequence with translatable text -->
      <tu datatype="html">
         <tuv xml:lang="en-US">
            <seg>This is <ph type="image">&lt;img src="john.gif" alt="<sub 
            type="alt">John's picture</sub>"/&gt;</ph> John.</seg>
         </tuv>
         <tuv xml:lang="es">
            <seg>Este es <ph type="image">&lt;img src="juan.gif" alt="<sub 
            type="alt">foto de Juan</sub>"/&gt;</ph> Juan.</seg>
         </tuv>
      </tu>
      <!-- Standalone sequence without translatable text -->
      <tu>
         <tuv xml:lang="en">
            <seg>text displayed in <x xid="id457" type="lb" equiv-text="&#0010;"/> 
            two lines.</seg>
         </tuv>
         <tuv xml:lang="es">
            <seg>texto en <x xid="id458" type="lb" equiv-text="&#0010;"/> dos 
            lineas.</seg>
         </tuv>
      </tu>
      <!-- Notes and properties -->
      <tu tuid="90293837" creationid="jean-claude" srclang="zh-CN" segtype="phrase">
         <note>Salutations</note>
         <note>Machine translation</note>
         <prop name="mt">web translator</prop>
         <tuv xml:lang="en">
            <seg>Hello!</seg>
         </tuv>         
         <tuv o-encoding="BIG5" xml:lang="zh-CN">
            <note>Enable Unicode support for viewing this entry.</note>
            <prop name="srcCodePage">BIG5</prop>            
            <seg>你好!</seg>
         </tuv>
      </tu>
      <!-- Untranslatable text -->
      <tu o-tmf="xliff" creationdate="20060125T210600Z" changedate="20060315T130700Z" 
      creationid="ted@mail.com">
         <tuv xml:lang="en" xml:space="default">
            <seg><hi type="protected" comment="product name">Ultrabalancer</hi> 
            support is excellent.</seg>
         </tuv>
         <tuv xml:lang="es" xml:space="default">
            <seg>El soporte de <hi type="protected">Ultrabalancer</hi> es 
            excelente.</seg>
         </tuv>
      </tu>
      <!-- grouped segments -->
      <tu group="numbers" g-order="1" datatype="plaintext" 
      creationdate="20060125T210600Z">
         <tuv xml:lang="fr">
            <seg>un</seg>
         </tuv>
         <tuv xml:lang="de">
            <seg>eine</seg>
         </tuv>
         <tuv xml:lang="en">
            <seg>one</seg>
         </tuv>
      </tu>
      <tu group="numbers" g-order="2" datatype="plaintext">
         <tuv xml:lang="de">
            <seg>zwei</seg>
         </tuv>
         <tuv xml:lang="fr">
            <seg>deux</seg>
         </tuv>
         <tuv xml:lang="en">
            <seg>two</seg>
         </tuv>
      </tu>
      <tu group="numbers" g-order="3" datatype="plaintext">
         <tuv xml:lang="en">
            <seg>three</seg>
         </tuv>
         <tuv xml:lang="de">
            <seg>drei</seg>
         </tuv>
         <tuv xml:lang="fr">
            <seg>trois</seg>
         </tuv>
      </tu>
      <!-- Foreign elements -->
      <xyz:database>main server</xyz:database>
      <xyz:purpose>general</xyz:purpose>
   </body>
</tmx>                                                   
               


B. XML Schema for TMX

The XML Schma for TMX is available at: http://www.lisa.org/tmx/tmx20.xsd.

						
<?xml version="1.0" encoding="UTF-8"?>
<!--
  Document        : tmx20.xsd
  Version         : 1.0
  Created on      : December 2, 2006
  Author          : rmraya@heartsome.net
  Description     : This XML Schema defines the structure of TMX 2.0
  Status          : Preliminary draft
  
  Copyright © The Localisation Industry Standards Association [LISA] 1997-2007. 
  All Rights Reserved.
-->
<xs:schema xmlns:tmx="http://www.lisa.org/tmx20"
    targetNamespace="http://www.lisa.org/tmx20" xml:lang="en"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
    <xs:import namespace="http://www.w3.org/XML/1998/namespace"
        schemaLocation="http://www.w3.org/2001/xml.xsd"/>
    <!--
    ================================================== 
     Restrictions
    ================================================== 
    -->    
    <!-- Restrictions for segtype attribute -->
    <xs:simpleType name="segtypes">
        <xs:restriction base="xs:token">
            <xs:enumeration value="block"/>
            <xs:enumeration value="paragraph"/>
            <xs:enumeration value="sentence"/>
            <xs:enumeration value="phrase"/>
        </xs:restriction>
    </xs:simpleType>
    <!-- Restrictions for xml:space attribute -->
    <xs:simpleType name="space">
        <xs:restriction base="xs:token">
            <xs:enumeration value="default"/>
            <xs:enumeration value="preserve"/>
        </xs:restriction>
    </xs:simpleType>
    <!-- Restrictions for assoc attribute -->
    <xs:simpleType name="assoc_type">
        <xs:restriction base="xs:token">
            <xs:enumeration value="p"/>
            <xs:enumeration value="f"/>
            <xs:enumeration value="b"/>            
        </xs:restriction>
    </xs:simpleType>
    <!-- Restrictions for datatype attribute -->
    <xs:simpleType name="datatype">
        <xs:restriction base="xs:token">
            <xs:enumeration value="unknown"/>
            <xs:enumeration value="undefined"/>
            <xs:enumeration value="alptext"/>
            <xs:enumeration value="cdf"/>
            <xs:enumeration value="cmx"/>
            <xs:enumeration value="cpp"/>
            <xs:enumeration value="hptag"/>
            <xs:enumeration value="html"/>
            <xs:enumeration value="interleaf"/>
            <xs:enumeration value="ipf"/>
            <xs:enumeration value="java"/>
            <xs:enumeration value="javascript"/>
            <xs:enumeration value="lisp"/>
            <xs:enumeration value="mif"/>
            <xs:enumeration value="opentag"/>
            <xs:enumeration value="pascal"/>
            <xs:enumeration value="plaintext"/>
            <xs:enumeration value="pm"/>
            <xs:enumeration value="resx"/>
            <xs:enumeration value="rtf"/>
            <xs:enumeration value="sgml"/>
            <xs:enumeration value="stf-f"/>
            <xs:enumeration value="stf-i"/>
            <xs:enumeration value="transit"/>
            <xs:enumeration value="vbscript"/>
            <xs:enumeration value="winres"/>
            <xs:enumeration value="xliff"/>
            <xs:enumeration value="xml"/>
            <xs:enumeration value="xptag"/>
        </xs:restriction>
    </xs:simpleType>
    <!-- Restrictions for type attribute when used in <bpt> or <g> -->
    <xs:simpleType name="paired_type">
        <xs:restriction base="xs:token">
            <xs:enumeration value="bold"/>
            <xs:enumeration value="color"/>
            <xs:enumeration value="dulined"/>
            <xs:enumeration value="font"/>
            <xs:enumeration value="italic"/>
            <xs:enumeration value="link"/>
            <xs:enumeration value="scap"/>
            <xs:enumeration value="struct"/>
            <xs:enumeration value="ulined"/>
            <xs:enumeration value="xliff-bpt"/>
            <xs:enumeration value="xliff-g"/>
        </xs:restriction>
    </xs:simpleType>
    <!-- Restrictions for type attribute when used in <ph> or <x/> -->
    <xs:simpleType name="placeholder_type">
        <xs:restriction base="xs:token">
            <xs:enumeration value="index"/>
            <xs:enumeration value="date"/>
            <xs:enumeration value="time"/>
            <xs:enumeration value="fnote"/>
            <xs:enumeration value="enote"/>
            <xs:enumeration value="alt"/>
            <xs:enumeration value="image"/>
            <xs:enumeration value="pb"/>
            <xs:enumeration value="lb"/>
            <xs:enumeration value="cb"/>
            <xs:enumeration value="inset"/>
            <xs:enumeration value="xliff-bx"/>
            <xs:enumeration value="xliff-ex"/>
            <xs:enumeration value="xliff-it"/>
            <xs:enumeration value="xliff-ph"/>
            <xs:enumeration value="xliff-x"/>
        </xs:restriction>
    </xs:simpleType>
    <!-- Restrictions for type attribute when used in <hi> -->
    <xs:simpleType name="term_type">
        <xs:restriction base="xs:token">
            <xs:enumeration value="abbrev"/>
            <xs:enumeration value="abbreviated-form"/>
            <xs:enumeration value="abbreviation"/>
            <xs:enumeration value="acronym"/>
            <xs:enumeration value="appellation"/>
            <xs:enumeration value="collocation"/>
            <xs:enumeration value="common-name"/>
            <xs:enumeration value="datetime"/>
            <xs:enumeration value="equation"/>
            <xs:enumeration value="expanded-form"/>
            <xs:enumeration value="formula"/>
            <xs:enumeration value="head-term"/>
            <xs:enumeration value="initialism"/>
            <xs:enumeration value="international-scientific-term"/>
            <xs:enumeration value="internationalism"/>
            <xs:enumeration value="logical-expression"/>
            <xs:enumeration value="materials-management-unit"/>
            <xs:enumeration value="name"/>
            <xs:enumeration value="near-synonym"/>
            <xs:enumeration value="part-number"/>
            <xs:enumeration value="phrase"/>
            <xs:enumeration value="phraseological-unit"/>
            <xs:enumeration value="protected"/>
            <xs:enumeration value="romanized-form"/>
            <xs:enumeration value="set-phrase"/>
            <xs:enumeration value="short-form"/>
            <xs:enumeration value="sku"/>
            <xs:enumeration value="standard-text"/>
            <xs:enumeration value="symbol"/>
            <xs:enumeration value="synonym"/>
            <xs:enumeration value="synonymous-phrase"/>
            <xs:enumeration value="term"/>
            <xs:enumeration value="transcribed-form"/>
            <xs:enumeration value="transliterated-form"/>
            <xs:enumeration value="truncated-term"/>
            <xs:enumeration value="variant"/>
        </xs:restriction>
    </xs:simpleType>
    <!-- Restrictions for context-type attribute -->
    <xs:simpleType name="context_type">
        <xs:restriction base="xs:token">
            <xs:enumeration value="database"/>
            <xs:enumeration value="element"/>
            <xs:enumeration value="elementtitle"/>
            <xs:enumeration value="linenumber"/>
            <xs:enumeration value="numparams"/>
            <xs:enumeration value="paramnotes"/>
            <xs:enumeration value="record"/>
            <xs:enumeration value="recordtitle"/>
            <xs:enumeration value="sourcefile"/>
        </xs:restriction>
    </xs:simpleType>
    <!-- Restrictions for user-defined attribute values -->
    <xs:simpleType name="Custom">
        <xs:restriction base="xs:string">
            <xs:pattern value="x-[^\s]+"/>
        </xs:restriction>
    </xs:simpleType>    
    <!--
    ================================================== 
    Structural Elements     
    ================================================== 
    -->
    <!-- Base Document Element -->
    <xs:element name="tmx">
        <xs:complexType>
            <xs:sequence>
                <xs:element ref="tmx:header"/>
                <xs:element ref="tmx:body"/>
            </xs:sequence>
            <xs:attribute name="version" use="required">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:enumeration value="2.0"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:attribute>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!-- Body -->
    <xs:element name="body">
        <xs:complexType>
            <xs:sequence>
                <xs:element minOccurs="0" maxOccurs="unbounded" ref="tmx:tu"/>
                <xs:any maxOccurs="unbounded" minOccurs="0" namespace="##other" processContents="lax"/>                
            </xs:sequence>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!-- Context Information -->
    <xs:element name="context">
        <xs:complexType mixed="true">
            <xs:attribute name="context-type" use="required">
                <xs:simpleType>
                    <xs:union memberTypes="tmx:context_type tmx:Custom"/>
                </xs:simpleType>
            </xs:attribute>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!-- External File -->
    <xs:element name="external-file">
        <xs:complexType>
            <xs:attribute name="href" use="required"/>
            <xs:attribute name="crc"/>
            <xs:attribute name="uid"/>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!-- Header -->
    <xs:element name="header">
        <xs:complexType>
            <xs:sequence>
                <xs:choice minOccurs="0" maxOccurs="unbounded">
                    <xs:element ref="tmx:note"/>
                    <xs:element ref="tmx:prop"/>
                </xs:choice>
                <xs:element minOccurs="0" ref="tmx:inline-data"/>
                <xs:element minOccurs="0" ref="tmx:segmentation"/>
                <xs:any maxOccurs="unbounded" minOccurs="0" namespace="##other"
                    processContents="lax"/>
            </xs:sequence>
            
            <xs:attribute name="creationtool" use="required"/>
            <xs:attribute name="creationtoolversion" use="required"/>
            <xs:attribute name="segtype" use="required" type="tmx:segtypes"/>
            <xs:attribute name="o-tmf" use="required"/>
            <xs:attribute name="adminlang" use="required"/>
            <xs:attribute name="srclang" use="required"/>
            <xs:attribute name="datatype" use="required">
                <xs:simpleType>
                    <xs:union memberTypes="tmx:datatype tmx:Custom"/>
                </xs:simpleType>
            </xs:attribute>
            <xs:attribute name="o-encoding"/>
            <xs:attribute name="creationdate"/>
            <xs:attribute name="creationid"/>
            <xs:attribute name="changedate"/>
            <xs:attribute name="changeid"/>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!-- Inline Data -->
    <xs:element name="inline-data">
        <xs:complexType>
            <xs:sequence>
                <xs:element maxOccurs="unbounded" ref="tmx:tag"/>
            </xs:sequence>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!-- Internal File -->
    <xs:element name="internal-file">
        <xs:complexType mixed="true">
            <xs:sequence>
                <xs:any maxOccurs="unbounded" minOccurs="0" namespace="##other"
                    processContents="lax"/>                
            </xs:sequence>
            <xs:anyAttribute namespace="##any" processContents="lax"/>            
        </xs:complexType>
    </xs:element>
    <!-- Note -->
    <xs:element name="note">
        <xs:complexType mixed="true">
            <xs:attribute name="o-encoding"/>
            <xs:attribute ref="xml:lang"/>
            <xs:attribute name="creationdate"/>
            <xs:attribute name="creationid"/>
            <xs:attribute name="changedate"/>
            <xs:attribute name="changeid"/>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!-- Property -->
    <xs:element name="prop">
        <xs:complexType mixed="true">
            <xs:attribute name="name" use="required"/>
            <xs:attribute ref="xml:lang"/>
            <xs:attribute name="o-encoding"/>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!-- Segment -->
    <xs:element name="seg">
        <xs:complexType mixed="true">
            <xs:choice minOccurs="0" maxOccurs="unbounded">
                <xs:element ref="tmx:bpt"/>
                <xs:element ref="tmx:ept"/>
                <xs:element ref="tmx:ph"/>
                <xs:element ref="tmx:hi"/>
                <xs:element ref="tmx:x"/>
                <xs:element ref="tmx:g"/>
            </xs:choice>
            <xs:attribute ref="xml:space" default="default"/>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!-- Segmentation -->
    <xs:element name="segmentation">
        <xs:complexType>
            <xs:choice>
                <xs:element ref="tmx:internal-file"/>
                <xs:element ref="tmx:external-file"/>
            </xs:choice>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!-- Tag -->
    <xs:element name="tag">
        <xs:complexType mixed="true">
            <xs:attribute name="id" use="required" type="xs:ID"/>
            <xs:attribute name="endmrk"/>
            <xs:attribute name="type" use="required">
                <xs:simpleType>
                    <xs:union memberTypes="tmx:paired_type tmx:placeholder_type tmx:Custom"/>
                </xs:simpleType>
            </xs:attribute>
            <xs:attribute name="o-encoding"/>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!-- Translation Unit -->
    <xs:element name="tu">
        <xs:complexType>
            <xs:sequence>
                <xs:choice minOccurs="0" maxOccurs="unbounded">
                    <xs:element ref="tmx:note"/>
                    <xs:element ref="tmx:prop"/>
                    <xs:element ref="tmx:context"/>
                </xs:choice>
                <xs:element ref="tmx:tuv" minOccurs="2" maxOccurs="unbounded"/>
                <xs:any maxOccurs="unbounded" minOccurs="0" namespace="##other"
                    processContents="lax"/>
            </xs:sequence>
            <xs:attribute name="tuid"/>
            <xs:attribute name="o-encoding"/>
            <xs:attribute name="datatype" default="unknown" type="tmx:datatype"/>
            <xs:attribute name="usagecount"/>
            <xs:attribute name="lastusagedate"/>
            <xs:attribute name="creationtool"/>
            <xs:attribute name="creationtoolversion"/>
            <xs:attribute name="creationdate"/>
            <xs:attribute name="creationid"/>
            <xs:attribute name="changedate"/>
            <xs:attribute name="segtype" type="tmx:segtypes"/>
            <xs:attribute name="changeid"/>
            <xs:attribute name="o-tmf"/>
            <xs:attribute name="srclang"/>
            <xs:attribute name="group"/>
            <xs:attribute name="g-order">
                <xs:simpleType>
                    <xs:restriction base="xs:integer">
                        <xs:minInclusive value="1"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:attribute>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!-- Translation Unit Variant -->
    <xs:element name="tuv">
        <xs:complexType>
            <xs:sequence>
                <xs:choice minOccurs="0" maxOccurs="unbounded">
                    <xs:element ref="tmx:note"/>
                    <xs:element ref="tmx:prop"/>
                </xs:choice>
                <xs:element ref="tmx:seg"/>
                <xs:any maxOccurs="unbounded" minOccurs="0" namespace="##other" processContents="lax"/>
            </xs:sequence>
            <xs:attribute ref="xml:lang" use="required"/>
            <xs:attribute name="o-encoding"/>
            <xs:attribute name="datatype" default="unknown" type="tmx:datatype"/>
            <xs:attribute name="usagecount"/>
            <xs:attribute name="lastusagedate"/>
            <xs:attribute name="creationtool"/>
            <xs:attribute name="creationtoolversion"/>
            <xs:attribute name="creationdate"/>
            <xs:attribute name="creationid"/>
            <xs:attribute name="changedate"/>
            <xs:attribute name="o-tmf"/>
            <xs:attribute name="changeid"/>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!--
    ================================================== 
     Content Markup 
    ================================================== 
    -->
    <!-- Begin Paired Tag -->
    <xs:element name="bpt">
        <xs:complexType mixed="true">
            <xs:sequence>
                <xs:element minOccurs="0" maxOccurs="unbounded" ref="tmx:sub"/>
            </xs:sequence>
            <xs:attribute name="i" use="required">
                <xs:simpleType>
                    <xs:restriction base="xs:integer">
                        <xs:minInclusive value="1"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:attribute>
            <xs:attribute name="x">
                <xs:simpleType>
                    <xs:restriction base="xs:integer">
                        <xs:minInclusive value="1"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:attribute>
            <xs:attribute name="equiv-text"/>
            <xs:attribute name="type" use="required">
                <xs:simpleType>
                    <xs:union memberTypes="tmx:paired_type tmx:Custom"/>
                </xs:simpleType>
            </xs:attribute>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!-- End Paired Tag -->
    <xs:element name="ept">
        <xs:complexType mixed="true">
            <xs:sequence>
                <xs:element minOccurs="0" maxOccurs="unbounded" ref="tmx:sub"/>
            </xs:sequence>
            <xs:attribute name="i" use="required">
                <xs:simpleType>
                    <xs:restriction base="xs:integer">
                        <xs:minInclusive value="1"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:attribute>
            <xs:attribute name="equiv-text"/>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!-- Generic Group Placeholder -->
    <xs:element name="g">
        <xs:complexType mixed="true">
            <xs:attribute name="xid" use="required" type="xs:IDREF"/>
            <xs:attribute name="type" use="required">
                <xs:simpleType>
                    <xs:union memberTypes="tmx:paired_type tmx:Custom"/>
                </xs:simpleType>
            </xs:attribute>
            <xs:attribute name="equiv-text"/>
            <xs:attribute name="x">
                <xs:simpleType>
                    <xs:restriction base="xs:integer">
                        <xs:minInclusive value="1"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:attribute>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!-- Highlight -->
    <xs:element name="hi">
        <xs:complexType mixed="true">
            <xs:choice minOccurs="0" maxOccurs="unbounded">
                <xs:element ref="tmx:bpt"/>
                <xs:element ref="tmx:ept"/>
                <xs:element ref="tmx:ph"/>
                <xs:element ref="tmx:x"/>
                <xs:element ref="tmx:g"/>
                <xs:element ref="tmx:hi"/>
            </xs:choice>
            <xs:attribute name="x">
                <xs:simpleType>
                    <xs:restriction base="xs:integer">
                        <xs:minInclusive value="1"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:attribute>
            <xs:attribute name="type" use="required">
                <xs:simpleType>
                    <xs:union memberTypes="tmx:term_type tmx:Custom"/>
                </xs:simpleType>
            </xs:attribute>
            <xs:attribute name="comment"/>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!-- Placeholder -->
    <xs:element name="ph">
        <xs:complexType mixed="true">
            <xs:sequence>
                <xs:element minOccurs="1" maxOccurs="unbounded" ref="tmx:sub"/>
            </xs:sequence>
            <xs:attribute name="x">
                <xs:simpleType>
                    <xs:restriction base="xs:integer">
                        <xs:minInclusive value="1"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:attribute>
            <xs:attribute name="assoc"  type="tmx:assoc_type"/>
            <xs:attribute name="equiv-text"/>
            <xs:attribute name="type" use="required">
                <xs:simpleType>
                    <xs:union memberTypes="tmx:placeholder_type tmx:Custom"/>
                </xs:simpleType>
            </xs:attribute>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!-- Subflow -->
    <xs:element name="sub">
        <xs:complexType mixed="true">
            <xs:choice minOccurs="0" maxOccurs="unbounded">
                <xs:element ref="tmx:bpt"/>
                <xs:element ref="tmx:ept"/>
                <xs:element ref="tmx:ph"/>
                <xs:element ref="tmx:hi"/>
                <xs:element ref="tmx:g"/>
                <xs:element ref="tmx:x"/>
            </xs:choice>
            <xs:attribute name="datatype" default="unknown" type="tmx:datatype"/>
            <xs:attribute name="type" use="required">
                <xs:simpleType>
                    <xs:union memberTypes="tmx:paired_type tmx:placeholder_type tmx:term_type tmx:Custom"/>
                </xs:simpleType>
            </xs:attribute>
            <xs:attribute name="x">
                <xs:simpleType>
                    <xs:restriction base="xs:integer">
                        <xs:minInclusive value="1"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:attribute>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
    <!--  Generic Placeholder -->
    <xs:element name="x">
        <xs:complexType>
            <xs:attribute name="xid" use="required" type="xs:IDREF"/>
            <xs:attribute name="equiv-text"/>
            <xs:attribute name="type" use="required">
                <xs:simpleType>
                    <xs:union memberTypes="tmx:placeholder_type tmx:Custom"/>
                </xs:simpleType>
            </xs:attribute>
            <xs:attribute name="x">
                <xs:simpleType>
                    <xs:restriction base="xs:integer">
                        <xs:minInclusive value="1"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:attribute>
            <xs:anyAttribute namespace="##any" processContents="lax"/>
        </xs:complexType>
    </xs:element>
</xs:schema>
<!-- End -->
																
					


C. Glossary

DTD

An SGML document has an associated Document Type Definition (DTD) that specifies the rules for the structure of the document. Several industries have standardized on various DTDs for the different types of documents that they share.

OSCAR

LISA special interest group (Open Standards for Container/Content Allowing Re-use).

SGML

SGML stands for Standard Generalized Markup Language. An ISO standard (ISO-8879) allows the definition of structured formats. SGML is not a format by itself, but a set of rules to define formats. SGML mark-up systems are defined in Document Type Definition files ( DTDs ).

UTC

UTC stands for Coordinated Universal Time.

XML

XML stands for Extensible Markup Language. XML is a simplified and restricted subset of SGML.


D. References

Normative
[IANA Charsets]

IANA Names for Character Sets. IANA (Internet Assigned Numbers Authority), Aug 2001

[ISO 8601]

Representation of dates and times. ISO (International Organization for Standardization), Dec 2000.

[RFC 4646]

RFC 4646 Tags for the Identification of Languages. IETF (Internet Engineering Task Force), September 2006. This document, in combination with RFC 4647, replaces RFC 3066, which replaced RFC 1766.

[SRX 2.0]

Segmentation Rules Exchange (SRX) is an XML-based standard for description of the ways in which translation and other language-processing tools segment text for processing.

[XML 1.0]

Extensible Markup Language (XML) 1.0 Second Edition. W3C (World Wide Web Consortium), Oct 2000.

[XML Namespaces]

Namespaces in XML. W3C (World Wide Web Consortium), August 2006.


Non-Normative
[ISO]

International Organization for Standardization Web site.

[LISA]

Localisation Industry Standards Association Web site.

[Unicode]

Unicode Consortium Web site.

[W3C]

World Wide Web Consortium Web site.