TMX 2.0 Specification Draft

OSCAR Working Draft - October 15, 2007

lisasig.gif

This version:

http://www.heartsome.org/tmx/tmx-03282007.html

Latest version:

http://www.heartsome.org/tmx/tmx.html

Previous version:

http://www.lisa.org/standards/tmx/tmx.html

Editor:

Rodolfo M. Raya <rmraya@heartsome.net>

Previous Editors:

Yves Savourel
Alan K. Melby

Copyright © The Localisation Industry Standards Association [ LISA ] 1997-2007. All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to LISA.

The limited permissions granted above are perpetual and will not be revoked by LISA or its successors or assigns.

This document and the information contained herein is provided on an "AS IS" basis and LISA DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Abstract

This document defines version 2.0 of the Translation Memory eXchange format (TMX). The purpose of the TMX format is to provide a standard method to describe translation memory data that is being exchanged among tools and/or translation vendors, while introducing little or no loss of critical data during the process.

Status of this Document

This document constitutes an initial draft for discussion. Comments may be sent to tmx2@lisa.org.

Table of Contents


Abstract

1. Introduction
1.1. XML Compliance
1.2. Character Encoding
1.3 Extensibility
1.3.1. Extension Points

2. General Structure
2.1. Header
2.2. Body

3. Detailed Specifications
3.1. Elements
3.1.1. Structural Elements
3.1.2. Inline Elements
3.2. Attributes
3.2.1. TMX Attributes
3.2.2. XML Namespace Attributes

4. Content Markup
4.1. Overview
4.2. Selection Rules for Inline Elements

5. TMX Compliance
5.1 Validation of TMX Files

6. Changes Since Previous Version (Non-Normative)
6.1 Backwards Compatibility

Appendices
A. Sample Document
B. XML Schema for TMX
C. Glossary
D. References
Normative
Non-Normative


1. Introduction

TMX is defined in two parts:

  • A specification of the format of the container (the higher-level elements that provide information about the file as a whole and about entries). In TMX, an entry consisting of aligned segments of text in two or more languages is called a Translation Unit (the <tu> element).

  • A specification of a low-level meta-markup format for the content of a segment of translation-memory text. In TMX, an individual segment of translation-memory text in a particular language is denoted by a <seg> element. See the section on Content Markup for more details.

1.1. XML Compliance

TMX is XML-compliant. The TMX vocabulary is defined using an XML Schema (see Appendix B) It also uses various third party standards for date/time and language codes. See the References section for more details.

TMX files are intended to be created automatically by export routines and processed automatically by import routines. TMX files are "well-formed" XML documents that can be processed without explicit reference to the TMX Schema. However, a "valid" TMX file must conform to the TMX Schema, and any suspicious TMX file should be verified against the TMX Schema using a validating XML parser.

Since XML syntax is case sensitive, any XML application must define casing conventions. All elements and attributes names of TMX are defined in lowercase.

The namespace for TMX 2.0 is defined as "http://www.lisa.org/tmx20". For example, if you want to use TMX in another XML document you document would look like this:

<?xml version="1.0"?>
<myformat xmlns:tmx="http://www.lisa.org/tmx20">
<data>
  <tmx:tmx version="2.0">
    <tmx:header ...
       ... TMX data ... 
    </tmx:body>      				
  </tmx:tmx>
</data>
</myformat>
		

1.2. Character Encoding

TMX files are always in Unicode. They can use either of three encoding methods: UTF-16 (16-bit files), UTF-8 (8-bit files) or ISO-646 [a.k.a. US-ASCII] (7-bit files).

In all cases, unlike in HTML, only the following five character entity references are allowed: &amp; (&), &lt; (<), &gt; (>), &apos; ('), and &quot; ("). For 7-bit files, extended (non-ASCII) characters are always represented by numeric character references. For example: &#x0396; or &#918; for a GREEK CAPITAL LETTER DELTA.

Since all XML processors must accept the UTF-8 and UTF-16 encodings and since US-ASCII is a subset of UTF-8, a TMX document can omit the encoding declaration in the XML declaration.

Note that UTF-16 files always start with the Unicode byte-order-mark (BOM) value: U+FEFF.

1.3 Extensibility

Although TMX provides a rich set of elements for exchanging Translation Memory data, sometimes it may be necessary to extend TMX vocabulary using XML Namespaces.

You can add non-TMX elements, as well as attributes and attribute values, to any TMX document. All foreign elements and attributes added to a TMX file must be defined using an XML Schema. All XML Schemas declared in a TMX document must be made available to permit validation of the foreign constructs included in the file.

Although TMX offers this extensibility mechanism, in order to avoid a nimiety of information and increase interoperability between tools, it is strongly recommended to use TMX capabilities whenever possible, rather than to create non-standard user-defined elements or attributes.

Applications that depend on TMX format for exchanging Translation Memory data are not required to understand and support non-TMX elements or attributes. A TMX application can safely ignore foreign elements or attributes present in a TMX document.

1.3.1. Extension Points

TMX supports the use of foreign XML elements in the following elements: <body>, <header>, <internal-file>, <tu> and <tuv>.

Foreign attributes can be added to any TMX element, provided that the attribute name is fully qualified with the corresponding namespace prefix.


2. General Structure

A TMX document is enclosed in a <tmx> root element. The <tmx> element contains two elements: <header> and <body>.

2.1. Header

The <header> contains meta-data about the document. In addition to its attributes, <header> can also store document-level information in <note> and <prop> elements. The SRX segmentation rules used to generate a TMX file can be included in the <header> using a <segmentation> element. Inline codes extracted from <seg> elements and replaced with <g> or <x/> elements are stored in the <header> inside an <inline-data> element.

2.2. Body

The <body> contains the collection of translation units (the <tu> elements). This collection is in no specific order.

Each <tu> element contains at least two translation unit variants (the <tuv> element). Each <tuv> contains the segment and the information pertaining to that segment for a given language.

The text itself is stored in the <seg> element, while <note> and <prop> allow you to store information specific to each <tuv>.

A segment can contain markup content elements: The <bpt>, <ept>, <g>, <ph> and <x/> elements allow you to encapsulate or replace original native inline codes. The <hi> element allows you to add extra markup not related to existing inline codes. And the <sub> element, used inside encapsulated inline code, allows you to delimit embedded translatable text.

See the Sample Document section for an example of TMX document.


3. Detailed Specifications

3.1. Elements

TMX elements are divided into two main categories: the structural elements (the container), and the inline elements (the content markup).

3.1.1. Structural Elements

The structural elements are the following:


<body>

Body - The <body> element encloses the main data, the set of <tu> elements that are comprised within the file.

Required attributes:

None.

Optional attributes:

None.

Contents:

Zero, one or more <tu> elements, followed by
Zero, one or more non-TMX elements.


<context>

Context Information - The <context> element describes the context of a <tu>. The purpose of this context information is to allow certain pieces of text to have different translations depending on where they came from. The translation of a piece of text may differ if it is a web form or a dialog or an Oracle form or a Lotus form for example. This information is thus required by a translator when working on the file. Likewise, the information may be used by any tool proposing to automatically leverage the text successfully.

Required attributes:

context-type.

Optional attributes:

None.

Contents:

Text.


<external-file>

External file - The <external-file> element specifies the location of the actual SRX file being referenced. The required href attribute provides a URL to the file. The crc attribute accepts a value that can be used to assure the integrity of the file. The optional uid attribute allows a unique ID to be assigned to the file.

Required attributes:

href.

Optional attributes:

crc, uid.

Contents:

Empty.


File header - The <header> element contains information pertaining to the whole document.

Required attributes:

creationtool, creationtoolversion, segtype, o-tmf, adminlang, srclang, datatype.

Optional attributes:

o-encoding, creationdate, creationid, changedate, changeid.

Contents:

Zero, one or more <note> or <prop> elements in any order, followed by
Zero or one <inline-data> element, followed by
Zero or one <segmentation> element, followed by
Zero, one or more non-TMX elements.


<inline-data>

Inline data - The <inline-data> element holds the elements with the information necessary to rebuild the inline tags in a translated document.

Required attributes:

None.

Optional attributes:

None.

Contents:

One or more <tag> elements.


<internal-file>

Internal file - The <internal-file> element contains the actual SRX file with the segmentation rules used when generating the TMX document.

Required attributes:

None.

Optional attributes:

None.

Contents:

One SRX file embedded using SRX namespace.


<note>

Note - The <note> element is used for comments.

Required attributes:

None.

Optional attributes:

creationdate, creationid, changedate, changeid, o-encoding, xml:lang.

Contents:

Text.


<prop>

Property - The <prop> element is used to define the various properties of the parent element (or of the document when <prop> is used in the <header> element). These properties are not defined by the standard.

As your tool is fully responsible for handling the content of a <prop> element, you can use it in any way you wish. For example the content can be a list of instructions your tool can parse, not only a simple text.

<prop name="user-defined">name:domain value:Computer science</prop>
<prop name="x-domain">Computer science</prop>

The <prop> element may be deprecated in future versions of TMX standard. Use attributes defined in a namespace different from TMX instead. See the Extensibility section for more information.

Required attributes:

name.

Optional attributes:

xml:lang, o-encoding.

Contents:

Tool-specific data or text.


<seg>

Segment - The <seg> element contains the text of the given segment. There is no length limitation to the content of a <seg> element. All spacing and line-breaking characters are significant within a <seg> element.

Required attributes:

None.

Optional attributes:

xml:space.

Contents:

Text data (without leading or trailing white spaces characters),
Zero, one or more of the following elements: <bpt>, <ept>, <g>, <hi>, <ph> and <x/>.
They can be in any order, except that each <bpt> element must have a subsequent corresponding <ept> element.


<segmentation>

Segmentation - The <segmentation> element points to or contains the SRX segmentation rules that were used in the generation of the TMX file.

Required attributes:

None.

Optional attributes:

None.

Contents:

Either exactly one <internal-file> or one <external-file> element.


<tag>

Tag - The <tag> element contains the actual inline information represented with <g> and <x/> in <seg> elements.

Required attributes:

id, type.

Optional attributes:

endmrk, o-encoding.

Contents:

Code data.


<tmx>

TMX document - The <tmx> element encloses all the other elements of the document.

Required attributes:

version.

Contents:

One <header> followed by
One <body> element.


<tu>

Translation unit - The <tu> element contains the data for a given translation unit.

Required attributes:

None.

Optional attributes:

tuid, o-encoding, datatype, usagecount, lastusagedate, creationtool, creationtoolversion, creationdate, creationid, changedate, segtype, changeid, o-tmf, srclang, group, g-order.

Contents:

Zero, one or more <note>, <prop> or <context> elements in any order, followed by
Two or more <tuv> elements, followed by
Zero, one or more non-TMX elements.


<tuv>

Translation Unit Variant - The <tuv> element specifies text in a given language.

Required attributes:

xml:lang.

Optional attributes:

o-encoding, datatype, usagecount, lastusagedate, creationtool, creationtoolversion, creationdate, creationid, changedate, changeid, o-tmf, xml:space.

Contents:

Zero, one or more <note>, or <prop> elements in any order, followed by
One <seg> element, followed by
Zero, one or more non-TMX elements.


3.1.2. Inline Elements

The inline elements are the elements that can appear inside the a segment. With the exception of the <hi> and <sub> element, they all enclose or replace any formatting or control codes that is not text but resides within the segment. See also the Content Markup section for more information.

The inline elements are the following:


<bpt>

Begin paired tag - The <bpt> element is used to delimit the beginning of a paired sequence of native codes. Each <bpt> has a corresponding <ept> element within the segment. A <btp> element must contain a <sub> element if the matching <ept> does not contain one.

Required attributes:

i, type.

Optional attributes:

equiv-text, x.

Contents:

Code data,
Zero, one or more <sub> elements.


<ept>

End paired tag - The <ept> element is used to delimit the end of a paired sequence of native codes. Each <ept> has a corresponding <bpt> element within the segment. A <btp> element must contain a <sub> element if the matching <bpt> does not contain one.

Required attributes:

i.

Optional attributes:

equiv-text.

Contents:

Code data,
Zero, one or more <sub> elements.


<g>

Generic group placeholder - The <g> element is used to replace any inline code of the original document that has a beginning and an end, does not overlap other paired inline codes and can be moved within its parent structural element. The actual inline data is stored in <tag> elements in the header of the file. The required xid attribute is used to reference the <tag> element that contains the replaced code.

Required attributes

xid, type.

Optional attributes:

equiv-text, x.

Contents:

Text data.


<hi>

Highlight - The <hi> element delimits a section of text that has special meaning, such as a terminological unit, a proper name, an item that should not be modified, etc. It can be used for various processing tasks. For example, to indicate to a Machine Translation tool proper names that should not be translated; for terminology verification, to mark suspect expressions after a grammar checking.

Required attributes:

type.

Optional attributes:

x, comment.

Contents:

Text data,
Zero, one or more of the following elements: <bpt>, <ept>, <g>, <hi>, <ph> and <x/>.
They can be in any order, except that each <bpt> element must have a subsequent corresponding <ept> element.


<ph>

Placeholder - The <ph> element is used to delimit a sequence of native standalone codes in the segment, or the initial or ending portion of a paired tag that does not have its matching code within the segment, that contains embedded translatable text.

Required attributes:

type.

Optional attributes:

x, assoc, equiv-text.

Contents:

Code data,
One or more <sub> elements.


<sub>

Sub-flow - The <sub> element is used to delimit sub-flow text inside a sequence of native code, for example: the definition of a footnote or the text of title in a HTML anchor element.

Here are some examples (translatable text underlined, sub-flow is bolded):

Footnote in RTF:

Original RTF:

Elephants{\cs16\super \chftn {\footnote \pard\plain \s15\widctlpar \f4\fs20
{\cs16\super \chftn } An elephant is a very large animal. }} are big.

TMX with content mark-up:

Elephants<ph type="fnote">{\cs16\super \chftn {\footnote \pard\plain \s15\widctlpar \f4\fs20
{\cs16\super \chftn } <sub type="fnote">An elephant is a very large animal. </sub>}}</ph> are big.

Index marker in RTF:

Original RTF:

Elephants{\pard\plain \widctlpar
\v\f4\fs20 {\xe {Big animal\bxe }}} are big.

TMX with content mark-up:

Elephants<ph type="index">{\pard\plain \widctlpar
\v\f4\fs20 {\xe {<sub type="index">Big animal </sub>\bxe }}}</ph> are big.

Text of an attribute in a HTML element:

Original HTML:

See the <A TITLE="Go to Notes "
HREF="notes.htm">Notes</A> for more details.

TMX with content mark-up:

See the <bpt i="1" type="link">&lt;A TITLE="<sub type="link">Go to Notes </sub>"
HREF="notes.htm"></bpt>Notes<ept i="1">&lt;/A></ept> for more details.

Note that sub-flow are related to segmentation and can cause interoperability issues when one tool uses sub-flow within its main segment, while another extract the sub-flow text as an independent segment.

Required attributes:

type.

Optional attributes:

datatype.

Contents:

Text data,
Zero, one or more of the following elements: <bpt>, <ept>, <g>, <ph>, <x/> and <hi>.
They can be in any order, except that each <bpt> element must have a subsequent corresponding <ept> element.


<x/>

Generic placeholder - The <x/> element is used to replace any inline code of the original document. The actual inline data is stored in <tag> elements in the header of the file. The required xid attribute is used to reference the <tag> element that contains the replaced code.

Required attributes:

xid, type.

Optional attributes:

equiv-text, x.

Content:

Empty.

3.2. Attributes

This section lists the various attributes used in the TMX elements.

3.2.1. TMX Attributes
adminlang

Administrative language - Specifies the default language for the administrative and informative elements <note> and <prop>.

Value description:

A language code as described in the [RFC 4646]. Unlike the other TMX attributes, the values for adminlang are not case-sensitive.

Default value:

Undefined.

Used in:

<header>.


assoc

Association - Indicates the association of a <ph> with the text prior or after.

Value description:

"p" (the element is associated with the text preceding the element), "f" (the element is associated with the text following the element), or "b" (the element is associated with the text on both sides).

Default value:

Undefined.

Used in:

<ph>.


changedate

Change date - Specifies the date of the last modification of the element.

Value description:

Date in [ISO 8601] Format. The recommended pattern to use is: YYYYMMDDThhmmssZ
Where: YYYY is the year (4 digits), MM is the month (2 digits), DD is the day (2 digits), hh is the hours (2 digits), mm is the minutes (2 digits), ss is the second (2 digits), and Z indicates the time is UTC time. For example:

date="20020125T210600Z"
is January 25, 2002 at 9:06pm GMT
is January 25, 2002 at 2:06pm US Mountain Time
is January 26, 2002 at 6:06am Japan time

Default value:

Undefined.

Used in:

<header>, <tu>, <tuv>.


changeid

Change identifier - Specifies the identifier of the user who modified the element last.

Value description:

Text.

Default value:

Undefined.

Used in:

<header>, <tu>, <tuv>.


comment

Comment - a comment in a tag

Value description:

Text.

Default value:

Undefined.

Used in:

<hi>.


context-type

Context type - The context-type attribute specifies the context and the type of resource or style of the data of a given element. For example, to define if it is a label, or a menu item in the case of resource-type data, or the style in the case of document-related data.

Value description:

Text without spaces. Pre-defined values are as follow:

database

Indicates a database content.

element

Indicates the content of an element within an XML document.

elementtitle

Indicates the name of an element within an XML document.

linenumber

Indicates the line number from the sourcefile (see context-type="sourcefile") where the source text is found.

numparams

Indicates a the number of parameters contained within the source text.

paramnotes

Indicates notes pertaining to the parameters in the source text.

record

Indicates the content of a record within a database.

recordtitle

Indicates the name of a record within a database.

sourcefile

Indicates the original source file from which the TMX file is created.

In addition, user-defined values can be used with this attribute. A user-defined value must start with an "x-" prefix.

Default value:

Undefined.

Used in:

<context>.


creationdate

Creation date - Specifies the date of creation of the element.

Value description:

Date in [ISO 8601] Format. The recommended pattern to use is: YYYYMMDDThhmmssZ
Where: YYYY is the year (4 digits), MM is the month (2 digits), DD is the day (2 digits), hh is the hours (2 digits), mm is the minutes (2 digits), ss is the second (2 digits), and Z indicates the time is UTC time. For example:

date="20020125T210600Z"
is January 25, 2002 at 9:06pm GMT
is January 25, 2002 at 2:06pm US Mountain Time
is January 26, 2002 at 6:06am Japan time

Default value:

Undefined.

Used in:

<header>, <tu>, <tuv>.


creationid

Creation identifier - Specifies the identifier of the user who created the element.

Value description:

Text.

Default value:

Undefined.

Used in:

<header>, <tu>, <tuv>.


creationtool

Creation tool - Identifies the tool that created the TMX document. Its possible values are not specified by the standard but each tool provider should publish the string identifier it uses.

Value description:

Text.

Default value:

Undefined.

Used in:

<header>, <tu>, <tuv>.


creationtoolversion

Creation tool version - Identifies the version of the tool that created the TMX document. Its possible values are not specified by the standard but each tool provider should publish the string identifier it uses.

Value description:

Text.

Default value:

Undefined.

Used in:

<header>, <tu>, <tuv>.


crc

Cyclic redundancy checking - A private value used to verify data as it is returned to the producer. The generation and verification of this number is tool-specific.

Value description:

Number (possibly not decimal).

Default value:

Undefined.

Used in:

<external-file>.


datatype

Data type - Specifies the type of data contained in the element. Depending on that type, you may apply different processes to the data.

Value description:

Text. The recommended values for the datatype attribute are as follow:

unknown

undefined (default)

alptext

WinJoust data.

cdf

Channel Definition Format.

cmx

Corel CMX Format.

cpp

C and C++ style text.

hptag

HP-Tag.

html

HTML, DHTML, etc.

interleaf

Interleaf documents.

ipf

IPF/BookMaster.

java

Java, source and property files.

javascript

JavaScript, ECMAScript scripts.

lisp

Lisp.

mif

Framemaker MIF, MML, etc.

opentag

OpenTag data.

pascal

Pascal, Delphi style text.

plaintext

Plain text.

pm

PageMaker.

resx

Windows .NET resources.

rtf

Rich Text Format.

sgml

SGML.

stf-f

S-Tagger for FrameMaker.

stf-i

S-Tagger for Interleaf.

transit

Transit data.

vbscript

Visual Basic scripts.

winres

Windows resources from RC, DLL, EXE.

xliff

XLIFF (XML Localization Interchange File Format).

xml

XML.

xptag

Quark XPressTag.

In addition, user-defined values can be used with this attribute. A user-defined value must start with an "x-" prefix.

Default value:

"unknown".

Used in:

<header>, <tu>, <tuv>, <sub>.


endmrk

End marker - The "endmrk" contains the formatting code of a closing tag replaced by a <g> element.

Value description:

Text.

Used in:

<g>.


equiv-text

Equivalent text - Indicates the equivalent text to substitute in place of an inline tag.

Value description:

Text.

Used in:

<bpt>, <ept>, <ph>, <g> and <x/>.


group

Group identifier - indicates that a given <tu> element belongs to a logical group of related translation units.

Value description:

Text without spaces.

Used in:

<tu>


g-order

Group order - defines the order of the <tu> within a given logical group. Used together with group attribute.

Value description:

Number starting in 1 and incremented in steps of 1 unit. Must be unique within each logical group defined with the group attribute. Its initial value is reset to 1 in each logical group.


i

Internal matching - The "i" attribute is used to pair the <bpt> elements with <ept> elements. This mechanism provides TMX with support to markup a possibly overlapping range of codes. Such constructions are not used often, however several formats allow them. For example, the following HTML segment, even if not strictly legal, is accepted by some HTML editors and usually interpreted correctly by the browsers.

For example:

[----------------------------]
<B>Bold <I>Bold and Italic</B> Italics</I>
        [--------------------------------]

With the TMX content mark-up, since the <ept> element does not have a type, it can be difficult to know which sequence of codes it closes as illustrated by the following segment:

TMX (with incomplete content mark-up):

<bpt> &lt;B></bpt> Bold,
<bpt> &lt;I></bpt> Bold+Italic<ept> &lt;/B></ept> ,
Italic<ept> &lt;/I></ept>

The attribute i is used to specify which <ept> is closing which <bpt>:

TMX (with correct content mark-up):

<bpt i="1" x="1" type="bold"> &lt;B></bpt> Bold,
<bpt i="2" x="1" type="italic"> &lt;I></bpt> Bold+Italic<ept i="1"> &lt;/B></ept> ,
Italic<ept i="2"> &lt;/I></ept>

Value description:

Number starting in 1 and incremented in steps of 1 unit. Must be unique for each <bpt> within a given <seg> element. Its initial value is reset to 1 in every <seg> element.

Default value:

Undefined.

Used in:

<bpt>, <ept>.


href

Hypertext reference - The "href" attribute contains a valid URL that describes the location of a file.

Value description:

Text.

Default value:

Undefined.

Used in:

<external-file>.


id

Identifier - The "id" attribute is used in <tag> elements as unique identifier. The value of the "id" attribute is determined by the tool creating the TMX file and must be unique within the document.

Value description:

Text, matching the Name production as required by the ID attribute type in XML standard.

Default value:

Undefined.

Used in:

<tag>.


lastusagedate

Last usage date - Specifies when the last time the content of a <tu> or <tuv> element was used in the original translation memory environment.

Value description:

Date in [ISO 8601] Format. The recommended pattern to use is: YYYYMMDDThhmmssZ
Where: YYYY is the year (4 digits), MM is the month (2 digits), DD is the day (2 digits), hh is the hours (2 digits), mm is the minutes (2 digits), ss is the second (2 digits), and Z indicates the time is UTC time. For example:

date="20020125T210600Z"
is January 25, 2002 at 9:06pm GMT
is January 25, 2002 at 2:06pm US Mountain Time
is January 26, 2002 at 6:06am Japan time

Default value:

Undefined.

Used in:

<tu>, <tuv>.


name

Property name - Tool specific name used to identify the type of a <prop> element.

Value description:

Text.

Default value:

Undefined.

Used in:

<prop>.


o-encoding

Original encoding - As stated in the Encoding section, all TMX documents are in Unicode. However, it is sometimes useful to know what code set was used to encode text that was converted to Unicode for purposes of interchange. The o-encoding attribute specifies the original or preferred code set of the data of the element in case it is to be re-encoded in a non-Unicode code set.

Value description:

One of the [IANA] recommended "charset identifier", if possible.

Default value:

Undefined.

Used in:

<header>, <tag>, <tu>, <tuv>, <note>, <prop>.


o-tmf

Original translation memory format - Specifies the format of the translation memory file from which the TMX document or segment thereof have been generated.

Value description:

Text.

Default value:

Undefined.

Used in:

<header>, <tu>, <tuv>.


segtype

Segment type - Specifies the kind of segmentation used in the <tu> element. If a <tu> element does not have a segtype attribute specified, it uses the one defined in the <header> element.

The "block" value is used when the segment does not correspond to one of the other values, for example when you want to store a chapter composed of several paragraphs in a single <tu>.

<tu segtype="block">
<prop type="x-sentbreak">$#$</prop>
<tuv xml:lang="en"><seg>This is the first paragraph of a big section.$#$
This is the second paragraph.$#$This is the third.</seg></tuv>
</tu>

In the example above the property "x-sentbreak" defines the token used to indicate the separation between sentences within the block of text. You can therefore easily break down the segment into smaller units if needed. You can imagine many other ways to use this mechanism.

A TMX file can include sentence level segmentation for maximum portability, so it is recommended that you use such segmentation rather than a specific, proprietary method like the one above.

The rules on how the text was segmented can be carried in a Segmentation Rules eXchange (SRX) document.

Value description:

"block", "paragraph", "sentence", or "phrase".

Default value:

Undefined.

Used in:

<header>, <tu>.


srclang

Source language - Specifies the language of the source text. In other words, the <tuv> holding the source segment will have its xml:lang attribute set to the same value as srclang. (except if srclang is set to "*all*"). If a <tu> element does not have a srclang attribute specified, it uses the one defined in the <header> element.

Value description:

A language code as described in the [RFC 4646], or the value "*all*" if any language can be used as the source language. Unlike the other TMX attributes, the values for srclang are not case-sensitive.

Default value:

Undefined.

Used in:

<header>, <tu>.


tuid

Translation unit identifier - Specifies an identifier for the <tu> element. Its value must be unique within the file.

Value description:

Text without spaces.

Default value:

Undefined.

Used in:

<tu>.


type

Type - Specifies the kind of data a <bpt>, <g>, <hi>, <ph>, <sub> or <x/> element represents.

Value description:

Text. Depends on the element where the attribute is used.

The recommended values for the type attribute, when used in <bpt> and <g>are as follow:

bold

Bold.

color

Color change.

dulined

Doubled-underlined.

font

Font change.

italic

Italic.

link

Linked text.

scap

Small caps.

struct

XML/SGML structure.

ulined

Underlined.

xliff-bpt

XLIFF <bpt> tag.

xliff-g

XLIFF <g> tag.

The recommended values for the type attribute, when used in <ph> and <x/> are as follow:

index

Index marker.

date

Date.

time

Time.

fnote

Footnote.

enote

End-note.

alt

Alternate text.

image

Image

pb

Page break.

lb

Line break.

cb

column break.

inset

Inset.

xliff-bx

XLIFF <bx/> tag.

xliff-ex

XLIFF <ex/> tag.

xliff-it

XLIFF <it> tag.

xliff-ph

XLIFF <ph> tag.

xliff-x

XLIFF <x/> tag.

The recommended values for the type attribute, when used in