TMX 2.0 Specification
Draft
OSCAR Working Draft - October 15, 2007
|
|
This version:
http://www.heartsome.org/tmx/tmx-03282007.html
Latest version:
http://www.heartsome.org/tmx/tmx.html
Previous version:
http://www.lisa.org/standards/tmx/tmx.html
Editor:
Rodolfo M. Raya <rmraya@heartsome.net>
Previous Editors:
Yves Savourel Alan K. Melby
Copyright © The Localisation Industry Standards Association [
LISA ] 1997-2007. All Rights Reserved.
This document and translations of it may be copied and furnished to others, and derivative
works that comment on or otherwise explain it or assist in its implementation may be
prepared, copied, published and distributed, in whole or in part, without restriction of
any kind, provided that the above copyright notice and this paragraph are included on all
such copies and derivative works. However, this document itself may not be modified in any
way, such as by removing the copyright notice or references to LISA.
The limited permissions granted above are perpetual and will not be revoked by LISA or its
successors or assigns.
This document and the information contained herein is provided on an "AS IS" basis and LISA
DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY
THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES
OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Abstract
This document defines version 2.0 of the Translation Memory eXchange format (TMX). The
purpose of the TMX format is to provide a standard method to describe translation memory data
that is being exchanged among tools and/or translation vendors, while introducing little
or no loss of critical data during the process.
Status of this Document
This document constitutes an initial draft for discussion. Comments may be sent to
tmx2@lisa.org.
Table of Contents
Abstract
1. Introduction
1.1. XML Compliance
1.2. Character Encoding
1.3 Extensibility
1.3.1. Extension Points
2. General Structure
2.1. Header
2.2. Body
3. Detailed Specifications
3.1. Elements
3.1.1. Structural Elements
3.1.2. Inline Elements
3.2. Attributes
3.2.1. TMX Attributes
3.2.2. XML Namespace Attributes
4. Content Markup
4.1. Overview
4.2. Selection Rules for Inline Elements
5. TMX Compliance
5.1 Validation of TMX Files
6. Changes Since Previous Version (Non-Normative)
6.1 Backwards Compatibility
Appendices
A. Sample Document
B. XML Schema for TMX
C. Glossary
D. References
Normative
Non-Normative
1. Introduction
TMX is defined in two parts:
A specification of the format of the container (the higher-level elements that
provide information about the file as a whole and about entries). In TMX, an entry
consisting of aligned segments of text in two or more languages is called a
Translation Unit (the
<tu> element).
A specification of a low-level meta-markup format for the content of a segment of
translation-memory text. In TMX, an individual segment of translation-memory
text in a particular language is denoted by a
<seg> element. See the section on
Content Markup for more
details.
1.1. XML Compliance
TMX is XML-compliant. The TMX vocabulary is defined using an XML Schema (see
Appendix B) It also uses various third party standards for
date/time and language codes. See the
References section for more details.
TMX files are intended to be created automatically by export routines and processed
automatically by import routines. TMX files are "well-formed" XML documents that can be
processed without explicit reference to the TMX Schema. However, a "valid" TMX file must
conform to the TMX Schema, and any suspicious TMX file should be verified against the TMX
Schema using a validating XML parser.
Since XML syntax is case sensitive, any XML application must define casing conventions.
All elements and attributes names of TMX are defined in
lowercase.
The namespace for TMX 2.0 is defined as "http://www.lisa.org/tmx20". For example, if you
want to use TMX in another XML document you document would look like this:
<?xml version="1.0"?>
<myformat xmlns:tmx="http://www.lisa.org/tmx20">
<data>
<tmx:tmx version="2.0">
<tmx:header ...
... TMX data ...
</tmx:body>
</tmx:tmx>
</data>
</myformat>
|
1.2. Character Encoding
TMX files are always in Unicode. They can use either of three encoding methods: UTF-16
(16-bit files), UTF-8 (8-bit files) or ISO-646 [a.k.a. US-ASCII] (7-bit files).
In all cases, unlike in HTML, only the following five character entity references are
allowed:
& (&), < (<), > (>), ' ('), and " (").
For 7-bit files, extended (non-ASCII) characters are always represented by numeric
character references. For example: Ζ or Ζ for a GREEK CAPITAL LETTER
DELTA.
Since all XML processors must accept the UTF-8 and UTF-16 encodings and since US-ASCII is a
subset of UTF-8, a TMX document can omit the encoding declaration in the XML
declaration.
Note that UTF-16 files always start with the Unicode byte-order-mark (BOM) value:
U+FEFF.
1.3 Extensibility
Although TMX provides a rich set of elements for exchanging Translation Memory data,
sometimes it may be necessary to extend TMX vocabulary using
XML Namespaces.
You can add non-TMX elements, as well as attributes and attribute values, to any TMX
document. All foreign elements and attributes added to a TMX file must be defined using an XML
Schema. All XML Schemas declared in a TMX document must be made available to permit
validation of the foreign constructs included in the file.
Although TMX offers this extensibility mechanism, in order to avoid a nimiety of
information and increase interoperability between tools, it is strongly recommended to
use TMX capabilities whenever possible, rather than to create non-standard
user-defined elements or attributes.
Applications that depend on TMX format for exchanging Translation Memory data are not
required to understand and support non-TMX elements or attributes. A TMX application can
safely ignore foreign elements or attributes present in a TMX document.
1.3.1. Extension Points
TMX supports the use of foreign XML elements in the following elements:
<body>,
<header>,
<internal-file>,
<tu> and
<tuv>.
Foreign attributes can be added to any TMX element, provided that the attribute name is
fully qualified with the corresponding namespace prefix.
2. General Structure
A TMX document is enclosed in a
<tmx> root element. The
<tmx> element contains two elements:
<header> and
<body>.
The
<header> contains meta-data about the document. In
addition to its attributes,
<header> can also store document-level information
in
<note> and
<prop> elements. The
SRX segmentation rules used to generate a TMX file can be
included in the
<header> using a
<segmentation> element. Inline codes
extracted from
<seg> elements and replaced with
<g> or
<x/> elements are stored in the
<header> inside an
<inline-data> element.
2.2. Body
The
<body> contains the collection of translation units
(the
<tu> elements). This collection is in no specific
order.
Each
<tu> element contains at least two translation unit
variants (the
<tuv> element). Each
<tuv> contains the segment and the information
pertaining to that segment for a given language.
The text itself is stored in the
<seg> element, while
<note> and
<prop> allow you to store information specific to each
<tuv>.
A segment can contain markup content elements: The
<bpt>,
<ept>,
<g>,
<ph> and
<x/> elements allow you to encapsulate or replace
original native inline codes. The
<hi> element allows you to add extra markup not related to
existing inline codes. And the
<sub> element, used inside encapsulated inline code,
allows you to delimit embedded translatable text.
See the
Sample Document section for an example of TMX
document.
3. Detailed Specifications
3.1. Elements
TMX elements are divided into two main categories: the structural elements (the
container), and the inline elements (the content markup).
Structural elements
<body>,
<context>,
<external-file>,
<header>,
<inline-data>,
<internal-file>,
<note>,
<prop>,
<seg>,
<tag>,
<tmx>,
<tu>,
<tuv>.
Inline elements
<bpt>,
<ept>,
<g>,
<hi>,
<ph>,
<sub>,
<x/>.
3.1.1. Structural Elements
The structural elements are the following:
<body>
Body - The <body> element encloses the main data, the set of
<tu> elements that are comprised within the file.
Required attributes:
None.
Optional attributes:
None.
Contents:
Zero, one or more
<tu> elements, followed by
Zero, one or more non-TMX elements.
<context>
Context Information - The <context> element describes
the context of a
<tu>. The purpose of this context information is to allow
certain pieces of text to have different translations depending on where they came from. The
translation of a piece of text may differ if it is a web form or a dialog or an Oracle form or a
Lotus form for example. This information is thus required by a translator when working on the
file. Likewise, the information may be used by any tool proposing to automatically leverage
the text successfully.
Required attributes:
context-type.
Optional attributes:
None.
Contents:
Text.
<external-file>
External file - The <external-file> element specifies
the location of the actual
SRX file being referenced. The required
href attribute provides a URL to the file. The
crc attribute accepts a value that can be used to assure the
integrity of the file. The optional
uid attribute allows a unique ID to be assigned to the
file.
Required attributes:
href.
Optional attributes:
crc,
uid.
Contents:
Empty.
File header - The <header> element contains information
pertaining to the whole document.
Required attributes:
creationtool,
creationtoolversion,
segtype,
o-tmf,
adminlang,
srclang,
datatype.
Optional attributes:
o-encoding,
creationdate,
creationid,
changedate,
changeid.
Contents:
Zero, one or more
<note> or
<prop> elements in any order, followed by
Zero or one
<inline-data> element, followed by
Zero or one
<segmentation> element, followed by
Zero, one or more non-TMX elements.
<inline-data>
Inline data - The <inline-data> element holds the
elements with the information necessary to rebuild the inline tags in a translated
document.
Required attributes:
None.
Optional attributes:
None.
Contents:
One or more
<tag> elements.
<internal-file>
Internal file - The <internal-file> element contains the
actual
SRX file with the segmentation rules used when generating the
TMX document.
Required attributes:
None.
Optional attributes:
None.
Contents:
One SRX file embedded using SRX namespace.
<note>
Note - The <note> element is used for comments.
Required attributes:
None.
Optional attributes:
creationdate,
creationid,
changedate,
changeid,
o-encoding,
xml:lang.
Contents:
Text.
<prop>
Property - The <prop> element is used to define the various
properties of the parent element (or of the document when
<prop> is used in the
<header> element). These properties are not defined
by the standard.
As your tool is fully responsible for handling the content of a
<prop> element, you can use it in any way you wish. For
example the content can be a list of instructions your tool can parse, not only a simple
text.
|
<prop name="user-defined">name:domain value:Computer
science</prop> <prop
name="x-domain">Computer science</prop>
|
The
<prop> element may be deprecated in future versions of
TMX standard. Use attributes defined in a namespace different from TMX instead. See the
Extensibility section for more
information.
Required attributes:
name.
Optional attributes:
xml:lang,
o-encoding.
Contents:
Tool-specific data or text.
<seg>
Segment - The <seg> element contains the text of the given
segment. There is no length limitation to the content of a <seg> element. All spacing
and line-breaking characters are significant within a <seg> element.
Required attributes:
None.
Optional attributes:
xml:space.
Contents:
Text data (without leading or trailing white spaces characters), Zero, one
or more of the following elements:
<bpt>,
<ept>,
<g>,
<hi>,
<ph> and
<x/>. They can be in any order, except that
each
<bpt> element must have a subsequent corresponding
<ept> element.
<segmentation>
Segmentation - The <segmentation> element points to or
contains the
SRX segmentation rules that were used in the generation of the
TMX file.
Required attributes:
None.
Optional attributes:
None.
Contents:
Either exactly one
<internal-file> or one
<external-file> element.
<tag>
Tag - The <tag> element contains the actual inline
information represented with
<g> and
<x/> in
<seg> elements.
Required attributes:
id, type.
Optional attributes:
endmrk, o-encoding.
Contents:
Code data.
<tmx>
TMX document - The <tmx> element encloses all the other
elements of the document.
Required attributes:
version.
Contents:
One
<header> followed by One
<body> element.
<tu>
Translation unit - The <tu> element contains the data for a
given translation unit.
Required attributes:
None.
Optional attributes:
tuid,
o-encoding,
datatype,
usagecount,
lastusagedate,
creationtool,
creationtoolversion,
creationdate,
creationid,
changedate,
segtype,
changeid,
o-tmf,
srclang,
group,
g-order.
Contents:
Zero, one or more
<note>,
<prop> or
<context> elements in any order, followed
by Two or more
<tuv> elements, followed by Zero, one or more non-TMX elements.
<tuv>
Translation Unit Variant - The <tuv> element specifies
text in a given language.
Required attributes:
xml:lang.
Optional attributes:
o-encoding,
datatype,
usagecount,
lastusagedate,
creationtool,
creationtoolversion,
creationdate,
creationid,
changedate,
changeid,
o-tmf,
xml:space.
Contents:
Zero, one or more
<note>, or
<prop> elements in any order, followed by
One
<seg> element, followed by
Zero, one or more non-TMX elements.
3.1.2. Inline Elements
The inline elements are the elements that can appear inside the a segment. With the
exception of the
<hi> and
<sub> element, they all enclose or replace any
formatting or control codes that is not text but resides within the segment. See also the
Content Markup section for more
information.
The inline elements are the following:
<bpt>
Begin paired tag - The <bpt> element is used to delimit the
beginning of a paired sequence of native codes. Each <bpt> has a corresponding
<ept> element within the segment. A <btp> element
must contain a
<sub> element if the matching
<ept> does not contain one.
Required attributes:
i, type.
Optional attributes:
equiv-text, x.
Contents:
Code data, Zero, one or more
<sub> elements.
<ept>
End paired tag - The <ept> element is used to delimit the end
of a paired sequence of native codes. Each <ept> has a corresponding
<bpt> element within the segment. A <btp> element
must contain a
<sub> element if the matching
<bpt> does not contain one.
Required attributes:
i.
Optional attributes:
equiv-text.
Contents:
Code data, Zero, one or more
<sub> elements.
<g>
Generic group placeholder - The <g> element is used to
replace any inline code of the original document that has a beginning and an end, does not
overlap other paired inline codes and can be moved within its parent structural element. The
actual inline data is stored in
<tag> elements in the header of the file. The required
xid attribute is used to reference the
<tag> element that contains the replaced code.
Required attributes
xid, type.
Optional attributes:
equiv-text, x.
Contents:
Text data.
<hi>
Highlight - The <hi> element delimits a section of text
that has special meaning, such as a terminological unit, a proper name, an item that should
not be modified, etc. It can be used for various processing tasks. For example, to indicate to
a Machine Translation tool proper names that should not be translated; for terminology
verification, to mark suspect expressions after a grammar checking.
Required attributes:
type.
Optional attributes:
x,
comment.
Contents:
Text data, Zero, one or more of the following elements:
<bpt>,
<ept>,
<g>,
<hi>,
<ph> and
<x/>. They can be in any order, except that
each
<bpt> element must have a subsequent corresponding
<ept> element.
<ph>
Placeholder - The <ph> element is used to delimit a
sequence of native standalone codes in the segment, or the initial or ending portion of a
paired tag that does not have its matching code within the segment, that contains embedded
translatable text.
Required attributes:
type.
Optional attributes:
x,
assoc, equiv-text.
Contents:
Code data, One or more
<sub> elements.
<sub>
Sub-flow - The <sub> element is used to delimit sub-flow
text inside a sequence of native code, for example: the definition of a footnote or the text of
title in a HTML anchor element.
Here are some examples (translatable text underlined, sub-flow is bolded):
Footnote in RTF:
|
Original RTF:
Elephants{\cs16\super \chftn {\footnote
\pard\plain \s15\widctlpar \f4\fs20 {\cs16\super \chftn }
An elephant is a very large animal.
}}
are big.
TMX with content mark-up:
Elephants<ph type="fnote">{\cs16\super
\chftn {\footnote \pard\plain \s15\widctlpar
\f4\fs20 {\cs16\super \chftn }
<sub type="fnote">An elephant is a very large
animal.
</sub>}}</ph>
are big.
|
Index marker in RTF:
|
Original RTF:
Elephants{\pard\plain
\widctlpar \v\f4\fs20 {\xe {Big
animal\bxe }}}
are big.
TMX with content mark-up:
Elephants<ph
type="index">{\pard\plain \widctlpar \v\f4\fs20 {\xe
{<sub type="index">Big animal
</sub>\bxe }}}</ph>
are big.
|
Text of an attribute in a HTML element:
|
Original HTML:
See the <A TITLE="Go
to Notes
" HREF="notes.htm">Notes</A>
for more details.
TMX with content mark-up:
See the
<bpt i="1" type="link"><A
TITLE="<sub type="link">Go to Notes
</sub>" HREF="notes.htm"></bpt>Notes<ept
i="1"></A></ept>
for more details.
|
Note that sub-flow are related to segmentation and can cause interoperability issues
when one tool uses sub-flow within its main segment, while another extract the sub-flow text
as an independent segment.
Required attributes:
type.
Optional attributes:
datatype.
Contents:
Text data, Zero, one or more of the following elements:
<bpt>,
<ept>,
<g>,
<ph>,
<x/> and
<hi>. They can be in any order, except that
each
<bpt> element must have a subsequent corresponding
<ept> element.
<x/>
Generic placeholder - The <x/> element is used to replace
any inline code of the original document. The actual inline data is stored in
<tag> elements in the header of the file. The required
xid attribute is used to reference the
<tag> element that contains the replaced code.
Required attributes:
xid, type.
Optional attributes:
equiv-text, x.
Content:
Empty.
3.2. Attributes
This section lists the various attributes used in the TMX elements.
TMX attributes
adminlang,
assoc,
changedate,
changeid,
comment,
context-type
creationdate,
creationid,
creationtool,
creationtoolversion,
crc,
datatype,
endmrk,
equiv-text,
group,
g-order,
i,
id,
lastusagedate,
name,
o-encoding,
o-tmf,
segtype,
srclang,
tuid,
type,
uid,
usagecount,
version,
x.
XML namespace attributes
xml:lang,
xml:space
3.2.1. TMX Attributes
adminlang
Administrative language - Specifies the default language for
the administrative and informative elements
<note> and
<prop>.
Value description:
A language code as described in the [RFC 4646]. Unlike
the other TMX attributes, the values for adminlang are not case-sensitive.
Default value:
Undefined.
Used in:
<header>.
assoc
Association
- Indicates the association of a
<ph> with the text prior or after.
Value description:
"p" (the element is associated with the text preceding the element), "f" (the element is
associated with the text following the element), or "b" (the element is associated with the
text on both sides).
Default value:
Undefined.
Used in:
<ph>.
changedate
Change date - Specifies the date of the last modification of the
element.
Value description:
Date in [ISO 8601] Format. The recommended pattern to
use is: YYYYMMDDThhmmssZ
Where: YYYY is the year (4 digits), MM is the month (2 digits), DD is the day (2
digits), hh is the hours (2 digits), mm is the minutes (2 digits), ss is the second (2 digits),
and Z indicates the time is UTC time. For example:
date="20020125T210600Z" is January 25, 2002 at 9:06pm GMT is
January 25, 2002 at 2:06pm US Mountain Time is January 26, 2002 at 6:06am Japan
time
Default value:
Undefined.
Used in:
<header>,
<tu>,
<tuv>.
changeid
Change identifier - Specifies the identifier of the user who
modified the element last.
Value description:
Text.
Default value:
Undefined.
Used in:
<header>,
<tu>,
<tuv>.
Comment - a comment in a tag
Value description:
Text.
Default value:
Undefined.
Used in:
<hi>.
context-type
Context type - The context-type attribute specifies the context
and the type of resource or style of the data of a given element. For example, to define if it is a
label, or a menu item in the case of resource-type data, or the style in the case of
document-related data.
Value description:
Text without spaces. Pre-defined values are as follow:
|
database
|
Indicates a database content.
|
|
element
|
Indicates the content of an element within an XML document.
|
|
elementtitle
|
Indicates the name of an element within an XML document.
|
|
linenumber
|
Indicates the line number from the sourcefile (see context-type="sourcefile") where the source text is found.
|
|
numparams
|
Indicates a the number of parameters contained within the source text.
|
|
paramnotes
|
Indicates notes pertaining to the parameters in the source text.
|
|
record
|
Indicates the content of a record within a database.
|
|
recordtitle
|
Indicates the name of a record within a database.
|
|
sourcefile
|
Indicates the original source file from which the TMX file is created.
|
In addition, user-defined values can be used with this attribute. A user-defined value must start with an "x-" prefix.
Default value:
Undefined.
Used in:
<context>.
creationdate
Creation date - Specifies the date of creation of the
element.
Value description:
Date in [ISO 8601] Format. The recommended pattern to
use is: YYYYMMDDThhmmssZ
Where: YYYY is the year (4 digits), MM is the month (2 digits), DD is the day (2
digits), hh is the hours (2 digits), mm is the minutes (2 digits), ss is the second (2 digits),
and Z indicates the time is UTC time. For example:
date="20020125T210600Z" is January 25, 2002 at 9:06pm GMT is
January 25, 2002 at 2:06pm US Mountain Time is January 26, 2002 at 6:06am Japan
time
Default value:
Undefined.
Used in:
<header>,
<tu>,
<tuv>.
creationid
Creation identifier - Specifies the identifier of the user who
created the element.
Value description:
Text.
Default value:
Undefined.
Used in:
<header>,
<tu>,
<tuv>.
Creation tool - Identifies the tool that created the TMX
document. Its possible values are not specified by the standard but each tool provider
should publish the string identifier it uses.
Value description:
Text.
Default value:
Undefined.
Used in:
<header>,
<tu>,
<tuv>.
Creation tool version - Identifies the version of the tool that
created the TMX document. Its possible values are not specified by the standard but each tool
provider should publish the string identifier it uses.
Value description:
Text.
Default value:
Undefined.
Used in:
<header>,
<tu>,
<tuv>.
crc
Cyclic redundancy checking - A private value used to verify data
as it is returned to the producer. The generation and verification of this number is
tool-specific.
Value description:
Number (possibly not decimal).
Default value:
Undefined.
Used in:
<external-file>.
datatype
Data type - Specifies the type of data contained in the element.
Depending on that type, you may apply different processes to the data.
Value description:
Text. The recommended values for the datatype attribute are as follow:
|
unknown
|
undefined (default)
|
|
alptext
|
WinJoust data.
|
|
cdf
|
Channel Definition Format.
|
|
cmx
|
Corel CMX Format.
|
|
cpp
|
C and C++ style text.
|
|
hptag
|
HP-Tag.
|
|
html
|
HTML, DHTML, etc.
|
|
interleaf
|
Interleaf documents.
|
|
ipf
|
IPF/BookMaster.
|
|
java
|
Java, source and property files.
|
|
javascript
|
JavaScript, ECMAScript scripts.
|
|
lisp
|
Lisp.
|
|
mif
|
Framemaker MIF, MML, etc.
|
|
opentag
|
OpenTag data.
|
|
pascal
|
Pascal, Delphi style text.
|
|
plaintext
|
Plain text.
|
|
pm
|
PageMaker.
|
|
resx
|
Windows .NET resources.
|
|
rtf
|
Rich Text Format.
|
|
sgml
|
SGML.
|
|
stf-f
|
S-Tagger for FrameMaker.
|
|
stf-i
|
S-Tagger for Interleaf.
|
|
transit
|
Transit data.
|
|
vbscript
|
Visual Basic scripts.
|
|
winres
|
Windows resources from RC, DLL, EXE.
|
|
xliff
|
XLIFF (XML Localization Interchange File Format).
|
|
xml
|
XML.
|
|
xptag
|
Quark XPressTag.
|
In addition, user-defined values can be used with this attribute. A user-defined value must start with an "x-" prefix.
Default value:
"unknown".
Used in:
<header>,
<tu>,
<tuv>,
<sub>.
endmrk
End marker - The "endmrk" contains the formatting code of a
closing tag replaced by a
<g> element.
Value description:
Text.
Used in:
<g>.
equiv-text
Equivalent text - Indicates the equivalent text to
substitute in place of an inline tag.
Value description:
Text.
Used in:
<bpt>,
<ept>,
<ph>,
<g> and
<x/>.
group
Group identifier - indicates that a given
<tu> element belongs to a logical group of related translation units.
Value description:
Text without spaces.
Used in:
<tu>
g-order
Group order - defines the order of the
<tu> within a given logical group. Used together with
group attribute.
Value description:
Number starting in 1 and incremented in steps of 1 unit. Must be unique within each logical group defined with the
group attribute. Its initial value is reset to 1 in each logical group.
i
Internal matching - The "i" attribute is used to pair the
<bpt> elements with
<ept> elements. This mechanism provides TMX with
support to markup a possibly overlapping range of codes. Such constructions are not used
often, however several formats allow them. For example, the following HTML segment, even if
not strictly legal, is accepted by some HTML editors and usually interpreted correctly by
the browsers.
For example:
[----------------------------]
<B>Bold <I>Bold and Italic</B> Italics</I>
[--------------------------------]
|
With the TMX content mark-up, since the
<ept> element does not have a type, it can be difficult to
know which sequence of codes it closes as illustrated by the following segment:
TMX (with incomplete content
mark-up):
|
<bpt>
<B></bpt>
Bold, <bpt>
<I></bpt>
Bold+Italic<ept>
</B></ept>
, Italic<ept>
</I></ept>
|
The attribute
i is used to specify which
<ept> is closing which
<bpt>:
TMX (with correct content mark-up):
|
<bpt i="1"
x="1" type="bold">
<B></bpt>
Bold, <bpt i="2" x="1" type="italic">
<I></bpt>
Bold+Italic<ept i="1">
</B></ept>
, Italic<ept i="2">
</I></ept>
|
Value description:
Number starting in 1 and incremented in steps of 1 unit. Must be unique for each
<bpt> within a given
<seg> element. Its initial value is reset to 1 in every
<seg> element.
Default value:
Undefined.
Used in:
<bpt>,
<ept>.
href
Hypertext reference - The "href" attribute contains a valid URL
that describes the location of a file.
Value description:
Text.
Default value:
Undefined.
Used in:
<external-file>.
id
Identifier - The "id" attribute is used in
<tag> elements as unique identifier. The value of the "id"
attribute is determined by the tool creating the TMX file and must be unique within the
document.
Value description:
Text, matching the
Name
production as required by the
ID attribute type
in
XML standard.
Default value:
Undefined.
Used in:
<tag>.
lastusagedate
Last usage date - Specifies when the last time the content of a
<tu> or
<tuv> element was used in the original translation
memory environment.
Value description:
Date in [ISO 8601] Format. The recommended pattern to
use is: YYYYMMDDThhmmssZ
Where: YYYY is the year (4 digits), MM is the month (2 digits), DD is the day (2
digits), hh is the hours (2 digits), mm is the minutes (2 digits), ss is the second (2 digits),
and Z indicates the time is UTC time. For example:
date="20020125T210600Z" is January 25, 2002 at 9:06pm GMT is
January 25, 2002 at 2:06pm US Mountain Time is January 26, 2002 at 6:06am Japan
time
Default value:
Undefined.
Used in:
<tu>,
<tuv>.
name
Property name - Tool specific name used to identify the type of a
<prop> element.
Value description:
Text.
Default value:
Undefined.
Used in:
<prop>.
o-encoding
Original encoding - As stated in the
Encoding section, all TMX documents are in Unicode.
However, it is sometimes useful to know what code set was used to encode text that was
converted to Unicode for purposes of interchange. The o-encoding attribute specifies the
original or preferred code set of the data of the element in case it is to be re-encoded in a
non-Unicode code set.
Value description:
One of the [IANA] recommended "charset
identifier", if possible.
Default value:
Undefined.
Used in:
<header>,
<tag>,
<tu>,
<tuv>,
<note>,
<prop>.
o-tmf
Original translation memory format - Specifies the format of the
translation memory file from which the TMX document or segment thereof have been
generated.
Value description:
Text.
Default value:
Undefined.
Used in:
<header>,
<tu>,
<tuv>.
segtype
Segment type - Specifies the kind of segmentation used in the
<tu> element. If a
<tu> element does not have a segtype attribute specified,
it uses the one defined in the
<header> element.
The "block" value is used when the segment does not correspond to one of the other values,
for example when you want to store a chapter composed of several paragraphs in a single
<tu>.
<tu segtype="block"> <prop
type="x-sentbreak">$#$</prop> <tuv
xml:lang="en"><seg>This is the first paragraph of a big
section.$#$ This is the second paragraph.$#$This is the
third.</seg></tuv> </tu>
In the example above the property "x-sentbreak" defines the token used to indicate the
separation between sentences within the block of text. You can therefore easily break down
the segment into smaller units if needed. You can imagine many other ways to use this
mechanism.
A TMX file can include sentence level segmentation for maximum portability, so it is
recommended that you use such segmentation rather than a specific, proprietary method like
the one above.
The rules on how the text was segmented can be carried in a Segmentation Rules eXchange
(SRX) document.
Value description:
"block", "paragraph", "sentence", or "phrase".
Default value:
Undefined.
Used in:
<header>,
<tu>.
srclang
Source language - Specifies the language of the source text. In
other words, the
<tuv> holding the source segment will have its
xml:lang attribute set to the same value as srclang.
(except if srclang is set to "*all*"). If a
<tu> element does not have a srclang attribute specified,
it uses the one defined in the
<header> element.
Value description:
A language code as described in the [RFC 4646], or the
value "*all*" if any language can be used as the source language. Unlike the other TMX
attributes, the values for srclang are not case-sensitive.
Default value:
Undefined.
Used in:
<header>,
<tu>.
tuid
Translation unit identifier - Specifies an identifier for the
<tu> element. Its value must be unique within the file.
Value description:
Text without spaces.
Default value:
Undefined.
Used in:
<tu>.
type
Type - Specifies the kind of data a
<bpt>,
<g>,
<hi>,
<ph>,
<sub> or
<x/> element represents.
Value description:
Text. Depends on the element where the attribute is used.
The recommended values for the type attribute, when used in
<bpt> and
<g>are as follow:
|
bold
|
Bold.
|
|
color
|
Color change.
|
|
dulined
|
Doubled-underlined.
|
|
font
|
Font change.
|
|
italic
|
Italic.
|
|
link
|
Linked text.
|
|
scap
|
Small caps.
|
|
struct
|
XML/SGML structure.
|
|
ulined
|
Underlined.
|
|
xliff-bpt
|
XLIFF <bpt> tag.
|
|
xliff-g
|
XLIFF <g> tag.
|
The recommended values for the type
attribute, when used in
<ph> and <x/> are as follow:
|
index
|
Index marker.
|
|
date
|
Date.
|
|
time
|
Time.
|
|
fnote
|
Footnote.
|
|
enote
|
End-note.
|
|
alt
|
Alternate text.
|
|
image
|
Image
|
|
pb
|
Page break.
|
|
lb
|
Line break.
|
|
cb
|
column break.
|
|
inset
|
Inset.
|
|
xliff-bx
|
XLIFF <bx/> tag.
|
|
xliff-ex
|
XLIFF <ex/> tag.
|
|
xliff-it
|
XLIFF <it> tag.
|
|
xliff-ph
|
XLIFF <ph> tag.
|
|
xliff-x
|
XLIFF <x/> tag.
|
The recommended values for the type
attribute, when used in
<hi> are as follow:
|
abbrev
|
Indicates the marked text is an abbreviation.
|
|
abbreviated-form
|
ISO-12620 2.1.8: A term resulting from the omission of any part of the full
term while designating the same concept.
|
|
abbreviation
|
ISO-12620 2.1.8.1: An abbreviated form of a simple term resulting from the
omission of some of its letters (e.g. 'adj.' for 'adjective').
|
|
acronym
|
ISO-12620 2.1.8.4: An abbreviated form of a term made up of letters from the
full form of a multi-word term strung together into a sequence pronounced only
syllabically (e.g. 'radar' for 'radio detecting and ranging').
|
|
appellation
|
ISO-12620: A proper-name term, such as the name of an agency or other proper
entity.
|
|
collocation
|
ISO-12620 2.1.18.1: A recurrent word combination characterized by
cohesion in that the components of the collocation must co-occur within an
utterance or series of utterances, even though they do not necessarily have to
maintain immediate proximity to one another.
|
|
common-name
|
ISO-12620 2.1.5: A synonym for an international scientific term that is
used in general discourse in a given language.
|
|
datetime
|
Indicates the marked text is a date and/or time.
|
|
equation
|
ISO-12620 2.1.15: An expression used to represent a concept based on a
statement that two mathematical expressions are, for instance, equal as
identified by the equal sign (=), or assigned to one another by a similar
sign.
|
|
expanded-form
|
ISO-12620 2.1.7: The complete representation of a term for which there is an
abbreviated form.
|
|
formula
|
ISO-12620 2.1.14: Figures, symbols or the like used to express a concept
briefly, such as a mathematical or chemical formula.
|
|
head-term
|
ISO-12620 2.1.1: The concept designation that has been chosen to head a
terminological record.
|
|
initialism
|
ISO-12620 2.1.8.3: An abbreviated form of a term consisting of some of the
initial letters of the words making up a multi-word term or the term elements
making up a compound term when these letters are pronounced individually
(e.g. 'BSE' for 'bovine spongiform encephalopathy').
|
|
international-scientific-term
|
ISO-12620 2.1.4: A term that is part of an international scientific
nomenclature as adopted by an appropriate scientific body.
|
|
internationalism
|
ISO-12620 2.1.6: A term that has the same or nearly identical orthographic
or phonemic form in many languages.
|
|
logical-expression
|
ISO-12620 2.1.16: An expression used to represent a concept based on
mathematical or logical relations, such as statements of inequality, set
relationships, Boolean operations, and the like.
|
|
materials-management-unit
|
ISO-12620 2.1.17: A unit to track object.
|
|
name
|
Indicates the marked text is a name.
|
|
near-synonym
|
ISO-12620 2.1.3: A term that represents the same or a very similar concept as
another term in the same language, but for which interchangeability is
limited to some contexts and inapplicable in others.
|
|
part-number
|
ISO-12620 2.1.17.2: A unique alphanumeric designation assigned to an
object in a manufacturing system.
|
|
phrase
|
Indicates the marked text is a phrase.
|
|
phraseological-unit
|
ISO-12620 2.1.18: Any group of two or more words that form a unit, the meaning
of which frequently cannot be deduced based on the combined sense of the words
making up the phrase.
|
|
protected
|
Indicates the marked text should not be translated.
|
|
romanized-form
|
ISO-12620 2.1.12: A form of a term resulting from an operation whereby
non-Latin writing systems are converted to the Latin alphabet.
|
|
set-phrase
|
ISO-12620 2.1.18.2: A fixed, lexicalized phrase.
|
|
short-form
|
ISO-12620 2.1.8.2: A variant of a multi-word term that includes fewer words
than the full form of the term (e.g. 'Group of Twenty-four' for
'Intergovernmental Group of Twenty-four on International Monetary
Affairs').
|
|
sku
|
ISO-12620 2.1.17.1: Stock keeping unit, an inventory item identified by a
unique alphanumeric designation assigned to an object in an inventory
control system.
|
|
standard-text
|
ISO-12620 2.1.19: A fixed chunk of recurring text.
|
|
symbol
|
ISO-12620 2.1.13: A designation of a concept by letters, numerals,
pictograms or any combination thereof.
|
|
synonym
|
ISO-12620 2.1.2: Any term that represents the same or a very similar concept
as the main entry term in a term entry.
|
|
synonymous-phrase
|
ISO-12620 2.1.18.3: Phraseological unit in a language that expresses the
same semantic content as another phrase in that same language.
|
|
term
|
Indicates the marked text is a term.
|
|
transcribed-form
|
ISO-12620 2.1.11: A form of a term resulting from an operation whereby the
characters of one writing system are represented by characters from another
writing system, taking into account the pronunciation of the characters
converted.
|
|
transliterated-form
|
ISO-12620 2.1.10: A form of a term resulting from an operation whereby the
characters of an alphabetic writing system are represented by characters
from another alphabetic writing system.
|
|
truncated-term
|
ISO-12620 2.1.8.5: An abbreviated form of a term resulting from the
omission of one or more term elements or syllables (e.g. 'flu' for
'influenza').
|
|
variant
|
ISO-12620 2.1.9: One of the alternate forms of a term.
|
Any of the suggested values listed in the tables above can be used with
<sub> element.
The values listed for <bpt>/<g>
and <ph>/<x/> can be used with
<tag>
In addition, user-defined values can be used with this attribute. A user-defined value
must start with an "x-" prefix.
Default value:
Undefined.
Used in:
<prop>,
<bpt>,
<ph>,
<hi>,
<sub>,
<x>.
uid
Unique ID - The "uid" attribute is used to provide a unique ID to
identify the file that contains the segmentation rules used when generating the TMX
document.
Value description:
Text.
Default value:
Undefined.
Used in:
<external-file>.
usagecount
Usage count - Specifies the number of times a
<tu> or the content of the
<tuv> element has been accessed in the original TM
environment.
Value description:
Number.
Default value:
Undefined.
Used in:
<tu>,
<tuv>.
version
TMX version - The version attribute indicates the version of the
TMX format to which the document conforms.
Value description:
Fixed text: the major version number, a period, and the minor version number. For example:
version="2.0".
Default value:
"2.0"
Used in:
<tmx>.
x
External matching - The x attribute is used to match inline
elements
<bpt>,
<ph>, and
<hi> between each
<tuv> element of a given
<tu> element. This mechanism facilitates the pairing of
allied codes in source and target text, even if the order of code occurrence differs between
the two because of the translation syntax. Note that an
<ept> element is matched based on x attribute of its
corresponding
<bpt> element.
For example:
|
<seg>link to <bpt i="1" type="link" x="1">&a href="www.mysite.com" title="<sub type="x-title">my site</sub>"></bpt>my web site<ept i="1"></a>,</ept> and this is <ph type="image" x="2"><img src="john.gif" alt="<sub type="alt">John's picture</sub>"/></ph> John.</seg>
<seg>enlace a <bpt i="1" type="link" x="1">&a href="www.mysite.com/es" title="<sub type="x-title">mi sitio</sub>"></bpt>mi sitio web<ept i="1"></a>,</ept> y este es <ph type="image" x="2"><img src="juan.gif" alt="<sub type="alt">foto de Juan</sub>"/></ph> Juan.</seg>
|
Value description:
Number starting in 1 and incremented in steps of 1 unit. Must be unique within a given
<seg> element. Its initial value is reset to 1 in every
<seg> element.
Default value:
Undefined.
Used in:
<bpt>,
<ph>,
<g>,
<x/>,
<hi>.
xid
External Identifier - The "xid" attribute is used in
<g> or
<x/> elements to reference the
id attribute of the
<tag> element that contains the original corresponding
code data or format replaced by the given element.
Value description:
The value of the referenced
id.
Default value:
Undefined.
Used in:
<g> and <x/>.
3.2.2. XML Namespace Attributes
xml:lang
Language
- The "xml:lang" attribute specifies the locale of the text of a given
element.
Value description:
A language code as described in the [RFC 4646]. This
declared value is considered to apply to all elements within the content of the element where
it is specified, unless overridden with another instance of the xml:lang attribute. Unlike
the other TMX attributes, the values for xml:lang are not case-sensitive. For more
information see
the section on
xml:lang in the XML specification.
Default value:
Undefined.
Used in:
<tuv>,
<note>,
<prop>.
xml:space
White spaces - The "xml:space" attribute specifies how white
spaces (ASCII spaces, tabs and line-breaks) should be treated.
Value description:
default or
preserve. The value
default signals that an application's default white-space processing
modes are acceptable for this element; the value
preserve indicates the intent that applications preserve all the white
space. This declared intent is considered to apply to all elements within the content of the
element where it is specified, unless overridden with another instance of the xml:space
attribute. For more information see the section on xml:space in the XML
specification.
Default value:
default.
Used in:
<seg>
4. Content Markup
4.1. Overview
Each TM system uses a different method of marking up the formatting. Formats are
constantly evolving, and new formats will be introduced on a regular basis. Attempting to
collect, interpret, disseminate and maintain finite descriptions of each formatting tag
used at any given time by any of the TM systems is not possible.
The best way to deal with these native codes is to delimit them by a specific set of elements
that convey where they begin and end, and possibly additional information about what they
are (bold, italic, footnote, etc.).
Native codes can be grouped into three categories:
Codes that either begin or end an instruction, and whose beginning and ending
functions both appear within a single segment. For example, an instruction to
begin embolden for a range of words which is then followed in the same segment by an
instruction to end bold formatting.
Codes that either begin or end an instruction, but whose beginning and ending
functions are not both contained within a single segment. For example, an
instruction to embolden text may apply to the first three sentences in a paragraph,
but the instruction to turn off bolding may only appear at the end of the third
sentence. Its beginning instruction is present in the first segment, while its
closing tag is present in the third segment.
Codes that represent self-contained functions that do not require explicit
ending instructions. An image or cross-reference token are examples of these
standalone codes, or codes that have unknown behavior.
Content markup can also be classified, using a different point of view, in two categories:
Native codes that contain embedded translatable text. For example,
the "alt" attribute used in links and images in HTML documents.
Pure native code. For example, the
<br/> tag in HTML.
The element
<sub> is provided to delimit sub-flow text within a
sequence of native codes. For instance, if the text content of a footnote is defined within
the footnote marker code, it may be demarked with the
<sub> element.
4.2. Selection Rules for Inline
Elements
Combining the two classification criteria listed in the previous sub-section, the rules
for selecting the inline tags used to mark up each category of native code sequences
are:
Use
<bpt> and
<ept> elements to enclose paired sequences of
native code that begin and end within the
<seg> element and contain translatable text in
either the initial or final sequence, requiring the use of a
<sub> element.
Use a
<g> element to replace paired native codes that
begin and end within the segment and don't contain translatable text. The replaced
sequences of native codes are stored in a
<tag> element in the
<inline-data> element of the
<header> of the TMX document.
Use a
<ph> element to enclose a standalone sequence of
native code, or a paired code isolated from its partner, that contains translatable
text and requires the use of a
<sub> element.
Use an
<x/> element to replace any standalone sequence
of native code, or paired code isolated from its partner, that doesn't contain
translatable text. The replaced sequences of native codes are stored in a
<tag> element in the
<inline-data> element of the
<header> of the TMX document.
Examples:
Paired codes containing translatable text
Source text:
|
|
<p>link to <a href="www.mysite.com" title="My Site">my web site</a>.</p>
|
Text with content markup:
|
|
<seg>link to <bpt i="0" type="link">&a href="www.mysite.com" title="<sub
type="x-title">My Site</sub>"></bpt>my web site<ept i="0"></a>,</ept>.</seg>
|
Paired codes without translatable text
Source text:
Text with content markup:
|
|
...
<inline-data>
<tag id="id2345" endmrk="}" type="italic">{\i </tag>
...
</inline-data>
</header>
...
<seg>Text in <g xid="id2345" type="italic">italics</g>.</seg>
...
|
Standalone sequence with translatable text
Source text:
|
|
...
This is <img src="john.gif" alt="John's picture"/> John.
...
|
Text with content markup:
|
|
...
<seg>This is <ph type="image"><img src="john.gif" alt="<sub type="alt">John's picture</sub>"/></ph> John.</seg>
...
|
Standalone sequence without translatable text
Source text:
|
|
text displayed in <br/> two lines.
|
Text with content markup:
|
|
...
<inline-data>
<tag id="id457" type="lb"><br/></tag>
...
<seg>text displayed in <x xid="id457" type="lb"/> two lines.</seg>
...
|
5. TMX Compliance
TMX compliance is defined as follow:
Given:
An original document with inline codes (for example an HTML file)
translated by a tool XYZ.
The translation memory of that document saved in TMX format, using
<bpt>,
<ept>,
<g>,
<ph> and
<x/>, elements as
described in section
Selection Rules for
Inline Elements.
The segmentation rules in SRX format used to break blocks of source text
into smaller fragments, either embedded in the TMX document or
referenced in an
<external-file>
element.
Assuming:
The tool XYZ supports
TMX Export if the TMX document created by tool XYZ contains all the
information required to re-create the translated document without loss of text, data or
formatting.
The tool XYZ supports
TMX Import if any TMX document containing all the information required
to re-create the translated document (possibly created by a TMX Export compliant tool), can
be imported in tool XYZ and effectively be used to re-create the translated document without
loss of text, data or formatting.
Tools that offers both import and export features must support both TMX Import and TMX
Export to be TMX compliant.
Whenever possible, the original formatting information should be included in the exported
TMX file, enclosed in <bpt>, <ept>
and <ph> elements or stored in <tag>
elements in the header of the file.
Under especial circumnstances, for example if the source document is a binary file, it may not be
possible to include the original source formatting codes in inline elements. In such cases, the
formatting information necessary to build the translated document must be extracted from the source
document. Nevertheless, all inline elements must still be present in the correct places —acting as
empty placeholders— and they must comply with the Selection Rules
for Inline Elements. When <g> and
<x/> elements are used under these circumstances, they can point their
xid attribute to a common empty <tag> element.
5.1 Validation of TMX Files
A cross-platform utility that validates TMX documents against TMX Schema and also verifies if they
follow the requiremenst described in this document is included as part of TMX 2.0
specifications.
Source code of the validation tool is available for download in OSCAR's web site.
6. Changes Since Previous Version
(Non-Normative)
The main changes in this version (2.0) relative to the previous version (1.4b) are as follows:
TMX 2.0 is based on an XML Schema instead of a DTD.
New elements. The following elements were
incorporated to TMX standard:
<context>,
<g>,
<x/>,
<segmentation>,
<inline-data>,
<tag>,
<internal-file> and
<external-file>.
Removed elements. The following elements
were removed from TMX standard: <it>, <map>,
<ude> and <ut>.
New attributes. The following attributes
were incorporated:
xml:space,
comment,
context-type,
crc,
group,
g-order,
href,
id,
name
xid,
equiv-text
At least two
<tuv> elements are now required in
<tu> element.
A new set of rules for selecting inline elements was designed. See
section the
Selection Rules for Inline
Elements for more details.
Attribute type marked as required in all inline elements.
Changed element
<ph> to require at least one
<sub> child.
Replaced attribute
type with
name in
<prop> element.
Replaced implementation levels 1 and 2 with a unique level of compliance. TMX
files must include all the necessary inline data to re-create the translation of
source documents to be considered TMX compliant. See section
TMX Compliance for more details.
Required uniqueness of tuid attribute within a TMX file.
6.1 Backwards Compatibility
TMX 2.0 was designed to be compatible with TMX 1.4b. It should be possible to upgrade a valid
TMX 1.4b file to 2.0 by:
Removing any DOCTYPE declaration from the file
Changing the value of version attribute from "1.4" to "2.0"
Removing all elements and attributes that were deprecated in TMX 1.4
(i.e. <ut>)
Replacing
<bpt>/<ept>
pairs and
<ph> elements with
<g> and
<x/> as necessary to comply with the Selection Rules for Inline
Elements.
A. Sample Document
<?xml version="1.0" encoding="UTF-8"?>
<tmx version="2.0"
xmlns="http://www.lisa.org/tmx20"
xsi:schemaLocation="http://www.lisa.org/tmx20 tmx20.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xyz="urn:myApps:xyz">
<header creationtool="Sample Creator" creationtoolversion="1.1.1"
segtype="block" o-tmf="unknown" adminlang="en-US" srclang="*all*" datatype="x-sample">
<inline-data>
<tag id="id2345" endmrk="}" type="italic">{\i </tag>
<tag id="id457" type="lb"><br/></tag>
<tag id="id458" type="lb"><br/></tag>
</inline-data>
<segmentation>
<internal-file xyz:myattribute="custom rules">
<!-- Segmentation rules in SRX 2.0 format -->
<srx:srx version="2.0" xmlns:srx="http://www.lisa.org/srx20">
<srx:header segmentsubflows="yes" cascade="yes">
<srx:formathandle type="start" include="no"/>
<srx:formathandle type="end" include="yes"/>
<srx:formathandle type="isolated" include="yes"/>
</srx:header>
<srx:body>
<srx:languagerules>
<srx:languagerule languagerulename="Default">
<!-- Common rule for most languages -->
<srx:rule break="yes">
<srx:beforebreak>[\.\?!]+</srx:beforebreak>
<srx:afterbreak>\s</srx:afterbreak>
</srx:rule>
</srx:languagerule>
</srx:languagerules>
<srx:maprules>
<!-- Common breaking rules -->
<srx:languagemap languagepattern=".*"
languagerulename="Default"/>
</srx:maprules>
</srx:body>
</srx:srx>
</internal-file>
</segmentation>
<!-- Other elements -->
<xyz:other />
</header>
<body>
<!-- Paired codes with translatable text -->
<tu srclang="en-US" datatype="html" tuid="sample1">
<tuv xml:lang="en" datatype="html">
<seg>link to <bpt i="1" type="link"
x="1">&a href="www.mysite.com" title="<sub type="x-title">my
site</sub>"></bpt>my web site<ept i="1"></a>,</ept>.</seg>
</tuv>
<tuv xml:lang="es" datatype="html">
<seg>enlace a <bpt i="1" type="link" x="1">&a
href="www.mysite.com/es" title="<sub type="x-title">mi
sitio</sub>"></bpt>mi sitio web<ept i="1"></a>,</ept>.</seg>
</tuv>
</tu>
<!-- Paired codes without translatable text -->
<tu datatype="rtf">
<context context-type="x-my-context">text formatting options</context>
<tuv xml:lang="en">
<seg>Text in <g xid="id2345" type="italic">italics</g>.</seg>
</tuv>
<tuv xml:lang="fr">
<seg>Texte en <g xid="id2345" type="italic">italiques</g>.</seg>
</tuv>
</tu>
<!-- Standalone sequence with translatable text -->
<tu datatype="html">
<tuv xml:lang="en-US">
<seg>This is <ph type="image"><img src="john.gif" alt="<sub
type="alt">John's picture</sub>"/></ph> John.</seg>
</tuv>
<tuv xml:lang="es">
<seg>Este es <ph type="image"><img src="juan.gif" alt="<sub
type="alt">foto de Juan</sub>"/></ph> Juan.</seg>
</tuv>
</tu>
<!-- Standalone sequence without translatable text -->
<tu>
<tuv xml:lang="en">
<seg>text displayed in <x xid="id457" type="lb" equiv-text="
"/>
two lines.</seg>
</tuv>
<tuv xml:lang="es">
<seg>texto en <x xid="id458" type="lb" equiv-text="
"/> dos
lineas.</seg>
</tuv>
</tu>
<!-- Notes and properties -->
<tu tuid="90293837" creationid="jean-claude" srclang="zh-CN" segtype="phrase">
<note>Salutations</note>
<note>Machine translation</note>
<prop name="mt">web translator</prop>
<tuv xml:lang="en">
<seg>Hello!</seg>
</tuv>
<tuv o-encoding="BIG5" xml:lang="zh-CN">
<note>Enable Unicode support for viewing this entry.</note>
<prop name="srcCodePage">BIG5</prop>
<seg>你好!</seg>
</tuv>
</tu>
<!-- Untranslatable text -->
<tu o-tmf="xliff" creationdate="20060125T210600Z" changedate="20060315T130700Z"
creationid="ted@mail.com">
<tuv xml:lang="en" xml:space="default">
<seg><hi type="protected" comment="product name">Ultrabalancer</hi>
support is excellent.</seg>
</tuv>
<tuv xml:lang="es" xml:space="default">
<seg>El soporte de <hi type="protected">Ultrabalancer</hi> es
excelente.</seg>
</tuv>
</tu>
<!-- grouped segments -->
<tu group="numbers" g-order="1" datatype="plaintext"
creationdate="20060125T210600Z">
<tuv xml:lang="fr">
<seg>un</seg>
</tuv>
<tuv xml:lang="de">
<seg>eine</seg>
</tuv>
<tuv xml:lang="en">
<seg>one</seg>
</tuv>
</tu>
<tu group="numbers" g-order="2" datatype="plaintext">
<tuv xml:lang="de">
<seg>zwei</seg>
</tuv>
<tuv xml:lang="fr">
<seg>deux</seg>
</tuv>
<tuv xml:lang="en">
<seg>two</seg>
</tuv>
</tu>
<tu group="numbers" g-order="3" datatype="plaintext">
<tuv xml:lang="en">
<seg>three</seg>
</tuv>
<tuv xml:lang="de">
<seg>drei</seg>
</tuv>
<tuv xml:lang="fr">
<seg>trois</seg>
</tuv>
</tu>
<!-- Foreign elements -->
<xyz:database>main server</xyz:database>
<xyz:purpose>general</xyz:purpose>
</body>
</tmx>
|
B. XML Schema for TMX
The XML Schma for TMX is available at:
http://www.lisa.org/tmx/tmx20.xsd.
<?xml version="1.0" encoding="UTF-8"?>
<!--
Document : tmx20.xsd
Version : 1.0
Created on : December 2, 2006
Author : rmraya@heartsome.net
Description : This XML Schema defines the structure of TMX 2.0
Status : Preliminary draft
Copyright © The Localisation Industry Standards Association [LISA] 1997-2007.
All Rights Reserved.
-->
<xs:schema xmlns:tmx="http://www.lisa.org/tmx20"
targetNamespace="http://www.lisa.org/tmx20" xml:lang="en"
xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:import namespace="http://www.w3.org/XML/1998/namespace"
schemaLocation="http://www.w3.org/2001/xml.xsd"/>
<!--
==================================================
Restrictions
==================================================
-->
<!-- Restrictions for segtype attribute -->
<xs:simpleType name="segtypes">
<xs:restriction base="xs:token">
<xs:enumeration value="block"/>
<xs:enumeration value="paragraph"/>
<xs:enumeration value="sentence"/>
<xs:enumeration value="phrase"/>
</xs:restriction>
</xs:simpleType>
<!-- Restrictions for xml:space attribute -->
<xs:simpleType name="space">
<xs:restriction base="xs:token">
<xs:enumeration value="default"/>
<xs:enumeration value="preserve"/>
</xs:restriction>
</xs:simpleType>
<!-- Restrictions for assoc attribute -->
<xs:simpleType name="assoc_type">
<xs:restriction base="xs:token">
<xs:enumeration value="p"/>
<xs:enumeration value="f"/>
<xs:enumeration value="b"/>
</xs:restriction>
</xs:simpleType>
<!-- Restrictions for datatype attribute -->
<xs:simpleType name="datatype">
<xs:restriction base="xs:token">
<xs:enumeration value="unknown"/>
<xs:enumeration value="undefined"/>
<xs:enumeration value="alptext"/>
<xs:enumeration value="cdf"/>
<xs:enumeration value="cmx"/>
<xs:enumeration value="cpp"/>
<xs:enumeration value="hptag"/>
<xs:enumeration value="html"/>
<xs:enumeration value="interleaf"/>
<xs:enumeration value="ipf"/>
<xs:enumeration value="java"/>
<xs:enumeration value="javascript"/>
<xs:enumeration value="lisp"/>
<xs:enumeration value="mif"/>
<xs:enumeration value="opentag"/>
<xs:enumeration value="pascal"/>
<xs:enumeration value="plaintext"/>
<xs:enumeration value="pm"/>
<xs:enumeration value="resx"/>
<xs:enumeration value="rtf"/>
<xs:enumeration value="sgml"/>
<xs:enumeration value="stf-f"/>
<xs:enumeration value="stf-i"/>
<xs:enumeration value="transit"/>
<xs:enumeration value="vbscript"/>
<xs:enumeration value="winres"/>
<xs:enumeration value="xliff"/>
<xs:enumeration value="xml"/>
<xs:enumeration value="xptag"/>
</xs:restriction>
</xs:simpleType>
<!-- Restrictions for type attribute when used in <bpt> or <g> -->
<xs:simpleType name="paired_type">
<xs:restriction base="xs:token">
<xs:enumeration value="bold"/>
<xs:enumeration value="color"/>
<xs:enumeration value="dulined"/>
<xs:enumeration value="font"/>
<xs:enumeration value="italic"/>
<xs:enumeration value="link"/>
<xs:enumeration value="scap"/>
<xs:enumeration value="struct"/>
<xs:enumeration value="ulined"/>
<xs:enumeration value="xliff-bpt"/>
<xs:enumeration value="xliff-g"/>
</xs:restriction>
</xs:simpleType>
<!-- Restrictions for type attribute when used in <ph> or <x/> -->
<xs:simpleType name="placeholder_type">
<xs:restriction base="xs:token">
<xs:enumeration value="index"/>
<xs:enumeration value="date"/>
<xs:enumeration value="time"/>
<xs:enumeration value="fnote"/>
<xs:enumeration value="enote"/>
<xs:enumeration value="alt"/>
<xs:enumeration value="image"/>
<xs:enumeration value="pb"/>
<xs:enumeration value="lb"/>
<xs:enumeration value="cb"/>
<xs:enumeration value="inset"/>
<xs:enumeration value="xliff-bx"/>
<xs:enumeration value="xliff-ex"/>
<xs:enumeration value="xliff-it"/>
<xs:enumeration value="xliff-ph"/>
<xs:enumeration value="xliff-x"/>
</xs:restriction>
</xs:simpleType>
<!-- Restrictions for type attribute when used in <hi> -->
<xs:simpleType name="term_type">
<xs:restriction base="xs:token">
<xs:enumeration value="abbrev"/>
<xs:enumeration value="abbreviated-form"/>
<xs:enumeration value="abbreviation"/>
<xs:enumeration value="acronym"/>
<xs:enumeration value="appellation"/>
<xs:enumeration value="collocation"/>
<xs:enumeration value="common-name"/>
<xs:enumeration value="datetime"/>
<xs:enumeration value="equation"/>
<xs:enumeration value="expanded-form"/>
<xs:enumeration value="formula"/>
<xs:enumeration value="head-term"/>
<xs:enumeration value="initialism"/>
<xs:enumeration value="international-scientific-term"/>
<xs:enumeration value="internationalism"/>
<xs:enumeration value="logical-expression"/>
<xs:enumeration value="materials-management-unit"/>
<xs:enumeration value="name"/>
<xs:enumeration value="near-synonym"/>
<xs:enumeration value="part-number"/>
<xs:enumeration value="phrase"/>
<xs:enumeration value="phraseological-unit"/>
<xs:enumeration value="protected"/>
<xs:enumeration value="romanized-form"/>
<xs:enumeration value="set-phrase"/>
<xs:enumeration value="short-form"/>
<xs:enumeration value="sku"/>
<xs:enumeration value="standard-text"/>
<xs:enumeration value="symbol"/>
<xs:enumeration value="synonym"/>
<xs:enumeration value="synonymous-phrase"/>
<xs:enumeration value="term"/>
<xs:enumeration value="transcribed-form"/>
<xs:enumeration value="transliterated-form"/>
<xs:enumeration value="truncated-term"/>
<xs:enumeration value="variant"/>
</xs:restriction>
</xs:simpleType>
<!-- Restrictions for context-type attribute -->
<xs:simpleType name="context_type">
<xs:restriction base="xs:token">
<xs:enumeration value="database"/>
<xs:enumeration value="element"/>
<xs:enumeration value="elementtitle"/>
<xs:enumeration value="linenumber"/>
<xs:enumeration value="numparams"/>
<xs:enumeration value="paramnotes"/>
<xs:enumeration value="record"/>
<xs:enumeration value="recordtitle"/>
<xs:enumeration value="sourcefile"/>
</xs:restriction>
</xs:simpleType>
<!-- Restrictions for user-defined attribute values -->
<xs:simpleType name="Custom">
<xs:restriction base="xs:string">
<xs:pattern value="x-[^\s]+"/>
</xs:restriction>
</xs:simpleType>
<!--
==================================================
Structural Elements
==================================================
-->
<!-- Base Document Element -->
<xs:element name="tmx">
<xs:complexType>
<xs:sequence>
<xs:element ref="tmx:header"/>
<xs:element ref="tmx:body"/>
</xs:sequence>
<xs:attribute name="version" use="required">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="2.0"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- Body -->
<xs:element name="body">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="tmx:tu"/>
<xs:any maxOccurs="unbounded" minOccurs="0" namespace="##other" processContents="lax"/>
</xs:sequence>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- Context Information -->
<xs:element name="context">
<xs:complexType mixed="true">
<xs:attribute name="context-type" use="required">
<xs:simpleType>
<xs:union memberTypes="tmx:context_type tmx:Custom"/>
</xs:simpleType>
</xs:attribute>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- External File -->
<xs:element name="external-file">
<xs:complexType>
<xs:attribute name="href" use="required"/>
<xs:attribute name="crc"/>
<xs:attribute name="uid"/>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- Header -->
<xs:element name="header">
<xs:complexType>
<xs:sequence>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="tmx:note"/>
<xs:element ref="tmx:prop"/>
</xs:choice>
<xs:element minOccurs="0" ref="tmx:inline-data"/>
<xs:element minOccurs="0" ref="tmx:segmentation"/>
<xs:any maxOccurs="unbounded" minOccurs="0" namespace="##other"
processContents="lax"/>
</xs:sequence>
<xs:attribute name="creationtool" use="required"/>
<xs:attribute name="creationtoolversion" use="required"/>
<xs:attribute name="segtype" use="required" type="tmx:segtypes"/>
<xs:attribute name="o-tmf" use="required"/>
<xs:attribute name="adminlang" use="required"/>
<xs:attribute name="srclang" use="required"/>
<xs:attribute name="datatype" use="required">
<xs:simpleType>
<xs:union memberTypes="tmx:datatype tmx:Custom"/>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="o-encoding"/>
<xs:attribute name="creationdate"/>
<xs:attribute name="creationid"/>
<xs:attribute name="changedate"/>
<xs:attribute name="changeid"/>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- Inline Data -->
<xs:element name="inline-data">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" ref="tmx:tag"/>
</xs:sequence>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- Internal File -->
<xs:element name="internal-file">
<xs:complexType mixed="true">
<xs:sequence>
<xs:any maxOccurs="unbounded" minOccurs="0" namespace="##other"
processContents="lax"/>
</xs:sequence>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- Note -->
<xs:element name="note">
<xs:complexType mixed="true">
<xs:attribute name="o-encoding"/>
<xs:attribute ref="xml:lang"/>
<xs:attribute name="creationdate"/>
<xs:attribute name="creationid"/>
<xs:attribute name="changedate"/>
<xs:attribute name="changeid"/>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- Property -->
<xs:element name="prop">
<xs:complexType mixed="true">
<xs:attribute name="name" use="required"/>
<xs:attribute ref="xml:lang"/>
<xs:attribute name="o-encoding"/>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- Segment -->
<xs:element name="seg">
<xs:complexType mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="tmx:bpt"/>
<xs:element ref="tmx:ept"/>
<xs:element ref="tmx:ph"/>
<xs:element ref="tmx:hi"/>
<xs:element ref="tmx:x"/>
<xs:element ref="tmx:g"/>
</xs:choice>
<xs:attribute ref="xml:space" default="default"/>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- Segmentation -->
<xs:element name="segmentation">
<xs:complexType>
<xs:choice>
<xs:element ref="tmx:internal-file"/>
<xs:element ref="tmx:external-file"/>
</xs:choice>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- Tag -->
<xs:element name="tag">
<xs:complexType mixed="true">
<xs:attribute name="id" use="required" type="xs:ID"/>
<xs:attribute name="endmrk"/>
<xs:attribute name="type" use="required">
<xs:simpleType>
<xs:union memberTypes="tmx:paired_type tmx:placeholder_type tmx:Custom"/>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="o-encoding"/>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- Translation Unit -->
<xs:element name="tu">
<xs:complexType>
<xs:sequence>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="tmx:note"/>
<xs:element ref="tmx:prop"/>
<xs:element ref="tmx:context"/>
</xs:choice>
<xs:element ref="tmx:tuv" minOccurs="2" maxOccurs="unbounded"/>
<xs:any maxOccurs="unbounded" minOccurs="0" namespace="##other"
processContents="lax"/>
</xs:sequence>
<xs:attribute name="tuid"/>
<xs:attribute name="o-encoding"/>
<xs:attribute name="datatype" default="unknown" type="tmx:datatype"/>
<xs:attribute name="usagecount"/>
<xs:attribute name="lastusagedate"/>
<xs:attribute name="creationtool"/>
<xs:attribute name="creationtoolversion"/>
<xs:attribute name="creationdate"/>
<xs:attribute name="creationid"/>
<xs:attribute name="changedate"/>
<xs:attribute name="segtype" type="tmx:segtypes"/>
<xs:attribute name="changeid"/>
<xs:attribute name="o-tmf"/>
<xs:attribute name="srclang"/>
<xs:attribute name="group"/>
<xs:attribute name="g-order">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="1"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- Translation Unit Variant -->
<xs:element name="tuv">
<xs:complexType>
<xs:sequence>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="tmx:note"/>
<xs:element ref="tmx:prop"/>
</xs:choice>
<xs:element ref="tmx:seg"/>
<xs:any maxOccurs="unbounded" minOccurs="0" namespace="##other" processContents="lax"/>
</xs:sequence>
<xs:attribute ref="xml:lang" use="required"/>
<xs:attribute name="o-encoding"/>
<xs:attribute name="datatype" default="unknown" type="tmx:datatype"/>
<xs:attribute name="usagecount"/>
<xs:attribute name="lastusagedate"/>
<xs:attribute name="creationtool"/>
<xs:attribute name="creationtoolversion"/>
<xs:attribute name="creationdate"/>
<xs:attribute name="creationid"/>
<xs:attribute name="changedate"/>
<xs:attribute name="o-tmf"/>
<xs:attribute name="changeid"/>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!--
==================================================
Content Markup
==================================================
-->
<!-- Begin Paired Tag -->
<xs:element name="bpt">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="tmx:sub"/>
</xs:sequence>
<xs:attribute name="i" use="required">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="1"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="x">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="1"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="equiv-text"/>
<xs:attribute name="type" use="required">
<xs:simpleType>
<xs:union memberTypes="tmx:paired_type tmx:Custom"/>
</xs:simpleType>
</xs:attribute>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- End Paired Tag -->
<xs:element name="ept">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="tmx:sub"/>
</xs:sequence>
<xs:attribute name="i" use="required">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="1"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="equiv-text"/>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- Generic Group Placeholder -->
<xs:element name="g">
<xs:complexType mixed="true">
<xs:attribute name="xid" use="required" type="xs:IDREF"/>
<xs:attribute name="type" use="required">
<xs:simpleType>
<xs:union memberTypes="tmx:paired_type tmx:Custom"/>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="equiv-text"/>
<xs:attribute name="x">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="1"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- Highlight -->
<xs:element name="hi">
<xs:complexType mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="tmx:bpt"/>
<xs:element ref="tmx:ept"/>
<xs:element ref="tmx:ph"/>
<xs:element ref="tmx:x"/>
<xs:element ref="tmx:g"/>
<xs:element ref="tmx:hi"/>
</xs:choice>
<xs:attribute name="x">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="1"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="type" use="required">
<xs:simpleType>
<xs:union memberTypes="tmx:term_type tmx:Custom"/>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="comment"/>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- Placeholder -->
<xs:element name="ph">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element minOccurs="1" maxOccurs="unbounded" ref="tmx:sub"/>
</xs:sequence>
<xs:attribute name="x">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="1"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="assoc" type="tmx:assoc_type"/>
<xs:attribute name="equiv-text"/>
<xs:attribute name="type" use="required">
<xs:simpleType>
<xs:union memberTypes="tmx:placeholder_type tmx:Custom"/>
</xs:simpleType>
</xs:attribute>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- Subflow -->
<xs:element name="sub">
<xs:complexType mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="tmx:bpt"/>
<xs:element ref="tmx:ept"/>
<xs:element ref="tmx:ph"/>
<xs:element ref="tmx:hi"/>
<xs:element ref="tmx:g"/>
<xs:element ref="tmx:x"/>
</xs:choice>
<xs:attribute name="datatype" default="unknown" type="tmx:datatype"/>
<xs:attribute name="type" use="required">
<xs:simpleType>
<xs:union memberTypes="tmx:paired_type tmx:placeholder_type tmx:term_type tmx:Custom"/>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="x">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="1"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
<!-- Generic Placeholder -->
<xs:element name="x">
<xs:complexType>
<xs:attribute name="xid" use="required" type="xs:IDREF"/>
<xs:attribute name="equiv-text"/>
<xs:attribute name="type" use="required">
<xs:simpleType>
<xs:union memberTypes="tmx:placeholder_type tmx:Custom"/>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="x">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="1"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:anyAttribute namespace="##any" processContents="lax"/>
</xs:complexType>
</xs:element>
</xs:schema>
<!-- End -->
|
C. Glossary
DTD
An
SGML document has an associated Document Type Definition
(DTD) that specifies the rules for the structure of the document. Several industries have
standardized on various DTDs for the different types of documents that they share.
OSCAR
LISA special interest group (Open Standards for Container/Content Allowing
Re-use).
SGML
SGML stands for Standard Generalized Markup Language. An ISO standard (ISO-8879) allows
the definition of structured formats. SGML is not a format by itself, but a set of rules to
define formats. SGML mark-up systems are defined in Document Type Definition files ( DTDs ).
UTC
UTC stands for Coordinated Universal Time.
XML
XML stands for Extensible Markup Language. XML is a simplified and restricted subset of
SGML.
D. References
Normative
[IANA Charsets]
IANA Names
for Character Sets. IANA (Internet Assigned Numbers Authority), Aug
2001
[ISO 8601]
Representation of dates and times. ISO (International Organization for
Standardization), Dec 2000.
[RFC 4646]
RFC 4646 Tags for
the Identification of Languages. IETF (Internet Engineering Task Force),
September 2006. This document, in combination with RFC 4647, replaces RFC 3066, which
replaced RFC 1766.
[SRX 2.0]
Segmentation Rules
Exchange (SRX) is an XML-based standard for description of the ways in which
translation and other language-processing tools segment text for processing.
[XML 1.0]
Extensible Markup Language
(XML) 1.0 Second Edition. W3C (World Wide Web Consortium), Oct 2000.
[XML Namespaces]
Namespaces in
XML. W3C (World Wide Web Consortium), August 2006.
Non-Normative
[ISO]
International Organization for
Standardization Web site.
[LISA]
Localisation Industry Standards
Association Web site.
[Unicode]
Unicode Consortium Web
site.
[W3C]
World Wide Web Consortium Web
site.
|