
LISA Version: Copyright © Localization Industry Standards Association (LISA) 2007
ISO Version: Copyright © International Organization for Standardization (ISO) 2007
All Rights Reserved.
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
LISA (the Localization Industry Standards Association – http://www.lisa.org) is the standards organization for the globalization industry. LISA’s OSCAR (Open Standards for Container/content Allowing Reuse) Special Interest Group develops XML-based standards for automated language-processing in the areas of globalization, internationalization, localization, and translation, including standards for translation memory, terminology, text memory, word/character counts, and other related areas.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of ISO technical committees is to prepare International Standards. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Publication as an International Standard requires approval by at least 75 % of the member bodies casting a vote. The main task of LISA’s OSCAR Special Interest Group is to develop standards to facilitate and automate the globalization of products and services in a way that supports local language and culture conventions. Publication as an OSCAR standard requires approval by the OSCAR steering committee.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. Neither LISA nor ISO shall be held responsible for identifying any or all such patent rights.
ISO 30042/LISA TBX was prepared by LISA OSCAR and Technical Committee ISO/TC 37, Terminology and other language and content resources, Subcommittee SC 3, Computer applications in terminology.
TBX and the TBX logo are registered trademarks of LISA, and the TBX logo is subject to terms of use as defined by LISA. LISA maintains copyright on TBX, with ISO maintaining copyright on ISO 30042. These two standards are subject to joint maintenance by a team of ISO TC37 and LISA OSCAR members.
The TBX (TermBase eXchange) format facilitates the representation of highly-structured terminological information derived from either human-oriented terminological data collections (terminologies) or natural language processing (NLP) lexicons. The information represented in TBX should be concept-oriented, with the understanding that real-life practicalities may dictate variations. The terms in a single entry are assumed to be synonymous unless otherwise noted. TBX allows the representation of various kinds of information about individual terms that distinguish them from other terms in the same concept entry. It also allows for the documentation of directionality between equivalents or other departures from the ideal that all terms in an entry are totally equivalent in both directions. Once terminological information is represented in TBX, it can serve multiple purposes, including the analysis, dissemination, and exchange of terminological data collections or subsets thereof.
TBX is an open XML-based standard format for terminological data. This standard provides a number of benefits so long as TBX files can be imported into and exported from most software packages that include a terminological database. This capability greatly facilitates the flow of terminological information throughout the information cycle both inside an organization and with outside service providers. In addition, terminology that is made available to the general public becomes much more accessible to human users and can be more easily integrated into existing terminological resources.
This OSCAR/ISO standard defines an XML-based application referred to as the TBX (TermBase exchange) format. The intended audience for this document consists of three groups: (1) programmers and analysts who desire to develop software applications that process TBX-compliant data files, for example, by converting them to files in some other format or by deriving TBX-compliant files from some other format; (2) terminologists and other language specialists who desire to analyze a terminological data collection for representation in TBX or tounderstand a TBX file, and (3) managers who desire to obtain an overview of the TBX format. A TBX data file is a well-formed XML document, which may be storedas a file on some medium, such as magnetic or optical disc, or transmitted as a text string over a network.
Each of these three groups should be familiar with this Introduction.In addition to an understanding of this Introduction, terminologistsand other language specialists need a basic understanding of thestructure of XML documents and the data categories contained in the terminology subset ofthe TC 37 Data Category Registry (DCR), which is described in ISO 12620 DIS 2006.Besides having or obtaining this background information, they shouldstudy the body of this OSCAR/ISO standard (Sections 1-8) and Annexes C and D, but they do not need the ability to write or modify XML DTDs orschemas. See the Bibliography section of this document for further backgroundinformation on data categories and the DCR [1, 2]. Programmers and analystsdeveloping software applications to process TBX documents must have athorough knowledge of XML and familiarity with the entirety of thisOSCAR/ISO standard and the various standards on which it is based.
For varioustypes of machine processing, including transmission over the Internet,terminological data can be represented using XML. The TBX formatdefined by this standard is an XML application designed tosupport machine processing of terminological data in various computerenvironments, including standalone computers, the Internet, andintranets.
TBX is designed to support the analysis, representation, dissemination,and exchange of information from terminologicaldatabases (termbases). It is intended to qualify as a TML (Terminology Markup Language) asdefined in the Terminology Markup Framework (TMF) specified in ISO 16642:2003. Inaddition, TBX is intended to support the extraction and merging of information from other, non-TMF-compliant, formats,although these processes may involve some information loss.
The following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
For the purposes of this standard, the following terms and definitions apply:
The terminological framework for TBX is provided by three established international standards (ISO DIS 12620:2006, ISO 12200:1999, and ISO 16642:2003). ISO DIS 12620 provides a framework for the creation of a Data Category Registry that lists data categories with standardized names that function as data element types that can serve as either field names or as predefined picklist values (permissible instances). ISO 12200, also known as MARTIF, provides the basis for the core structure of TBX. TMF includes a structural metamodel for TMLs (Terminology Markup Languages) in general, regardless of which XML style of representation is used.
TBX is a format that qualifies as a TML by complying with the requirements of TMF. It is based on the TMF structural metamodel; it specifies a Data Category Selection (DCS) from the ISO TC 37 Data Category Registry (DCR) that is a large subset of the DCS for the Terminology domain; and it and adopts an XML style compatible with ISO 12200. Thus, TBX is a standards-based format, being an XML application based on ISO 12200, ISO 12620, and TMF. Its intended uses include analyzing, representing, manipulating, and sharing terminological data.
Not all terminology environments use the same set of data categories. Therefore, TBX is a flexible format that allows user groups to specify their own Data Category Selections as subsets of the TBX DCS and other constraints on the core structure, so long as data categories from the ISO DCR are used whenever possible. These constraints are represented in a TBX eXtensible Constraint Specification (XCS) file. The following figure shows how the flexibility of TBX is based on the classic form-content distinction. Each combination of the core structure DTD/schema (which defines the form) and a particular XCS file (which defines the allowed content) results in an application-specific variant of TBX.
Figure 1. Using TBX Basic and XCS to produce TBX variants
Each variant of TBX is a TML within TMF (Terminology Markup Framework) in compliance with ISO 16642. Since each TML is interoperable with every other TML, limited only by incompatibilities in the choice of data categories, TBX XML documents can be converted to XML documents in other formats within TMF. However, interoperability between TBX and formats that do not qualify as TMLs is not guaranteed. Nevertheless, limited interoperability is possible between non-TML formats such as OLIF and TMX.
Even though TBX supports customization according to user needs, there are limits to what variations can be defined by an XCS file; otherwise, certain variations would not qualify as TMLs according to ISO 16642. All acceptable variations on TBX have the same core structure. They differ mainly with respect to the data categories from the TC 37 DCR that are allowed by a particular user group.
Some related formats that are not TMLs but still involve some form of terminology are OLIF and TMX (Translation Memory eXchange). The OLIF format is primarily designed to represent machine-translation lexicons. The TMX format is primarily designed to represent translation-memory data. However, there is a connection between machine translation and translation memory. The rapidly evolving information, typically subject-field specific nominals, in a machine-translation lexicon can be treated as terminology. Function words and general vocabulary are not terms, but their entries in a machine-translation lexicon are usually rather stable. Furthermore, the segments of text in a translation-memory database can often be treated as terminological units. Therefore, it is anticipated that filters will become available between TBX and OLIF and between TBX and TMX in order to exchange terminology among these formats. This exchange will require extended TBX XCS files that include the data categories present in OLIF and TMX that are not found in the master TBX XCS file.
For an XML data file to be a TBX file, it must meet the following three criteria:
In practice, TBX documents are typically created by an export routine in some piece of software. They may be displayed, for example, using a browser and an XSLT stylesheet, or be processed by an import routine that is part of some other piece of software. So long as the XML documents that are created and processed are TBX-compliant, it is not necessary for a human to inspect them, and no formal conformance check is necessary. However, in some circumstances, such as dealing with suspected data corruption, TBX-compliance can be checked using TBX-validation software.
The first two aspects of TBX-compliance can be mostly checked by validating the TBX document against the DTD of the core structure using a validating XML parser, and the third aspect can be checked either by using a custom software application that checks for adherence to the constraints declared in the XCS module or by validating against a comprehensive XML schema generated from the information in the core structure plus the information in the XCS module. A freeware custom software application that checks against the core structure and the current XCS was developed by the SALT project [7] and is available at the LISA (www.lisa.org) website. One aspect of TBX-compliance that cannot be checked by a general-purpose parser is the proper use of meta-markup elements.
As noted above, it is possible to validate whether any given well-formed XML data file is TBX-compliant. However, this validation is a formal process and does not ensure that appropriate terminological methods have been used to create the data or that the content of the data categories is accurate. Validation can determine, for instance, that the value of an XML element such as descrip is not one of the allowed values, which include subject field and definition, but validation cannot detect an inaccurate definition. See Figure 2 for examples of these distinctions in TBX. The first segment of the figure is not well-formed, since the first <descrip> element has a spelling error in the end tag and since the <term> element has no closing tag at all. The second segment is well-formed but not core-structure valid, since the core-structure module of TBX does not allow for a <desskrip> tag. The third segment is valid according to the TBX core structure DTD but does not adhere to the Default XCS of TBX, since there is no TBX data category called "conflagration" in the default XCS file. The fourth segment is core-structure valid and XCS-adherent but not accurate, since a kitten is not a dog or wolf.
There are three levels of certification for a TBX-aware software application:
To achieve level one certification, the software application must produce and accept TBX files that are well-formed and core-structure-valid XML and that adhere to at least one XCS file and, of course, detect when purported TBX files are not well-formed or not core-structure valid or not XCS-adherent. It is, of course, hoped that TBX files will be accurate, but this quality check is beyond the scope of software certification testing. To require accuracy would be akin to requiring that a word processor only produce pragmatically coherent prose.
To achieve level two certification, the software application must achieve level one certification and must be able to check for adherence to the constraints in the default XCS file specified in this ISO/OSCAR document. Thus, level two certification supports a degree of blindness in that it can import TBX files from any outside source that also achieves level two certification.
To achieve level three certification, the software application must achieve level two certification and be able to check adherence to a comprehensive XCS that supports a lossless roundtrip for the termbase in the application. Thus, once the information in the termbase has been exported to TBX, the termbase can be emptied and subsequently regenerated from the information in the TBX file. Furthermore, the comprehensive XCS must be built on the default XCS of level two so that non-default data categories are included only when necessary and so that non-default data categories are taken from the TC 37 DCR where possible.
The following is an example of a simple but complete TBX document. The numbers in square brackets to the left of certain lines are not part of the TBX document. They serve as footnote numbers to the comments below.
The following sample TBX document will not validate unless the line numbers in square brackets are removed.
[1] <?xml version='1.0'?>
<!DOCTYPE martif SYSTEM "TBXcoreStrucV01.dtd">
[2] <martif type='TBX' xml:lang='en' >
[3] <martifHeader>
<fileDesc><sourceDesc><p>from an Oracle corporation termBase</p></sourceDesc></fileDesc>
<encodingDesc><p type='DCSName'>TBXmasterXCSV01.XML</p></encodingDesc>
</martifHeader>
[4] <text> <body>
[5] <termEntry id='eid-Oracle-67'>
[6] <descrip type='subjectField'>manufacturing</descrip>
[7] <descrip type='definition'>A value between 0 and 1 used in …</descrip>
[8] <langSet xml:lang='en'>
[9] <tig>
<term id='tid-Oracle-67-en1'>alpha smoothing factor</term>
[10] <termNote type='termType'>fullForm</termNote>
[11] </tig>
[12] </langSet>
[13] <langSet xml:lang='hu'>
[14] <tig>
<term id='tid-Oracle-67-hu1'>Alfa simítási
tényezõ </term>
</tig>
[15] </langSet>
[16] </termEntry>
[17] </body> </text>
[18] </martif>
Only a minimal acquaintance with XML is assumed in the following explanation. For key TBX elements, the correspondence to the structural component of the metamodel in ISO 16642 (TMF) is indicated.
This sample TBX entry has several properties:
This section defines the core structure of TBX informally, particularly for a human analyst who is either seeking to understand a TBX document or to analyze source or target terminological data in order to prepare a mapping that a programmer can use to write an automatic conversion routine from the source format to, for example, TBX or from TBX to the target format.
The highest-level XML element in a TBX document is the martif element, which consists of a martifHeader element and a text element. (See Figure 3.) These element names are found in ISO 12200 and have roots in the Text Encoding Initiative, which submitted the MARTIF project to ISO Technical Committee 37 for standardization.
The text element in Figure 3 consists of terminological entries, which together make up the TBX body element, and complementary Information (a meta-model object class). In TBX, complementary information is found in the front and back elements.
The martifHeader element corresponds to global information in the meta-model and consists of a description of the whole terminological data collection (in the fileDesc element), information about the applicable XCS file and unusual character encoding (in the encodingDesc element), and a history of major revisions to the collection (in the revisionDesc element). The only time character encoding information needs to be included in the header is when non-Unicode characters are included, that is, either references beyond the Basic Multilingual Plane of ISO 10646 or non-ISO-10646 characters documented in the ude element, which functions as it does in TMX.
A question mark after an element in the box-and-line diagrams below indicates that the element is optional. A plus sign after an element indicates that one or more occurrences of the element are allowed. A plus sign in a box by itself indicates that the structure connected to the right of the box occurs one or more times. The small symbols in the corners of the boxes are artifacts of the drawing process and can be ignored.
See Annex A for more detail on these elements.
Each terminological concept entry in the body element is called a termEntry (see Figure 3) and follows the structure of the meta-model.
The auxInfo element in Figure 4 corresponds to pieces of terminology-related information that can be associated with any one of three levels: the Terminological Entry level (termEntry in TBX, i.e., the concept level), the Language-section level (LangSet in TBX), and the Term-section level (ntig, or its simplified version, tig, in TBX). The termNote and termNoteGrp elements at the Term-section level are also terminology-related information but can only appear at the Term-section level and below. The termCompList element corresponds to the term component section object class of the meta-model. Although the model seems to indicate that a concept entry is srictly hierarchical (one concept, represented by fully synonymous terms in verious languages), the transferComment data category (a type of ternNote) allows direct links between specific terms with a note that indicates various departures from the ideal of a terminology data collection consisting of a set of concept entries in which all the terms in a given entry are perfectly synonomous and reversible.
Figure 4. The levels of a terminological entry
In TBX, auxInfo consists of any combination of the following elements:
descrip, descripGrp, admin, adminGrp, transacGrp, note, ref, and xref
A ref element is a cross-reference that points somewhere inside the martif element. An xref element is a cross-reference that points to an external object using a URI (a URL or other Web address). This mechanism provides some aspects of theXlink standards from the W3C. A note element, obviously, is a note. These three elements appear at various levels to allow the creation of links (ref and xref) and the recording of supplementary information (note).
A transacGrp element gives information about a transaction. The data category definition for ISO TC 37 data category /terminology management functions/ states that the two terminology management functions involved in a transaction are date and responsibility. A date is specified by a date element, and a responsibility is specified by a transacNote element. Thus, a transacGrp contains a transac element that describes the transaction, accompanied by any combination of transacNote, date, note, ref, and xref elements that apply to the transaction. Any date in TBX must appear within a transacGrp, even if an implicit transaction must be made explicit.
An adminGrp element is similar to a transacGrp in that it contains information pertaining to another element, in this case an admin rather than a transac, specifically, a combination of adminNote, note, ref, and xref. An admin is a simplified adminGrp in which there is just a single admin element and the adminGrp container has been omitted.
A descripGrp element consists of a descrip element followed by any combination of descripNote, admin, adminGrp, transacGrp, note, ref, and xref elements.
The descrip and admin elements are examples of metadata categories in TBX. Each instance of a metadata category in TBX is an element that is specialized by the value of its type attribute. The various instantiations of the metadata categories are given in section 9. The TBX DCS file restricts each instantiation of a descrip to certain levels.
A termNoteGrp element, like other Grp elements, consists of a base element, in this case a termNote, and auxiliary information, in this case, admin, adminGrp, transacGrp, note, ref, and xref elements. A descrip element contains concept-related data categories that do not apply to the term itself; consequently, a comparison with descripGrp shows that the difference is that there are no descrip elements in a termNoteGrp.
A termCompList element shows the internal composition of a term and consists of a combination of termCompGrp and, in the simplified case, termComp elements. A termCompGrp, consistent with the pattern set by other ...Grp elements, consists of a termComp element and a combination of termNote, termNoteGrp, admin, adminGrp, transacGrp, note, ref, and xref elements that apply to it. Each termComp element contains some component of a term, such as one of the words of which it is composed.
In TBX, elements such as descrip, descripNote, admin, adminNote, transac, transacNote, termNote, note, ref, and xref, contain text. Sometimes, the permissible values of the element are restricted to a picklist. In these cases, the field name is considered to be a closed data category and the picklist values are called its permissible instances or its value domain. In other cases, the element can contain free text. There are two types of free text in TBX: plain and note. Plain text (#PCDATA) is defined by the XML specification. It contains no elements, only characters and character entities. Note text, which is used in definitions and contextual examples and similar elements, allows the following additional embedded elements besides hi: foreign, bpt, ept, it, ph, and ut.
In TBX, every element containing free text must have an explicit or implicit language indicated by an explicit or inherited xml:lang attribute. In TBX, the xml:lang attribute does not apply to attributes and their values, since these attributes all have picklist values that are codes, not free text.
In TBX, all text is in Unicode. There are three allowable encodings of Unicode: UTF-8, UCS-2, and seven-bit ASCII with non-ASCII characters represented as hex character references to their Unicode code point.
The foreign element is used to mark a segment of text that is in a different language from the surrounding text, e.g., "the French word
pamplemousse
is called a grapefruit in English." A hi element highlights a segment of text and optionally points to another element. One use of hi is to mark an entailed term inside a definition or other text field.
The five elements, bpt, ept, it, ph, and ut, are meta-markup tags that are used to mark up (i.e., encapsulate) embedded non-TBX markup to distinguish it from text. They allow TBX elements to contain various kinds of other markup (such as html or text processing markup) that needs to be retained but should not necessarily be processed during terminology management functions. Any such enclosed markup is modified so that start-tag characters ("<") become entities (<) and ampersands become entities (&). If a piece of markup to be encapsulated consists of two paired pieces of markup, such as the markup used to show that a piece of text is to be in bold or italics, then bpt and ept (begin and end paired tags) are used. If the markup to be encapsulated consists of one piece that would be paired except that the other piece was cut off and appears outside the current element, then an it (isolated tag) is used. If the piece of markup to be encapsulated stands on its own, marking a place such as a footnote, then ph (placeholder) is used. If the categorization of the piece of markup is unknown, then ut (unknown tag) is used. (These markup conventions reflect procedures adopted in the OSCAR TMX environment.)
Suppose one has the following segment of text to put into an XML element in TBX:
We need a big dog.
The marked-up text underlying this presentation might be:
We need a <bold> big </bold> dog.
This is not a problem for meta-markup tags. One can express the entire text chunk in a TBX element as follows:
We need a <bpt i='1'><bold></bpt> big <ept i='1'></bold></ept> dog.
Then it is possible to regenerate the original segment back by removing the meta-markup tags and converting any "<" inside a meta-markup tag back to "<".
Now consider the following segment, which uses SGML markup:
We need a big but < 50 pound dog
which might have the following underlying SGML markup:
We need a <bold> big but < 50 pound </bold> dog
(i.e., a "big but less-than-fifty-pound dog" in which the less-than sign "<" has already been converted to an SGML entity in the source segment before placing it into TBX, since in this case the less-than sign is a literal rather than an escape character).
One would express this character in a TBX segment as follows:
We need a <bpt i='1'><bold></bpt> big but &lt; 50 pound <ept i='1'></bold></ept> dog.
Then, when the original segment is reconstructed, we will get what we started with, since the & will be converted back to an ampersand.
As noted above, HTML tags are one kind of markup that can be enclosed inside meta-markup elements. This allows the markup to be retained and processed during display or import without unduly complicating the core structure by including the XHTML DTD in the TBX core structure. Any kind of markup, including RTF, can be encapsulated in meta-markup tags and later retrieved without loss of information.
The metadata categories of TBX are as follows. Each of them can potentially be given multiple instantiations in an XCS module, and each instantiation specifies one data category. In TBX, the specific data category instantiation is indicated by the value of a type attribute (e. g. <descrip type='definition'>). The determination of the data category associated with a metadata category can in some cases derive its value from either a superordinate element (inheritance) or in some special cases, from a subordinate category (reverse inheritance).
In general, a …Grp element in TBX receives the data category of the first element of the group, and all the elements of a …List element inherit the data category of the list. If the …Grp elements were not optional in the simple case of a single element, then the data category would be specified directly on the …Grp element.
A term is not formally a metadata category in TBX, but the termType data category used with a termNote element is used to specify term type, thus rendering a term element an indirect metadata category.
The main attributes used in TBX are xml:lang (language), type, id (to identify an element uniquely within the XML document), and target (to point to an ID). Additional attributes are found in Annex A.
The value of the xml:lang attribute inherits downward through the implied tree structure of the XML document unless overridden by another xml:lang attribute. The martif element is required to have an xml:lang attribute. The language specified in the martif element becomes the working language of the entire TBX file. Each langSet element shall also specify a language that applies to that Language Section. This becomes the object language of the language section. Thus, a definition at the Terminological Entry level is assumed to be in the working language of the martif file unless otherwise specified, and a note in a Language Section is assumed to be in the same language as the terms in that Language Section unless otherwise specified.
The allowed values of the xml:lang attribute in TBX are the same as the allowed values of the lang attribute in TMX. These values are found in IETF RFC 4646.
The id and target attributes work together to point unambiguously between elements in the same TBX file. For example, one entry:
<termEntry id="eid-database-5574">
...(entry for "hunting dog:")
</termEntry>
could be pointed to by another entry:
<termEntry>
<descrip type="superordinateConceptGeneric" target="eid-database-5574">hunting dog</descrip>
...(entry for "Retriever" [a type of hunting dog])
</termEntry>
The redundant content "hunting dog" in the second entry is for display purposes. It provides a name for the link to the other entry that can be viewed by a human who is deciding whether to follow the link.
This section describes the master data category selection (DCS) for TBX, which is based on a selection of data categories taken from the Terminology Thematic Domain of the ISO TC 37 Data Category Registry. This DCS was selected to support somewhat blind interchange. The formal, machine-processable version of the TBX master DCS can be found in Annex B. The master DCS is not the only possible DCS for TBX. Particular user groups are expected to define subsets of the master DCS. If a necessary data category is not found in the master DCS, a user group can define an extension of the master DCS.
NOTE: The list presents the data categories in an order that reflects one way of grouping them according to a kind of concept system. It is also the order in which they appear in the master XCS module.
Guidelines for encoding particular data categories in TBX (e.g., as XML elements) are given in Annex C.
Each data category other than the basic data categories in Table T.1 is related to the meta-model by being classified as either administrative or descriptive. Descriptive data categories can describe either a concept or a term. All data categories that use the tag name descrip are concept-related descriptive data categories. All data categories that have the word "note" in their tag name, as well as the tag name termCompList are term-related descriptive data categories. All data categories that use the tag name admin are administrative. Descriptive and administrative data categories are further divided into properties and relations. In TBX, a data category is a relation if the target attribute is allowed by the XCS file. Notes can be either administrative or descriptive.
The dataCategories in the first set of tables (T.) are implemented directly as TBX elements and therefore are not subordinate to metadata categories. The remaining subsections contain data categories that are specializations of metadata categories in the core structure of TBX.
| data category identifier | dataType | Target | element type | Level | Comments |
|---|---|---|---|---|---|
| term | noteText | none | <term> | term | |
| note | noteText | none | <note> | ||
| date * | date (ISO format) | none | <term> | ||
| entailedTerm ** | term | <hi> | <hi type="entailedTerm" target="id45>term</hi> | ||
| foreign | none | <foreign> | <foreign xml:lang="fr"> |
* Permissible date values comply with ISO 8601: yyyy-mm-dd or yyyymmddd. The date element is used with terminology management data categories in a transacGrp element element.
| data category identifier | dataType | Target | Attribute | Level | Comments |
|---|---|---|---|---|---|
| elementIdentifier ** | none | id | id="eid-45631" | ||
| lang | language codes | none | xml:lang |
** The examples shown are possible instances of these elements. Element identifiers can include entry identifiers (eid-...), concept identifiers (cid-...), term identifiers (tid-...), among others.
The remmaining tables (T.2 and beyond) are arranged according to the metadata categories with which individual data categories are associated. In these tables, the first column (encoding) is a link to a section of the encoding guidelines for that data category. The second is related to the unique name of the data category in the TC 37 DCR. Typically, it consists of the "camel case" name as it appears in the DCR, which means that spaces have been removed between discrete words in the name and the first letter of the second and subsequent words are expressed in upper-case. The third column (dataType) indicates what kind of text is allowed in the element. The fourth column indicates whether this element can take a target attribute, in which case it indicates what kind of element can be targeted. The fifth column (metaType) indicates which metadata category is used in TBX for this data category. The sixth column (Level) gives any exceptional information about the levels in the metamodel at which a particular data category can appear. Admin elements can appear at any level. Descrip elements can appear at the entry, language, or term levels unless otherwise restricted (using codes TE for Terminological Entry, LS for Language Section, and TM for term). TermNote elements can appear only at the term level, unless authorized (by a TC code) to appear at the Term Component level as well.
| encoding | TBX Data Category Name | dataType | Target | metaType | Level | Comments |
|---|---|---|---|---|---|---|
| ISO12620A-0201 | termType | picklist | none | termNote | term | values listed below table |
| ISO12620A-02013002 | abbreviatedFormFor | noteText | term | termNote | term | |
| ISO12620A-02013004 | shortFormFor | noteText | term | termNote | term |
termType: entryTerm synonym internationalScientificTerm fullForm transcribedForm symbol formula equation logicalExpression commonName abbreviatedFormOfTerm variant shortFormOfTerm transliteratedForm sku partNumber phraseologicalUnit synonymousPhrase standardText string internationalism
| encoding | TBX Data Category Name | dataType | Target | metaType | Level | Comments |
|---|---|---|---|---|---|---|
| ISO12620A-020201 | partOfSpeech | plainText | none | termNote | term termComponent | |
| ISO12620A-020202 | grammaticalGender | picklist | none | termNote | term termComponent | values listed below table |
| ISO12620A-020203 | grammaticalNumber | picklist | none | termNote | term termComponent | values listed below table |
| ISO12620A-020204 | animacy | picklist | none | termNote | term termComponent | values listed below table |
| ISO12620A-020207 | grammaticalValency | noteText | none | termNote | term | |
| ISO12620A-020301 | usageNote | noteText | none | termNote | term | |
| ISO12620A-020302 | geographicalUsage | noteText | none | termNote | term | |
| ISO12620A-020303 | register | picklist | none | termNote | term | values listed below table |
| ISO12620A-020304 | frequency | picklist | none | termNote | term | values listed below table |
| ISO12620A-020305 | temporalQualifier | picklist | none | termNote | term | values listed below table |
| ISO12620A-020306 | timeRestriction | noteText | none | termNote | term | |
| ISO12620A-020307 | proprietaryRestriction | picklist | none | termNote | term | values listed below table |
| ISO12620A-020401 | termProvenance | picklist | none | termNote | term | values listed below table |
| ISO12620A-020402 | etymology | noteText | none | termNote | term termComponent |
grammaticalGender: masculine feminine neuter
otherGender
grammaticalNumber: singular plural dual massNoun
otherNumber
animacy: animate inanimate otherAnimacy
register: neutralRegister technicalRegister
in-houseRegister bench-levelRegister slangRegister vulgarRegister
frequency: commonlyUsed infrequentlyUsed rarelyUsed
temporalQualifier: archaicTerm outdatedTerm
obsoleteTerm
proprietaryRestriction: trademark serviceMark
tradeName
termProvenance: transdisciplinaryBorrowing
translingualBorrowing loanTranslation neologism
| encoding | TBX Data Category Name | dataType | Target | metaType | Level | Comments |
|---|---|---|---|---|---|---|
| ISO12620A-0205 | pronunciation | noteText | none | termNote | term termComponent | |
| ISO12620A-0206 | syllabification | noteText | none | termCompList | ||
| ISO12620A-0207 | hyphenation | noteText | none | termCompList | ||
| ISO12620A-020801 | morphologicalElement | noteText | none | termCompList | ||
| ISO12620A-020802 | termElement | noteText | none | termCompList | ||
| ISO12620A-020803 | lemma | noteText | none | termCompList | ||
| ISO12620A-020804 | termStructure | noteText | none | termNote | term termComponent |
| encoding | TBX Data Category Name | dataType | Target | metaType | Level | Comments |
|---|---|---|---|---|---|---|
| ISO12620A-020901 | normativeAuthorization | picklist | none | termNote | term | values listed below table |
| ISO12620A-020902 | language-planningQualifier | picklist | none | termNote | term | values listed below table |
| ISO12620A-020903 | administrativeStatus | picklist | none | termNote | term | values listed below table |
| ISO12620A-020904 | processStatus | picklist | none | termNote | term | values listed below table |
normativeAuthorization: standardizedTerm
preferredTerm admittedTerm deprecatedTerm supersededTerm legalTerm regulatedTerm
language-planningQualifier: recommendedTerm nonstandardizedTerm proposedTerm newTerm
administrativeStatus: standardizedTerm-admn-sts preferredTerm-admn-sts admittedTerm-admn-sts deprecatedTerm-admn-sts supersededTerm-admn-sts legalTerm-admn-sts regulatedTerm-admn-sts
processStatus: unprocessed provisionallyProcessed finalized
| encoding | TBX Data Category Name | dataType | Target | metaType | Level | Comments |
|---|---|---|---|---|---|---|
| ISO12620A-0302 | falseFriend | noteText | term | termNote | term | |
| ISO12620A-0304 | reliabilityCode | picklist | none | descrip | langSet termEntry term |
values listed below table |
| ISO12620A-0305 | transferComment | noteText | term | termNote | term |
reliabilityCode: 1 2 3 4 5 6 7 8 9 10
| encoding | TBX Data Category Name | dataType | Target | metaType | Level | Comments |
|---|---|---|---|---|---|---|
| ISO12620A-04 | subjectField | plainText | none | descrip | langSet termEntry term |
|
| ISO12620A-0402 | classificationCode | noteText | bibl | descrip | langSet termEntry term |
| encoding | TBX Data Category Name | dataType | Target | metaType | Level | Comments |
|---|---|---|---|---|---|---|
| ISO12620A-0501 | definition | noteText | none | descrip | langSet termEntry term |
|
| ISO12620A-0502 | explanation | noteText | none | descrip | langSet termEntry term |
|
| ISO12620A-050302 | sampleSentence | noteText | none | descrip | term |
|
| ISO12620A-0504 | example | noteText | none | descrip | langSet termEntry term |
|
| ISO12620A-050501 | figure | noteText | binaryData | descrip | langSet termEntry term |
|
| ISO12620A-050502 | audio | noteText | binaryData | descrip | langSet termEntry term |
|
| ISO12620A-050503 | video | noteText | binaryData | descrip | langSet termEntry term |
|
| ISO12620A-050504 | table | noteText | binaryData | descrip | langSet termEntry term |
|
| ISO12620A-050505 | otherBinaryData | noteText | binaryData | descrip | langSet termEntry term |
|
| ISO12620A-0506 | unit | noteText | none | descrip | term |
|
| ISO12620A-0507 | range | noteText | none | descrip | term |
|
| ISO12620A-050701 | quantity | noteText | none | descrip | term |
|
| ISO12620A-0508 | characteristic | noteText | none | descrip | term |
|
| ISO12620A-0509 | conceptOrigin | noteText | none | admin | ||
| ISO12620A-0803 | contextType | picklist | none | descripNote | values listed below table |
contextType: definingContext explanatoryContext associativeContext linguisticContext metalinguisticContext translatedContext
| encoding | TBX Data Category Name | dataType | Target | metaType | Level | Comments |
|---|---|---|---|---|---|---|
| ISO12620A-0702 | conceptPosition | noteText | conceptSysDescrip | descrip | langSet termEntry |
|
| ISO12620A-070201 | broaderConceptGeneric | noteText | entry | descrip | langSet termEntry |
|
| ISO12620A-070202 | broaderConceptPartitive | noteText | entry | descrip | langSet termEntry |
|
| ISO12620A-07020202 | superordinateConceptGeneric | noteText | entry | descrip | langSet termEntry |
|
| ISO12620A-07020203 | superordinateConceptPartitive | noteText | entry | descrip | langSet termEntry |
|
| ISO12620A-07020301 | subordinateConceptGeneric | noteText | entry | descrip | langSet termEntry |
|
| ISO12620A-07020302 | subordinateConceptPartitive | noteText | entry | descrip | langSet termEntry |
|
| ISO12620A-07020401 | coordinateConceptGeneric | noteText | entry | descrip | langSet termEntry |
|
| ISO12620A-07020402 | coordinateConceptPartitive | noteText | entry | descrip | langSet termEntry |
|
| ISO12620A-070205 | relatedConcept | noteText | entry | descrip | langSet termEntry |
|
| ISO12620A-07020501 | relatedConceptBroader | noteText | entry | descrip | langSet termEntry |
|
| ISO12620A-07020502 | relatedConceptNarrower | noteText | entry | descrip | langSet termEntry |
|
| ISO12620A-070206 | sequentiallyRelatedConcept | noteText | entry | descrip | langSet termEntry |
|
| ISO12620A-070207 | temporallyRelatedConcept | noteText | entry | descrip | langSet termEntry |
|
| ISO12620A-070208 | spatiallyRelatedConcept | noteText | entry | descrip | langSet termEntry |
|
| ISO12620A-070210 | associatedConcept | noteText | entry | descrip | langSet termEntry |
| encoding | TBX Data Category Name | dataType | Target | metaType | Level | Comments |
|---|---|---|---|---|---|---|
| ISO12620A-0902 | thesaurusDescriptor | noteText | thesaurusDescrip | descrip | termEntry |
|
| ISO12620A-0904 | keyword | noteText | none | admin | ||
| ISO12620A-0905 | indexHeading | noteText | none | admin |
| encoding | TBX Data Category Name | dataType | Target | metaType | Level | Comments |
|---|---|---|---|---|---|---|
| ISO12620A-1001 | terminologyManagementTransactions | picklist | none | transac | values listed below table |
|
| ISO12620A-100202 | responsibility | noteText | respPerson | transacNote | ||
| ISO12620A-10020210 | subsetOwner | plainText | none | admin | ||
| ISO12620A-100203 | usageCount | noteText | none | transacNote |
terminologyManagementTransactions: origination input modification check approval withdrawal standardization exportation importation proposal userAccess
| encoding | TBX Data Category Name | dataType | Target | metaType | Level | Comments |
|---|---|---|---|---|---|---|
| ISO12620A-100301 | customerSubset | noteText | none | admin | ||
| ISO12620A-100303 | projectSubset | noteText | none | admin | ||
| ISO12620A-100305 | productSubset | noteText | none | admin | ||
| ISO12620A-100306 | applicationSubset | noteText | none | admin | ||
| ISO12620A-100307 | environmentSubset | noteText | none | admin | ||
| ISO12620A-100308 | businessUnitSubset | noteText | none | admin | ||
| ISO12620A-100309 | securitySubset | picklist | none | admin | values listed below table |
securitySubset: public confidential
| encoding | TBX Data Category Name | dataType | Target | metaType | Level | Comments |
|---|---|---|---|---|---|---|
| ISO12620A-100602 | sortKey | noteText | none | admin | ||
| ISO12620A-100603 | searchTerm | noteText | none | admin | ||
| ISO12620A-100604 | hotkey | noteText | none | hi | ||
| ISO12620A-1011 | elementWorkingStatus | picklist | none | admin | values listed below table |
|
| ISO12620A-1013 | entrySource | noteText | none | admin | ||
| ISO12620A-1018 | cross-reference | noteText | element | ref | ||
| ISO12620A-101801 | see | noteText | element | ref | ||
| ISO12620A-101805 | homograph | noteText | term | termNote | term | |
| ISO12620A-10180601 | antonym-term | noteText | term | descrip | term |
|
| ISO12620A-10180602 | antonym-concept | noteText | entry | descrip | termEntry |
|
| ISO12620A-101807 | externalCrossReference | noteText | external | xref | ||
| ISO12620A-101808 | corpusTrace | noteText | external | xref | ||
| ISO12620A-1019 | source | noteText | none | admin | ||
| ISO12620A-1020 | sourceIdentifier | noteText | bibl | admin | ||
| ISO12620A-102001 | sourceType | picklist | none | adminNote | values listed below table |
|
| ISO12620A-102201 | originatingPerson | plainText | none | admin | ||
| ISO12620A-102202 | originatingInstitution | noteText | none | admin | ||
| ISO12620A-102203 | originatingDatabase | noteText | none | admin | ||
| ISO12620A-10220301 | databaseType | plainText | none | admin | ||
| ISO12620A-1023 | sourceLanguage | noteText | none | admin | ||
| ISO12620A-1024 | targetLanguage | picklist | none | admin | ||
| ISO12620A-1025 | domainExpert | noteText | bibl | admin |
elementWorkingStatus: starterElement workingElement consolidatedElement archiveElement importedElement exportedElement
sourceType: parallelText backgroundText
| encoding | TBX Data Category Name | dataType | Target | metaType | Level | Comments |
|---|---|---|---|---|---|---|
| ISO12620A-0503 | context | noteText | none | descrip | term |
|
| ISO12620A-0801 | descriptionType | picklist | element | descripNote | ||
| ISO12620A-0802 | definitionType | picklist | element | descripNote | values listed below table |
definitionType: intensionalDefinition extensionalDefinition partitiveDefinition translatedDefinition
Data categories that do not have a picklist in the TBX master XCS can have a picklist in a user-group subset XCS of TBX (see section 10) if the user-group in question can agree on a picklist for that data category. One obvious candidate for a user-group picklist is partOfSpeech, for which there is no agreed-on picklist when all the languages of the world and all linguistic theories are to be taken into account, but whose options are easily standardized for European languages.
As indicated in the Section 4, user-group subsets (user-group-specific DCSs) can be defined for TBX. In TBX, a DCS is formally represented in a data structure called an XCS (eXtensible Contstaint Specification). This section will describe one such sample subset, which will be called the Supplier Subset. For illustrative purposes, this subset will only allow minimal terminological information provided to a fictive supplier of translation along with the source text to be translated, in a very restrictive environment involving manufacturing and finance only.
This subset will allow only two types of terms, full forms and abbreviated forms. This is done by specifying a picklist of allowed values for the data category termType, which is an instantiation of the metadata category termNote. The following information is placed in the XCS module concerning term type:
This specification is a strict subset of the specification for termType in the master XCS module. The only difference is that the master XCS module allows more options in the picklist. Clearly, any TBX document that conforms to the subset specification shall also conform to the master specification.
This subset will allow only two types of descriptive information: a subject field and a definition. The subject fields allowed in this subset are manufacturing and finance, and subject field specifications are allowed only at the terminological entry level:
The master XCS module allows any plain text value for a subject field, so a subset module can specify a picklist. Obviously, there can be no picklist of possible definitions, so the specification for definition contains the same type of text found in general notes (noteText) and is allowed at two levels, entry and language.
This is done by placing the following information in the XCS module:
An actual machine-processable XCS file for the Supplier subset of TBX would look like this:
<?xml version="1.0"?>
<!DOCTYPE TBXXCS SYSTEM "tbxxcsdtd.dtd"><TBXXCS id='DXFd-supplier' version="1.0" lang='en'>
<header><title>subset DCS file for the Supplier example</title></header>
<datCatSet>
<termNoteSpec id="termType" datcatId="ISO12620A-0201">
<contents datatype="picklist" targetclass="none">fullForm abbreviatedForm</contents>
</termNoteSpec>
<descripSpec id="subjectField" datcatId="ISO12620A-04">
<contents datatype="picklist" targetclass="none">manufacturing finance</contents>
<levels>termEntry</levels>
</descripSpec>
<descripSpec id="definition" datcatId="ISO12620A-0501">
<contents datatype="noteText" targetclass="none"/>
<levels>termEntry langSet</levels>
</descripSpec>
</datCatSet>
</TBXXCS>
It should be noted that the machine-processable XCS module corresponds in a straightforward manner to the information listed for the three data categories presented in the above examples. The data category identifier (in yellow) are found in the master data category selection. Creating an XCS module does not require an extensive knowledge of XML nor expertise in writing XML DTDs or schemas.
Clearly, specifying only three data categories (term type, subject field, and definition) as instances of metadata categories defines a very limited subset of TBX; nevertheless, this limited data-category module can be logically combined with the core-structure module of TBX to allow such TBX-compliant documents as the example in section 7. Elements that are not metadata categories, i.e., basic TBX data categories such as <term> and <note>, are allowed without explicit mention in the XCS module because they are part of the core structure and need not be specified in the XCS.
This annex contains a formal representation of the core structure as a DTD. When reformatted as a separate file, it should be named "TBXcoreStrucV01.dtd" (TBX Core DTD Version 01).
The XML entities (such as noteText) listed in TBXcoreStrucV01.dtd allow mnemonic names to be given to pieces of text, especially text used in several places. The elements of TBX are divided into three groups: (a) the low-level elements used to mark up text, such as markup inside definitions and contextual examples, (b) elements needed to constitute a terminological entry (<termEntry>); and (c) high-level elements and other elements not used in a terminological entry, e.g., header elements. After these three groups of elements, attribute lists for all elements are given in alphabetical order. Comments associated with metadata indicate their specializations and other constraints on them are in the XCS module.
<!-- TBX Core DTD Version 01 --> <!-- Based on DXF core-structure DTD version 0.4, from the SALT project, compatible with ISO 12200 amended --> <!-- declaration: martif PUBLIC "ISO 12200:1999A//DTD MARTIF core (DXFcdV04)//EN" --> <!-- note: see DCS for values of type on metadata categories and for values of lang --> <!-- ================================================================================= SOME USEFUL ENTITIES THAT ARE REFERENCED BELOW ================================================================================== --> <!ENTITY % basicText '(#PCDATA | hi)*'> <!ENTITY % noteText '(#PCDATA | hi | foreign | bpt | ept | it | ph | ut)*'> <!ENTITY % auxInfo '(descrip | descripGrp | admin | adminGrp | transacGrp | note | ref | xref)*' > <!ENTITY % noteLinkInfo '(admin | adminGrp | transacGrp | note | ref | xref)*' > <!-- Entities that define common sets of attributes --> <!ENTITY % impIDLang ' id ID #IMPLIED xml:lang CDATA #IMPLIED '> <!ENTITY % impIDType ' id ID #IMPLIED type CDATA #IMPLIED '> <!ENTITY % impIDLangTypTgtDtyp ' id ID #IMPLIED xml:lang CDATA #IMPLIED type CDATA #REQUIRED target IDREF #IMPLIED datatype CDATA #IMPLIED '> <!-- ================================================================================ ELEMENTS USED FOR TEXT MARKUP ================================================================================ --> <!ELEMENT hi (#PCDATA) > <!ATTLIST hi type (entailedTerm | xlink) #IMPLIED target IDREF #IMPLIED xml:lang CDATA #IMPLIED href CDATA #IMPLIED show CDATA #IMPLIED actuate CDATA #IMPLIED role CDATA #IMPLIED behavior CDATA #IMPLIED > <!ELEMENT foreign %noteText; > <!ATTLIST foreign id ID #IMPLIED xml:lang CDATA #IMPLIED > <!-- meta-markup elements borrowed from OSCAR --> <!ELEMENT bpt (#PCDATA)* > <!ATTLIST bpt i CDATA #IMPLIED x CDATA #IMPLIED type CDATA #IMPLIED > <!ELEMENT ept (#PCDATA)* > <!ATTLIST ept i CDATA #IMPLIED > <!ELEMENT it (#PCDATA)* > <!ATTLIST it pos (begin|end) #REQUIRED x CDATA #IMPLIED type CDATA #IMPLIED > <!ELEMENT ph (#PCDATA)* > <!ATTLIST ph assoc CDATA #IMPLIED x CDATA #IMPLIED type CDATA #IMPLIED > <!ELEMENT ut (#PCDATA) > <!ATTLIST ut x CDATA #IMPLIED > <!-- ================================================================================ ELEMENTS NEEDED FOR TERMINOLOGICAL ENTRIES (IN ALPHABETICAL ORDER) ================================================================================ --> <!ELEMENT admin %noteText; > <!-- meta: see DCS for values of type --> <!ATTLIST admin %impIDLangTypTgtDtyp;> <!ELEMENT adminGrp (admin, (adminNote|note|ref|xref)*) > <!ATTLIST adminGrp id ID #IMPLIED > <!ELEMENT adminNote %noteText; > <!-- meta: see DCS for values of type --> <!ATTLIST adminNote %impIDLangTypTgtDtyp; > <!ELEMENT date (#PCDATA) > <!ATTLIST date id ID #IMPLIED > <!ELEMENT descrip %noteText; > <!-- meta: see DCS for values of type --> <!ATTLIST descrip %impIDLangTypTgtDtyp;> <!ELEMENT descripGrp (descrip, (descripNote|admin|adminGrp|transacGrp|note|ref|xref)*) > <!ATTLIST descripGrp id ID #IMPLIED > <!ELEMENT descripNote %noteText; > <!-- meta: see DCS for values of type --> <!ATTLIST descripNote %impIDLangTypTgtDtyp;> <!ELEMENT langSet ((%auxInfo;), (tig | ntig)+) > <!ATTLIST langSet id ID #IMPLIED xml:lang CDATA #REQUIRED lang CDATA #IMPLIED > <!ELEMENT note %noteText; > <!ATTLIST note %impIDLang; > <!ELEMENT ntig (termGrp, %auxInfo;) > <!ATTLIST ntig id ID #IMPLIED > <!ELEMENT ref (#PCDATA) > <!-- meta: see DCS for values of type --> <!ATTLIST ref %impIDLangTypTgtDtyp; > <!ELEMENT term %noteText; > <!ATTLIST term id ID #IMPLIED > <!ELEMENT termComp %noteText; > <!ATTLIST termComp %impIDLang; > <!ELEMENT termCompGrp (termComp, (termNote|termNoteGrp)*, %noteLinkInfo;) > <!ATTLIST termCompGrp id ID #IMPLIED > <!ELEMENT termCompList ((%auxInfo;), (termComp | termCompGrp)+) > <!-- meta: see DCS for values of type --> <!ATTLIST termCompList id ID #IMPLIED type CDATA #REQUIRED > <!ELEMENT termEntry ((%auxInfo;),(langSet+)) > <!ATTLIST termEntry id ID #IMPLIED > <!ELEMENT termGrp (term, (termNote|termNoteGrp)*, (termCompList)* ) > <!ATTLIST termGrp id ID #IMPLIED > <!ELEMENT termNote %noteText; > <!-- meta: see DCS for values of type --> <!ATTLIST termNote %impIDLangTypTgtDtyp;> <!ELEMENT termNoteGrp (termNote, %noteLinkInfo;) > <!ATTLIST termNoteGrp id ID #IMPLIED > <!ELEMENT tig (term, (termNote)*, %auxInfo;) > <!ATTLIST tig id ID #IMPLIED > <!ELEMENT transac %noteText; > <!-- meta: see DCS for values of type --> <!ATTLIST transac type CDATA "transactionType" xml:lang CDATA #IMPLIED target IDREF #IMPLIED datatype CDATA #IMPLIED > <!ELEMENT transacGrp (transac, (transacNote|date|note|ref|xref)* ) > <!ATTLIST transacGrp id ID #IMPLIED > <!ELEMENT transacNote %noteText; > <!-- meta: see DCS for values of type --> <!ATTLIST transacNote %impIDLangTypTgtDtyp; > <!ELEMENT xref (#PCDATA) > <!-- meta: see DCS for values of type --> <!ATTLIST xref %impIDType; target CDATA #REQUIRED > <!-- =================================================================================== OTHER ELEMENTS (in hierarchical order) =================================================================================== --> <!ELEMENT martif (martifHeader, text) > <!-- *** starting element *** --> <!ATTLIST martif type CDATA #REQUIRED xml:lang CDATA #REQUIRED > <!ELEMENT martifHeader (fileDesc, encodingDesc?, revisionDesc?) > <!ATTLIST martifHeader id ID #IMPLIED > <!ELEMENT p %noteText; > <!-- p is used in several header elements --> <!ATTLIST p id ID #IMPLIED type (langDeclaration|DCSName) #IMPLIED xml:lang CDATA #IMPLIED > <!ELEMENT fileDesc (titleStmt?, publicationStmt?, sourceDesc+) > <!ATTLIST fileDesc id ID #IMPLIED > <!ELEMENT titleStmt (title, note*) > <!ATTLIST titleStmt %impIDLang; > <!ELEMENT title (#PCDATA) > <!ATTLIST title %impIDLang; > <!ELEMENT publicationStmt (p+) > <!ATTLIST publicationStmt id ID #IMPLIED > <!ELEMENT sourceDesc (p+) > <!ATTLIST sourceDesc %impIDLang; > <!ELEMENT encodingDesc (ude?, p+) > <!ATTLIST encodingDesc id ID #IMPLIED > <!ELEMENT ude (map+)> <!ATTLIST ude id ID #IMPLIED name CDATA #REQUIRED base CDATA #IMPLIED > <!ELEMENT map EMPTY> <!ATTLIST map unicode CDATA #REQUIRED code CDATA #REQUIRED ent CDATA #REQUIRED subst CDATA #REQUIRED > <!ELEMENT revisionDesc (change+) > <!ATTLIST revisionDesc %impIDLang; > <!ELEMENT change (p+) > <!ATTLIST change %impIDLang; > <!ELEMENT text (front?, body, back?) > <!ATTLIST text id ID #IMPLIED > <!ELEMENT front (#PCDATA) > <!-- here put Other Resources, each in a namespace --> <!ATTLIST front id ID #IMPLIED > <!ELEMENT body (termEntry+) > <!ATTLIST body id ID #IMPLIED > <!ELEMENT back ((refObjectList)*) > <!ATTLIST back id ID #IMPLIED > <!ELEMENT refObjectList (refObject+) > <!-- meta: see DCS for values of type --> <!ATTLIST refObjectList id ID #IMPLIED type CDATA #REQUIRED > <!ELEMENT refObject ((itemSet | itemGrp | item)+) > <!ATTLIST refObject id ID #IMPLIED > <!ELEMENT item %noteText; > <!ATTLIST item %impIDType; > <!ELEMENT itemGrp (item, %noteLinkInfo;)> <!ATTLIST itemGrp id ID #IMPLIED > <!ELEMENT itemSet ((item | itemGrp)+)> <!ATTLIST itemSet %impIDType; > <!-- end -->
<!-- Parameter Entities First--> <!ENTITY % specAtt "datcatId CDATA #REQUIRED name CDATA #REQUIRED"> <!-- Elements and Attributes--> <!ELEMENT TBXXCS (header,languages,datCatSet,refObjectSet)> <!ATTLIST TBXXCS lang CDATA #REQUIRED name CDATA #REQUIRED version CDATA #REQUIRED> <!ELEMENT header (title)> <!ELEMENT languages (langInfo)+> <!ELEMENT datCatSet (adminNoteSpec|adminSpec|descripNoteSpec |descripSpec|hiSpec|refSpec|termCompListSpec |termNoteSpec|transacNoteSpec|transacSpec |xrefSpec)+> <!ELEMENT refObjectSet (refObjectDef)+> <!ELEMENT title (#PCDATA)> <!ELEMENT langInfo (langCode,langName)> <!ELEMENT langCode (#PCDATA)> <!ELEMENT langName (#PCDATA)> <!ELEMENT descripSpec (contents,levels)> <!ATTLIST descripSpec %specAtt;> <!ELEMENT levels (#PCDATA)> <!ELEMENT adminNoteSpec (contents)> <!ATTLIST adminNoteSpec %specAtt;> <!ELEMENT adminSpec (contents)> <!ATTLIST adminSpec %specAtt;> <!ELEMENT descripNoteSpec (contents)> <!ATTLIST descripNoteSpec %specAtt;> <!ELEMENT hiSpec (contents)> <!ATTLIST hiSpec %specAtt;> <!ELEMENT refSpec (contents)> <!ATTLIST refSpec %specAtt;> <!ELEMENT termCompListSpec (contents)> <!ATTLIST termCompListSpec %specAtt;> <!ELEMENT termNoteSpec (contents)> <!ATTLIST termNoteSpec %specAtt;> <!ELEMENT transacNoteSpec (contents)> <!ATTLIST transacNoteSpec %specAtt;> <!ELEMENT transacSpec (contents)> <!ATTLIST transacSpec %specAtt;> <!ELEMENT xrefSpec (contents)> <!ATTLIST xrefSpec %specAtt;> <!ELEMENT contents (#PCDATA)> <!ATTLIST contents datatype CDATA #REQUIRED forTermComp CDATA #IMPLIED targetType CDATA #REQUIRED> <!ELEMENT refObjectDef (refObjectType,itemSet)> <!ELEMENT refObjectType (#PCDATA)> <!ELEMENT itemSet (item)+> <!ATTLIST itemSet type CDATA #REQUIRED> <!ELEMENT item (#PCDATA)> <!ATTLIST item type CDATA #REQUIRED>
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE TBXXCS SYSTEM "tbxxcsdtd.dtd"> <TBXXCS name='master' version="0.4" lang='en'> <header> <title>TBX master XCS (extensible constraint specification) file</title> </header> <languages> <langInfo> <langCode>en</langCode> <langName>English</langName> </langInfo> <langInfo> <langCode>de</langCode> <langName>German</langName> </langInfo> </languages> <datCatSet> <termNoteSpec name="abbreviatedFormFor" datcatId="ISO12620A-02013002"> <contents datatype="noteText" targetType="term"/> </termNoteSpec> <termNoteSpec name="administrativeStatus" datcatId="ISO12620A-020903"> <contents datatype="picklist" targetclass="none">standardizedTerm-admn-sts preferredTerm-admn-sts admittedTerm-admn-sts deprecatedTerm-admn-sts supersededTerm-admn-sts legalTerm-admn-sts regulatedTerm-admn-sts </contents> </termNoteSpec> <termNoteSpec name="animacy" datcatId="ISO12620A-020204"> <contents datatype="picklist" targetclass="none" forTermComp="yes">animate inanimate otherAnimacy </contents> </termNoteSpec> <descripSpec name="antonym-concept" datcatId="ISO12620A-10180602"> <contents datatype="noteText" targetType="entry"/> <levels>termEntry </levels> </descripSpec> <descripSpec name="antonym-term" datcatId="ISO12620A-10180601"> <contents datatype="noteText" targetType="term"/> <levels>term </levels> </descripSpec> <adminSpec name="applicationSubset" datcatId="ISO12620A-100306"> <contents datatype="noteText" targetclass="none"/> </adminSpec> <descripSpec name="associatedConcept" datcatId="ISO12620A-070210"> <contents datatype="noteText" targetType="entry"/> <levels>langSet termEntry </levels> </descripSpec> <descripSpec name="audio" datcatId="ISO12620A-050502"> <contents datatype="noteText" targetType="binaryData"/> <levels>langSet termEntry term </levels> </descripSpec> <descripSpec name="broaderConceptGeneric" datcatId="ISO12620A-070201"> <contents datatype="noteText" targetType="entry"/> <levels>langSet termEntry </levels> </descripSpec> <descripSpec name="broaderConceptPartitive" datcatId="ISO12620A-070202"> <contents datatype="noteText" targetType="entry"/> <levels>langSet termEntry </levels> </descripSpec> <adminSpec name="businessUnitSubset" datcatId="ISO12620A-100308"> <contents datatype="noteText" targetclass="none"/> </adminSpec> <descripSpec name="characteristic" datcatId="ISO12620A-0508"> <contents datatype="noteText" targetclass="none"/> <levels>term </levels> </descripSpec> <descripSpec name="classificationCode" datcatId="ISO12620A-0402"> <contents datatype="noteText" targetType="bibl"/> <levels>langSet termEntry term </levels> </descripSpec> <adminSpec name="conceptOrigin" datcatId="ISO12620A-0509"> <contents datatype="noteText" targetclass="none"/> </adminSpec> <descripSpec name="conceptPosition" datcatId="ISO12620A-0702"> <contents datatype="noteText" targetType="conceptSysDescrip"/> <levels>langSet termEntry </levels> </descripSpec> <descripSpec name="context" datcatId="ISO12620A-0503"> <contents datatype="noteText" targetclass="none"/> <levels>term </levels> </descripSpec> <descripNoteSpec name="contextType" datcatId="ISO12620A-0803"> <contents datatype="picklist" targetclass="none">definingContext explanatoryContext associativeContext linguisticContext metalinguisticContext translatedContext </contents> </descripNoteSpec> <descripSpec name="coordinateConceptGeneric" datcatId="ISO12620A-07020401"> <contents datatype="noteText" targetType="entry"/> <levels>langSet termEntry </levels> </descripSpec> <descripSpec name="coordinateConceptPartitive" datcatId="ISO12620A-07020402"> <contents datatype="noteText" targetType="entry"/> <levels>langSet termEntry </levels> </descripSpec> <xrefSpec name="corpusTrace" datcatId="ISO12620A-101808"> <contents datatype="noteText" targetType="external"/> </xrefSpec> <refSpec name="cross-reference" datcatId="ISO12620A-1018"> <contents datatype="noteText" targetType="element"/> </refSpec> <adminSpec name="customerSubset" datcatId="ISO12620A-100301"> <contents datatype="noteText" targetclass="none"/> </adminSpec> <adminSpec name="databaseType" datcatId="ISO12620A-10220301"> <contents datatype="plainText" targetclass="none"/> </adminSpec> <descripSpec name="definition" datcatId="ISO12620A-0501"> <contents datatype="noteText" targetclass="none"/> <levels>langSet termEntry term </levels> </descripSpec> <descripNoteSpec name="definitionType" datcatId="ISO12620A-0802"> <contents datatype="picklist" targetType="element">intensionalDefinition extensionalDefinition partitiveDefinition translatedDefinition </contents> </descripNoteSpec> <descripNoteSpec name="descriptionType" datcatId="ISO12620A-0801"> <contents datatype="picklist" targetType="element"/> </descripNoteSpec> <adminSpec name="domainExpert" datcatId="ISO12620A-1025"> <contents datatype="noteText" targetType="bibl"/> </adminSpec> <adminSpec name="elementWorkingStatus" datcatId="ISO12620A-1011"> <contents datatype="picklist" targetclass="none" forTermComp="yes">starterElement workingElement consolidatedElement archiveElement importedElement exportedElement </contents> </adminSpec> <hiSpec name="entailedTerm" datcatId="ISO12620A-100601"> <contents datatype="noteText" targetclass="none"/> </hiSpec> <adminSpec name="entrySource" datcatId="ISO12620A-1013"> <contents datatype="noteText" targetclass="none"/> </adminSpec> <adminSpec name="environmentSubset" datcatId="ISO12620A-100307"> <contents datatype="noteText" targetclass="none"/> </adminSpec> <termNoteSpec name="etymology" datcatId="ISO12620A-020402"> <contents datatype="noteText" targetclass="none" forTermComp="yes"/> </termNoteSpec> <descripSpec name="example" datcatId="ISO12620A-0504"> <contents datatype="noteText" targetclass="none"/> <levels>langSet termEntry term </levels> </descripSpec> <descripSpec name="explanation" datcatId="ISO12620A-0502"> <contents datatype="noteText" targetclass="none"/> <levels>langSet termEntry term </levels> </descripSpec> <xrefSpec name="externalCrossReference" datcatId="ISO12620A-101807"> <contents datatype="noteText" targetType="external"/> </xrefSpec> <termNoteSpec name="falseFriend" datcatId="ISO12620A-0302"> <contents datatype="noteText" targetType="term"/> </termNoteSpec> <descripSpec name="figure" datcatId="ISO12620A-050501"> <contents datatype="noteText" targetType="binaryData"/> <levels>langSet termEntry term </levels> </descripSpec> <termNoteSpec name="frequency" datcatId="ISO12620A-020304"> <contents datatype="picklist" targetclass="none">commonlyUsed infrequentlyUsed rarelyUsed </contents> </termNoteSpec> <termNoteSpec name="geographicalUsage" datcatId="ISO12620A-020302"> <contents datatype="noteText" targetclass="none"/> </termNoteSpec> <termNoteSpec name="grammaticalGender" datcatId="ISO12620A-020202"> <contents datatype="picklist" targetclass="none" forTermComp="yes">masculine feminine neuter otherGender </contents> </termNoteSpec> <termNoteSpec name="grammaticalNumber" datcatId="ISO12620A-020203"> <contents datatype="picklist" targetclass="none" forTermComp="yes">singular plural dual massNoun otherNumber </contents> </termNoteSpec> <termNoteSpec name="grammaticalValency" datcatId="ISO12620A-020207"> <contents datatype="noteText" targetclass="none"/> </termNoteSpec> <termNoteSpec name="homograph" datcatId="ISO12620A-101805"> <contents datatype="noteText" targetType="term"/> </termNoteSpec> <hiSpec name="hotkey" datcatId="ISO12620A-100604"> <contents datatype="noteText" targetclass="none"/> </hiSpec> <termCompListSpec name="hyphenation" datcatId="ISO12620A-0207"> <contents datatype="noteText" targetclass="none" forTermComp="yes"/> </termCompListSpec> <adminSpec name="indexHeading" datcatId="ISO12620A-0905"> <contents datatype="noteText" targetclass="none"/> </adminSpec> <adminSpec name="keyword" datcatId="ISO12620A-0904"> <contents datatype="noteText" targetclass="none"/> </adminSpec> <termNoteSpec name="language-planningQualifier" datcatId="ISO12620A-020902"> <contents datatype="picklist" targetclass="none">recommendedTerm nonstandardizedTerm proposedTerm newTerm </contents> </termNoteSpec> <termCompListSpec name="lemma" datcatId="ISO12620A-020803"> <contents datatype="noteText" targetclass="none" forTermComp="yes"/> </termCompListSpec> <termCompListSpec name="morphologicalElement" datcatId="ISO12620A-020801"> <contents datatype="noteText" targetclass="none" forTermComp="yes"/> </termCompListSpec> <termNoteSpec name="normativeAuthorization" datcatId="ISO12620A-020901"> <contents datatype="picklist" targetclass="none">standardizedTerm preferredTerm admittedTerm deprecatedTerm supersededTerm legalTerm regulatedTerm </contents> </termNoteSpec> <adminSpec name="originatingDatabase" datcatId="ISO12620A-102203"> <contents datatype="noteText" targetclass="none"/> </adminSpec> <adminSpec name="originatingInstitution" datcatId="ISO12620A-102202"> <contents datatype="noteText" targetclass="none"/> </adminSpec> <adminSpec name="originatingPerson" datcatId="ISO12620A-102201"> <contents datatype="plainText" targetclass="none"/> </adminSpec> <descripSpec name="otherBinaryData" datcatId="ISO12620A-050505"> <contents datatype="noteText" targetType="binaryData"/> <levels>langSet termEntry term </levels> </descripSpec> <termNoteSpec name="partOfSpeech" datcatId="ISO12620A-020201"> <contents datatype="plainText" targetclass="none" forTermComp="yes"/> </termNoteSpec> <termNoteSpec name="processStatus" datcatId="ISO12620A-020904"> <contents datatype="picklist" targetclass="none">unprocessed provisionallyProcessed finalized </contents> </termNoteSpec> <adminSpec name="productSubset" datcatId="ISO12620A-100305"> <contents datatype="noteText" targetclass="none"/> </adminSpec> <adminSpec name="projectSubset" datcatId="ISO12620A-100303"> <contents datatype="noteText" targetclass="none"/> </adminSpec> <termNoteSpec name="pronunciation" datcatId="ISO12620A-0205"> <contents datatype="noteText" targetclass="none" forTermComp="yes"/> </termNoteSpec> <termNoteSpec name="proprietaryRestriction" datcatId="ISO12620A-020307"> <contents datatype="picklist" targetclass="none">trademark serviceMark tradeName </contents> </termNoteSpec> <descripSpec name="quantity" datcatId="ISO12620A-050701"> <contents datatype="noteText" targetclass="none"/> <levels>term </levels> </descripSpec> <descripSpec name="range" datcatId="ISO12620A-0507"> <contents datatype="noteText" targetclass="none"/> <levels>term </levels> </descripSpec> <termNoteSpec name="register" datcatId="ISO12620A-020303"> <contents datatype="picklist" targetclass="none">neutralRegister technicalRegister in-houseRegister bench-levelRegister slangRegister vulgarRegister </contents> </termNoteSpec> <descripSpec name="relatedConcept" datcatId="ISO12620A-070205"> <contents datatype="noteText" targetType="entry"/> <levels>langSet termEntry </levels> </descripSpec> <descripSpec name="relatedConceptBroader" datcatId="ISO12620A-07020501"> <contents datatype="noteText" targetType="entry"/> <levels>langSet termEntry </levels> </descripSpec> <descripSpec name="relatedConceptNarrower" datcatId="ISO12620A-07020502"> <contents datatype="noteText" targetType="entry"/> <levels>langSet termEntry </levels> </descripSpec> <descripSpec name="reliabilityCode" datcatId="ISO12620A-0304"> <contents datatype="picklist" targetclass="none">1 2 3 4 5 6 7 8 9 10 </contents> <levels>langSet termEntry term </levels> </descripSpec> <transacNoteSpec name="responsibility" datcatId="ISO12620A-100202"> <contents datatype="noteText" targetType="respPerson" forTermComp="yes"/> </transacNoteSpec> <descripSpec name="sampleSentence" datcatId="ISO12620A-050302"> <contents datatype="noteText" targetclass="none"/> <levels>term </levels> </descripSpec> <adminSpec name="searchTerm" datcatId="ISO12620A-100603"> <contents datatype="noteText" targetclass="none"/> </adminSpec> <adminSpec name="securitySubset" datcatId="ISO12620A-100309"> <contents datatype="picklist" targetclass="none">public confidential </contents> </adminSpec> <refSpec name="see" datcatId="ISO12620A-101801"> <contents datatype="noteText" targetType="element" forTermComp="yes"/> </refSpec> <descripSpec name="sequentiallyRelatedConcept" datcatId="ISO12620A-070206"> <contents datatype="noteText" targetType="entry"/> <levels>langSet termEntry </levels> </descripSpec> <termNoteSpec name="shortFormFor" datcatId="ISO12620A-02013004"> <contents datatype="noteText" targetType="term"/> </termNoteSpec> <adminSpec name="sortKey" datcatId="ISO12620A-100602"> <contents datatype="noteText" targetclass="none"/> </adminSpec> <adminSpec name="source" datcatId="ISO12620A-1019"> <contents datatype="noteText" targetclass="none"/> </adminSpec> <adminSpec name="sourceIdentifier" datcatId="ISO12620A-1020"> <contents datatype="noteText" targetType="bibl"/> </adminSpec> <adminSpec name="sourceLanguage" datcatId="ISO12620A-1023"> <contents datatype="noteText" targetclass="none"/> </adminSpec> <adminNoteSpec name="sourceType" datcatId="ISO12620A-102001"> <contents datatype="picklist" targetclass="none">parallelText backgroundText </contents> </adminNoteSpec> <descripSpec name="spatiallyRelatedConcept" datcatId="ISO12620A-070208"> <contents datatype="noteText" targetType="entry"/> <levels>langSet termEntry </levels> </descripSpec> <descripSpec name="subjectField" datcatId="ISO12620A-04"> <contents datatype="plainText" targetclass="none"/> <levels>langSet termEntry term </levels> </descripSpec> <descripSpec name="subordinateConceptGeneric" datcatId="ISO12620A-07020301"> <contents datatype="noteText" targetType="entry"/> <levels>langSet termEntry </levels> </descripSpec> <descripSpec name="subordinateConceptPartitive" datcatId="ISO12620A-07020302"> <contents datatype="noteText" targetType="entry"/> <levels>langSet termEntry </levels> </descripSpec> <adminSpec name="subsetOwner" datcatId="ISO12620A-10020210"> <contents datatype="plainText" targetclass="none"/> </adminSpec> <descripSpec name="superordinateConceptGeneric" datcatId="ISO12620A-07020202"> <contents datatype="noteText" targetType="entry"/> <levels>langSet termEntry </levels> </descripSpec> <descripSpec name="superordinateConceptPartitive" datcatId="ISO12620A-07020203"> <contents datatype="noteText" targetType="entry"/> <levels>langSet termEntry </levels> </descripSpec> <termCompListSpec name="syllabification" datcatId="ISO12620A-0206"> <contents datatype="noteText" targetclass="none" forTermComp="yes"/> </termCompListSpec> <descripSpec name="table" datcatId="ISO12620A-050504"> <contents datatype="noteText" targetType="binaryData"/> <levels>langSet termEntry term </levels> </descripSpec> <adminSpec name="targetLanguage" datcatId="ISO12620A-1024"> <contents datatype="picklist" targetclass="none"/> </adminSpec> <termNoteSpec name="temporalQualifier" datcatId="ISO12620A-020305"> <contents datatype="picklist" targetclass="none">archaicTerm outdatedTerm obsoleteTerm </contents> </termNoteSpec> <descripSpec name="temporallyRelatedConcept" datcatId="ISO12620A-070207"> <contents datatype="noteText" targetType="entry"/> <levels>langSet termEntry </levels> </descripSpec> <termCompListSpec name="termElement" datcatId="ISO12620A-020802"> <contents datatype="noteText" targetclass="none" forTermComp="yes"/> </termCompListSpec> <termNoteSpec name="termProvenance" datcatId="ISO12620A-020401"> <contents datatype="picklist" targetclass="none">transdisciplinaryBorrowing translingualBorrowing loanTranslation neologism </contents> </termNoteSpec> <termNoteSpec name="termStructure" datcatId="ISO12620A-020804"> <contents datatype="noteText" targetclass="none" forTermComp="yes"/> </termNoteSpec> <termNoteSpec name="termType" datcatId="ISO12620A-0201"> <contents datatype="picklist" targetclass="none">entryTerm synonym internationalScientificTerm fullForm transcribedForm symbol formula equation logicalExpression commonName abbreviatedFormOfTerm variant shortFormOfTerm transliteratedForm sku partNumber phraseologicalUnit synonymousPhrase standardText string internationalism </contents> </termNoteSpec> <transacSpec name="terminologyManagementTransactions" datcatId="ISO12620A-1001"> <contents datatype="picklist" targetclass="none" forTermComp="yes">origination input modification check approval withdrawal standardization exportation importation proposal userAccess </contents> </transacSpec> <descripSpec name="thesaurusDescriptor" datcatId="ISO12620A-0902"> <contents datatype="noteText" targetType="thesaurusDescrip"/> <levels>termEntry </levels> </descripSpec> <termNoteSpec name="timeRestriction" datcatId="ISO12620A-020306"> <contents datatype="noteText" targetclass="none"/> </termNoteSpec> <termNoteSpec name="transferComment" datcatId="ISO12620A-0305"> <contents datatype="noteText" targetType="term"/> </termNoteSpec> <descripSpec name="unit" datcatId="ISO12620A-0506"> <contents datatype="noteText" targetclass="none"/> <levels>term </levels> </desc