LISA Home page [© 2010 • ISSN 1420-3693 • www.localization.org]
© 2010 SMP Marketing • ISSN 1420-3693 • www.localization.org

In this issue…


Symantec Houdini: How Symantec addresses some of its help localisation issues

John Rowley, Symantec

Introduction

Localising Windows Help should be a straightforward process. After all, it's simply a matter of translating the Rich Text Format (RTF) files used to build the help text, and making sure they compile without errors isn't it? Well it is if you have been given accurate counts for words and graphics, receive the help files when they are completely stable, and have complete confidence in your translators to be consistent in their cross-referencing (even if the US writers were not) and unlikely to destroy any of the internal links in the help. And then there's the real world.


Help files are more likely to be delivered piece meal, either as individual RTF files, or worse, as individual topics. Then towards the end, the writers decide to change their browse sequences, alter the context strings and improve their keyword indexing. New topics also get added at the last minute to take account of feature creep, existing topics have their cross- referencing improved, and topics (already in translation) get deleted because the writers do some last minute restructuring. How do you keep track of it all? Translation memory systems may offer a solution, but there are other alternatives.

Symantec has developed an internal tool - Houdini - to cut down the time testing help and comparing against original files. Houdini is distributed to our current vendor base and has attracted a lot of positive comment to the extent we are considering a possible release to the market on the next release of Symantec C++. This article looks at the general issues associated with localising help, and how Houdini tries to address those issues.

Help Localisation Issues

Help localisation issues can be grouped into five categories:

  1. Generating project statistics
  2. Building the Help
  3. Ensuring the consistency of the help system
  4. Maintaining the integrity of the help
  5. Managing project updates.

Generating project statistics

Estimating the word count for a help project is particularly troublesome. You need to open each RTF file in a word-processor and generate a word count. A product like Norton Utilities for Windows `95 has approximately 10 help projects, using about 35 RTF files. Getting a word count for each RTF file in a project like this is time-consuming and slow. So imagine the headache you'd have trying to get help writers to supply you with this information on a regular basis.

Another problem is identifying the number of bitmaps used in a project. Ideally you are working with a clean help project, where the only bitmaps in the help directory are those used by the project. However, it's not unusual for writers to leave redundant bitmaps in the directory as they build and test their help. Traditionally, one way to find out which bitmaps are used by the project is to remove all the bitmaps from the directory, compile the help file and note the errors reported for missing bitmaps. Once again, this is time-consuming and not particularly efficient, either for the publisher or the vendor.

Generally it's useful to know where topics are located within the help system, how they refer to each other and what sort of attributes are associated with each topic (such as context strings, titles, browse sequences and so on). Trying to "map" out a help system like this is difficult, but if you have such a map it makes it a lot easier to track down and sort out problems.

Building the Help

Building the help involves organising and capturing any new bitmaps that need to be translated, including segmented bitmaps (bitmaps containing jumps), translating various options in the help project file, and finally compiling the help project. Most of this is a fairly straightforward process.

Ensuring the consistency of the help

Maintaining consistency covers two areas: cross- referencing and formatting.

Checking the consistency of cross-referencing usually looks for inconsistencies between:

  • Page titles and footnote titles -- ideally, these should match
  • The jump text and the page title of the topics being jumped to
  • Keyword usage -- for example, to ensure there are no instances of mixed case keywords (such as Utilities and utilities).

Checking the consistency of formatting looks for inconsistencies in:

  • Formatting of topics which are designed to "popup". Topics that have had a non-scrolling region defined in the heading (in other words, the style attribute "keep with next" was set) will not display correctly -- a problem not reported by the help compiler.
  • "Orphaned" hotspots -- where the underlined text (a jump or popup) is accidentally separated from the hidden text (the context string), usually by a space
  • Paragraph formatting -- a common problem is when paragraphs are formatted with the "hidden" attribute, causing text to display incorrectly on the screen.

Maintaining the integrity of the help

If you are translating a help file, particularly on a topic by topic basis during simultaneous translation, there's always a huge risk your files will get out of step with the US teams. Integrity inspections try to identify:

  • Help topics that have duplicate titles -- although an optional inspection, duplicate titles can confuse end-users when they are searching for help on a particular subject.
  • Missing topics -- these are topics referred to in the help which are not present in the help project
  • Duplicate topics -- although the help compiler informs you of the problem, irritatingly, it does not tell you where BOTH topics are located, making it difficult to track down.
  • References to external topics (topics contained in other help files) -- the help compiler has no means of checking if these topics exist.
  • Structural differences between the English help project and the translated help project. For example, you need to ensure that topics in the English files have matching topics in the translated files, and that the original topic and the translated topic have same browse sequences, build tags, cross-references, bitmaps and so forth.

Managing Updates

Managing updates can involve tracking weekly changes between help projects during a simultaneous ship, or identifying changes between two significant product releases. Tracking is easy if writers use tracking sheets to identify their changes, but tracking sheets that are maintained manually are difficult to implement, especially as deadlines draw close. If you are working on a simultaneous ship, then this sort of information becomes even more crucial; significantly, you need to be get this information on a regular basis so that you can monitor the progress of the help system.

From a localisation perspective, the crucial task is to determine:

  • Which topics can be deleted from the project - these can be manualy extracted before sending on to vendors
  • Which new topics have been added to the project - once identified, these can be extracted and sent out to vendors for translation while further analysis is going on.
  • What structural changes have been made to topics common to both releases. For example, has the word count changed between topics, have additional cross references being added, deleted or modified in any way?

Once you've identified the changes you'll probably find that the biggest problem is modifying the topics common to both releases: changing browse sequences, updating the keyword list, and (an absolute nightmare) modifying the topic's unique identifier (the context string) because writers renamed them in the new project. (On one Symantec project, a simple one hour change in the US required about 96 hours of work on the translated edition.)

Symantec Houdini

Clearly, there's a lot involved in localising help. Traditionally, the main tool would have been a word processor to translate the material, and then running the files through the help compiler to check that everything worked. However, to check consistency between jumps and formatting issues, you would have had to manually work your way through the file. This is time consuming and, depending on the size of the help file and the type of checks you want to make, this can at least 5 days.

Symantec developed Houdini to try and address many of these problems. The tool works on the project file (the HPJ) and the source files (the rich text format files). The first version simply reported on the consistency between page and footnote titles and jumps. This feature alone reduced consistency testing down to about a day, compared with the more usual 5 days checking.

The next version introduced a statistics feature, providing statistics such as topic counts, word counts, bitmap counts and so on. This version was given to the US writers so that they could keep track of their word counts without having to go through the tedious process of counting everything in a word processor. Additional reports were subsequently added to identify formatting errors and keyword issues. All Houdini reports can be saved as a text file; to make the reports easy to read, the files are tabbed-delimited, so they can be opened in a spreadsheet.

The next edition of Houdini focused on keyword translation. Typically, keywords are duplicated throughout the help project but there is no easy way to translate them once and have the translation replicated across the help project. Houdini now extracts the unique keywords into a table. Translators can edit this table have the changes updated within the RTF files. Again, this saves time and ensures consistency.

The latest edition of Houdini focuses on comparing the structure of the English help project with the translated project. The comparison reports on topics missing from the translated file, translated topics which no longer appear in the English file, hotspot differences between English and translated topics, and differences in footnotes such as browse sequences, macros, build tags and so on (footnotes which frequently get changed at the last minute).

The next edition of Houdini will focus on inserting and extracting individual topics to and from RTF files. This feature will make is easier to provide updates to translators during a simultaneous ship, rather than supplying them with complete RTF files containing half-written topics. It will also have an footnote updating feature, so that the latest English footnotes (such as browse sequences) can be updated to the translation build.

Like all tools, Houdini's main success comes from being integrated into the localisation process. US writers use the tool to obtain word counts and imporve the consistency within the text before passing it on for translation; localisation uses the tool to compare the various differences between US builds, so that we can keep vendors informed of progress, and vendors are given the tool so that they can check their work before returning the project to Symantec. The quality of work being returned by vendors has been raised significantly, reducing the amount of inspection work we have to do internally. We hope to release a version Houdini with the next release of Symantec C++, sometime in the December quarter.


John Rowley
Symantec Ltd.
Ballycoolin Industrial Park
Blanchardstown
Dublin 15
Ireland
Tel: (353) 1 820 5060
Fax: (353) 1 820 4055
Email: jrowley@symantec.com




Contents


LISA Business Data

LISA Publications Catalog

Industry Insights Reports

Best Practice Guides

Surveys

QA Model

Forum Summaries and Presentations

LISA Globalization Consulting Network

Webinars and TouchPoint Advisory Calls


Join LISA

Subscribe


Upcoming Events

LISA Forum USA
(Foster City, California, April 13–16, 2010)

LISA@Chinasoft Fair
(Chengdu, China)

LISA Forum Asia
(Suzhou, June 28–July 1, 2010)

LISA Forum Europe
(Budapest, October, 2010)

LISA Forum India
(New Delhi, December, 2010)


Open StandardsTBXTMX

Terminology SIG

Job and CV Postings