Tuesday, September 29, 2015

Translating AutoCAD Drawings: TranslateCAD + Studio

Translating AutoCAD files may seem like a daunting proposition. After all, it's not an everyday file format for many of us. Luckily, there's a way to handle them with relative ease in SDL Trados Studio with the assistance of a separate program called TranslateCAD and a little file type customization via regular expressions.

1. Preparing the files

If the source files are DWGs, the first step will be converting them to DXFs, as TranslateCAD can only process files in DXF format. If AutoCAD is not available for the conversion, a free program like DraftSight will work as well.

2. Extracting text with TranslateCAD

Start TranslateCAD, navigate to the folder containing the DXF files, select all the files that need to be translated and click the Extract Text button.

TranslateCAD will create two text files for each DXF, adding "trans1" and "trans2" to the name. The "trans1" files contain the translatable text, so those are the ones to be used in Studio, while the "trans2" files should be left untouched.

3. Customizing the Studio TXT file type*

*This step will only need to be completed once if the same file type settings will be used to process future AutoCAD-based TXT files.

The TXT file produced by TranslateCAD includes some content that is not translatable text, as shown here:

Opening the file in Studio with the default TXT file type settings will result in each of those lines being included as a translatable segment. To exclude them, we can customize the file type by adding regular expressions to tell Studio to interpret that text as inline tags, and therefore non-translatable.

We can do this for all future projects and single files by going to Options > File Types > Text, or only for our active project by going to Project Settings > File Types > Text.

Once the Text file type is selected, we go to Inline tags and click the Add rule button.

When the Add inline rule window opens, we select Placeholder for the Rule Type and enter our regular expression in the Rule Opening field:

After clicking OK, the rule will be added to the list. Additional rules can be added by repeating this procedure.

These are a few examples of regular expressions that can be useful when working with AutoCAD files.

Regular Expression
Sample captured text
A segment starting with two pound signs followed by one or more numbers and ending in two pound signs
A numbers-only segment
A segment consisting of a single uppercase letter

Once the appropriate rules have been added, we can close the Options (or Project Settings) dialog box.

A bug-related update (April 2016)
Paul Filkin has brought to my attention that there is a bug in SDL Trados Studio 2015 SR2 that causes the above procedure to fail, for which he has found a workaround. This may be helpful for anyone who comes across this issue.

Paul writes:

In the current version of Studio this doesn’t seem to work.  I’ve logged a bug and found a workaround by adding these things to the structure as opposed to inline tagging:

4. Translating the files in Studio

Now that everything is ready, we can add the TXT files to Studio, and process them as we normally would, translate them and generate the target files. Since our target files should replace the original TXT files, the name should remain unchanged and they should be placed in the same folder as the original source files.

5. Converting the translated TXT files back to DXF

Once the target TXT files are ready, we go back to TranslateCAD and use the "Join TXT files to recreate Translated DXF" feature. It's a simple matter of selecting the files we need to convert and clicking Re-Construct DXF.

TranslateCAD will produce new DXF files with "trans" appended to the name.

6. Converting the DXF files to DWG

For the final step we will need to go back to AutoCAD or DraftSight and save the DXF files as DWGs.

And that's it! This may sound complicated, but in fact, once we've set up the file type regex rules, any future files can be processed rather quickly.

Sometimes it’s the Little Things: Managing Abbreviations for Better Segmentation

One of the main sentence-level segmentation rules in Studio uses a full stop to indicate the end of a sentence and therefore the end of a segment. For most jobs, this default segmentation works well, but what if our source text looks like this?

Since Studio interprets each of those periods as the end of a sentence, our file will look like this:

If word order is different in the source and target languages, working with this file will mean entering the translation and locking each segment without confirming it, which means the new translations won’t be added to the TM, which will result in losing all concordance and propagation benefits, as well as any potential future leverage. And if the file has hundreds or thousands of segments like this, productivity can be significantly affected.

Luckily, there’s a simple solution to this: adding “ELEC.”, “HYDR.” and “SYS.” (or any relevant abbreviations, of course) to Studio’s list of recognized abbreviations. Basically, Studio will create a new segment after every period, except when that period is used as part of an abbreviation, so we can make use of this feature, as the list of abbreviations can be edited in the Translation Memory’s settings. Here’s how to do it.

First, go to your TM settings, then Language Resources > Abbreviation List, and click Edit.

This opens the Abbreviations list. Scroll to the bottom and add your abbreviations.

After clicking OK 3 times to close the TM settings window, the new abbreviations will now be recognized by the TM and therefore Studio will ignore them when segmenting a file.

Note that this new segmentation cannot be applied to an existing SDLXLIFF file, which is already segmented, so the source file will need to be processed again by either adding it to a project or opening it as a single file, using the TM that contains the new abbreviations.

After doing so, we get the following Studio file for our example above.

Much better!

As a final note, keep in mind that abbreviations are part of the Translation Memory, which means we can customize them as needed, based on our various files and projects.