Skip to main content
EMT Blog
  • News article
  • 10 May 2022
  • 12 min read

Multi-user translation and open-source CAT software: OmegaT in action

By Lilian FAEDI and Ismaël GARIN, students of UFR Arts, Lettres et Langues Metz – Université de Lorraine, supervised by Jean-Christophe Helary, their internship supervisor, translator, and member of the OmegaT development team.

This article was co-written by Lilian FAEDI and Ismaël GARIN as part of their internship for their second year of a master’s degree in Translation Technologies (TeTra), UFR Arts, Lettres et Langues Metz – Université de Lorraine. The writing was supervised by Jean-Christophe Helary, their internship supervisor, translator, and member of the OmegaT development team.

Introduction

The main objectives of this article are to introduce the reader to multi-user translation in OmegaT, to demonstrate the most useful features for multi-user use, and to present some general principles underlying the operation of OmegaT. It is an open-source software and can be used free of charge in a multi-user configuration. This makes it possible to offer professionals easily accessible solutions at a time when teleworking paradoxically enables greater collaboration between colleagues.

The question is: how does working on a multi-user project with OmegaT fit in with the translator's needs?

OmegaT, an open-source software

The focus here is on the benefits that open-source software can bring to collaborative working in general. OmegaT's license allows it to be used freely, which implies that the software is free. The rights that are guaranteed to the user are very precisely defined by its license (GPL, version 3):

https://en.wikipedia.org/wiki/GNU_General_Public_License

We can summarize these rights as follows:

  • The freedom to run the program, for any purpose;
  • The freedom to study how the program works and to adapt it to your needs;
  • The freedom to redistribute copies of the program (which implies the possibility of both giving and selling copies);
  • The freedom to improve the program and to distribute these improvements to the public, for the benefit of the whole community. Access to the source code is a condition for the exercise of rights 1 and 3.

OmegaT is a professional CAT software program that has been around for 20 years, and has benefited from code contributions from the DGT, the South African government, and CAT companies such as weLocalize and Kilgray. This is in addition to the multitude of lesser-known companies that have contributed to the code and features of the project:

https://omegat.org/sponsorship.html

Multi-user project: data sharing and data integrity

OmegaT itself does not support data sharing. However, it does include features that allow it to use data sharing systems.

There are three types of data sharing systems:

  • Systems that do not check the integrity of the data when accessed simultaneously by several users ("DropBox" type);
  • Systems that guarantee data integrity by limiting access to one user at a time ("database" type in client-server mode);
  • Systems that provide a copy of the data to each user and synchronize changes by regularly merging them with the "parent" version, while providing a system for managing potentially conflicting mergers (such as the "version control systems" used in online multi-developer code writing projects).

OmegaT chose the third option for the following reasons:

  • "DropBox" type systems do not guarantee the integrity of simultaneously accessed data;
  • Database systems are difficult to install and require advanced technical knowledge;
  • Collaborative development systems are ubiquitous on the Internet. They do not require additional software installation, and the knowledge required to use them does not exceed that of an average user of IT tools.

OmegaT Screenshot number 1

"Gitea" collaborative development platform hosted by the April association

Multi-user project for OmegaT

This article does not cover the setup part of the project, which will be developed in a later article. We are presenting the multi-user project as it was made available to us for the duration of our internship. This project is freely available, but restricted and can be viewed here:

https://forge.chapril.org/brandelune/documentation_emacs

The access to a multi-user project is determined by the login settings of the website where the files are hosted:

  • The project can be invisible to people who are not members (typical case of a translation agency project);
  • The project can be visible to the general public, but limited in writing to members (typical case of an open volunteer project);
  • The project can be visible to the public and open for writing (a typical test project).

Like any other OmegaT project (multi-user or local), it consists of an `omegat.project` file containing the project settings, an `omegat/` folder containing, among other things, the translation memory in `TMX` format, which is shared for reading and writing, and a `glossary/` folder containing, among other things, the glossary in `TSV` format (encoded in `UTF-8`), which is also shared for reading and writing.

Managing the access conflicts, the conflicts in writing and others make the `gitea` system entirely responsible for this. OmegaT acts as a transparent proxy between the user and the system.

The multi-user project can be accessed from the software interface, which will copy all the project files locally and authorize smooth synchronization (by default, every three minutes, which makes it possible to translate in "offline" mode for staggered synchronization).

To do this, each member must open the software and click on the option "Download Team Project" (available in the main window, via the " Project " menu):

OmegaT Screenshot number 2

Download Team Project

OmegaT Screenshot number 3

Enter the project address and the location for the local copy

Each participant in the project is registered by the project manager as having the right to overwrite the files located online. After uploading, OmegaT will ask for the access logs.

OmegaT Screenshot number 4

Entering individual access logs

It will now display the list of files in the project. Note that the source files within the project are not hosted on the corresponding file management repository site, but rather in a separate repository which is set up in the `omegat.project` file described above:

https://forge.chapril.org/brandelune/documentation_emacs/src/branch/main/omegat.project

Line `30` of the file gives us the address of the repository, which we can access directly:

https://git.sr.ht/~brandelune/emacs_documentation_repository/tree

Once the files are displayed in the OmegaT translation interface, we can translate them one by one, validate the translation with `Enter` and, once the set save time has passed, OmegaT will synchronize the translated contents. The partnered server will retrieve this content when it synchronizes with it. Any conflict (where the same segment has been translated in two different ways between two synchronizations) is displayed in the OmegaT interface to the "conflicting" user in a window where they are asked to choose between their version and the version suggested in the synchronization window.

OmegaT Screenshot number 5

Conflict when adding the segment

Translation memories in OmegaT

Project memory

To understand the use of translation memories in OmegaT, it is important to talk about them. First, OmegaT creates one translation memory per project by default. The user never needs to create a memory and associate it with a project. An OmegaT project is a translation memory, which is at the heart of the project. It is fueled by the translator's work and, in multi-user mode, by the synchronized translations of other project members. As mentioned above, it is also located in the `omegat/` folder:

https://forge.chapril.org/brandelune/documentation_emacs/src/branch/main/omegat

All the memories used by OmegaT are in the TMX standard format. This format allows interoperability with other CAT software.

Auxiliary memories

An OmegaT project is a files' hierarchy in which there is, at the same level of `omegat/`, `source/`, `glossary/`, a `tm/` folder which aims at receiving reference memories (generally in `TMX`, but also in other bilingual formats supported by OmegaT: `XLIFF`, `PO`, etc.).

This folder can be divided into sub-folders to better the operating of these references. Some of them have specific features that the translator may use freely:

  • tm/auto/ will include memories that the translator wants automatically spread throughout the project as if it was the default memory;
  • `tm/enforce/` will include memories that are prioritized over contents already saved in the project. The new contents will then overwrite contents already saved in the project's memory;
  • `tm/mt/` will include memories that originate from automatic translation systems to which the translator should be specifically aware of. These contents will be displayed and highlighted in red whenever they are inserted in a segment;
  • `tm/penalty-010/` will include memories that have a matching rate which will penalize any of the given number after the hyphen (here: `10 %`);
  • `tm/tmx2source/` will include memories that originate from a translated document in another target language. The translator can then see his translation in the other target language directly displayed beneath the segment to be translated.

The memory just has to be copied in the `tm/` folder so that it is immediately supported in the project. In the same way, it has just to be removed from the folder so that their contents aren't referenced.

Shared memories

Each time a user creates target documents, OmegaT will generate three `TMX` files that only include translated segments from source files currently in translation, never the whole global memory's content of the project. These three files have the following particularities:

  • `[name of the project]-omegat.tmx`, that include the translations followed by tags used internally by OmegaT to be reused in other OmegaT projects;
  • `[name of the project]-level1.tmx`, that include `TMX 1.4b level 1` data, i.e. only the translation's text content, without tags;
  • `[name of the project]-level2.tmx`, that include `TMX 1.4b level1 2` data, i.e. the translation's text content, with tags in their syntax defined by the `TMX` standard.

https://www.gala-global.org/tmx-14b#SectionIntroduction

Those memories can be shared with other users, kept for later use, etc.

Glossaries

Alongside translation memories, OmegaT also uses glossaries like other CAT tools. Within the `glossary/` folder in the shared project, each member of the translation staff can add their own glossaries. It is a simple text file in `TSV` format (Tab Separated Values) encoded in `UTF-8`.

The user should simply create a text file (in `.txt` format) in order to use one. If they want to enter or add a new term inside, here are the elements - separated with tabulation spaces - that they should consider:

  • The source term (e.g.: and so on);
  • The target term (e.g.: et ainsi de suite);
  • A remark (e.g.: idiomatic expression to keep).

OmegaT Screenshot number 6

Glossary opened in a text editor

We can directly access glossaries via two ways:

  • Either go to "Project" > "Access Project Contents" > "Writeable Glossary";
  • Or click on the gearwheel in the top-right corner of the Glossary window (if opened during translation).

Provided changes are immediately taken in account within the project as soon as the changes in the text editor are saved.

The OmegaT interface can directly add terms without using a text editor by the means of the "Edit > Create Glossary Entry" menu. These entries are added to the writeable glossary of the project which will be synchronized in the multi-user project.

OmegaT Screenshot number 7

How glossaries are displayed in OmegaT

Communication between the project's members with synchronized notes

The "Notes" in OmegaT are useful to do many things; one of them being proof-reading. Each user can write a note attached to a particular segment in order to give miscellaneous indications about the elements to correct, or remarks/pieces of advice related to the glossary or the translation's coherence. It constitutes the basic tool for proof-readers.

OmegaT Screenshot number 8

Window "Notes" that indicates the note attached to the current segment

OmegaT's notes are part of the `TMX` standard:

https://www.gala-global.org/tmx-14b#note

The notes' content is then synchronized in a multi-user project and can establish communication between associates.

With the "View" menu, OmegaT can highlight segments attached with notes thanks to the "Mark Segments with Notes" (which are highlighted in light blue by default). In the translation's main window, these segments are now clearly identified:

OmegaT Screenshot number 9

Segments with notes among translated segments

The notes' content can be part of a search to display related segments. It is then possible to only search for segments that are attached to notes, in afford to work on those ones.

QA check tools

There are many features for QA check and proof-reading in OmegaT. Other articles (in French) have already spoken about that:

Teamwork in a multi-user project

A multi-user project aims to divide tasks between the members of the project.

Each associate is identified with the name they have logged in with the settings of OmegaT. The user ID will be saved in each segment of the `TMX` memory, either as a member that created the segment or as a member that edited the segment with a date-hour code.

https://forge.chapril.org/brandelune/documentation_emacs/src/branch/main/omegat/project_save.tmx#L43

OmegaT does not provide roles assignation features, but instead present enough clearly identified features so that the members of a given project can divide their tasks themselves. For instance, a team may have:

  • A member who can translate a file;
  • Another one who will proof-read the translation, possibly with only ten minutes of interval;
  • A third member who can perform global changes for contents standardization (glossary, references, etc.);
  • A fourth member who can generate the source files to proof-read in the context of the final format, so that they can transfer changes in the project.

The search feature is extremely powerful and can perform searches for names, dates, contents (with or without regular expressions), either in source or target. It is then simple to identify the segments translated by a member between two given moments.

The search interface can filter the found segments to force OmegaT to display only those that are in the editor.

Conclusion

This article has presented multi-user features which ensure that OmegaT is a high-profile tool to work on complex translation projects. The synchronization allowed using collaborative development websites that provide the user (and the whole translation staff) with access to all elements of a project while continually updating these. Synchronized translation memories are what makes OmegaT so useful for multi-user shared projects. With shared glossaries, notes and other QA tools, translation shows itself as easy as ever. We can also take into account tasks for content checking.

Details

Publication date
10 May 2022
Language
  • English
  • French
EMT Category
  • Translation technology