What is tokenization on GlobalLink Web?

GlobalLink Web employs tokenization to minimize repetitive translation and content bleeding. Tokenization identifies and generalizes common patterns within the text to render the translation applicable in multiple variants of similar phrases. This feature eliminates the need to re-translate phrases that are identical, except for a variable value that may not require translation. For example:

Source Segment Tokenized Variant
You have 5 items in your cart. You have {{n}} items in your cart.
You have 9 items in your cart. You have {{n}} items in your cart.
You have 12 items in your cart. You have {{n}} items in your cart.

In this example, GlobalLink Web ignores the variable value in the source segment thereby allowing one translation to cover all possible iterations. While there are obvious time and cost efficiencies to tokenization, there may actually be cases where the localization of certain tokens is required for your business.

To understand the right strategy for your website, let's begin by reviewing the units that are treated as tokens by default on GlobalLink Web:

  • Numbers
  • Time zones
  • Times
  • Dates
  • Currencies
  • Measurements

If you have business or regional requirements to localize any of these units, there are a few nuances to understand. Here are some examples of localized tokens that might apply to your use-case:

English Source Segment French Translation
Our company was incorporated on June 05, 2000. Notre société a été constituée le 5 juin 2000.
This is a 3,000 sq ft house. Il s'agit d'une maison de 278,70 mètres carrés.
Your remaining balance is $100.00 Votre solde restant est de 89,31 €

In each example, there are multiple intricacies associated with the localization of the token that can be reduced to 3 basic categories.

1 = Reformatting (e.g. MM-DD-YYYY to DD-MM-YYYY)

2 = Conversion & translation (e.g. 3,000 sq ft house to maison de 278,70 mètres carrés)

3 = Placement of separators (e.g. $100.00 to 89,31 €)

Best practice for each category will vary by language and region, but the general concept can be universally applied. Prior to implementing any solution, you will want to have a good understanding of the nuances of the local markets you are targeting.

What are my options?

There are three approaches to token localization on GlobalLink Web, listed below from least to most ideal.

1 = Disable GlobalLink Web tokenization entirely, forcing every segment on your website (including those with variable tokens) to be Machine Translated. This approach is discouraged because you will lose the efficiencies discussed at the outset of this article, unnecessarily increasing the amount of Machine Translation you consume. Furthermore, this will present challenges for your reviewers in that edits to a segment containing a variable token would not apply to the same segment with a different token value.

2 = With the exception of measurement and currency conversions, GlobalLink Web can be used to localize certain tokens such as dates, times and time zones. Please contact your GlobalLink Web Support team if you wish to implement one of these custom token approaches.

3 = The best approach to token localization is to set up the required logic on the origin website that GlobalLink Web is translating for each market. By doing so, you are in complete control of any custom formats and conversions while maintaining GlobalLink Web's default token logic designed to maximize time and cost efficiencies.