Skip to content

Latest commit

 

History

History
610 lines (439 loc) · 22.7 KB

locale.rst

File metadata and controls

610 lines (439 loc) · 22.7 KB

Locale management

A locale is a combination of language, region, script, and regional preferences the user wants to format their data into.

There are multiple models of locale data structures in the industry that have varying degrees of compatibility between each other. Historically, each major platform has used their own, and many standard bodies provided conflicting proposals.

Mozilla, alongside with most modern platforms, follows Unicode and W3C recommendation and conforms to a standard known as BCP 47 which describes a low level textual representation of a locale known as language tag.

A few examples of language tags: en-US, de, ar, zh-Hans, es-CL.

Locales and Language Tags

Locale data structure consists of four primary fields.

  • Language (Example: English - en, French - fr, Serbian - sr)
  • Script (Example: Latin - Latn, Cyrylic - Cyrl)
  • Region (Example: United States - US, Canada - CA, Russia - RU)
  • Variants (Example: Mac OS - macos, Windows - windows, Linux - linux)

BCP 47 specifies the syntax for each of those fields (called subtags) when represented as a string. The syntax defines the allowed selection of characters, their capitalization, and the order in which the fields should be defined.

Most of the base subtags are valid ISO codes, such as ISO 639 for language subtag, or ISO 3166-1 for region.

The examples above present language tags with several fields omitted, which is allowed by the standard.

On top of that, a locale may contain:

  • extensions and private fields
    These fields can be used to carry additional information about a locale. Mozilla currently has partial support for them in the JS implementation and plans to extend support to all APIs.
  • extkeys and "grandfathered" tags (unfortunate language, but part of the spec)
    Mozilla does not support these yet.

An example locale can be visualized as:

{
    "language": "sr",
    "script": "Cyrl",
    "region": "RU",
    "variants": [],
    "extensions": {},
    "privateuse": [],
}

which can be then serialized into a string: "sr-Cyrl-RU".

Important

Since locales are often stored and passed around the codebase as language tag strings, it is important to always use an appropriate API to parse, manipulate and serialize them. Avoid Do-It-Yourself solutions which leave your code fragile and may break on unexpected language tag structures.

Locale Fallback Chains

Locale sensitive operations are always considered "best-effort". That means that it cannot be assumed that a perfect match will exist between what the user requested and what the API can provide.

As a result, the best practice is to always operate on locale fallback chains - ordered lists of locales according to the user preference.

An example of a locale fallback chain may be: ["es-CL", "es-ES", "es", "fr", "en"].

The above means a request to format the data according to the Chilean Spanish if possible, fall back to Spanish Spanish, then any (generic) Spanish, French and eventually to English.

Important

It is always better to use a locale fallback chain over a single locale. In case there's only one locale available, a list with one element will work while allowing for future extensions without a costly refactor.

Language Negotiation

Due to the imperfections in data matching, all operations on locales should always use a language negotiation algorithm to resolve the best available set of locales, based on the list of all available locales and an ordered list of requested locales.

Such algorithms may vary in sophistication and number of strategies. Mozilla's solution is based on modified logic from RFC 5656.

The three lists of locales used in negotiation:

  • Available - locales that are locally installed
  • Requested - locales that the user selected in decreasing order of preference
  • Resolved - result of the negotiation

The result of a negotiation is an ordered list of locales that are available to the system, and the consumer is expected to attempt using the locales in the resolved order.

Negotiation should be used in all scenarios like selecting language resources, calendar, number formatting, etc.

Single Locale Matching

Every negotiation strategy goes through a list of steps in an attempt to find the best possible match between locales.

The exact algorithm is custom, and consists of a 6 level strategy:

1) Attempt to find an exact match for each requested locale in available
   locales.
   Example: ['en-US'] * ['en-US'] = ['en-US']

2) Attempt to match a requested locale to an available locale treated
   as a locale range.
   Example: ['en-US'] * ['en'] = ['en']
                          ^^
                          |-- becomes 'en-*-*-*'

3) Attempt to use the maximized version of the requested locale, to
   find the best match in available locales.
   Example: ['en'] * ['en-GB', 'en-US'] = ['en-US']
              ^^
              |-- ICU likelySubtags expands it to 'en-Latn-US'

4) Attempt to look for a different variant of the same locale.
   Example: ['ja-JP-win'] * ['ja-JP-mac'] = ['ja-JP-mac']
              ^^^^^^^^^
              |----------- replace variant with range: 'ja-JP-*'

5) Attempt to look for a maximized version of the requested locale,
   stripped of the region code.
   Example: ['en-CA'] * ['en-ZA', 'en-US'] = ['en-US', 'en-ZA']
              ^^^^^
              |----------- look for likelySubtag of 'en': 'en-Latn-US'

6) Attempt to look for a different region of the same locale.
   Example: ['en-GB'] * ['en-AU'] = ['en-AU']
              ^^^^^
              |----- replace region with range: 'en-*'

Filtering / Matching / Lookup

When negotiating between lists of locales, Mozilla's LocaleService API offers three language negotiation strategies:

Filtering

This is the most common scenario, where there is an advantage in creating a maximal possible list of locales that the user may benefit from.

An example of a scenario:

let requested = ["fr-CA", "en-US"];
let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-CH"];

let result = Services.locale.negotiateLanguages(requested, available);

result == ["fr-CA", "fr", "fr-CH", "en-GB", "en-ZA"];

In the example above the algorithm was able to match "fr-CA" as a perfect match, but then was able to find other matches as well - a generic French is a very good match, and Swiss French is also very close to the top requested language.

In case of the second of the requested locales, unfortunately American English is not available, but British English and South African English are.

The algorithm is greedy and attempts to match as many locales as possible. This is usually what the developer wants.

Matching

In less common scenarios the code needs to match a single, best available locale for each of the requested locales.

An example of this scenario:

let requested = ["fr-CA", "en-US"];
let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-ZH"];

let result = Services.locale.negotiateLanguages(
  requested,
  available,
  undefined,
  Services.locale.langNegStrategyMatching);

result == ["fr-CA", "en-GB"];

The best available locales for "fr-CA" is a perfect match, and for "en-US", the algorithm selected British English.

Lookup

The third strategy should be used in cases where no matter what, only one locale can be ever used. Some third-party APIs don't support fallback and it doesn't make sense to continue resolving after finding the first locale.

It is still advised to continue using this API as a fallback chain list, just in this case with a single element.

let requested = ["fr-CA", "en-US"];
let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-ZH"];

let result = Services.locale.negotiateLanguages(
  requested,
  available,
  Services.locale.defaultLocale,
  Services.locale.langNegStrategyLookup);

result == ["fr-CA"];

Default Locale

Besides Available, Requested and Resolved locale lists, there's also a concept of DefaultLocale, which is a single locale out of the list of available ones that should be used in case there is no match to be found between available and requested locales.

Every Firefox is built with a single default locale - for example Firefox zh-CN has DefaultLocale set to zh-CN since this locale is guaranteed to be packaged in, have all the resources, and should be used if the negotiation fails to return any matches.

let requested = ["fr-CA", "en-US"];
let available = ["it", "de", "zh-CN", "pl", "sr-RU"];
let defaultLocale = "zh-CN";

let result = Services.locale.negotiateLanguages(requested, available, defaultLocale);

result == ["zh-CN"];

Chained Language Negotiation

In some cases the user may want to link a language selection to another component.

For example, a Firefox extension may come with its own list of available locales, which may have locales that Firefox doesn't.

In that case, negotiation between user requested locales and the add-on's list may result in a selection of locales superseding that of Firefox itself.

     Fx Available
    +-------------+
    |  it, fr, ar |
    +-------------+                 Fx Locales
                  |                +--------+
                  +--------------> | fr, ar |
                  |                +--------+
        Requested |
 +----------------+
 | es, fr, pl, ar |
 +----------------+                 Add-on Locales
                  |                +------------+
                  +--------------> | es, fr, ar |
  Add-on Available |               +------------+
+-----------------+
|  de, es, fr, ar |
+-----------------+

In that case, an add-on may end up being displayed in Spanish, while Firefox UI will use French. In most cases this results in a bad UX.

In order to avoid that, one can chain the add-on negotiation and take Firefox's resolved locales as a requested, and negotiate that against the add-ons' available list.

    Fx Available
   +-------------+
   |  it, ar, fr |
   +-------------+                Fx Locales (as Add-on Requested)
                 |                +--------+
                 +--------------> | fr, ar |
                 |                +--------+
       Requested |                         |                Add-on Locales
+----------------+                         |                +--------+
| es, fr, pl, ar |                         +------------->  | fr, ar |
+----------------+                         |                +--------+
                                           |
                          Add-on Available |
                         +-----------------+
                         |  de, es, ar, fr |
                         +-----------------+

Available Locales

In Gecko, available locales come from the Packaged Locales and the installed language packs. Language packs are a variant of WebExtensions providing just localized resources for one or more languages.

The primary notion of which locales are available is based on which locales Gecko has UI localization resources for, and other datasets such as internationalization may carry different lists of available locales.

Requested Locales

The list of requested locales can be read and set using LocaleService::requestedLocales API.

Using the API will perform necessary sanity checks and canonicalize the values.

After the sanitization, the value will be stored in a pref intl.locale.requested. The pref usually will store a comma separated list of valid BCP47 locale codes, but it can also have two special meanings:

  • If the pref is not set at all, Gecko will use the default locale as the requested one.
  • If the pref is set to an empty string, Gecko will look into OS app locales as the requested.

The former is the current default setting for Firefox Desktop, and the latter is the default setting for Firefox for Android.

If the developer wants to programmatically request the app to follow OS locales, they can assign null to requestedLocales.

Regional Preferences

Every locale comes with a set of default preferences that are specific to a culture and region. This contains preferences such as calendar system, way to display time (24h vs 12h clock), which day the week starts on, which days constitute a weekend, what numbering system and date time formatting a given locale uses (for example "MM/DD" in en-US vs "DD/MM" in en-AU).

For all such preferences Gecko has a list of default settings for every region, but there's also a degree of customization every user may want to make.

All major operating systems have a Settings UI for selecting those preferences, and since Firefox does not provide its own, Gecko looks into the OS for them.

A special API mozilla::intl::OSPreferences handles communication with the host operating system, retrieving regional preferences and altering internationalization formatting with user preferences.

One thing to notice is that the boundary between regional preferences and language selection is not strong. In many cases the internationalization formats will contain language specific terms and literals. For example a date formatting pattern into Japanese may look like this - "2018年3月24日", or the date format may contains names of months or weekdays to be translated ("April", "Tuesday" etc.).

For that reason it is tricky to follow regional preferences in a scenario where Operating System locale selection does not match the Firefox UI locales.

Such behavior might lead to a UI case like "Today is 24 października" in an English Firefox with Polish date formats.

For that reason, by default, Gecko will only look into OS Preferences if the language portion of the locale of the OS and Firefox match. That means that if Windows is in "en-AU" and Firefox is in "en-US" Gecko will look into Windows Regional Preferences, but if Windows is in "de-CH" and Firefox is in "fr-FR" it won't. In order to force Gecko to look into OS preferences irrelevant of the language match, set the flag intl.regional_prefs.use_os_locales to true.

UI Direction

Since the UI direction is so tightly coupled with the locale selection, the main method of testing the directionality of the Gecko app lives in LocaleService.

LocaleService::IsAppLocaleRTL returns a boolean indicating if the current direction of the app UI is right-to-left.

Default and Last Fallback Locales

Every Gecko application is built with a single locale as the default one. Such locale is guaranteed to have all linguistic resources available, should be used as the default locale in case language negotiation cannot find any match, and also as the last locale to look for in a fallback chain.

If all else fails, Gecko also support a notion of last fallback locale, which is currently hardcoded to "en-US", and is the very final locale to try in case nothing else (including the default locale) works. Notice that Unicode and ICU use "en-GB" in that role because more English speaking people around the World recognize British regional preferences than American (metric vs. imperial, Fahrenheit vs Celsius etc.). Mozilla may switch to "en-GB" in the future.

Packaged Locales

When the Gecko application is being packaged it bundles a selection of locale resources to be available within it. At the moment, for example, most Firefox for Android builds come with almost 100 locales packaged into it, while Desktop Firefox comes with usually just one packaged locale.

There is currently work being done on enabling more flexibility in how the locales are packaged to allow for bundling applications with different sets of locales in different areas - dictionaries, hyphenations, product language resources, installer language resources, etc.

Web Exposed Locales

For anti-tracking or some other reasons, we tend to expose spoofed locale to web content instead of default locales. This can be done by setting the pref intl.locale.privacy.web_exposed. The pref is a comma separated list of locale, and empty string implies default locales.

The pref has no function while privacy.spoof_english is set to 2, where "en-US" will always be returned.

Multi-Process

Locale management can operate in a client/server model. This allows a Gecko process to manage locales (server mode) or just receive the locale selection from a parent process (client mode).

The client mode is currently used by all child processes of Desktop Firefox, and may be used by, for example, GeckoView to follow locale selection from a parent process.

To check the mode the process is operating in, the LocaleService::IsServer method is available.

Note that L10nRegistry.registerSources, L10nRegistry.updateSources, and L10nRegistry.removeSources each trigger an IPC synchronization between the parent process and any extant content processes, which is expensive. If you need to change the registration of multiple sources, the best way to do so is to coalesce multiple requests into a single array and then call the method once.

Mozilla Exceptions

There's currently only a single exception of the BCP47 used, and that's a legacy "ja-JP-mac" locale. The "mac" is a variant and BCP47 requires all variants to be 5-8 character long.

Gecko supports the limitation by accepting the 3-letter variants in our APIs and also provides a special appLocalesAsLangTags method which returns this locale in that form. (appLocalesAsBCP47 will canonicalize it and turn into "ja-JP-macos").

Usage of language negotiation etc. shouldn't rely on this behavior.

Events

LocaleService emits two events: intl:app-locales-changed and intl:requested-locales-changed which all code can listen to.

Those events may be broadcasted in response to new language packs being installed, or uninstalled, or user selection of languages changing.

In most cases, the code should observe the intl:app-locales-changed and react to only that event since this is the one indicating a change in the currently used language settings that the components should follow.

Testing

Many components may have logic encoded to react to changes in requested, available or resolved locales.

In order to test the component's behavior, it is important to replicate the environment in which such change may happen.

Since in most cases it is advised for a component to tie its language negotiation to the main application (see Chained Language Negotiation), it is not enough to add a new locale to trigger the language change.

First, it is necessary to add a new locale to the available ones, then change the requested, and only that will result in a new negotiation and language change happening.

There are two primary ways to add a locale to available ones.

Testing Localization

If the goal is to test that the correct localization ends up in the correct place, the developer needs to register a new L10nFileSource in L10nRegistry and provide a mock cached data to be returned by the API.

It may look like this:

let source = L10nFileSource.createMock(
  "mock-source", "app",
  ["ko-KR", "ar"],
  "resource://mock-addon/localization/{locale}",
  [
    {
      path: "resource://mock-addon/localization/ko-KR/test.ftl",
      source: "key = Value in Korean"
    },
    {
      path: "resource://mock-addon/localization/ar/test.ftl",
      source: "key = Value in Arabic"
    }
  ]
);

L10nRegistry.registerSources([fs]);

let availableLocales = Services.locale.availableLocales;

assert(availableLocales.includes("ko-KR"));
assert(availableLocales.includes("ar"));

Services.locale.requestedLocales = ["ko-KR"];

let appLocales = Services.locale.appLocalesAsBCP47;
assert(appLocales[0], "ko-KR");

From here, a resource test.ftl can be added to a Localization and for ID key the correct value from the mocked cache will be returned.

Testing Locale Switching

The second method is much more limited, as it only mocks the locale availability, but it is also simpler:

Services.locale.availableLocales = ["ko-KR", "ar"];
Services.locale.requestedLocales = ["ko-KR"];

let appLocales = Services.locale.appLocalesAsBCP47;
assert(appLocales[0], "ko-KR");

In the future, Mozilla plans to add a third way for add-ons (bug 1440969) to allow for either manual or automated testing purposes disconnecting its locales from the main application ones.

Testing the outcome

Except of testing for reaction to locale changes, it is advised to avoid writing tests that expect a certain locale to be selected, or certain internationalization or localization data to be used.

Doing so locks down the test infrastructure to be only usable when launched in a single locale environment and requires those tests to be updated whenever the underlying data changes.

In the case of testing locale selection it is best to use a fake locale like x-test, that will not be present at the beginning of the test.

In the case of testing for internationalization data it is best to use resolvedOptions(), to verify the right data is being used, rather than comparing the output string.

In the case of localization, it is best to test against the correct data-l10n-id being set or, in edge cases, verify that a given variable is present in the string using String.prototype.includes.

Deep Dive

Below is a list of articles with additional details on selected subjects:

.. toctree::
   :maxdepth: 1

   locale_env
   locale_startup

Feedback

In case of questions, please consult Intl module peers.