We're currently conducting some experiments with using ICU4X in Gecko rather than ICU4C. This file documents the procedures for building with ICU4X. The current implementation is incomplete, and hopefully we can begin to land code incrementally. This document will serve as a documentation for the status of this experimentation on what has been landed in tree.
To enable the ICU4X experimentation:
- Add the
ac_add_options --enable-icu4x
mozconfig. - Generate the locale data by running
./intl/update-icu4x.sh
. - Do a full build.
The data is bundled directly into the binary in the config/external/icu4x
directory. The "icu4xdata" is a separate library, which consists of an assembly file that directly includes the ICU4X locale binary data. Eventually this binary data will be directly accessed via ICU4X's StaticDataProvider.
The script intl/update-icu4x.sh
can generate and update this binary data. At the time of this writing, this data is not checked in to source control since it's quite large with no ability to prune the data with only exporting certain keys. See unicode-org/icu4x#192 for supporting splitting out keys.
At the time of this writing, ICU4X has been added to the allow list for vendored code but the files still need to be vendored in. This is currently blocked on some large testdata being included. There is a tracking issue unicode-org/icu4x#849 for the ICU4X integration, but we might find a way to prune the files on the Gecko vendoring code.
The data provider in ICU4X is the mechanism to load the locale-specific data. In previous experiments we used the FsDataProvider
which hits the file system every time the APIs need any data. Future integrations should look into using the StaticDataProvider
.
The final step in the ICU4X integration is to actually build and include the C++ FFI files. In the Summer 2021 experimentation we used manually built FFIs, but future experiments will rely on the Diplomat-built FFIs.