Skip to content

Commit

Permalink
Bug 1719535 - Part 1. Update ICU4X data generator script. r=TYLin,pla…
Browse files Browse the repository at this point in the history
…tform-i18n-reviewers,gregtatum

Generate baked data in intl/icu4x_data/data/baked, instead of postcard since ICU4X 1.2 can use custom baked data without modifying icu_capi.

Differential Revision: https://phabricator.services.mozilla.com/D167670
  • Loading branch information
makotokato committed Aug 2, 2023
1 parent 787f7d3 commit caac2ca
Showing 1 changed file with 28 additions and 18 deletions.
46 changes: 28 additions & 18 deletions intl/update-icu4x.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,20 @@
set -e

# Update the icu4x binary data for a given release:
# Usage: update-icu4x.sh <URL of ICU GIT> <release tag name>
# update-icu4x.sh https://github.com/unicode-org/icu4x.git icu@0.3.0
# Usage: update-icu4x.sh <URL of ICU GIT> <release tag name> <CLDR version> <ICU release tag name>
# update-icu4x.sh https://github.com/unicode-org/icu4x.git icu@1.2.0 43.0.0 release-73-1
#
# Update to the main branch:
# Usage: update-icu4x.sh <URL of ICU GIT> <branch>
# update-icu4x.sh https://github.com/unicode-org/icu4x.git main
# Usage: update-icu4x.sh <URL of ICU GIT> <branch> <CLDR version> <ICU release tag name>
# update-icu4x.sh https://github.com/unicode-org/icu4x.git main 43.0.0 release-73-1

# default
cldr=${3:-43.0.0}
icuexport=${4:-release-73-1}

if [ $# -lt 2 ]; then
echo "Usage: update-icu4x.sh <URL of ICU4X GIT> <release tag name> <CLDR version>"
echo "Example: update-icu4x.sh https://github.com/unicode-org/icu4x.git icu@0.3.0 39.0.0"
echo "Usage: update-icu4x.sh <URL of ICU4X GIT> <ICU4X release tag name> <CLDR version> <ICU release tag name>"
echo "Example: update-icu4x.sh https://github.com/unicode-org/icu4x.git icu@1.2.0 43.0.0 release-73-1"
exit 1
fi

Expand All @@ -35,12 +39,11 @@ export LC_ALL=en_US.UTF-8
# Define all of the paths.
original_pwd=$(pwd)
top_src_dir=$(cd -- "$(dirname "$0")/.." >/dev/null 2>&1 ; pwd -P)
data_dir=${top_src_dir}/config/external/icu4x
data_file=${data_dir}/icu4x.postcard
data_dir=${top_src_dir}/intl/icu_testdata/data/baked
git_info_file=${data_dir}/ICU4X-GIT-INFO

log "Remove the old data"
rm -f ${data_file}
rm -rf ${data_dir}

log "Clone ICU4X"
tmpclonedir=$(mktemp -d)
Expand All @@ -51,24 +54,31 @@ log ${tmpclonedir}
cd ${tmpclonedir}

log "Run the icu4x-datagen tool to regenerate the data."
log "Saving the data to: ${data_file}"
log "Saving the data into: ${data_dir}"

# TODO(Bug 1741262) - Should locales be filtered as well? It doesn't appear that the existing ICU
# data builder is using any locale filtering.

# TODO(Bug 1741264) - Keys are not supported yet: https://github.com/unicode-org/icu4x/issues/192
# --keys <KEYS>...
# Include this resource key in the output. Accepts multiple arguments.
# --key-file <KEY_FILE>
# Path to text file with resource keys to include, one per line. Empty lines and
# lines starting with '#' are ignored.
cargo run --bin icu4x-datagen -- \
--cldr-tag $3 \
--all-keys \
--all-locales \
--format blob \
--out ${data_file} \
-v \
cargo run --bin icu4x-datagen \
--features=bin \
-- \
--cldr-tag ${cldr} \
--icuexport-tag ${icuexport} \
--keys segmenter/dictionary/w_auto@1 \
--keys segmenter/grapheme@1 \
--keys segmenter/line@1 \
--keys segmenter/lstm/wl_auto@1 \
--keys segmenter/sentence@1 \
--keys segmenter/word@1 \
--all-locales \
--use-separate-crates \
--format mod \
--out ${data_dir} \

log "Record the current cloned git information to:"
log ${git_info_file}
Expand Down

0 comments on commit caac2ca

Please sign in to comment.