Skip to content

Commit

Permalink
regen/unicode_constants.pl: Add name parameter
Browse files Browse the repository at this point in the history
A future commit will want to use the first surrogate code point's UTF-8
value.  Add this to the generated macros, and give it a name, since
there is no official one.  The program has to be modified to cope with
this.
  • Loading branch information
Karl Williamson committed Sep 14, 2012
1 parent a0e786e commit 765ec46
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 3 deletions.
14 changes: 11 additions & 3 deletions regen/unicode_constants.pl
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,9 @@ END
# native indicates that the output is the code point, converted to the
# platform's native character set if applicable
#
# If the code point has no official name, the desired name may be appended
# after the flag, which will be ignored if there is an official name.
#
# This program is used to make it convenient to create compile time constants
# of UTF-8, and to generate proper EBCDIC as well as ASCII without manually
# having to figure things out.
Expand All @@ -56,14 +59,16 @@ END

chomp;
unless ($_ =~ m/ ^ ( [^\ ]* ) # Name or code point token
(?: [\ ]+ ( .* ) )? # optional flag
(?: [\ ]+ ( [^ ]* ) )? # optional flag
(?: [\ ]+ ( .* ) )? # name if unnamed; flag is required
/x)
{
die "Unexpected syntax at line $.: $_\n";
}

my $name_or_cp = $1;
my $flag = $2;
my $desired_name = $3;

my $name;
my $cp;
Expand All @@ -77,11 +82,13 @@ END
}
else {
$cp = $name_or_cp;
$name = charnames::viacode("0$cp"); # viacode requires a leading zero
# to be sure that the argument is hex
$name = charnames::viacode("0$cp") // ""; # viacode requires a leading
# zero to be sure that the
# argument is hex
die "Unknown code point '$cp' at line $.: $_\n" unless defined $cp;
}

$name = $desired_name if $name eq "";
$name =~ s/ /_/g; # The macro name can have no blanks in it

my $str = join "", map { sprintf "\\x%02X", $_ }
Expand Down Expand Up @@ -128,6 +135,7 @@ END
03C5 tail
2010 string
D800 first FIRST_SURROGATE
007F native
00DF native
Expand Down
1 change: 1 addition & 0 deletions unicode_constants.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
#define GREEK_SMALL_LETTER_UPSILON_UTF8_TAIL "\x85" /* U+03C5 */

#define HYPHEN_UTF8 "\xE2\x80\x90" /* U+2010 */
#define FIRST_SURROGATE_UTF8_FIRST_BYTE 0xED /* U+D800 */

#define DELETE_NATIVE 0x007F /* U+007F */
#define LATIN_SMALL_LETTER_SHARP_S_NATIVE 0x00DF /* U+00DF */
Expand Down

0 comments on commit 765ec46

Please sign in to comment.