Skip to content

Commit

Permalink
diff-highlight: do not split multibyte characters
Browse files Browse the repository at this point in the history
When the input is UTF-8 and Perl is operating on bytes instead of
characters, a diff that changes one multibyte character to another
that shares an initial byte sequence will result in a broken diff
display as the common byte sequence prefix will be separated from
the rest of the bytes in the multibyte character.

For example, if a single line contains only the unicode character
U+C9C4 (encoded as UTF-8 0xEC, 0xA7, 0x84) and that line is then
changed to the unicode character U+C9C0 (encoded as UTF-8 0xEC,
0xA7, 0x80), when operating on bytes diff-highlight will show only
the single byte change from 0x84 to 0x80 thus creating invalid UTF-8
and a broken diff display.

Fix this by putting Perl into character mode when splitting the line
and then back into byte mode after the split is finished.

The utf8::xxx functions require Perl 5.8 so we require that as well.

Also, since we are mucking with code in the split_line function, we
change a '*' quantifier to a '+' quantifier when matching the $COLOR
expression which has the side effect of speeding everything up while
eliminating useless '' elements in the returned array.

Reported-by: Yi EungJun <[email protected]>
Signed-off-by: Kyle J. McKay <[email protected]>
Acked-by: Jeff King <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
  • Loading branch information
mackyle authored and gitster committed Apr 4, 2015
1 parent 3759d27 commit 8d00662
Showing 1 changed file with 7 additions and 2 deletions.
9 changes: 7 additions & 2 deletions contrib/diff-highlight/diff-highlight
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#!/usr/bin/perl

use 5.008;
use warnings FATAL => 'all';
use strict;

Expand Down Expand Up @@ -160,8 +161,12 @@ sub highlight_pair {

sub split_line {
local $_ = shift;
return map { /$COLOR/ ? $_ : (split //) }
split /($COLOR*)/;
return utf8::decode($_) ?
map { utf8::encode($_); $_ }
map { /$COLOR/ ? $_ : (split //) }
split /($COLOR+)/ :
map { /$COLOR/ ? $_ : (split //) }
split /($COLOR+)/;
}

sub highlight_line {
Expand Down

0 comments on commit 8d00662

Please sign in to comment.