-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathutfdecode.1
95 lines (95 loc) · 2.92 KB
/
utfdecode.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
.Dd 2015-05-14
.Dt UTFDECODE 1
.Sh NAME
.Nm utfdecode
.Nd unicode decoder, encoder and debugger
.Sh SYNOPSIS
.Nm utfdecode
.Bk -words
.Op Fl hlmnoqst
.Op Fl e Ar format
.Op Fl d Ar format
.Ek
.Sh DESCRIPTION
The
.Nm utfdecode
utility reads input from stdin, decodes it and encodes to stdout and can be used to debug valid and invalid UTF-encoded content
as well as cleaning up and transforming between utf-8/16/32 formats.
.Pp
The following options are available:
.Bl -tag -width Ds
.It Fl d Ar format , Fl Fl decode-format Ns = Ns Ar format
Determine how input should be decoded. The format can be 'utf8' (default) to decode UTF-8
and 'codepoint' for decoding textual U+XXXX code points.
.It Fl e Ar format , Fl Fl encode-format Ns = Ns Ar format
Encode in the specified encoding.
.It Fl l Ar limit , Fl Fl limit Ns = Ns Ar limit
Limit the decoding to the specified number of bytes.
.It Fl m Ar handling , Fl Fl malformed Ns = Ns Ar handling
Specify what should happen on decoding errors: 'ignore' to ignore invalid input
, 'replace' to replace with the unicode replacement character (U+FFFD)
and 'abort' to abort the program directly with exit value 65.
.It Fl o Ar offset , Fl Fl offset Ns = Ns Ar offset
Skip the specified number of bytes before starting decoding.
.It Fl q , Fl Fl quiet-errors
Do not log errors to stderr.
.It Fl s , Fl Fl summary
Print a summary at end of the decoding showing number of bytes, characters and decoding errors.
.El
.Sh EXIT STATUS
The
.Nm utfdecode
utility exits with one of the following values:
.Pp
.Bl -tag -width flag -compact
.It Li 0
There was no decoding error.
.It Li 64
There was invalid command line usage.
.It Li 65
There was at least one decoding error.
.It Li 66
The specified input file could not be opened.
.It Li 74
There was an input/output error.
.El
.Sh EXAMPLES
See the terminal input on key presses, writing a newline between reads/key presses:
.Pp
.Dl $ utfdecode -n
.Pp
Check a file for UTF-8 encoding errors, listing all the errors:
.Pp
.Dl $ utfdecode -e silent myfile
.Pp
Debug the UTF-8 encoding looking at the first 100 bytes at an offset of 1024:
.Pp
.Dl $ utfdecode -l 100 -o 1024 myfile
.Pp
See a summary of how many bytes, unicode characters and encoding errors there
is at the start page of english Wikipedia:
.Pp
.Dl $ curl -s http://en.wikipedia.org/wiki/Main_Page | utfdecode -e silent -s
.Pp
Validate a file in shell script without output:
.Pp
.Dl $ if utfdecide -e silent -q myfile; then process_file; else handle_error; fi
.Pp
Convert a file from UTF-16BE to UTF-8:
.Pp
.Dl $ utfdecode -d utf16be -e utf8 silent myfile
.Pp
See how the unicode character
.Sq U+8278
looks when written to the terminal in UTF-8:
.Pp
.Dl $ echo U+8278 | utfdecode -d codepoint -e utf8
.Pp
Compare how many bytes a file would be in different encodings:
.Pp
.Dl $ for e in utf8 utf16be utf32be; do echo "$e:"; utfdecode -e $e myfile | wc -c; done
.Pp
.Sh SEE ALSO
.Xr iconv 1
.Sh AUTHOR
.An Fredrik Fornwall Aq Mt [email protected]