Python has immutable strings of Unicode code points, str
, and
8-bit bytes, bytes
, both of which are [sequences] as well as
having further specialized methods. Though b[0]
of a bytes
or
similar returns an int
, there's no separate char type; s[0]
of a
str
produces another str
of length 1.
Other binary sequence types include:
bytearray
: Mutable counterpart tobytes
. No string literal constructor but otherwise all the same methods plus mutators.memoryview
: Memory buffers to access internal data of objects supporting the (C-level) buffer protocol.
str(obj='')
str(obj=b'', encoding='utf-8', errors='strict')
bytes(10)
: Zero-filled string.bytes(range(20))
: From iterable of integers 0 ≤ i < 256.bytes(b'abc')
: Copy of binary data via buffer protocolbytes.fromhex('2Ef0 F1F2')
: ASCII hex representation, skipping whitespace
Literals are quoted with single ('
) or double ("
) quotes; each
allows the other in its string. Triple-quoted strings ('''
or
"""
); may span multiple lines. Adjacent string literals are
concatenated into a single string.
String literals may be prefixed with case-insensitive one-character prefixes to change their interpretation:
b
: Produce abytes
instead of astr
. Only ASCII chars (codepoints < 128) and backslash escape sequences allowed.r
: Raw string or bytes; backslashes are interpreted literally. (Not usable withu
.)u
: Unicode literal. Does nothing in Python ≥3.3; in Python 2, wherestr
is the equivalant ofbytes
, reads string literal as Unicode instead.f
: (≥3.6) Formatted string literal. Cannot be combined withb
oru
.
More, including escape code list, at String and Bytes literals.
All methods below apply to both character and byte strings (str
and
bytes
) unless otherwise indicated. Methods that assume chars (e.g.,
capitalize
) assume ASCII in bytestrings. Methods available on
immutable objects always return a new copy, even when called on a
mutable object (e.g., bytearray.replace()
).
t [not] in s
: Subsequence test, e.g.,'bar' in 'foobarbaz'
is Trues + t
: Concatenation returning new object. For better efficiency, use''.join(s, t, ...)
or write toio.StringIO
.
Encoding:
decode(encoding='utf-8', errors='strict')
: Returnsstr
decoded frombytes
read as encoding. errors may bestrict
(raisesUnicodeError
),ignore
,replace
, etc.; see codec error handlers.encode(encoding='utf-8', errors='strict')
: Returnbytes
object encoded fromstr
.
Character Class Predicates (str
only; all chars must match and len ≥ 1):
isprintable()
: Includes space but not other whitespace; true if empty as wellisspace()
: Whitespaceisalnum()
: Is alpha, decimal, digit or numericisalpha()
: Unicode 'Letter' (not Unicode 'Alphabetic')isdecimal()
: Chars form numbers in base 10isnumeric()
: Includes e.g., fractionsisdigit()
: Includes non-decimal, e.g., superscripts, Kharosthi numbersistitle()
: Cased chars after nonchars upper, all else lowerisupper()
,islower()
: Must include a cased characterisidentifier()
: According to Python language def; also seekeyword.iskeyword()
String Predicates (all take optional start and end indexes):
- s₁
in
s₂ startswith(s)
,endswith(s)
count(s)
: Count of non-overlapping s
Indexing (all take optional start and end indexes):
find(s)
,rfind(s)
: Returns lowest/highest index of sindex()
,rindex()
: As find but raiseValueError
when not found
Modification:
lstrip(cs)
,rstrip(cs)
,strip(cs)
: Remove leading/trailing/both chars of set made from string cs, default whitespacereplace(old, new[, count])
: Replace substring old
Case modification:
upper()
,lower()
swapcase()
: Not necessarily reversablecapitalize()
: First char capitalized; rest loweredtitle()
: All chars after non-chars uppered; can produce weird resultscasefold()
: (≥3.3) More aggressive "lower casing" as per Unicode 3.13.
Padding:
expandtabs(tabsize=8)
: Column 0 at start of stringcenter(width, fillchar=' ')
ljust(width, fillchar=' ')
rjust(width, fillchar=' ')
zfill(width)
: Pad with0
between sign and digits; sign included in width
Splitting:
partition(sep)
,rpartition(sep)
:
Return a 3-tuple of(pre, sep, post)
or(str, '', '')
if sep not foundsplit(sep=None, max=-1)
,rsplit()
:- sep=None separates with runs of consecutive whitespace; leading/trailing whitespace is removed
- Consecutive non-None _sep_s delimit empty strings
- Returns unlimited if -1, or no more than max+1 elements
splitlines(keepends=False)
: Splits on\r
,\n
,\r\n
,\v
,\f
,\x1c
,\x1d
,\x1e
(file/group/record separator),\x85
(next line C1),\u2028
(line sep),\u2029
(para sep)
Other:
join(iterable)
: Concatenation of iterable separated by string providing this method.maketrans(x, y=None, z=None)
: Make translation table- 1 arg: dict mapping ints of Unicode code points or chars to Unicode code points, chars, strings or None
- 2 args: strings of equal length
- 3 args: as 2, but 3rd arg is chars to delete
translate(table)
: Chars translated throughmaketrans
table
f'...'
,F'...'
: (≥3.6) Formatted string literals or f-strings (This is the fastest formatter.)format(*args, **kwargs)
: See format() String Syntax belowformat_map(mapping)
: Specifications{key}
in thestr
are looked up in mapping and replaced by the returned values.- s
%
values: Not recommended. See printf-string and printf-bytes (≥3.5) for more info.
The string is literal text, doubled braces {{
and }}
for literal
braces, and substitutions of the form {spec}
. spec renders values
from the arguments to str.format()
:
{0}
or another number to use a positional argument.{}
alone may be used (multiple times) for auto-numbered positionals if no other specs are used.{name}
to use named argument name. If calling asformat(**mapping)
, consider usingstr.format_map(mapping)
instead.{name.attr}
to use attribute attr of argument name.{name[n]}
,{name.attr[n]}
to use index n of name or name.attr.
Optionally append !conversion
to apply a specific conversion to the value: s
for str()
, r
for repr()
and a
for ascii()
. The last escapes
non-ASCII values using \x
(hex 00-FF), \u
(UCS-16) and \U
(UCS-32) escapes.
Additionally optionally append , :format
for further formatting
including field width, alignment and padding, numeric formatting and
grouping, etc. The format characters are in order:
- fill: Any character (default space).
- align:
<
left,>
right,^
centered,=
padding after sign. - sign: Determines sign on positive numbers:
-
none (default),+
, #
: Alternate form for conversion:0b
0o
0x
prefix on ints, always decimal point on other numbers and trailing zeros left on floats.- width: One or more digits for minimum field width;
leading
0
for fill0
and=
padding. - grouping:
_
(≥3.6) or,
to separate thousands (ints only). .
precision: Max field size, or precision for floats.- type: Default
s
(str) or none. Other values (uppercase convertsnan
toNAN
, etc.):e
,E
: Exponent notation, default precision 6.f
,F
: Fixed-point notation, default precision 6.g
,G
: "General"; rounds to precision and usesf
ore
.n
: "Number"; asg
but inserts locale separators.%
: Percentage inf
format.- None: similar to
g
but at least one digit past decimal point and unlimited default precision.
The format specs above also allow {...}
specification of
parameters ("nested replacement fields") for programatically
specifying field widths etc.
See Format String Syntax and Format Specification Mini-Language for further details and examples.
io.StringIO
,io.BytesIO
: In-memory I/O