Python has immutable strings of Unicode code points, str
, and
8-bit bytes, bytes
, both of which are [sequences] as well as
having further specialized methods. There's no separate char type;
s[0]
produces a str
or bytes
of length 1.
Other binary sequence types include:
bytearray
: Mutable counterpart tobytes
. No string literal constructor but otherwise all the same methods plus mutators.memoryview
: Memory buffers to access internal data of objects supporting the (C-level) buffer protocol.
str(obj='')
str(obj=b'', encoding='utf-8', errors='strict')
bytes(10)
: Zero-filled string.bytes(range(20))
: From iterable of integers 0 ≤ i < 256.bytes(b'abc')
: Copy of binary data via buffer protocolbytes.fromhex('2Ef0 F1F2')
: ASCII hex representation, skipping whitespace
Literals are quoted with single ('
) or double ("
) quotes; each
allows the other in its string. Triple-quoted strings ('''
or
"""
); may span multiple lines. Adjacent string literals are
concatenated into a single string.
String literals may be prefixed with case-insensitive one-character prefixes to change their interpretation:
b
: Produce abytes
instead of astr
. Only ASCII chars (codepoints < 128) and backslash escape sequences allowed.r
: Raw string or bytes; backslashes are interpreted literally. (Not usable withu
.)u
: Unicode literal. Does nothing in Python ≥3.3; in Python 2, wherestr
is the equivalant ofbytes
, reads string literal as Unicode instead.f
: (≥3.6) Formatted string literal. Cannot be combined withb
oru
.
More, including escape code list, at String and Bytes literals.
All methods below apply to both character and byte strings (str
and
bytes
) unless otherwise indicated. Methods that assume chars (e.g.,
capitalize
) assume ASCII in bytestrings. Methods available on
immutable objects always return a new copy, even when called on a
mutable object (e.g., bytearray.replace()
).
t [not] in s
: Subsequence test, e.g.,'bar' in 'foobarbaz'
is Trues + t
: Concatenation returning new object. For better efficiency, use''.join(s, t, ...)
or write toio.StringIO
.
Encoding:
decode(encoding='utf-8', errors='strict')
: Returnsstr
decoded frombytes
read as encoding. errors may bestrict
(raisesUnicodeError
),ignore
,replace
, etc.; see codec error handlers.encode(encoding='utf-8', errors='strict')
: Returnbytes
object encoded fromstr
.
Character Class Predicates (str
only; all chars must match and len ≥ 1):
isprintable()
: Includes space but not other whitespace; true if empty as wellisspace()
: Whitespaceisalnum()
: Is alpha, decimal, digit or numericisalpha()
: Unicode 'Letter' (not Unicode 'Alphabetic')isdecimal()
: Chars form numbers in base 10isnumeric()
: Includes e.g., fractionsisdigit()
: Includes non-decimal, e.g., superscripts, Kharosthi numbersistitle()
: Cased chars after nonchars upper, all else lowerisupper()
,islower()
: Must include a cased characterisidentifier()
: According to Python language def; also seekeyword.iskeyword()
String Predicates (all take optional start and end indexes):
- s₁
in
s₂ startswith(s)
,endswith(s)
count(s)
: Count of non-overlapping s
Indexing (all take optional start and end indexes):
find(s)
,rfind(s)
: Returns lowest/highest index of sindex()
,rindex()
: As find but raiseValueError
when not found
Modification:
lstrip(cs)
,rstrip(cs)
,strip(cs)
: Remove leading/trailing/both chars of set made from string cs, default whitespacereplace(old, new[, count])
: Replace substring old
Case modification:
upper()
,lower()
swapcase()
: Not necessarily reversablecapitalize()
: First char capitalized; rest loweredtitle()
: All chars after non-chars uppered; can produce weird resultscasefold()
: (≥3.3) More aggressive "lower casing" as per Unicode 3.13.
Padding:
expandtabs(tabsize=8)
: Column 0 at start of stringcenter(width, fillchar=' ')
ljust(width, fillchar=' ')
rjust(width, fillchar=' ')
zfill(width)
: Pad with0
between sign and digits; sign included in width
Splitting:
partition(sep)
,rpartition(sep)
:
Return a 3-tuple of(pre, sep, post)
or(str, '', '')
if sep not foundsplit(sep=None, max=-1)
,rsplit()
:- sep=None separates with runs of consecutive whitespace; leading/trailing whitespace is removed
- Consecutive non-None _sep_s delimit empty strings
- Returns unlimited if -1, or no more than max+1 elements
splitlines(keepends=False)
: Splits on\r
,\n
,\r\n
,\v
,\f
,\x1c
,\x1d
,\x1e
(file/group/record separator),\x85
(next line C1),\u2028
(line sep),\u2029
(para sep)
Other:
join(iterable)
: Concatenation of iterable separated by string providing this method.maketrans(x, y=None, z=None)
: Make translation table- 1 arg: dict mapping ints of Unicode code points or chars to Unicode code points, chars, strings or None
- 2 args: strings of equal length
- 3 args: as 2, but 3rd arg is chars to delete
translate(table)
: Chars translated throughmaketrans
table
f'...'
,F'...'
: (≥3.6) Formatted string literals or f-stringsformat(*args, **kwargs)
: See format string syntaxformat_map(mapping)
: mapping is used directly and not copied to a dict (useful for dict subclasses)- s
%
values: Not recommended. See printf-string and printf-bytes for more info.
io.StringIO
,io.BytesIO
: In-memory I/O