Python Character and Byte String Handling

Python has immutable strings of Unicode code points, str, and 8-bit bytes, bytes, both of which are [sequences] as well as having further specialized methods. There's no separate char type; s[0] produces a str or bytes of length 1.

Other binary sequence types include:

bytearray: Mutable counterpart to bytes. No string literal constructor but otherwise all the same methods plus mutators.
memoryview: Memory buffers to access internal data of objects supporting the (C-level) buffer protocol.

Constructors

str(obj='')
str(obj=b'', encoding='utf-8', errors='strict')
bytes(10): Zero-filled string.
bytes(range(20)): From iterable of integers 0 ≤ i < 256.
bytes(b'abc'): Copy of binary data via buffer protocol
bytes.fromhex('2Ef0 F1F2'): ASCII hex representation, skipping whitespace

Literals are quoted with single (') or double (") quotes; each allows the other in its string. Triple-quoted strings (''' or """); may span multiple lines. Adjacent string literals are concatenated into a single string.

String literals may be prefixed with case-insensitive one-character prefixes to change their interpretation:

b: Produce a bytes instead of a str. Only ASCII chars (codepoints < 128) and backslash escape sequences allowed.
r: Raw string or bytes; backslashes are interpreted literally. (Not usable with u.)
u: Unicode literal. Does nothing in Python ≥3.3; in Python 2, where str is the equivalant of bytes, reads string literal as Unicode instead.
f: (≥3.6) Formatted string literal. Cannot be combined with b or u.

More, including escape code list, at String and Bytes literals.

Methods

All methods below apply to both character and byte strings (str and bytes) unless otherwise indicated. Methods that assume chars (e.g., capitalize) assume ASCII in bytestrings. Methods available on immutable objects always return a new copy, even when called on a mutable object (e.g., bytearray.replace()).

Common Sequence Operations:

t [not] in s: Subsequence test, e.g., 'bar' in 'foobarbaz' is True
s + t: Concatenation returning new object. For better efficiency, use ''.join(s, t, ...) or write to io.StringIO.

Encoding:

decode(encoding='utf-8', errors='strict'): Returns str decoded from bytes read as encoding. errors may be strict (raises UnicodeError), ignore, replace, etc.; see codec error handlers.
encode(encoding='utf-8', errors='strict'): Return bytes object encoded from str.

Character Class Predicates (str only; all chars must match and len ≥ 1):

isprintable(): Includes space but not other whitespace; true if empty as well
isspace(): Whitespace
isalnum(): Is alpha, decimal, digit or numeric
isalpha(): Unicode 'Letter' (not Unicode 'Alphabetic')
isdecimal(): Chars form numbers in base 10
isnumeric(): Includes e.g., fractions
isdigit(): Includes non-decimal, e.g., superscripts, Kharosthi numbers
istitle(): Cased chars after nonchars upper, all else lower
isupper(), islower(): Must include a cased character
isidentifier(): According to Python language def; also see keyword.iskeyword()

String Predicates (all take optional start and end indexes):

s₁ in s₂
startswith(s), endswith(s)
count(s): Count of non-overlapping s

Indexing (all take optional start and end indexes):

find(s), rfind(s): Returns lowest/highest index of s
index(), rindex(): As find but raise ValueError when not found

Modification:

lstrip(cs), rstrip(cs), strip(cs): Remove leading/trailing/both chars of set made from string cs, default whitespace
replace(old, new[, count]): Replace substring old

Case modification:

upper(), lower()
swapcase(): Not necessarily reversable
capitalize(): First char capitalized; rest lowered
title(): All chars after non-chars uppered; can produce weird results
casefold(): (≥3.3) More aggressive "lower casing" as per Unicode 3.13.

Padding:

expandtabs(tabsize=8): Column 0 at start of string
center(width, fillchar=' ')
ljust(width, fillchar=' ')
rjust(width, fillchar=' ')
zfill(width): Pad with 0 between sign and digits; sign included in width

Splitting:

partition(sep), rpartition(sep):
Return a 3-tuple of (pre, sep, post) or (str, '', '') if sep not found
split(sep=None, max=-1), rsplit():
- sep=None separates with runs of consecutive whitespace; leading/trailing whitespace is removed
- Consecutive non-None _sep_s delimit empty strings
- Returns unlimited if -1, or no more than max+1 elements
splitlines(keepends=False): Splits on \r, \n, \r\n, \v, \f, \x1c, \x1d, \x1e (file/group/record separator), \x85 (next line C1), \u2028 (line sep), \u2029 (para sep)

Other:

join(iterable): Concatenation of iterable separated by string providing this method.
maketrans(x, y=None, z=None): Make translation table
- 1 arg: dict mapping ints of Unicode code points or chars to Unicode code points, chars, strings or None
- 2 args: strings of equal length
- 3 args: as 2, but 3rd arg is chars to delete
translate(table): Chars translated through maketrans table

Formatting

f'...', F'...': (≥3.6) Formatted string literals or f-strings
format(*args, **kwargs): See format string syntax
format_map(mapping): mapping is used directly and not copied to a dict (useful for dict subclasses)
s % values: Not recommended. See printf-string and printf-bytes for more info.

I/O

io.StringIO, io.BytesIO: In-memory I/O

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

string.md

string.md

Python Character and Byte String Handling

Constructors

Methods

Formatting

I/O

Files

string.md

Latest commit

History

string.md

File metadata and controls

Python Character and Byte String Handling

Constructors

Methods

Formatting

I/O