Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github/workflows		.github/workflows
tests		tests
url_normalize		url_normalize
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
tox.ini		tox.ini

Repository files navigation

url-normalize

A Python library for standardizing and normalizing URLs with support for internationalized domain names (IDN).

Introduction

url-normalize provides a robust URI normalization function that:

Takes care of IDN domains.
Always provides the URI scheme in lowercase characters.
Always provides the host, if any, in lowercase characters.
Only performs percent-encoding where it is essential.
Always uses uppercase A-through-F characters when percent-encoding.
Prevents dot-segments appearing in non-relative URI paths.
For schemes that define a default authority, uses an empty authority if the default is desired.
For schemes that define an empty path to be equivalent to a path of "/", uses "/".
For schemes that define a port, uses an empty port if the default is desired
Ensures all portions of the URI are utf-8 encoded NFC from Unicode strings

Inspired by Sam Ruby's urlnorm.py

Features

IDN Support: Full internationalized domain name handling
Configurable Defaults:
- Customizable default scheme (https by default)
- Configurable default domain for absolute paths
Query Parameter Control:
- Parameter filtering with allowlists
- Support for domain-specific parameter rules
Versatile URL Handling:
- Empty string URLs
- Double slash URLs (//domain.tld)
- Shebang (#!) URLs
Developer Friendly:
- Cross-version Python compatibility (3.8+)
- 100% test coverage
- Modern type hints and string handling

Installation

pip install url-normalize

Usage

Python API

from url_normalize import url_normalize

# Basic normalization (uses https by default)
print(url_normalize("www.foo.com:80/foo"))
# Output: https://www.foo.com/foo

# With custom default scheme
print(url_normalize("www.foo.com/foo", default_scheme="http"))
# Output: http://www.foo.com/foo

# With query parameter filtering enabled
print(url_normalize("www.google.com/search?q=test&utm_source=test", filter_params=True))
# Output: https://www.google.com/search?q=test

# With custom parameter allowlist as a dict
print(url_normalize(
    "example.com?page=1&id=123&ref=test",
    filter_params=True,
    param_allowlist={"example.com": ["page", "id"]}
))
# Output: https://example.com?page=1&id=123

# With custom parameter allowlist as a list
print(url_normalize(
    "example.com?page=1&id=123&ref=test",
    filter_params=True,
    param_allowlist=["page", "id"]
))
# Output: https://example.com?page=1&id=123

# With default domain for absolute paths
print(url_normalize("/images/logo.png", default_domain="example.com"))
# Output: https://example.com/images/logo.png

# With default domain and custom scheme
print(url_normalize("/images/logo.png", default_scheme="http", default_domain="example.com"))
# Output: http://example.com/images/logo.png

Command-line Usage

You can also use url-normalize from the command line:

$ url-normalize "www.foo.com:80/foo"
# Output: https://www.foo.com/foo

# With custom default scheme
$ url-normalize -s http "www.foo.com/foo"
# Output: http://www.foo.com/foo

# With query parameter filtering
$ url-normalize -f "www.google.com/search?q=test&utm_source=test"
# Output: https://www.google.com/search?q=test

# With custom allowlist
$ url-normalize -f -p page,id "example.com?page=1&id=123&ref=test"
# Output: https://example.com/?page=1&id=123

# With default domain for absolute paths
$ url-normalize -d example.com "/images/logo.png"
# Output: https://example.com/images/logo.png

# With default domain and custom scheme
$ url-normalize -d example.com -s http "/images/logo.png"
# Output: http://example.com/images/logo.png

# Via uv tool/uvx
$ uvx url-normalize www.foo.com:80/foo
# Output: https://www.foo.com:80/foo

Documentation

For a complete history of changes, see CHANGELOG.md.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

url-normalize

Table of Contents

Introduction

Features

Installation

Usage

Python API

Command-line Usage

Documentation

Contributing

License

About

Releases 8

Packages

Contributors 8

Languages

License

niksite/url-normalize

Folders and files

Latest commit

History

Repository files navigation

url-normalize

Table of Contents

Introduction

Features

Installation

Usage

Python API

Command-line Usage

Documentation

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 8

Languages

Packages