The Synthax that is required to encode an URI is very general. There is fixed encoding according to US-ASCII standards
There is also a standard that revers to the referees to the Universal Character Code (Unicode/ISO 10646) where it is possible to also use international symbols., i.e. symbols that are not in the US-ASCII encoding.
US-ASCII Encoding
Percent Encoding for reserved characters, or characters that do not exist in US-ASCII encoding and there are reserved characters that have special functions and permitted characters as well as not-permitted characters.
pct-endoced = "%" HEDIG HEXDIG
Reserved characters with special function
reserved = gen-delims / sub-delims
gen-delimns = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delimns = "!" / "$" / "'" / "(" / ")" / "+" / "," / ";" / "="
Permitted characters
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
Extension to Universal Character Code (Unicode/ISO 10646)
Internalization Resource Identifier (IRI, RFC 3987)
The schema for an URI is very simple. It starts with a schema that describes the primary access mechanisms and the protocol that is used to receive or to get the address / the name. Then there are mandatory/optional parts, e.g. the information about username and password, the information about port numbers to use or a pathname or a query string, or a fragment identifier. Mandatory or essential is the access schema, then there is a separator. After the separator there is the host name. After the host name there might come access path and maybe also the filename of the document that has to be retrieved from the files server. After the filename additional parameters an be transfered and communicated via a query or specifying a special part of the document.
URI = schema"://"[userinfo@"]host[:port][path]["?"query]["#"fragment]
- schema: e.g. http, ftp, mailto,...
- userinfo: e.g. username:password
- host: e.g. Domain-Name, IPv4/IPv6 Address
- port: e.g. :80 for standard http port
- path: e.g. path in file system of WWW server
- query: e.g. parameters to be passed over to Applications
- fragment: e.g. determines a specific fragment of a document
Therefore and URI consists of:
host or domain name
file system path
query parameters
fragment identifier
The URL() constructor is handy to parse (and validate) URLs in JavaScript.
new URL(relativeOrAbsolute [, absoluteBase]) accepts as first argument an absolute or relative URL. When the first argument is relative, you have to indicate the second argument as an abolsute URL that serves the base for the first argument.
After creating the URL() instance, you can easily access the most common URL components like:
url.search for raw query string
url.searchParams for an instance of URLSearchParams to pick query string parameters
url.hostname to access the hostname
url.pathname to read the pathname
url.hash to determine the hash value
Regarding browser support, URL constructor is available in modern browsers. It is not, however, available in Internet Explorer.
Let’s use this invalid URL to initialize the parser:
try {
const url = new URL('https://example.com');
} catch (error) {
error; // => TypeError, "Failed to construct URL: Invalid URL"
}
Aside from accessing URL components, the properties like search, hostname, pathname, hash are writeable — this you can manipulate the URL.
For example, let’s modify the hostname of an existing URL from red.com to blue.io:
const url = new URL('http://jpaths.com/index.html');
url.href; // => 'http://jpaths.com/index.html'
url.hostname = 'softreck.com';
url.href; // => 'http://softreck.com/path/index.html'