Understanding URL Encoding: Percent-Encoding Explained
Learn how URL encoding (percent-encoding) works, which characters must be encoded, and how to handle special characters in URLs and query strings.
What Is URL Encoding
URL encoding, also called percent-encoding, replaces unsafe or reserved characters in URLs with a percent sign (%) followed by two hexadecimal digits representing the character's ASCII value. For example, a space becomes %20, an ampersand (&) becomes %26, and a question mark (?) becomes %3F. This encoding is necessary because URLs can only contain a limited set of characters (letters, digits, and a few symbols like - _ . ~). Characters outside this set — including spaces, accented letters, and characters with special URL meaning — must be encoded to be transmitted correctly. Without encoding, a search query like 'mac & cheese' would break the URL structure because & is a query parameter separator.
Reserved vs. Unreserved Characters
Unreserved characters can appear in URLs without encoding: uppercase and lowercase letters (A-Z, a-z), digits (0-9), and four symbols (- _ . ~). Reserved characters have special meaning in URL syntax and must be encoded when used as data rather than delimiters: : / ? # [ ] @ ! $ & ' ( ) * + , ; =. For example, the / character separates path segments, so a filename containing a slash must encode it as %2F. The @ symbol separates user info from the host, so an email in a URL path must encode it as %40. The context determines whether a reserved character should be encoded — in its structural role, it stays unencoded; as data content, it must be encoded.
URL Encoding in Practice
In JavaScript, encodeURIComponent() encodes everything except unreserved characters — use it for query parameter values. encodeURI() is less aggressive, preserving URL structural characters — use it for full URLs. In Python, urllib.parse.quote() and quote_plus() handle encoding (quote_plus encodes spaces as + instead of %20, common in HTML form submissions). Most web frameworks handle URL encoding automatically in their routing and URL-building utilities. When constructing URLs manually, always encode user-provided values before inserting them into the URL. Double encoding (%2520 instead of %20) is a common bug that occurs when encoding already-encoded strings.
Recommended Resources
Sponsored · We may earn a commission at no cost to you
Unicode and International Characters
Non-ASCII characters (accented letters, CJK characters, emoji) are first converted to their UTF-8 byte representation, then each byte is percent-encoded. The character 'e' with an accent (e) is UTF-8 bytes C3 A9, so it becomes %C3%A9 in a URL. Internationalized Domain Names (IDNs) use a separate encoding called Punycode for the domain portion of URLs. Modern browsers display decoded Unicode characters in the address bar for readability while transmitting the encoded form. IRIs (Internationalized Resource Identifiers) extend URIs to allow Unicode characters directly, though percent-encoding remains the standard for HTTP transmission.
Related Free Tools
Related Articles
Frequently Asked Questions
What is the difference between encodeURI and encodeURIComponent?
encodeURI encodes a complete URI while preserving characters that have special meaning in URL structure (: / ? # @ etc.). Use it when encoding a full URL. encodeURIComponent encodes all characters except unreserved ones, including structural characters — use it when encoding a value that will be inserted into a URL component like a query parameter. Using encodeURI on a query value can leave & and = unencoded, breaking the parameter structure.
Should spaces be encoded as %20 or +?
Both are valid in different contexts. In the URL path, spaces must be %20. In query strings (the part after ?), spaces can be either %20 or + — the + encoding comes from the application/x-www-form-urlencoded format used by HTML forms. The %20 encoding is more universally correct and works in all URL components. Most modern APIs accept both in query strings.
How do I decode a URL-encoded string?
In JavaScript, use decodeURIComponent() for individual values or decodeURI() for full URLs. In Python, use urllib.parse.unquote() or unquote_plus(). In your browser, the address bar automatically displays decoded URLs. Be careful with untrusted encoded strings — always validate decoded output before using it in database queries, HTML output, or file system operations to prevent injection attacks.