Base64
UTF-8-safe encode/decode plus URL-safe. Unicode never breaks.
Blog

Why btoa() Breaks on Unicode, and How to Fix It

btoa() throws InvalidCharacterError on emoji and accented text. Here is why Base64 breaks on Unicode and how to encode any string safely.

A developer-style cover on a mint background with the large words 'Base64 and Unicode' beside btoa, UTF-8, and URL-safe cards.

Call btoa('Hello 🌍') and the console answers with a red InvalidCharacterError. Plain ASCII works, but an emoji stops it cold — and an accented letter slips through only to decode back as é. The cause is one old assumption baked into a single function.

Why btoa() stops at the first non-Latin character

The browser’s built-in btoa and atob treat every character as a single byte — a value from 0 to 255. That range only holds Latin-1 characters, and the one assumption produces two different failures.

A globe emoji 🌍 sits at code point U+1F30D, far outside the byte range, so btoa refuses it outright with InvalidCharacterError. An accented é (U+00E9) is sneakier: it fits inside 0–255, so btoa accepts it without complaint — but it stores the raw Latin-1 byte instead of the UTF-8 bytes the rest of the web expects. Decode that result as UTF-8 and you get é. The MDN reference for btoa() spells out the same Latin-1 limitation.

Latin-1 directly vs going through UTF-8

The same input takes two different paths depending on how you feed it in.

Inputbtoa directlyVia UTF-8 bytes
Helloworksworks
caféno error, but decodes to éworks
🌍 (emoji)InvalidCharacterErrorworks

Pure ASCII is identical in Latin-1 and UTF-8, so both paths agree. Everything above U+007F either throws or quietly changes the bytes.

The fix: turn the string into UTF-8 bytes first

The fix is small. Instead of handing the raw string to btoa, run it through TextEncoder to get a UTF-8 byte array, then encode those bytes. To decode, reverse the steps through TextDecoder('utf-8'). Every character is now split into single bytes, which is exactly what btoa expects.

Drop Hello 🌍 into the PiPi Worlds Base64 tool and you get SGVsbG8g8J+MjQ==. Paste that value straight back, decode it, and Hello 🌍 returns without losing a single byte — because the tool routes through UTF-8 internally. Apply the same principle in your own code and the error disappears.

Standard vs URL-safe Base64

Standard Base64 uses +, /, and a trailing = for padding. The catch is that those characters mean something else inside a URL or a filename.

That is what the URL-safe variant is for. It swaps + for -, / for _, and strips the = padding. The URL-safe version of the example above is SGVsbG8g8J-MjQ, with the padding gone. The header and payload segments of a JWT are exactly this base64url form, so when you want to read a token, the JWT decoder is the quicker stop. If you need to handle a full value destined for a query string, the URL encoder covers that case.

Before you paste: where does that token go?

Base64 is not encryption. It is a reversible encoding that anyone can undo. So when you decode a production access token or a credential, any online tool that ships your input to an unknown server becomes a leak path in its own right.

The PiPi Worlds Base64 tool runs encoding and decoding entirely in your browser, so the value you paste never leaves the page — safe for tokens and private snippets. Decoding auto-detects both standard and URL-safe input and quietly cleans up stray whitespace or line breaks. Paste the value you received and read it back, with no broken characters to chase.

Frequently asked questions

Why does btoa() throw InvalidCharacterError on emoji or accented text?
The browser's `btoa` assumes every character is one byte (0–255), which only covers Latin-1. An emoji like 🌍 sits far above that range, so `btoa` rejects it with `InvalidCharacterError`. An accented letter like é stays inside the range, so `btoa` accepts it but writes the wrong Latin-1 byte. Either way, convert the string to UTF-8 bytes first, then encode.
My decoded text came out as "é" instead of "é". How do I fix it?
The Base64 string itself is fine — the garbled result comes from decoding the bytes as Latin-1 instead of UTF-8. Decode the same value as UTF-8 and the original characters return. The PiPi Worlds Base64 tool decodes as UTF-8, so Unicode round-trips cleanly.
What is the difference between standard and URL-safe Base64?
Standard Base64 uses `+`, `/`, and `=` padding. The URL-safe variant swaps `+` for `-`, `/` for `_`, and drops the `=` padding, so the result is safe inside URLs, JWT segments, and filenames.
Does Base64 make data larger?
Yes. Base64 represents every 3 bytes as 4 ASCII characters, so output grows by roughly 33%. That overhead is the cost of moving binary safely as text.
Is the text I paste sent to a server?
No. The PiPi Worlds Base64 tool runs encoding and decoding entirely in your browser, so your input is never sent to or stored on a server.

Sources

Written by the PiFl Labs content team from public sources and reviewed in-house before publishing.

Last reviewed:

Back to the tool →