charset.school
Encode UTF-16

sandbox

UTF-16 sandbox (encode)

Convert a Unicode code point into UTF-16 bytes, with a chosen endianness.

Accepts U+XXXX, 0xXX, decimal, or a single character.

Endianness
U+1F389🎉
4 bytes

Decimal

127881

Hexadecimal

0x3C 0xD8 0x89 0xDF

Binary

00111100
11011000
10001001
11011111

Step-by-step breakdown

  1. 01

    Pick the endianness

    The low-order byte of each code unit comes first (Little Endian).

    Little Endian (LE)
  2. 02

    Pick the UTF-16 form

    Range U+10000U+10FFFF - beyond the BMP, in the supplementary planes (emojis, historic scripts, rare CJK...). Encoded as a surrogate pair: 2 code units = 4 bytes.

    2 code units · surrogate pair
  3. 03

    Convert to binary

    Subtract 0x10000 and keep the remaining 20 bits.

    00001111001110001001
  4. 04

    Split for the surrogates

    Split the 20 bits into 2 packets of 10 bits.
    The left 10 bits (high-order) form the high surrogate, the right 10 bits (low-order) form the low surrogate.
    Each 10-bit packet represents an integer between 0 and 1,023 (10 bits = 2¹⁰ = 1,024 values). Add it to the surrogate base: 0xD800 for the high, 0xDC00 for the low.

    0000111100 | 1110001001
  5. 05

    Convert to hexadecimal

    The 2 code units (high + low surrogate) yield 4 bytes, each code unit ordered by endianness.

    0x3C 0xD8 0x89 0xDF
charset.school

Teaching tool. No tracking, no ads.

Developed by Florent Sorel