charset.school
Encode UTF-32

sandbox

UTF-32 sandbox (encode)

Convert a Unicode code point into 4 UTF-32 bytes, with a chosen endianness.

Accepts U+XXXX, 0xXX, decimal, or a single character.

Endianness
U+1F389🎉
4 bytes

Decimal

127881

Hexadecimal

0x89 0xF3 0x01 0x00

Binary

10001001
11110011
00000001
00000000

Step-by-step breakdown

  1. 01

    Pick the endianness

    The low-order byte comes first (Little Endian). The 4 bytes are written in reverse order from the number.

    Little Endian (LE)
  2. 02

    Convert to 32-bit binary

    UTF-32 always uses 32 bits, regardless of the code point.
    Code point U+1F389 fits in at most 21 significant bits (Unicode caps at U+10FFFF), so the 11 high bits are always zero.
    This padding is what makes UTF-32 simple: no format to guess, no marker, just the code point in its 32 bits.

    00000000000000011111001110001001
  3. 03

    Convert to hexadecimal

    Split the binary into 4 packets of 8 bits, then write each packet in hex.
    The order of the 4 bytes depends on the chosen endianness.

    0x89 0xF3 0x01 0x00
charset.school

Teaching tool. No tracking, no ads.

Developed by Florent Sorel