charset.school
Decode UTF-8

sandbox

UTF-8 sandbox (decode)

Convert a sequence of UTF-8 bytes into a Unicode code point.

Hex (with or without 0x), separated by spaces / commas / no separator.

U+00E9é
2 bytes

Decimal

233

Bytes

0xC3 0xA9

Binary

11000011
10101001

Step-by-step breakdown

  1. 01

    Identify the byte count

    First byte 110xxxxx (two 1s then 0) - 2-byte form.

    2 bytes · U+0080 → U+07FF
  2. 02

    Extract the data bits

    For each byte, strip the format marker (110/1110/11110 on the leader, 10 on continuations) - what remains are the data bits.

    00011 | 101001
  3. 03

    Reassemble the binary

    Concatenate the groups to rebuild the code point's binary (11 significant bits).

    00011101001
  4. 04

    Convert to a code point

    The binary equals 233 in decimal, i.e. U+00E9 in Unicode notation.

    U+00E9
charset.school

Teaching tool. No tracking, no ads.

Developed by Florent Sorel