UTF-32
/juː-ti-ɛf θɜːrtiː-tuː/
noun — "a fixed-length Unicode encoding using 32-bit units."
UTF-32 (Unicode Transformation Format, 32-bit) is a character encoding standard that represents every Unicode code point using a fixed 32-bit code unit. Unlike variable-length encodings such as UTF-8 or UTF-16, each Unicode character in UTF-32 is stored in exactly 4 bytes, providing simple and direct access to any character without the need for parsing multiple bytes or surrogate pairs.
Technically, UTF-32 works as follows:
Unicode Transformation Format
/juː-ti-ɛf/
noun — "a family of Unicode Transformation Format encodings."
UTF (Unicode Transformation Format) refers collectively to a set of character encoding schemes designed to represent Unicode code points as sequences of bytes or code units. Each UTF variant defines a method to convert the abstract numeric code points of Unicode into a binary format suitable for storage, transmission, and processing in digital systems. The most common UTFs are UTF-8, UTF-16, and UTF-32, each with different characteristics optimized for efficiency, compatibility, or simplicity.
UTF-16
/juː-ti-ɛf sɪksˈtiːn/
noun — "a fixed- or variable-length encoding for Unicode using 16-bit units."
UTF-8
/juː-ti-ɛf eɪt/
noun — "a variable-length encoding for Unicode characters."
UTF-8 (Unicode Transformation Format, 8-bit) is a character encoding system that represents every Unicode code point using sequences of 1 to 4 bytes. It is designed to be backward-compatible with ASCII, efficient for storage, and fully capable of representing every character defined in the Unicode standard. UTF-8 has become the dominant encoding for web content, software, and data interchange because it combines compatibility, compactness, and universality.
Character Encoding
/ˈkærɪktər ɛnˈkoʊdɪŋ/
noun — "the method of representing characters as digital data."
Unicode
/ˈjuːnɪˌkoʊd/
noun — "a universal standard for encoding, representing, and handling text."
Unicode is a computing industry standard designed to provide a consistent and unambiguous way to encode, represent, and manipulate text from virtually all writing systems in use today. It assigns a unique code point — a numeric value — to every character, symbol, emoji, or diacritical mark, enabling computers and software to interchange text across different platforms, languages, and devices without loss of meaning or corruption.