This is one of the 52 terms in The Language of Localization published by XML Press in 2017 and the contributor for this term is Ken Lunde.

What is it?

A character encoding standard that provides a cross-platform, uniform, and robust digital representation of the scripts for the world’s languages.

Why is it important?

Unicode has become the de facto way in which characters for the scripts of the world’s languages are represented in modern digital devices, meaning that Unicode is a prerequisite for all digital text.

Why does a business professional need to know this?

Unicode[Lunde-Ken 1] provides the foundation for anything related to text data. The Unicode standard is developed and maintained by the Unicode Consortium. The uniform representation for all of Unicode’s 136,690 characters – as of Version 10.0, released on June 20, 2017 – helps to ensure interoperability and translatability of any text-related tasks that you might encounter, whether it is for multilingual user interface (UI) strings or translations of entire manuals.

Any implementation that handles text but does not support Unicode is a completely wasted effort, because its text data cannot easily interoperate with Unicode-based implementations that are now commonplace (see the Unicode FAQ).

It is important to understand that Unicode is much more than a huge bucket of characters covering 139 scripts that are used by an even larger number of the world’s languages (including Egyptian hieroglyphics):

  • Unicode defines several properties that determine how its characters are to behave.
  • The UCD (Unicode Character Database) is the primary source for these properties, which are documented in UAX #44. Some of the properties include line breaking, casing, bidirectionality, inherent width, and so on.

Closely related to Unicode are the following two important and useful projects:

  • ICU (International Components for Unicode), which provides robust libraries that implement many functions for properly handling Unicode-based text data according to the UCD.
  • CLDR (Common Locale Data Repository), which provides an enormous amount of locale data that are used by an increasingly large number of OSes and apps.

Both projects are frequently updated, and Unicode itself is now on an annual release cycle.