Special Thanks

SourceForge.net Logo


Valid XHTML 1.0 Transitional

What is Unicode


  • Unicode defines semantics for each character for almost all the world's written languages, standardizing script behavior, providing a standard algorithm for bi-directional text, and defining cross-mappings to other standards. Unicode enabled functions are often referred to as wide-character functions because each Unicode character is 16-bits wide, so it is possible to have separate values for up to 65,536 characters.
  • Unicode will replace ASCII as the next worldwide character encoding standard, treating all characters as having a fixed width of 16 (sixteen) bits or 2 (two) bytes. Unicode will support representation of all the world's characters (different characters in different languages) in modern computer use, including technical symbols and special characters used in publishing..

Internationalization (i18n)

  • Unicode is the new foundation for the process of internationalization. The older code page concepts were really complicated to use for most of complex (Asian) languages. It was never designed for such use, and its ad hoc extensions have lead to inconsistent definitions for many characters. Internationalizing your code while using the same code base is complex, since you would have to support different character sets with different architectures for different markets. But modern business requirements are even stronger; programs have to handle characters from a wide variety of languages at the same time; the EU alone requires several different older character sets to cover all its languages. Mixing older character sets together is a nightmare, since all data has to be tagged, and mixing data from different sources is nearly impossible to do reliably.
  • With Unicode, a single internationalization process can produce code that handles the requirements of all the world's languages at the same time. Since Unicode has a single definition for each character, you don't get data corruption problems that plague mixed code set programs. Since it handles the characters for all the world markets in a uniform way, it avoids the complexities of different character code architectures.
  • Microsoft's products are rapidly being adapted to use Unicode: most of Office 97 is Unicode capable. This is a good illustration, Microsoft first started by merging their East Asian (Chinese, Japanese and Korean) & their US edition into a single program using Unicode. They have merged their Middle East and South Asian support in the newer office product, in MS Office XP. Unicode provides a unique encoding for every character. Once your data is in Unicode, it can be all handled the same way - sorted, searched, and manipulated without fear of data corruption.

Some Unicode Related Information on Bangla language is available from the site http://bangla.uni.cc/

You can dig more on Unicode or see What is Unicode in Bangla.