MySQL | Character sets
Start your free 7-days trial now!
What is a Character set?
A character set defines the set of characters that can be stored in a string.
Each character is made up of a symbol and its corresponding encoding.
For example if we had an imaginary character set consisting of two characters: A
and a
Symbol | Encoding |
---|---|
A | 0 |
a | 1 |
The combination of the two symbols and their corresponding encoding represents the character set.
Each character set also defines the maximum number of bytes required to store a character from the character set. For some character sets all characters take the same number of bytes to store (e.g. 1 for ASCII), however, for other character sets the bytes required will differ depending on the character (e.g. 1 to 4 for utf8mb4).
Each character set also has at least one collation. A collation defines a set of rules for comparing characters within the character set.
Character Sets in MySQL
The default character set in MySQL is utf8mb4.
We can check the list of full supported character sets in MySQL by running:
SHOW CHARACTER SET;
+----------+---------------------------------+---------------------+--------+| Charset | Description | Default collation | Maxlen |+----------+---------------------------------+---------------------+--------+| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 || ascii | US ASCII | ascii_general_ci | 1 || big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 || binary | Binary pseudo charset | binary | 1 || cp1250 | Windows Central European | cp1250_general_ci | 1 || cp1251 | Windows Cyrillic | cp1251_general_ci | 1 || cp1256 | Windows Arabic | cp1256_general_ci | 1 || cp1257 | Windows Baltic | cp1257_general_ci | 1 || cp850 | DOS West European | cp850_general_ci | 1 || cp852 | DOS Central European | cp852_general_ci | 1 || cp866 | DOS Russian | cp866_general_ci | 1 || cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 || dec8 | DEC West European | dec8_swedish_ci | 1 || eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 || euckr | EUC-KR Korean | euckr_korean_ci | 2 || gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 || gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 || gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 || geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 || greek | ISO 8859-7 Greek | greek_general_ci | 1 || hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 || hp8 | HP West European | hp8_english_ci | 1 || keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 || koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 || koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 || latin1 | cp1252 West European | latin1_swedish_ci | 1 || latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 || latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 || latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 || macce | Mac Central European | macce_general_ci | 1 || macroman | Mac West European | macroman_general_ci | 1 || sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 || swe7 | 7bit Swedish | swe7_swedish_ci | 1 || tis620 | TIS620 Thai | tis620_thai_ci | 1 || ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 || ujis | EUC-JP Japanese | ujis_japanese_ci | 3 || utf16 | UTF-16 Unicode | utf16_general_ci | 4 || utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 || utf32 | UTF-32 Unicode | utf32_general_ci | 4 || utf8 | UTF-8 Unicode | utf8_general_ci | 3 || utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 |+----------+---------------------------------+---------------------+--------+