MySQL Character Sets

Summary: in this tutorial, we’ll explore MySQL character sets and their importance in how MySQL stores text data.

In MySQL, the character set and collation are essential concepts that define how data is stored and sorted in text columns.

Understanding character sets and collations are crucial for designing databases that can handle different languages and text requirements.

Introduction to MySQL Character Sets

A character set is a collection of characters with a unique encoding. It defines the set of characters that can be used in a text column, such as letters, numbers, symbols, and special characters.

MySQL supports various character sets, and the choice of character set determines the range of characters that can be stored in a column.

Common character sets include utf8, utf8mb4, latin1, utf16, and many others. To list all character sets in the current MySQL server, you use the following statement:

SHOW CHARACTER SET;Code language: SQL (Structured Query Language) (sql)

Here’s the partial output:

+----------+---------------------------------+---------------------+--------+
| Charset  | Description                     | Default collation   | Maxlen |
+----------+---------------------------------+---------------------+--------+
| armscii8 | ARMSCII-8 Armenian              | armscii8_general_ci |      1 |
| ascii    | US ASCII                        | ascii_general_ci    |      1 |
| big5     | Big5 Traditional Chinese        | big5_chinese_ci     |      2 |
| binary   | Binary pseudo charset           | binary              |      1 |
...Code language: PHP (php)

The Maxlen column specifies the number of bytes that a character in a character set holds.

Some character sets contain single-byte characters e.g., latin1 , latin2 , cp850 , etc., whereas other character sets contain multi-byte characters.

The default character sets are utf8mb4 and utf8mb4_0900_ai_ci. However, you can change them accordingly.

Converting between different character sets

MySQL provides two functions that allow you to convert strings between different character sets: CONVERT and CAST.

The syntax of the CONVERT function is as follows:

CONVERT(expression USING character_set_name)Code language: SQL (Structured Query Language) (sql)

The CAST function is similar to the CONVERT function. It converts a string to a different character set:

CAST(string AS character_type CHARACTER SET character_set_name)Code language: SQL (Structured Query Language) (sql)

Take a look at the following example of using the CAST function:

SELECT CAST(_latin1'MySQL character set' AS CHAR CHARACTER SET utf8);Code language: SQL (Structured Query Language) (sql)

Setting character sets for client connections

To configure a character set for a client connection, you can do one of the following ways:

1) Using the SET NAMES statement

Issue the SET NAME  statement after the client connected to the MySQL database server. For example, to set a Unicode character set utf8mb4, you use the following statement:

SET NAMES 'utf8mb4';Code language: SQL (Structured Query Language) (sql)

2) Using –default-character-set option

If the application supports the --default-character-set  option, you can use it to set the character set.

For example, mysql client tool supports --default-character-set option and you can set it up in the configuration file as follows:

[mysql]
default-character-set=utf8mb4Code language: plaintext (plaintext)

3) Using the charset in connection strings

Some MySQL connectors allow you to specify a character set. For example, if you use PHP PDO, you can set the character set in the data source name as follows:

$dsn ="mysql:host=$host;dbname=$db;charset=utf8mb4";Code language: PHP (php)

Regardless of which way you use it, make sure that the character set used by the application matches the character set stored in the MySQL database server.

Summary

  • Character sets define how MySQL stores data in text columns.
  • Use the SHOW CHARACTER SET statement to list all character sets.
  • Use the CONVERT() or CAST() function to convert between character sets.
Was this tutorial helpful?