mysql字符集排序规则_MySQL原理 - 字符集与排序规则
任何計(jì)算機(jī)存儲(chǔ)數(shù)據(jù),都需要字符集,因?yàn)橛?jì)算機(jī)存儲(chǔ)的數(shù)據(jù)其實(shí)都是二進(jìn)制編碼,將一個(gè)個(gè)字符,映射到對(duì)應(yīng)的二進(jìn)制編碼的這個(gè)映射就是字符編碼(字符集)。這些字符如何排序呢?決定字符排序的規(guī)則就是排序規(guī)則。
查看內(nèi)置字符集與比較規(guī)則
通過(guò)show charset;命令,可以查看所有的字符集。
以下僅展示了我們常用的字符集:
+----------+---------------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+---------------------------------+---------------------+--------+
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| ascii | US ASCII | ascii_general_ci | 1 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| cp1250 | Windows Central European | cp1250_general_ci | 1 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
| utf32 | UTF-32 Unicode | utf32_general_ci | 4 |
+----------+---------------------------------+---------------------+--------+
ascii:共收錄128個(gè)字符,包括空格、標(biāo)點(diǎn)符號(hào)、數(shù)字、大小寫(xiě)字母和一些不可見(jiàn)字符。由于總共才128個(gè)字符,所以可以使用1個(gè)字節(jié)來(lái)進(jìn)行編碼
latin1:共收錄256個(gè)字符,是在ASCII字符集的基礎(chǔ)上又?jǐn)U充了128個(gè)西歐常用字符(包括德法兩國(guó)的字母),也可以使用1個(gè)字節(jié)來(lái)進(jìn)行編碼。
gb2312: 收錄了漢字以及拉丁字母、希臘字母、日文平假名及片假名字母、俄語(yǔ)西里爾字母。其中收錄漢字6763個(gè),其他文字符號(hào)682個(gè),兼容ASCII字符集。這是一個(gè)變長(zhǎng)字符集,如果該字符在ascii字符集中,則采用1字節(jié)編碼,否則采用兩字節(jié)。
gbk: GBK是在gb2312基礎(chǔ)上擴(kuò)容后的標(biāo)準(zhǔn)。收錄了所有的中文字符。同樣的,這是一個(gè)變長(zhǎng)字符集,如果該字符在ascii字符集中,則采用1字節(jié)編碼,否則采用兩字節(jié)。
utf8和utf8mb4: 收錄地球上能想到的所有字符,而且還在不斷擴(kuò)充。這種字符集兼容ASCII字符集,采用變長(zhǎng)編碼方式,編碼一個(gè)字符需要使用1~4個(gè)字節(jié)。MySQL為了節(jié)省空間,其中的utf8是標(biāo)準(zhǔn) UTF8 閹割后的,只有1~3字節(jié)編碼的字符集,基本包含了所有常用的字符。如果還要使用 enoji 表情,那么需要使用utf8mb4,這個(gè)是完整的 UTF8 字符集。
utf16: 不同于utf8,utf16用兩個(gè)字節(jié)或者四個(gè)字節(jié)編碼字符,可以理解為utf8的不節(jié)省空間的一種形式
utf32: 固定用四個(gè)字節(jié)編碼字符,可以理解為utf8的不節(jié)省空間的一種形式
通過(guò)查看information_schema.character_sets表,也可以看到所有的字符集:
mysql> select * from information_schema.character_sets where character_set_name = "utf8";
+--------------------+----------------------+---------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN |
+--------------------+----------------------+---------------+--------+
| utf8 | utf8_general_ci | UTF-8 Unicode | 3 |
+--------------------+----------------------+---------------+--------+
1 row in set (0.06 sec)
通過(guò)show collation;命令,可以查看所有的字符集,我們這里來(lái)查看utf8mb4的排序規(guī)則:
mysql> show collation like 'utf8mb4%';
+------------------------+---------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+------------------------+---------+-----+---------+----------+---------+
| utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 |
| utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 |
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 |
| utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 |
| utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 |
| utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 |
| utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 |
| utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 |
| utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 |
| utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 |
| utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 |
| utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 |
| utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 |
| utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 |
| utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 |
| utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 |
| utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 |
| utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 |
| utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 |
| utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 |
| utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 |
| utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 |
| utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 |
| utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 |
| utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 |
| utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 |
+------------------------+---------+-----+---------+----------+---------+
26 rows in set (0.13 sec)
同樣的,通過(guò)查詢information_schema.collations也可以:
mysql> select * from information_schema.collations where character_set_name = "utf8mb4";
+------------------------+--------------------+-----+------------+-------------+---------+
| COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN |
+------------------------+--------------------+-----+------------+-------------+---------+
| utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 |
| utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 |
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 |
| utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 |
| utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 |
| utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 |
| utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 |
| utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 |
| utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 |
| utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 |
| utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 |
| utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 |
| utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 |
| utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 |
| utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 |
| utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 |
| utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 |
| utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 |
| utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 |
| utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 |
| utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 |
| utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 |
| utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 |
| utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 |
| utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 |
| utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 |
+------------------------+--------------------+-----+------------+-------------+---------+
26 rows in set (0.11 sec)
每個(gè)字符集都有一個(gè)默認(rèn)的排序規(guī)則:IS_DEFAULT 為 Yes。
比較規(guī)則名稱以與其關(guān)聯(lián)的字符集的名稱開(kāi)頭,可以用通過(guò)這個(gè)開(kāi)頭查詢所有的字符集,也可以查詢information_schema.collations精確指定字符集
字符集后面跟著的是語(yǔ)言編碼,因?yàn)閡tf8mb4包含了所有字符,不同國(guó)家的文字語(yǔ)言排序肯定不一樣。
最后末尾的ci代表case insensitive,大小寫(xiě)不敏感,所有可能的后綴如下所示:
ai: accent insensitive 不區(qū)分重音
as: accent sensitive 區(qū)分重音
ci: case insensitive 不區(qū)分大小寫(xiě)
cs: case sensitive 區(qū)分大小寫(xiě)
bin: binary 以二進(jìn)制方式比較
應(yīng)用字符集與比較規(guī)則
字符集與比較規(guī)則配置有四個(gè)級(jí)別:
MySQL實(shí)例級(jí)別
庫(kù)級(jí)別
表級(jí)別
字段級(jí)別
指定的級(jí)別粒度越小,則以粒度越小的字符集還有比較規(guī)則優(yōu)先。例如指定MySQL實(shí)例級(jí)別字符集是utf8mb4,指定某個(gè)表字符集是latin1,那么這個(gè)表的所有字段如果不指定的話,編碼就是latin1
由于字符集和比較規(guī)則是互相有聯(lián)系的,如果我們只修改了字符集,比較規(guī)則也會(huì)跟著變化,如果只修改了比較規(guī)則,字符集也會(huì)跟著變化,具體規(guī)則如下:
只修改字符集,則比較規(guī)則將變?yōu)樾薷暮蟮淖址J(rèn)的比較規(guī)則。
只修改比較規(guī)則,則字符集將變?yōu)樾薷暮蟮谋容^規(guī)則對(duì)應(yīng)的字符集。
實(shí)例級(jí)別
通過(guò)兩個(gè)系統(tǒng)變量來(lái)指定實(shí)例級(jí)別的字符集與排序規(guī)則。
配置文件:
[server]
character_set_server=utf8mb4
collation_server=utf8mb4_general_ci
啟動(dòng)之后,可以查看并修改這兩個(gè)變量。
mysql> show variables like 'character_set_server';
+----------------------+---------+
| Variable_name | Value |
+----------------------+---------+
| character_set_server | utf8mb4 |
+----------------------+---------+
1 row in set (0.06 sec)
mysql> show variables like 'collation_server';
+------------------+--------------------+
| Variable_name | Value |
+------------------+--------------------+
| collation_server | utf8mb4_general_ci |
+------------------+--------------------+
1 row in set (0.05 sec)
mysql> set character_set_server = 'utf8mb4';
Query OK, 0 rows affected (0.00 sec)
mysql> set collation_server = 'utf8mb4_general_ci';
Query OK, 0 rows affected (0.00 sec)
庫(kù)級(jí)別
創(chuàng)建數(shù)據(jù)庫(kù)的時(shí)候,可以指定字符集還有排序規(guī)則。
mysql> create database test_db character set utf8mb4 collate utf8mb4_general_ci;
Query OK, 1 row affected (0.01 sec)
不指定的話,就用實(shí)例級(jí)別的字符集還有排序規(guī)則。
查看當(dāng)前數(shù)據(jù)庫(kù)的字符集還有排序規(guī)則則是通過(guò)use命令指定數(shù)據(jù)庫(kù)之后,查看character_set_database變量以及collation_database來(lái)實(shí)現(xiàn):
mysql> show variables like 'character_set_database';
+------------------------+---------+
| Variable_name | Value |
+------------------------+---------+
| character_set_database | utf8mb4 |
+------------------------+---------+
1 row in set (0.07 sec)
mysql> show variables like 'collation_database';
+--------------------+--------------------+
| Variable_name | Value |
+--------------------+--------------------+
| collation_database | utf8mb4_general_ci |
+--------------------+--------------------+
1 row in set (0.09 sec)
就算設(shè)置這兩個(gè)變量,也是無(wú)效的:
mysql> set character_set_database = 'utf8';
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like 'character_set_database';
+------------------------+---------+
| Variable_name | Value |
+------------------------+---------+
| character_set_database | utf8mb4 |
+------------------------+---------+
1 row in set (0.09 sec)
修改數(shù)據(jù)庫(kù)的字符集還有排序規(guī)則的方式:
mysql> alter database test_db character set = 'utf8';
Query OK, 1 row affected (0.01 sec)
mysql> show variables like 'character_set_database';
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| character_set_database | utf8 |
+------------------------+-------+
1 row in set (0.08 sec)
這個(gè)更新只會(huì)對(duì)新建的表如果沒(méi)指定字符集和排序規(guī)則的生效,并不會(huì)更新老表的字符集還有排序規(guī)則。
表級(jí)別
可以在創(chuàng)建時(shí)指定字符集合排序規(guī)則,不指定的話,用數(shù)據(jù)庫(kù)的字符集還有排序規(guī)則,也可以修改字符集和排序規(guī)則。
mysql> create table test (name varchar(32)) character set utf8mb4 collate utf8mb4_bin;
Query OK, 0 rows affected (0.04 sec)
mysql> show create table test;
+-------+---------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+---------------------------------------------------------------------------------------------------------------------------------------+
| test | CREATE TABLE `test` (
`name` varchar(32) COLLATE utf8mb4_bin DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+---------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.09 sec)
mysql> alter table test character set = 'utf8';
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> show create table test;
+-------+--------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+--------------------------------------------------------------------------------------------------------------------------------------+
| test | CREATE TABLE `test` (
`name` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+-------+--------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.06 sec)
可以看出,僅僅是表的字符集還有排序規(guī)則變了,對(duì)于已有字段,并沒(méi)有改變編碼和排序規(guī)則。
列級(jí)別
可以在創(chuàng)建表的時(shí)候,指定不同的列有不同的字符集和排序規(guī)則,也可以修改列的字符集和排序規(guī)則:
mysql> create table test (name varchar(32) character set utf8 collate utf8_bin) character set utf8mb4 collate utf8mb4_bin;
Query OK, 0 rows affected (0.03 sec)
mysql> show create table test;
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| test | CREATE TABLE `test` (
`name` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.09 sec)
mysql> alter table test modify column name varchar(32) COLLATE latin1_bin;
Query OK, 0 rows affected (0.09 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> show create table test;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
| test | CREATE TABLE `test` (
`name` varchar(32) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.09 sec)
MySQL客戶端字符編碼問(wèn)題
有時(shí)候,我們會(huì)遇到字符編碼不一致導(dǎo)致的程序問(wèn)題。例如我們的 Java 程序,使用 jdbc 鏈接。讀取的數(shù)據(jù),打印出來(lái)是亂碼?;蛘呤?#xff0c;MySQL 無(wú)法識(shí)別我們客戶端發(fā)來(lái)的命令。這涉及到字符編碼問(wèn)題。我們需要保持 Java 程序的字符編碼與 JDBC 鏈接指定的字符編碼一致,這樣才不會(huì)有亂碼的問(wèn)題。
指定 Java 程序編碼:通過(guò)啟動(dòng)參數(shù):-Dfile.encoding=UTF-8 設(shè)置默認(rèn)的字符編碼(java.nio.charset.Charset.defaultCharset();)是utf-8(對(duì)應(yīng) MySQL 的utf8還有utf8mb4)。
指定 JDBC 鏈接編碼:
jdbc:mysql://127.0.0.1:3306/test?characterEncoding=utf8
mysql客戶端命令行指定字符集
mysql -h 127.0.0.1 -P 3306 -u root --default-character-set=utf8mb4 -p
之后查看有關(guān)編碼的環(huán)境變量,都是和設(shè)置的這個(gè)字符集一樣。
mysql> SHOW VARIABLES LIKE 'character_set_client';
+----------------------+---------+
| Variable_name | Value |
+----------------------+---------+
| character_set_client | utf8mb4 |
+----------------------+---------+
1 row in set, 1 warning (0.00 sec)
mysql> SHOW VARIABLES LIKE 'character_set_connection';
+--------------------------+---------+
| Variable_name | Value |
+--------------------------+---------+
| character_set_connection | utf8mb4 |
+--------------------------+---------+
1 row in set, 1 warning (0.00 sec)
mysql> SHOW VARIABLES LIKE 'character_set_results';
+-----------------------+---------+
| Variable_name | Value |
+-----------------------+---------+
| character_set_results | utf8mb4 |
+-----------------------+---------+
1 row in set, 1 warning (0.00 sec)
其中:
character_set_client: 服務(wù)器解碼請(qǐng)求時(shí)使用的字符集
character_set_connection:服務(wù)器處理請(qǐng)求時(shí)將字符集轉(zhuǎn)換成這個(gè)字符集處理。操作具體列時(shí),在轉(zhuǎn)換為具體列的編碼。
character_set_results:服務(wù)器向客戶端返回?cái)?shù)據(jù)時(shí)使用的字符集
MySQL 設(shè)計(jì)這三個(gè)編碼的時(shí)候,出于以下考慮:
一個(gè) MySQL,可能有多種不同語(yǔ)言和操作系統(tǒng)或者國(guó)家的客戶端,所以通過(guò)設(shè)置character_set_client還有character_set_results進(jìn)行兼容。
由于操作具體列數(shù)據(jù)的時(shí)候需要編碼轉(zhuǎn)換,如果character_set_connection和字段一致的話,就不用轉(zhuǎn)換了,所以設(shè)置character_set_connection可以讓 MySQL 用一種編碼理解命令統(tǒng)一處理,同時(shí)設(shè)置character_set_connection為最常用的可以減少轉(zhuǎn)換。
一般情況下,保持這三個(gè)一致就好。我們就設(shè)置好連接使用的字符集就行了。
創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎(jiǎng)勵(lì)來(lái)咯,堅(jiān)持創(chuàng)作打卡瓜分現(xiàn)金大獎(jiǎng)總結(jié)
以上是生活随笔為你收集整理的mysql字符集排序规则_MySQL原理 - 字符集与排序规则的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: corel产品注册样机安装代码_为你的产
- 下一篇: python将excel导入mysql_