當前位置：首頁 > 编程语言 > python >内容正文

python

Python中文

發(fā)布時間：2024/10/12 python 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python中文小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

在python中有兩種默認的字符串：str和unicode。在Python中一定要注意區(qū)分“Unicode字符串”和"Unicode對象”的區(qū)別。

后面所有的“Unicode字符串”都是指"Unicode對象”。

一個傳統(tǒng)的字符串完全可以用str對象表示，它僅僅是一個字節(jié)流，除非解碼為unicode對象，否則沒有任何實際意義。

一開始先上例子

>>> s='哈哈' >>> s '\xe5\x93\x88\xe5\x93\x88' >>> type(s) <type 'str'> >>> >>> ss = u'哈哈' >>> ss u'\u54c8\u54c8' >>> type(ss) <type 'unicode'>

ss聲明為unicode

>>> u = s.decode('utf8') >>> u u'\u54c8\u54c8' >>> print u 哈哈 >>> >>> u = s.decode('utf-16') >>> >>> u u'\u93e5\ue588\u8893' >>> print u 鏥袓

將字符串s(utf8編碼)使用decode進行解碼后，可以得到同等得unicode對象，和直接聲明為unicode的ss一樣

>>> u=ss.decode('utf8') Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/usr/local/sinasrv2/lib/python2.7/encodings/utf_8.py", line 16, in decodereturn codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) >>> u=ss.encode('utf8') >>> u '\xe5\x93\x88\xe5\x93\x88' >>> print u 哈哈

可以將unicode對象編碼encode成其他字符集，不能進行解碼decode，decode是針對utf8、GBK編碼來說的

下面再給一個例子進行說明：

1、先聲明一個unicode對象

2、將其編碼成gbk

3、直接print出現不顯示

4、使用decode解碼成GBK，正確顯示

>>> ss = u'哈哈' >>> ss u'\u54c8\u54c8' >>> t = ss.encode('gbk') >>> t '\xb9\xfe\xb9\xfe' >>> print t>>> >>> print t.decode('gbk') 哈哈

序列化unicode對象

>>> str(ss) Traceback (most recent call last):File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

向文件直接輸出ss會拋異常。在處理unicode中文字符串的時候，必須首先對它調用encode函數，轉換成其他編碼輸出。

總結：在Python中，str對象就是一個字節(jié)數組，至于里面的內容是不是一個合法的字符串，以及這個字符串采用什么編碼都不重要。

這些內容需要用戶自己記錄和判斷。這個限制也同樣適用于unicode對象。要記住unicode對象中的內容可絕對不一定就是合法的unicode字符串。

【參考鏈接】

1、http://blog.csdn.net/mayflowers/article/details/1568852

轉載于:https://www.cnblogs.com/gsblog/p/3860584.html

總結

以上是生活随笔為你收集整理的Python中文的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：二本能考部队文职吗
下一篇： python tips（持续更新）