Declare
1 | # -*- coding:utf-8 -*- |
make clare that the origin code context use utf-8 encoding.
if not declate when unioncode sign appare in origin text, python throw exception:
1 | SyntaxError: Non-ASCII character '\xe4' in file code_test.py on line 6, |
Setting default encoding
1 | import sys |
told python use utf8
as the default encoding to deal symbols
python’s type str
use ascii
as the default encoding ps. v2.7
1 | UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e07' in position 0: |
Exception in db
1 | UnicodeEncodeError:'latin-1' codec can't encode character ... |
declare the connection and the cursor’s encoding:1
2
3
4
5db = pymysql.connect("localhost", "use", "passwd", "db_name" , use_unicode=True, charset="utf8")
db = MySQLdb.connect(host="localhost", user = "root", passwd = "", db = "testdb", use_unicode=True, charset="utf8")
cursor.execute('SET NAMES utf8;')
cursor.execute('SET CHARACTER SET utf8;')
cursor.execute('SET character_set_connection=utf8;')
Exception in wb
If the web is encoded in gbk
or gb2312
, use utf8
to show will case messy encoding
1 | html = unicode(html, "gbk").encode("utf8") |
turn the bytes stream to unicode first in gbk
decoding, then encoding in utf8
Encoding transform
- if in origin code
str
delcare has the prfixu
then it is unicde encoding - if origin code
str
has no prefixu
then it use it’s text encoding - use
unicode(str, codec)
to transformstr
to unicode - most times use
str.decode(codec)
to decode to bytes stream
1 | # -*- coding:utf-8 -*- |
in windows bash termianl’s encode is utf8
, gbk
is messy encoding
in windows dos terminal’s encode is gbk
, utf8
is messy encoding
Encoding in json
1 | json.dumps # turn json object -> string |
1 | # add encoding parameter |
https://stackoverflow.com/questions/3942888/unicodeencodeerror-latin-1-codec-cant-encode-character
https://blog.csdn.net/learn_tech/article/details/52982679
https://blog.csdn.net/xfyangle/article/details/60969522