Hello crawler

Urllib

1
2
3
4
import urllib2

response = urllib2.urlopen("http://www.linchenguang.com")
print response.read()

analysis

urlopen(url)
urlopen(url, data, timeout)
urlopen(Request)

  • Is this a multi status in python
  • timeout in default is socket._GLOBAL_DEFAULT_TIMEOUT
  • data in default is null

request = urllib2.Request("http://www.linchenguang.com")

Get

1
2
3
4
5
6
7
8
9
10
11
12
13
import urllib
import urllib2

values={}
values['username'] = "admin"
values['password'] = "admin"
#values = {"username":"admin","password":"admin"}
data = urllib.urlencode(values)
url = "http://passport.csdn.net/account/login"
geturl = url + "?"+data
request = urllib2.Request(geturl)
response = urllib2.urlopen(request)
print response.read()
  • data compatible with json, array style is similar with js
  • data need urlencode prevent unreserved characters ambiguity
  • data parameters connect with url with ?

Post

1
2
3
4
5
6
7
8
9
import urllib
import urllib2

values = {"username":"admin","password":"admin"}
data = urllib.urlencode(values)
url = "https://passport.csdn.net/account/login"
request = urllib2.Request(url,data)
response = urllib2.urlopen(request)
print response.read()
  • data be referenced in Request method as a parameter
  • post has two phase transport progress