Urllib & Requests
Urllib
錯誤類型 1. URLError 2. HTTPError (url的子類別) 會返回狀態碼
範圍
狀態
100~299
成功
300~399
可處理
400~599
錯誤
子類異常要寫在父類異常之前,所以先http再url
python2與python3使用不太一樣
Python2
import urllib2
from bs4 import BeautifulSoup
#取得url的原始碼
def getHtml(url):
try:
header = {
"Accept" : "text/html",
"User-Agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.160 Safari/537.22",
#'Cookie':'over18=1', #可在header傳入參數 ex.八卦版->我已滿18歲
}
request = urllib2.Request(url, headers=header)
soup=BeautifulSoup(urllib2.urlopen(request).read(),'lxml') #記得安裝lxml套件
return soup
except urllib2.HTTPError, e:
return 'error'
except urllib2.URLError, e:
return 'error'
Requests
# python3
import requests
from bs4 import BeautifulSoup
def getHtml(url):
header = {
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/55.0.2883.87 Safari/537.36 '
}
res=requests.get(url,headers=header)
res.encoding='utf8'
res.raise_for_status() # 如果連線發生錯誤則終止程式(類似try except)
print(res.status_code) # 連線的狀態碼(int),也可用於判斷連線是否成功
soup=BeautifulSoup(res.text,'lxml') #記得安裝lxml套件
return soup
https://tw.saowen.com/a/5fc6e9419520438129df8e091c27683af5cc933a01db3e259e8b44fde106b91e
http://docs.python-requests.org/zh_CN/latest/user/quickstart.html
SSL: CERTIFICATE_VERIFY_FAILED
import requests
# 去除警告比較不煩人
requests.packages.urllib3.disable_warnings()
requests.get(url, timeout=10, verify=False)
加上verify=False
參考:
https://blog.csdn.net/zahuopuboss/article/details/52964809 https://www.itread01.com/content/1549509138.html
Last updated
Was this helpful?