Urllib & Requests

Urllib

錯誤類型 1. URLError 2. HTTPError (url的子類別) 會返回狀態碼

範圍

狀態

100~299

成功

300~399

可處理

400~599

錯誤

子類異常要寫在父類異常之前,所以先http再url

python2與python3使用不太一樣

Python2

import urllib2  
from bs4 import BeautifulSoup

#取得url的原始碼
def getHtml(url): 
    try:
        header = {
            "Accept" : "text/html",
            "User-Agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.160 Safari/537.22",
            #'Cookie':'over18=1',  #可在header傳入參數 ex.八卦版->我已滿18歲
        }
        request = urllib2.Request(url, headers=header)
        soup=BeautifulSoup(urllib2.urlopen(request).read(),'lxml')  #記得安裝lxml套件
        return soup

    except urllib2.HTTPError, e:
        return 'error'
    except urllib2.URLError, e:
        return 'error'

Requests

來源:URLerror異常處理

Python爬虫入门六之Cookie的使用

https://tw.saowen.com/a/5fc6e9419520438129df8e091c27683af5cc933a01db3e259e8b44fde106b91e

http://docs.python-requests.org/zh_CN/latest/user/quickstart.html

SSL: CERTIFICATE_VERIFY_FAILED

加上verify=False

參考:

https://blog.csdn.net/zahuopuboss/article/details/52964809 https://www.itread01.com/content/1549509138.html

Last updated

Was this helpful?