# Urllib & Requests

## Urllib

錯誤類型\
1\. URLError\
2\. HTTPError (url的子類別) 會返回狀態碼

| 範圍       | 狀態  |
| -------- | --- |
| 100\~299 | 成功  |
| 300\~399 | 可處理 |
| 400\~599 | 錯誤  |

子類異常要寫在**父類**異常之前，所以先http再url

**python2與python3使用不太一樣**

### Python2

```python
import urllib2  
from bs4 import BeautifulSoup

#取得url的原始碼
def getHtml(url): 
    try:
        header = {
            "Accept" : "text/html",
            "User-Agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.160 Safari/537.22",
            #'Cookie':'over18=1',  #可在header傳入參數 ex.八卦版->我已滿18歲
        }
        request = urllib2.Request(url, headers=header)
        soup=BeautifulSoup(urllib2.urlopen(request).read(),'lxml')  #記得安裝lxml套件
        return soup

    except urllib2.HTTPError, e:
        return 'error'
    except urllib2.URLError, e:
        return 'error'
```

## Requests

```python
# python3
import requests
from bs4 import BeautifulSoup

def getHtml(url):
    header = {
        'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                      'Chrome/55.0.2883.87 Safari/537.36 '
    }

    res=requests.get(url,headers=header)
    res.encoding='utf8'
    res.raise_for_status() # 如果連線發生錯誤則終止程式(類似try except)

    print(res.status_code) # 連線的狀態碼(int)，也可用於判斷連線是否成功

    soup=BeautifulSoup(res.text,'lxml') #記得安裝lxml套件


    return soup
```

[來源:URLerror異常處理](http://cuiqingcai.com/961.html)

[Python爬虫入门六之Cookie的使用](http://cuiqingcai.com/968.html)

<https://tw.saowen.com/a/5fc6e9419520438129df8e091c27683af5cc933a01db3e259e8b44fde106b91e>

<http://docs.python-requests.org/zh_CN/latest/user/quickstart.html>

## SSL: CERTIFICATE\_VERIFY\_FAILED

```python
import requests

# 去除警告比較不煩人
requests.packages.urllib3.disable_warnings()

requests.get(url, timeout=10, verify=False)
```

加上verify=False

參考：

<https://blog.csdn.net/zahuopuboss/article/details/52964809>\
<https://www.itread01.com/content/1549509138.html>
