# Urllib & Requests

## Urllib

錯誤類型\
1\. URLError\
2\. HTTPError (url的子類別) 會返回狀態碼

| 範圍       | 狀態  |
| -------- | --- |
| 100\~299 | 成功  |
| 300\~399 | 可處理 |
| 400\~599 | 錯誤  |

子類異常要寫在**父類**異常之前，所以先http再url

**python2與python3使用不太一樣**

### Python2

```python
import urllib2  
from bs4 import BeautifulSoup

#取得url的原始碼
def getHtml(url): 
    try:
        header = {
            "Accept" : "text/html",
            "User-Agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.160 Safari/537.22",
            #'Cookie':'over18=1',  #可在header傳入參數 ex.八卦版->我已滿18歲
        }
        request = urllib2.Request(url, headers=header)
        soup=BeautifulSoup(urllib2.urlopen(request).read(),'lxml')  #記得安裝lxml套件
        return soup

    except urllib2.HTTPError, e:
        return 'error'
    except urllib2.URLError, e:
        return 'error'
```

## Requests

```python
# python3
import requests
from bs4 import BeautifulSoup

def getHtml(url):
    header = {
        'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                      'Chrome/55.0.2883.87 Safari/537.36 '
    }

    res=requests.get(url,headers=header)
    res.encoding='utf8'
    res.raise_for_status() # 如果連線發生錯誤則終止程式(類似try except)

    print(res.status_code) # 連線的狀態碼(int)，也可用於判斷連線是否成功

    soup=BeautifulSoup(res.text,'lxml') #記得安裝lxml套件


    return soup
```

[來源:URLerror異常處理](http://cuiqingcai.com/961.html)

[Python爬虫入门六之Cookie的使用](http://cuiqingcai.com/968.html)

<https://tw.saowen.com/a/5fc6e9419520438129df8e091c27683af5cc933a01db3e259e8b44fde106b91e>

<http://docs.python-requests.org/zh_CN/latest/user/quickstart.html>

## SSL: CERTIFICATE\_VERIFY\_FAILED

```python
import requests

# 去除警告比較不煩人
requests.packages.urllib3.disable_warnings()

requests.get(url, timeout=10, verify=False)
```

加上verify=False

參考：

<https://blog.csdn.net/zahuopuboss/article/details/52964809>\
<https://www.itread01.com/content/1549509138.html>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://stb11816.gitbook.io/python_note/crawler/url.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
