# pandas & csv

### Pandas

官方文件

<https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy>

1.讀取csv : `data=pandas.read_csv('filename')`

2.瀏覽資料內容 : `data.head()`

3.讀取特定欄位資料(列，直直的) : `detail=data[['name1','name2']......]`

3-1. 轉為一般陣列: `data['name'].tolist()`

3-2. 可用numpy將series或者dataframe型態轉為numpy.array

4.計算各欄位的多種統計資訊(count,mean,std,min,max,25%,50%,75%) : `data.describe()`

5.所有欄位名稱: `data.colums`

6.直接選取各筆資料 `data[1:]` (list切割方式)

Source:\
[pandas 資料處理](http://brucecodeworld.logdown.com/posts/240977-python-pandas-little-teaching)

## iloc

data.iloc\[2] --> 第三個row (第三筆資料，橫的)

data.iloc\[2:,0:2] --> 切割區塊(row,colums)，抓取第一筆之後所有資料，但只包含第一、二項特徵

data.iloc\[1:4,:] --> 以row選取

data.iloc\[:,1:4] --> 以colums選取

iloc\[0] --> Series

iloc\[\[0]] --> DataFrame

Source:[Python Pandas DataFrame 的 iloc\[0\] 與 iloc\[\[0\]\]](http://rickchungtw-blog.logdown.com/posts/686802)

### 讀取

```python
import pandas as pd
df = pd.read_csv("City_Zhvi_AllHomes.csv")
df.head(15)
# df.地址，在read_excel()進來時，第一列就會被當成他的屬性。所以df.地址就抓到地址這一行
```

<https://tinycorner.tw/2018/04/24/使用python讀取csv和xlsx檔/>

<https://tinycorner.tw/2018/04/24/使用python讀取csv和xlsx檔/>

### 處理空值

```python
test['name'] = test['name'].fillna('我是空的！')
```

### 判斷nan

```python
import math
x = float('nan')
math.isnan(x)  # True
```

### 四捨五入

```python
df = pd.DataFrame(np.random.random([3, 3]),
     columns=['A', 'B', 'C'], index=['first', 'second', 'third']

df.round(2)  # 取至小數點第二位
```

<https://blog.csdn.net/shuibuzhaodeshiren/article/details/84324885>

### Series轉換

```python
scores = pd.Series({'小明':90, '小華':80, '小李':70, '小強':55})
# 把Series轉成Dataframe只需要使用指令就好
score_df = scores.to_frame() #搞定
```

[https://medium.com/@weilihmen/python-pandas-%E5%9F%BA%E6%9C%AC%E6%93%8D%E4%BD%9C%E6%95%99%E5%AD%B8-%E6%88%90%E7%B8%BE%E8%A1%A8-f6d0ec4f89](https://medium.com/@weilihmen/python-pandas-基本操作教學-成績表-f6d0ec4f89)

### 轉置矩陣

```python
df.transpose()
```

<http://violin-tao.blogspot.com/2017/06/pandas-1-indexing.html>

### row

用loop讀取

```python
for (index_label, row_series) in empDfObj.iterrows():
    print('Row Index label : ', index_label)
    print('Row Content as Series : ', row_series.values)
```

<https://thispointer.com/pandas-6-different-ways-to-iterate-over-rows-in-a-dataframe-update-while-iterating-row-by-row/>

直接轉成numpy array

```python
df.to_numpy()
```

<https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy>

### colum

```python
# 新增
df.insert(loc, column, value)

#刪除
del df['A']
```

<https://www.delftstack.com/zh-tw/howto/python-pandas/how-to-delete-pandas-dataframe-column/>

### dataframe轉dict

可是value會是list (\[0])

```python
dataframe.to_dict('list')
# {key: [value]}
```

<https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html>

<https://stackoverflow.com/questions/26716616/convert-a-pandas-dataframe-to-a-dictionary>

### 日期轉換

```python
df['DATE'] = df['Date'].apply(lambda x: x.strftime('%Y-%m-%d'))
```

<https://t.codebug.vip/questions-1537063.htm>

### CSV

1.讀取csv : `inputfile=open('filepath','r')`

2.讀取內容 : `data=csv.reader(inputfile)` (list, 包含欄位名稱(第一項\[0]))

Source : [CSV讀寫](http://dewerzht-blog.logdown.com/posts/701780-python-log-with-csv)

### Python之xlsx檔案與csv檔案相互轉換 <a href="#outline__1" id="outline__1"></a>

Workbook轉csv

```python
from openpyxl import Workbook
for sheet in wb.get_sheet_names():
    with open("export/" + file_csv, 'w', newline='') as op:
        csv_file_writer = csv.writer(op)
        sheet_ranges = wb[sheet]
        for row in sheet_ranges.rows:
            row_container = []
            for cell in row:
                if cell.value == None:
                    row_container.append('')
                else:
                    row_container.append(str(cell.value).replace('\n', ''))

            csv_file_writer.writerow(row_container)
```

[https://codertw.com/%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/434360/](https://codertw.com/程式語言/434360/)

<https://blog.csdn.net/qq_33689414/article/details/78307031>

[https://codertw.com/%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/369201/](https://codertw.com/程式語言/369201/)

issue: 輸出csv多一行空白

```
writefile = open('result.csv','w'，newline =‘’)
writer = csv.writer(writefile)
```

<https://blog.csdn.net/pfm685757/article/details/47806469>

## openpyxl

```python
import openpyxl 

wb = openpyxl.Workbook() 
sheet = wb.active
sheet.title = "New Title"

# 新增資料表
ws1 = wb.create_sheet("Mysheet")

sheet.cell(row = 2, column = 2).value = ' everyone '

# 設定欄位長度與寬度
sheet.row_dimensions[1].height = 70
sheet.column_dimensions['B'].width = 20
```

<https://www.geeksforgeeks.org/python-adjusting-rows-and-columns-of-an-excel-file-using-openpyxl-module/>

<https://www.itread01.com/content/1544850004.html>

自動調整寬度尚未測試


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://stb11816.gitbook.io/python_note/web/456.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
