Pandas
官方文件
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy
1.讀取csv : data=pandas.read_csv('filename')
2.瀏覽資料內容 : data.head()
3.讀取特定欄位資料(列,直直的) : detail=data[['name1','name2']......]
3-1. 轉為一般陣列: data['name'].tolist()
3-2. 可用numpy將series或者dataframe型態轉為numpy.array
4.計算各欄位的多種統計資訊(count,mean,std,min,max,25%,50%,75%) : data.describe()
5.所有欄位名稱: data.colums
6.直接選取各筆資料 data[1:]
(list切割方式)
Source:
pandas 資料處理
iloc
data.iloc[2] --> 第三個row (第三筆資料,橫的)
data.iloc[2:,0:2] --> 切割區塊(row,colums),抓取第一筆之後所有資料,但只包含第一、二項特徵
data.iloc[1:4,:] --> 以row選取
data.iloc[:,1:4] --> 以colums選取
iloc[0] --> Series
iloc[[0]] --> DataFrame
Source:Python Pandas DataFrame 的 iloc[0] 與 iloc[[0]]
讀取
import pandas as pd
df = pd.read_csv("City_Zhvi_AllHomes.csv")
df.head(15)
# df.地址,在read_excel()進來時,第一列就會被當成他的屬性。所以df.地址就抓到地址這一行
https://tinycorner.tw/2018/04/24/使用python讀取csv和xlsx檔/
https://tinycorner.tw/2018/04/24/使用python讀取csv和xlsx檔/
處理空值
test['name'] = test['name'].fillna('我是空的!')
判斷nan
import math
x = float('nan')
math.isnan(x) # True
四捨五入
df = pd.DataFrame(np.random.random([3, 3]),
columns=['A', 'B', 'C'], index=['first', 'second', 'third']
df.round(2) # 取至小數點第二位
https://blog.csdn.net/shuibuzhaodeshiren/article/details/84324885
Series轉換
scores = pd.Series({'小明':90, '小華':80, '小李':70, '小強':55})
# 把Series轉成Dataframe只需要使用指令就好
score_df = scores.to_frame() #搞定
https://medium.com/@weilihmen/python-pandas-%E5%9F%BA%E6%9C%AC%E6%93%8D%E4%BD%9C%E6%95%99%E5%AD%B8-%E6%88%90%E7%B8%BE%E8%A1%A8-f6d0ec4f89
轉置矩陣
http://violin-tao.blogspot.com/2017/06/pandas-1-indexing.html
row
用loop讀取
for (index_label, row_series) in empDfObj.iterrows():
print('Row Index label : ', index_label)
print('Row Content as Series : ', row_series.values)
https://thispointer.com/pandas-6-different-ways-to-iterate-over-rows-in-a-dataframe-update-while-iterating-row-by-row/
直接轉成numpy array
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy
colum
# 新增
df.insert(loc, column, value)
#刪除
del df['A']
https://www.delftstack.com/zh-tw/howto/python-pandas/how-to-delete-pandas-dataframe-column/
dataframe轉dict
可是value會是list ([0])
dataframe.to_dict('list')
# {key: [value]}
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html
https://stackoverflow.com/questions/26716616/convert-a-pandas-dataframe-to-a-dictionary
日期轉換
df['DATE'] = df['Date'].apply(lambda x: x.strftime('%Y-%m-%d'))
https://t.codebug.vip/questions-1537063.htm
CSV
1.讀取csv : inputfile=open('filepath','r')
2.讀取內容 : data=csv.reader(inputfile)
(list, 包含欄位名稱(第一項[0]))
Source : CSV讀寫
Python之xlsx檔案與csv檔案相互轉換
Workbook轉csv
from openpyxl import Workbook
for sheet in wb.get_sheet_names():
with open("export/" + file_csv, 'w', newline='') as op:
csv_file_writer = csv.writer(op)
sheet_ranges = wb[sheet]
for row in sheet_ranges.rows:
row_container = []
for cell in row:
if cell.value == None:
row_container.append('')
else:
row_container.append(str(cell.value).replace('\n', ''))
csv_file_writer.writerow(row_container)
https://codertw.com/%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/434360/
https://blog.csdn.net/qq_33689414/article/details/78307031
https://codertw.com/%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/369201/
issue: 輸出csv多一行空白
writefile = open('result.csv','w',newline =‘’)
writer = csv.writer(writefile)
https://blog.csdn.net/pfm685757/article/details/47806469
openpyxl
import openpyxl
wb = openpyxl.Workbook()
sheet = wb.active
sheet.title = "New Title"
# 新增資料表
ws1 = wb.create_sheet("Mysheet")
sheet.cell(row = 2, column = 2).value = ' everyone '
# 設定欄位長度與寬度
sheet.row_dimensions[1].height = 70
sheet.column_dimensions['B'].width = 20
https://www.geeksforgeeks.org/python-adjusting-rows-and-columns-of-an-excel-file-using-openpyxl-module/
https://www.itread01.com/content/1544850004.html
自動調整寬度尚未測試