python pd Series 添加行_Python数据分析与挖掘的常用工具
生活随笔
收集整理的這篇文章主要介紹了
python pd Series 添加行_Python数据分析与挖掘的常用工具
小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
Python語(yǔ)言:簡(jiǎn)要概括一下Python語(yǔ)言在數(shù)據(jù)分析、挖掘場(chǎng)景中常用特性:
a = np.array([2, 0, 1, 5])
print(a)
print(a[:3])
print(a.min())
a.sort() # a被覆蓋
print(a)
b = np.array([[1, 2, 3], [4, 5, 6]])
print(b*b)輸出:[2 0 1 5]
[2 0 1]0
[0 1 2 5]
[[ 1 4 9]
[16 25 36]]
x1 = x[0]
x2 = x[1]return [2 * x1 - x2 ** 2 - 1, x1 ** 2 - x2 - 2]
result = fsolve(f, [1, 1])
print(result)# 積分from scipy import integratedef g(x): # 定義被積函數(shù)return (1 - x ** 2) ** 0.5
pi_2, err = integrate.quad(g, -1, 1) # 輸出積分結(jié)果和誤差
print(pi_2 * 2, err)輸出:1.91963957 model.fit():訓(xùn)練模型,監(jiān)督模型是fit(X,y),無(wú)監(jiān)督模型是fit(X) model.predict(X_new):預(yù)測(cè)新樣本
model.predict_proba(X_new):預(yù)測(cè)概率,僅對(duì)某些模型有用(LR) model.ransform():從數(shù)據(jù)中學(xué)到新的“基空間”
model.fit_transform():從數(shù)據(jù)中學(xué)到的新的基,并將這個(gè)數(shù)據(jù)按照這組“基”進(jìn)行轉(zhuǎn)換Scikit-Learn本身自帶了一些數(shù)據(jù)集,如花卉和手寫(xiě)圖像數(shù)據(jù)集等,下面以花卉數(shù)據(jù)集舉個(gè)栗子,訓(xùn)練集包含4個(gè)維度——萼片長(zhǎng)度、寬度,花瓣長(zhǎng)度和寬度,以及四個(gè)亞屬分類結(jié)果。示例:from sklearn 輸出:? ?0
2017-10-24 19:02:40,785 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2017-10-24 19:02:40,785 : INFO : collected 3 word types from a corpus of 4 raw words and 2 sentences
2017-10-24 19:02:40,785 : INFO : Loading a fresh vocabulary
2017-10-24 19:02:40,785 : INFO : min_count=1 retains 3 unique words (100% of original 3, drops 0)
2017-10-24 19:02:40,785 : INFO : min_count=1 leaves 4 word corpus (100% of original 4, drops 0)
2017-10-24 19:02:40,786 : INFO : deleting the raw counts dictionary of 3 items
2017-10-24 19:02:40,786 : INFO : sample=0.001 downsamples 3 most-common words
2017-10-24 19:02:40,786 : INFO : downsampling leaves estimated 0 word corpus (5.7% of prior 4)
2017-10-24 19:02:40,786 : INFO : estimated required memory for 3 words and 100 dimensions: 3900 bytes
2017-10-24 19:02:40,786 : INFO : resetting layer weights
2017-10-24 19:02:40,786 : INFO : training model with 3 workers on 3 vocabulary and 100 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
2017-10-24 19:02:40,788 : INFO : worker thread finished; awaiting finish of 2 more threads
2017-10-24 19:02:40,788 : INFO : worker thread finished; awaiting finish of 1 more threads
2017-10-24 19:02:40,788 : INFO : worker thread finished; awaiting finish of 0 more threads
2017-10-24 19:02:40,789 : INFO : training on 20 raw words (0 effective words) took 0.0s, 0 effective words/s
2017-10-24 19:02:40,789 : WARNING : under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
[ -1.54225400e-03 -2.45212857e-03 -2.20486755e-03 -3.64410551e-03
-2.28137174e-03 -1.70348200e-03 -1.05830852e-03 -4.37875278e-03
-4.97106137e-03 3.93485563e-04 -1.97932171e-03 -3.40653211e-03
1.54990738e-03 8.97102174e-04 2.94041773e-03 3.45200230e-03
-4.60584508e-03 3.81468004e-03 3.07120802e-03 2.85422982e-04
7.01598416e-04 2.69670971e-03 4.17246483e-03 -6.48593705e-04
1.11404411e-03 4.02203249e-03 -2.34672683e-03 2.35153269e-03
2.32632101e-05 3.76200466e-03 -3.95653257e-03 3.77303245e-03
8.48884694e-04 1.61545759e-03 2.53374409e-03 -4.25464474e-03
-2.06338940e-03 -6.84972096e-04 -6.92955102e-04 -2.27969326e-03
-2.13766913e-03 3.95324081e-03 3.52649018e-03 1.29243149e-03
4.29229392e-03 -4.34781052e-03 2.42843386e-03 3.12117115e-03
-2.99768522e-03 -1.17538485e-03 6.67148328e-04 -6.86432002e-04
-3.58940102e-03 2.40547652e-03 -4.18888079e-03 -3.12567432e-03
-2.51603196e-03 2.53451476e-03 3.65199335e-03 3.35336081e-03
-2.50071986e-04 4.15537134e-03 -3.89242987e-03 4.88173496e-03
-3.34603712e-03 3.18462006e-03 1.57053335e-04 3.51517834e-03
-1.20337342e-03 -1.81524854e-04 3.57784083e-05 -2.36600707e-03
-3.77405947e-03 -1.70441647e-03 -4.51521482e-03 -9.47134569e-04
4.53894213e-03 1.55767589e-03 8.57840874e-04 -1.12304837e-03
-3.95945460e-03 5.37869288e-04 -2.04461766e-03 5.24829782e-04
3.76719423e-03 -4.38512256e-03 4.81262803e-03 -4.20147832e-03
-3.87057988e-03 1.67581497e-03 1.51928759e-03 -1.31744961e-03
3.28474329e-03 -3.28777428e-03 -9.67226923e-04 4.62622894e-03
1.34165725e-03 3.60148447e-03 4.80416557e-03 -1.98963983e-03]
列表(可以被修改),元組(不可以被修改)
字典(結(jié)構(gòu))
集合(同數(shù)學(xué)概念上的集合)
函數(shù)式編程(主要由lambda()、map()、reduce()、filter()構(gòu)成)
Python數(shù)據(jù)分析常用庫(kù):
Python數(shù)據(jù)挖掘相關(guān)擴(kuò)展庫(kù)NumPy
提供真正的數(shù)組,相比Python內(nèi)置列表來(lái)說(shuō)速度更快,NumPy也是Scipy、Matplotlib、Pandas等庫(kù)的依賴庫(kù),內(nèi)置函數(shù)處理數(shù)據(jù)速度是C語(yǔ)言級(jí)別的,因此使用中應(yīng)盡量使用內(nèi)置函數(shù)。示例:NumPy基本操作import numpy as np # 一般以np為別名a = np.array([2, 0, 1, 5])
print(a)
print(a[:3])
print(a.min())
a.sort() # a被覆蓋
print(a)
b = np.array([[1, 2, 3], [4, 5, 6]])
print(b*b)輸出:[2 0 1 5]
[2 0 1]0
[0 1 2 5]
[[ 1 4 9]
[16 25 36]]
Scipy
NumPy和Scipy讓Python有了MATLAB味道。Scipy依賴于NumPy,NumPy提供了多維數(shù)組功能,但只是一般的數(shù)組并不是矩陣。比如兩個(gè)數(shù)組相乘時(shí),只是對(duì)應(yīng)元素相乘。Scipy提供了真正的矩陣,以及大量基于矩陣運(yùn)算的對(duì)象與函數(shù)。Scipy包含功能有最優(yōu)化、線性代數(shù)、積分、插值、擬合、特殊函數(shù)、快速傅里葉變換、信號(hào)處理、圖像處理、常微分方程求解等常用計(jì)算。示例:Scipy求解非線性方程組和數(shù)值積分# 求解方程組from scipy.optimize import fsolvedef f(x):x1 = x[0]
x2 = x[1]return [2 * x1 - x2 ** 2 - 1, x1 ** 2 - x2 - 2]
result = fsolve(f, [1, 1])
print(result)# 積分from scipy import integratedef g(x): # 定義被積函數(shù)return (1 - x ** 2) ** 0.5
pi_2, err = integrate.quad(g, -1, 1) # 輸出積分結(jié)果和誤差
print(pi_2 * 2, err)輸出:1.91963957
Matplotlib
Python中著名的繪圖庫(kù),主要用于二維繪圖,也可以進(jìn)行簡(jiǎn)單的三維繪圖。示例:Matplotlib繪圖基本操作import matplotlib.pyplot 輸出:Pandas
Pandas是Python下非常強(qiáng)大的數(shù)據(jù)分析工具。它建立在NumPy之上,功能很強(qiáng)大,支持類似SQL的增刪改查,并具有豐富的數(shù)據(jù)處理函數(shù),支持時(shí)間序列分析功能,支持靈活處理缺失數(shù)據(jù)等。Pandas基本數(shù)據(jù)結(jié)構(gòu)是Series和DataFrame。Series就是序列,類似一維數(shù)組,DataFrame則相當(dāng)于一張二維表格,類似二維數(shù)組,它每一列都是一個(gè)Series。為定位Series中的元素,Pandas提供了Index對(duì)象,類似主鍵。DataFrame本質(zhì)上是Series的容器。示例:Pandas簡(jiǎn)單操作import pandas 輸出:1Scikit-Learn
Scikit-Learn依賴NumPy、Scipy和Matplotlib,是Python中強(qiáng)大的機(jī)器學(xué)習(xí)庫(kù),提供了諸如數(shù)據(jù)預(yù)處理、分類、回歸、聚類、預(yù)測(cè)和模型分析等功能。示例:創(chuàng)建線性回歸模型from sklearn.linear_model所有模型都提供的接口:
監(jiān)督模型提供的接口:
model.predict_proba(X_new):預(yù)測(cè)概率,僅對(duì)某些模型有用(LR)
無(wú)監(jiān)督模型提供的接口:
model.fit_transform():從數(shù)據(jù)中學(xué)到的新的基,并將這個(gè)數(shù)據(jù)按照這組“基”進(jìn)行轉(zhuǎn)換Scikit-Learn本身自帶了一些數(shù)據(jù)集,如花卉和手寫(xiě)圖像數(shù)據(jù)集等,下面以花卉數(shù)據(jù)集舉個(gè)栗子,訓(xùn)練集包含4個(gè)維度——萼片長(zhǎng)度、寬度,花瓣長(zhǎng)度和寬度,以及四個(gè)亞屬分類結(jié)果。示例:from sklearn 輸出:? ?0
Keras
Keras是基于Theano的深度學(xué)習(xí)庫(kù),它不僅可以搭建普通神經(jīng)網(wǎng)絡(luò),還可以搭建各種深度學(xué)習(xí)模型,如自編碼器、循環(huán)神經(jīng)網(wǎng)絡(luò)、遞歸神經(jīng)網(wǎng)絡(luò)、卷積神經(jīng)網(wǎng)絡(luò)等,運(yùn)行速度也很快,簡(jiǎn)化了搭建各種神經(jīng)網(wǎng)絡(luò)模型的步驟,允許普通用戶輕松搭建幾百個(gè)輸入節(jié)點(diǎn)的深層神經(jīng)網(wǎng)絡(luò),定制度也很高。示例:簡(jiǎn)單的MLP(多層感知器)from keras.models 參考:Keras中文文檔
如何計(jì)算兩個(gè)文檔的相似度(二)
Genism
Genism主要用來(lái)處理語(yǔ)言方面的任務(wù),如文本相似度計(jì)算、LDA、Word2Vec等。示例:import logging輸出:2017-10-24 19:02:40,785 : INFO : collecting all words and their counts2017-10-24 19:02:40,785 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2017-10-24 19:02:40,785 : INFO : collected 3 word types from a corpus of 4 raw words and 2 sentences
2017-10-24 19:02:40,785 : INFO : Loading a fresh vocabulary
2017-10-24 19:02:40,785 : INFO : min_count=1 retains 3 unique words (100% of original 3, drops 0)
2017-10-24 19:02:40,785 : INFO : min_count=1 leaves 4 word corpus (100% of original 4, drops 0)
2017-10-24 19:02:40,786 : INFO : deleting the raw counts dictionary of 3 items
2017-10-24 19:02:40,786 : INFO : sample=0.001 downsamples 3 most-common words
2017-10-24 19:02:40,786 : INFO : downsampling leaves estimated 0 word corpus (5.7% of prior 4)
2017-10-24 19:02:40,786 : INFO : estimated required memory for 3 words and 100 dimensions: 3900 bytes
2017-10-24 19:02:40,786 : INFO : resetting layer weights
2017-10-24 19:02:40,786 : INFO : training model with 3 workers on 3 vocabulary and 100 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
2017-10-24 19:02:40,788 : INFO : worker thread finished; awaiting finish of 2 more threads
2017-10-24 19:02:40,788 : INFO : worker thread finished; awaiting finish of 1 more threads
2017-10-24 19:02:40,788 : INFO : worker thread finished; awaiting finish of 0 more threads
2017-10-24 19:02:40,789 : INFO : training on 20 raw words (0 effective words) took 0.0s, 0 effective words/s
2017-10-24 19:02:40,789 : WARNING : under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
[ -1.54225400e-03 -2.45212857e-03 -2.20486755e-03 -3.64410551e-03
-2.28137174e-03 -1.70348200e-03 -1.05830852e-03 -4.37875278e-03
-4.97106137e-03 3.93485563e-04 -1.97932171e-03 -3.40653211e-03
1.54990738e-03 8.97102174e-04 2.94041773e-03 3.45200230e-03
-4.60584508e-03 3.81468004e-03 3.07120802e-03 2.85422982e-04
7.01598416e-04 2.69670971e-03 4.17246483e-03 -6.48593705e-04
1.11404411e-03 4.02203249e-03 -2.34672683e-03 2.35153269e-03
2.32632101e-05 3.76200466e-03 -3.95653257e-03 3.77303245e-03
8.48884694e-04 1.61545759e-03 2.53374409e-03 -4.25464474e-03
-2.06338940e-03 -6.84972096e-04 -6.92955102e-04 -2.27969326e-03
-2.13766913e-03 3.95324081e-03 3.52649018e-03 1.29243149e-03
4.29229392e-03 -4.34781052e-03 2.42843386e-03 3.12117115e-03
-2.99768522e-03 -1.17538485e-03 6.67148328e-04 -6.86432002e-04
-3.58940102e-03 2.40547652e-03 -4.18888079e-03 -3.12567432e-03
-2.51603196e-03 2.53451476e-03 3.65199335e-03 3.35336081e-03
-2.50071986e-04 4.15537134e-03 -3.89242987e-03 4.88173496e-03
-3.34603712e-03 3.18462006e-03 1.57053335e-04 3.51517834e-03
-1.20337342e-03 -1.81524854e-04 3.57784083e-05 -2.36600707e-03
-3.77405947e-03 -1.70441647e-03 -4.51521482e-03 -9.47134569e-04
4.53894213e-03 1.55767589e-03 8.57840874e-04 -1.12304837e-03
-3.95945460e-03 5.37869288e-04 -2.04461766e-03 5.24829782e-04
3.76719423e-03 -4.38512256e-03 4.81262803e-03 -4.20147832e-03
-3.87057988e-03 1.67581497e-03 1.51928759e-03 -1.31744961e-03
3.28474329e-03 -3.28777428e-03 -9.67226923e-04 4.62622894e-03
1.34165725e-03 3.60148447e-03 4.80416557e-03 -1.98963983e-03]
總結(jié)
以上是生活随笔為你收集整理的python pd Series 添加行_Python数据分析与挖掘的常用工具的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: python操作hive数据库代码_py
- 下一篇: python如何在所有线程结束后执行最后