當(dāng)前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

pytorch深度学习参加平安银行数据大赛，从驾驶行为预测驾驶风险

發(fā)布時間：2024/3/24 pytorch 56 豆豆

生活随笔收集整理的這篇文章主要介紹了 pytorch深度学习参加平安银行数据大赛，从驾驶行为预测驾驶风险小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

比賽鏈接http://www.datafountain.cn/#/competitions/284/intro
本賽題提供部分客戶1分鐘級駕駛行為數(shù)據(jù)及對應(yīng)客戶的賠付率作為訓(xùn)練集，包括經(jīng)緯度定位及駕駛狀態(tài)等（已脫敏），參賽隊伍需要對其進(jìn)行數(shù)據(jù)挖掘和必要的機器學(xué)習(xí)訓(xùn)練。另外，我們會提供同期其他部分客戶的駕駛行為數(shù)據(jù)來做評測，檢測您的算法是否能準(zhǔn)確的識別出當(dāng)時客戶的駕駛風(fēng)險。
與以往比賽不同的是, 由于數(shù)據(jù)安全的問題本次比賽數(shù)據(jù)訓(xùn)練集跟評測集不對外公開, 參賽選手提交的不在是數(shù)據(jù)評測結(jié)果而是平臺可執(zhí)行的Python代碼. 具體提交要求請參見[作品要求]. 為方便選手更好的理解數(shù)據(jù),企業(yè)將會提供部分樣例數(shù)據(jù).
比賽內(nèi)容我不再重復(fù)，簡單來說就是從司機的駕駛行為數(shù)據(jù)預(yù)測保險賠付率。

數(shù)據(jù)示例

其中第一列為用戶的id 下載數(shù)據(jù)集合后該示例數(shù)據(jù)有一百個用戶
trip_id為用戶行程id
后面兩列為經(jīng)緯度當(dāng)時行駛的海拔速度電話是否能接通時間
Y值為賠付比例

讀取數(shù)據(jù)
采用pandas讀取數(shù)據(jù)
清洗數(shù)據(jù)，對于該數(shù)據(jù)表中速度方向都不正常的值予以清除

特征工程
可以猜想一下什么樣開車的人容易出事故，哪些路段容易出事故
對此我們做了一個前端顯示效果對比 Y =0 和 Y>0的行駛軌跡

先將數(shù)據(jù)放入數(shù)據(jù)庫，數(shù)據(jù)庫表結(jié)構(gòu)如下，該數(shù)據(jù)庫主要為了取數(shù)據(jù)方便用，不必詳細(xì)考慮數(shù)據(jù)類型：

最后一列是為了將表格中的時間轉(zhuǎn)化為常用的時間，原來的時間為自從1970年1月1號開始的時間

我們從數(shù)據(jù)庫中調(diào)出不同的行駛軌跡：
這是Y=0的行駛軌跡：

這是Y>0的行駛軌跡：

如果將這兩個圖在百度地圖上詳細(xì)對比發(fā)現(xiàn)路段和賠付比例沒有必然聯(lián)系，暫時我們就不把經(jīng)緯度作為一個特征。

駕駛數(shù)據(jù)的時間分布：
我們可以假設(shè)一個人喜歡晚上開車會比較容易出事故。
所以抽取他的定位數(shù)據(jù)在24個小時的分布作為特征。

假設(shè)有人喜歡急剎，擅長大轉(zhuǎn)彎也容易出事
所以將速度的一階差分，和速度均值，方向的一階差分作為特征

抽取特征的程序如下：

data = pd.read_csv(path_train) train1 = [] label1 = []#分類標(biāo)簽 label2 = []#回歸標(biāo)簽 alluser = data['TERMINALNO'].nunique() # Feature Engineer, 對每一個用戶生成特征: #計算特征：時間24小時分布速度方差速度均值電話狀態(tài)分布 for item in data['TERMINALNO'].unique():temp = data.loc[data['TERMINALNO'] == item,:]temp.index = range(len(temp))# trip 特征num_of_trips = temp['TRIP_ID'].nunique()# record 特征num_of_records = temp.shape[0]num_of_state = temp[['TERMINALNO','CALLSTATE']]nsh = num_of_state.shape[0]num_of_state_0 = num_of_state.loc[num_of_state['CALLSTATE']==0].shape[0]/float(nsh)num_of_state_1 = num_of_state.loc[num_of_state['CALLSTATE']==1].shape[0]/float(nsh)num_of_state_2 = num_of_state.loc[num_of_state['CALLSTATE']==2].shape[0]/float(nsh)num_of_state_3 = num_of_state.loc[num_of_state['CALLSTATE']==3].shape[0]/float(nsh)num_of_state_4 = num_of_state.loc[num_of_state['CALLSTATE']==4].shape[0]/float(nsh)del num_of_state# 時間特征# temp['weekday'] = temp['TIME'].apply(lambda x:datetime.datetime.fromtimestamp(x).weekday())temp['hour'] = temp['TIME'].apply(lambda x:datetime.datetime.fromtimestamp(x).hour)hour_state = np.zeros([24,1])for i in range(24):hour_state[i] = temp.loc[temp['hour']==i].shape[0]/float(nsh)# 駕駛行為特征mean_speed = temp['SPEED'].mean()#速度均值std_speed = temp['SPEED'].std()#速度標(biāo)準(zhǔn)差mean_height = temp['HEIGHT'].mean()#海拔均值std_height = temp['HEIGHT'].std()#海拔標(biāo)準(zhǔn)差diffmean_direction = temp['DIRECTION'].diff().mean()#方向一階差分均值# 添加labeltarget = temp.loc[0, 'Y']# 所有特征feature = [num_of_trips,num_of_records,num_of_state_0,num_of_state_1,num_of_state_2,num_of_state_3,num_of_state_4,\mean_speed,std_speed,mean_height,std_height,diffmean_direction\,float(hour_state[0]),float(hour_state[1]),float(hour_state[2]),float(hour_state[3]),float(hour_state[4]),float(hour_state[5]),float(hour_state[6]),float(hour_state[7]),float(hour_state[8]),float(hour_state[9]),float(hour_state[10]),float(hour_state[11]),float(hour_state[12]),float(hour_state[13]),float(hour_state[14]),float(hour_state[15]),float(hour_state[16]),float(hour_state[17]),float(hour_state[18]),float(hour_state[19]),float(hour_state[20]),float(hour_state[21]),float(hour_state[22]),float(hour_state[23])]train1.append(feature)label2.append([target])

讀取了數(shù)據(jù)之后做一個分類預(yù)測，將Y值分為三類，分別是=0 小于均值和其他
打標(biāo)簽程序如下：

mean_y = np.mean(np.array(label2)) for label in label2:if label[0] == 0:label1.append([0])elif label[0] <= mean_y:label1.append([1])else:label1.append([2])

然后構(gòu)建神經(jīng)網(wǎng)絡(luò)：
通過我對參數(shù)的調(diào)整，以下神經(jīng)網(wǎng)絡(luò)可以較好擬合數(shù)據(jù)，所以不再加深和加寬（增加神經(jīng)元個數(shù)）：

訓(xùn)練代碼如下：

import torch from torch.autograd import Variable import torch.nn.functional as F #數(shù)據(jù)類型轉(zhuǎn)化為tensorx = torch.Tensor(train1) y = torch.Tensor(label1) y = torch.cat((y), ).type(torch.LongTensor) # shape (100,) LongTensor = 64-bit integer x, y = Variable(x), Variable(y) #構(gòu)造分類模型 net1 = torch.nn.Sequential(torch.nn.Linear(36, 13),torch.nn.ReLU(),torch.nn.Linear(13, 13),torch.nn.ReLU(),torch.nn.Linear(13, 3) ) print(net1) # 輸出網(wǎng)絡(luò)模型 optimizer = torch.optim.SGD(net1.parameters(), lr=0.002) loss_func = torch.nn.CrossEntropyLoss() #訓(xùn)練分類模型 len_data = len(label1) for t in range(100000):out = net1(x) # input x and predict based on xloss = loss_func(out, y) # must be (1. nn output, 2. target), the target label is NOT one-hottedoptimizer.zero_grad() # clear gradients for next trainloss.backward() # backpropagation, compute gradientsoptimizer.step() # apply gradientsif t % 100 == 0:print("第" + str(t) + "次訓(xùn)練")prediction = torch.max(out, 1)[1]pred_y = prediction.data.numpy().squeeze()target_y = y.data.numpy()accuracy = sum(pred_y == target_y)/(len_data+0.0)print('Accuracy=%.2f' % accuracy)if accuracy>0.95:# 保存模型torch.save(net1, 'model/net1.pkl')print("保存分類模型")breakif t ==99999:torch.save(net1, 'model/net1.pkl')print("保存分類模型")

然后對所有Y>0的數(shù)據(jù)做回歸，回歸模型如下：

訓(xùn)練代碼如下：

#讀取數(shù)據(jù) data = pd.read_csv(path_train) train1 = []#回歸訓(xùn)練數(shù)據(jù)集 label2 = []#回歸標(biāo)簽 alluser = data['TERMINALNO'].nunique() # Feature Engineer, 對每一個用戶生成特征: #計算特征：時間24小時分布速度方差速度均值電話狀態(tài)分布 for item in data['TERMINALNO'].unique():temp = data.loc[data['TERMINALNO'] == item,:]temp.index = range(len(temp))# trip 特征num_of_trips = temp['TRIP_ID'].nunique()# record 特征num_of_records = temp.shape[0]num_of_state = temp[['TERMINALNO','CALLSTATE']]nsh = num_of_state.shape[0]num_of_state_0 = num_of_state.loc[num_of_state['CALLSTATE']==0].shape[0]/float(nsh)num_of_state_1 = num_of_state.loc[num_of_state['CALLSTATE']==1].shape[0]/float(nsh)num_of_state_2 = num_of_state.loc[num_of_state['CALLSTATE']==2].shape[0]/float(nsh)num_of_state_3 = num_of_state.loc[num_of_state['CALLSTATE']==3].shape[0]/float(nsh)num_of_state_4 = num_of_state.loc[num_of_state['CALLSTATE']==4].shape[0]/float(nsh)del num_of_state# 時間特征# temp['weekday'] = temp['TIME'].apply(lambda x:datetime.datetime.fromtimestamp(x).weekday())temp['hour'] = temp['TIME'].apply(lambda x:datetime.datetime.fromtimestamp(x).hour)hour_state = np.zeros([24,1])for i in range(24):hour_state[i] = temp.loc[temp['hour']==i].shape[0]/float(nsh)# 駕駛行為特征mean_speed = temp['SPEED'].mean()#速度均值std_speed = temp['SPEED'].std()#速度標(biāo)準(zhǔn)差mean_height = temp['HEIGHT'].mean()#海拔均值std_height = temp['HEIGHT'].std()#海拔標(biāo)準(zhǔn)差diffmean_direction = temp['DIRECTION'].diff().mean()#方向一階差分均值# 添加labeltarget = temp.loc[0, 'Y']# 所有特征feature = [num_of_trips,num_of_records,num_of_state_0,num_of_state_1,num_of_state_2,num_of_state_3,num_of_state_4,\mean_speed,std_speed,mean_height,std_height,diffmean_direction\,float(hour_state[0]),float(hour_state[1]),float(hour_state[2]),float(hour_state[3]),float(hour_state[4]),float(hour_state[5]),float(hour_state[6]),float(hour_state[7]),float(hour_state[8]),float(hour_state[9]),float(hour_state[10]),float(hour_state[11]),float(hour_state[12]),float(hour_state[13]),float(hour_state[14]),float(hour_state[15]),float(hour_state[16]),float(hour_state[17]),float(hour_state[18]),float(hour_state[19]),float(hour_state[20]),float(hour_state[21]),float(hour_state[22]),float(hour_state[23])]if target>0:#只用大于零的做回歸train1.append(feature)label2.append([target]) #歸一化 train1 = AutoNorm(train1) import torch from torch.autograd import Variable #構(gòu)造回歸模型 net2 = torch.nn.Sequential(torch.nn.Linear(36, 4),torch.nn.ReLU(),torch.nn.Linear(4, 4),torch.nn.ReLU(),torch.nn.Linear(4,1)) print(net2) # 輸出網(wǎng)絡(luò)模型 #訓(xùn)練回歸模型 x = torch.Tensor(train1) y = torch.Tensor(label2) y = torch.cat((y), ).type(torch.FloatTensor) # shape (100,) LongTensor = 64-bit integer y = torch.unsqueeze(y, 1) x, y = Variable(x), Variable(y)optimizer = torch.optim.SGD(net2.parameters(),lr=0.02) loss_func = torch.nn.MSELoss() for t in range(100000):prediction = net2(x)loss = loss_func(prediction,y)optimizer.zero_grad()loss.backward()optimizer.step()if t % 10 ==0:print("第" + str(t) + "次訓(xùn)練")print(loss.data[0])#如果誤差小于%5即可停止訓(xùn)練保存模型if loss.data[0]/mean_y<0.05:torch.save(net2, 'model/net2.pkl')print("保存分類模型")breakif t ==99999:torch.save(net2, 'model/net2.pkl')print("保存分類模型")

這里有兩個要注意的地方，在數(shù)據(jù)輸入神經(jīng)網(wǎng)絡(luò)之前，為了避免不同的數(shù)據(jù)類型即特征之間量綱不一樣導(dǎo)致系統(tǒng)失靈的問題，要對所有的特征數(shù)據(jù)做歸一化：
歸一化代碼如下：

#歸一化 def width(lst):i = 0for j in lst[0]:i = i + 1return idef AutoNorm(mat):n = len(mat)m = width(mat)MinNum = [9999999999] * mMaxNum = [0] * mfor i in mat:for j in range(0, m):if i[j] > MaxNum[j]:MaxNum[j] = i[j]for p in mat:for q in range(0, m):if p[q] <= MinNum[q]:MinNum[q] = p[q]section = list(map(lambda x: x[0] - x[1], zip(MaxNum, MinNum)))print (section)NormMat = []for k in mat:distance = list(map(lambda x: x[0] - x[1], zip(k, MinNum)))value = list(map(lambda x: x[0] / x[1], zip(distance, section)))NormMat.append(value)return NormMat

另外需要說明在訓(xùn)練模型的時候如何解決過擬合的問題：
學(xué)習(xí)曲線是什么？

學(xué)習(xí)曲線就是通過畫出不同訓(xùn)練集大小時訓(xùn)練集和交叉驗證的準(zhǔn)確率，可以看到模型在新數(shù)據(jù)上的表現(xiàn)，進(jìn)而來判斷模型是否方差偏高或偏差過高，以及增大訓(xùn)練集是否可以減小過擬合。

當(dāng)訓(xùn)練集和測試集的誤差收斂但卻很高時，為高偏差。
左上角的偏差很高，訓(xùn)練集和驗證集的準(zhǔn)確率都很低，很可能是欠擬合。
我們可以增加模型參數(shù)，比如，構(gòu)建更多的特征，減小正則項。
此時通過增加數(shù)據(jù)量是不起作用的。

當(dāng)訓(xùn)練集和測試集的誤差之間有大的差距時，為高方差。
當(dāng)訓(xùn)練集的準(zhǔn)確率比其他獨立數(shù)據(jù)集上的測試結(jié)果的準(zhǔn)確率要高時，一般都是過擬合。
右上角方差很高，訓(xùn)練集和驗證集的準(zhǔn)確率相差太多，應(yīng)該是過擬合。
我們可以增大訓(xùn)練集，降低模型復(fù)雜度，增大正則項，或者通過特征選擇減少特征數(shù)。

理想情況是是找到偏差和方差都很小的情況，即收斂且誤差較小。

最后即可載入已經(jīng)訓(xùn)練好的神經(jīng)網(wǎng)絡(luò)預(yù)測結(jié)果：

#載入網(wǎng)絡(luò)做分類預(yù)測 import torch x = torch.Tensor(test1) from torch.autograd import Variable x = Variable(x) print("載入分類網(wǎng)絡(luò)") net1 = torch.load('model/net1.pkl') out = net1(x) prediction = torch.max(out, 1)[1] pred_y = prediction.data.numpy().tolist() x = torch.Tensor(test2) x = Variable(x) #載入網(wǎng)絡(luò)模型 net2 = torch.load('model/net2.pkl') prediction = net2(x) y =prediction.data.numpy().tolist() print(y)

當(dāng)然此處要完成這個比賽還是很不夠的，還需要綜合多個模型集成才能準(zhǔn)確預(yù)測。

轉(zhuǎn)載于:https://blog.51cto.com/yixianwei/2118759

總結(jié)

以上是生活随笔為你收集整理的pytorch深度学习参加平安银行数据大赛，从驾驶行为预测驾驶风险的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：【Multisim仿真74LS160同步
下一篇：凯斯轴承数据故障诊断PHM轴承寿命预测深

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

pytorch

pytorch深度学习参加平安银行数据大赛，从驾驶行为预测驾驶风险

總結(jié)