日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

【Python学习系列二十九】scikit-learn库实现天池平台智慧交通预测赛

發布時間:2025/4/16 python 42 豆豆
生活随笔 收集整理的這篇文章主要介紹了 【Python学习系列二十九】scikit-learn库实现天池平台智慧交通预测赛 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1、背景:https://tianchi.aliyun.com/competition/introduction.htm?spm=5176.100066.0.0.3f6e7d83RQgWEL&raceId=231598

? ? ? ? ? ? ? ? 分數是0.58比較弱,代碼這里參考。

2、通過比賽提取的特征如下:

特征類型說明
link_IDstring每條路段(link)唯一標識
link_seqint132條路段從1-132編號;
lengthintlink長度(米)
widthintlink寬度(米)
link_classintlink道路等級,如1代表主干道
datestring日期,如2015-10-01
weekint星期,根據日期映射到星期,從1到7
time_intervalstring時間段,如[2015-09-01 00:00:00,2015-09-01 00:02:00)
time_slotint時間片,根據時間段映射一天24小時,從1到720,每段2分
avg_travel_timefloat該時間片平均旅行時間的均值,反應集中趨勢
inlinks_atl_1float車輛在該路段上(timeslot-上游平均旅行時間)時間段的平均旅行時間,上游最多4個路段匯入,如果小于4,則大于4的為0。
測試集中,該值是通過決策樹回歸預測出來的。
inlinks_atl_2float
inlinks_atl_3float
inlinks_atl_4float
inlinks_atl_1float車輛在該路段上(timeslot+平均旅行時間)時間段的平均旅行時間,下游最多4個路段匯出,如果小于4,則大于4的為0。
測試集中,該值是通過決策樹回歸預測出來的。
inlinks_atl_2float
inlinks_atl_3float
inlinks_atl_4float
travel_timefloat車輛在該路段上的平均旅行時間(秒)
3、代碼參考:

# -*- coding: utf-8 -*-import pandas as pd import time import numpy as np from sklearn import metrics from sklearn import tree from sklearn.linear_model import LinearRegressiondef main():#加載標記數據label_ds=pd.read_csv(r"link_train_0801.txt",sep='\t',encoding='utf8',\names=['link_id','link_seq','length','width','link_class','start_date','week','time_interval','time_slot','travel_time',\'avg_travel_time','sd_travel_time','inlinks_num','outlinks_num','inlinks_avg_travel_time','outlinks_avg_travel_time',\'inlinks_atl_1','inlinks_atl_2','inlinks_atl_3','inlinks_atl_4','outlinks_atl_1','outlinks_atl_2','outlinks_atl_3','outlinks_atl_4']) label_ds["link_id"] = label_ds["link_id"].astype("string")label_ds["link_seq"] = label_ds["link_seq"].astype("int")label_ds["length"] = label_ds["length"].astype("int")label_ds["width"] = label_ds["width"].astype("int")label_ds["link_class"] = label_ds["link_class"].astype("int")label_ds["start_date"] = label_ds["start_date"].astype("string")label_ds["week"] = label_ds["week"].astype("int")label_ds["time_interval"] = label_ds["time_interval"].astype("string")label_ds["time_slot"] = label_ds["time_slot"].astype("int")label_ds["travel_time"] = label_ds["travel_time"].astype("float")label_ds["avg_travel_time"] = label_ds["avg_travel_time"].astype("float")label_ds["sd_travel_time"] = label_ds["sd_travel_time"].astype("float")label_ds["inlinks_num"] = label_ds["inlinks_num"].astype("int")label_ds["outlinks_num"] = label_ds["outlinks_num"].astype("int")label_ds["inlinks_avg_travel_time"] = label_ds["inlinks_avg_travel_time"].astype("float")label_ds["outlinks_avg_travel_time"] = label_ds["outlinks_avg_travel_time"].astype("float")label_ds["inlinks_atl_1"] = label_ds["inlinks_atl_1"].astype("float")label_ds["inlinks_atl_2"] = label_ds["inlinks_atl_2"].astype("float")label_ds["inlinks_atl_3"] = label_ds["inlinks_atl_3"].astype("float")label_ds["inlinks_atl_4"] = label_ds["inlinks_atl_4"].astype("float")label_ds["outlinks_atl_1"] = label_ds["outlinks_atl_1"].astype("float")label_ds["outlinks_atl_2"] = label_ds["outlinks_atl_2"].astype("float")label_ds["outlinks_atl_3"] = label_ds["outlinks_atl_3"].astype("float")label_ds["outlinks_atl_4"] = label_ds["outlinks_atl_4"].astype("float")#加載預測數據 unlabel_ds=pd.read_csv(r"link_test_0801.txt",sep='\t',encoding='utf8',\names=['link_id','link_seq','length','width','link_class','start_date','week','time_interval','time_slot',\'avg_travel_time','sd_travel_time','inlinks_num','outlinks_num','inlinks_avg_travel_time','outlinks_avg_travel_time',\'inlinks_atl_1','inlinks_atl_2','inlinks_atl_3','inlinks_atl_4','outlinks_atl_1','outlinks_atl_2','outlinks_atl_3','outlinks_atl_4']) unlabel_ds["link_id"] = unlabel_ds["link_id"].astype("string")unlabel_ds["link_seq"] = unlabel_ds["link_seq"].astype("int")unlabel_ds["length"] = unlabel_ds["length"].astype("int")unlabel_ds["width"] = unlabel_ds["width"].astype("int")unlabel_ds["link_class"] = unlabel_ds["link_class"].astype("int")unlabel_ds["start_date"] = unlabel_ds["start_date"].astype("string")unlabel_ds["week"] = unlabel_ds["week"].astype("int")unlabel_ds["time_interval"] = unlabel_ds["time_interval"].astype("string")unlabel_ds["time_slot"] = unlabel_ds["time_slot"].astype("int")unlabel_ds["avg_travel_time"] = unlabel_ds["avg_travel_time"].astype("float")unlabel_ds["sd_travel_time"] = unlabel_ds["sd_travel_time"].astype("float")unlabel_ds["inlinks_num"] = unlabel_ds["inlinks_num"].astype("int")unlabel_ds["outlinks_num"] = unlabel_ds["outlinks_num"].astype("int")unlabel_ds["inlinks_avg_travel_time"] = unlabel_ds["inlinks_avg_travel_time"].astype("float")unlabel_ds["outlinks_avg_travel_time"] = unlabel_ds["outlinks_avg_travel_time"].astype("float")unlabel_ds["inlinks_atl_1"] = unlabel_ds["inlinks_atl_1"].astype("float")unlabel_ds["inlinks_atl_2"] = unlabel_ds["inlinks_atl_2"].astype("float")unlabel_ds["inlinks_atl_3"] = unlabel_ds["inlinks_atl_3"].astype("float")unlabel_ds["inlinks_atl_4"] = unlabel_ds["inlinks_atl_4"].astype("float")unlabel_ds["outlinks_atl_1"] = unlabel_ds["outlinks_atl_1"].astype("float")unlabel_ds["outlinks_atl_2"] = unlabel_ds["outlinks_atl_2"].astype("float")unlabel_ds["outlinks_atl_3"] = unlabel_ds["outlinks_atl_3"].astype("float")unlabel_ds["outlinks_atl_4"] = unlabel_ds["outlinks_atl_4"].astype("float")outit=pd.DataFrame()#輸出結果mr_df=pd.DataFrame()#輸出link的mape和rmsemape=0;rmse=0;train_df=label_ds.loc[(pd.to_datetime(label_ds["start_date"])<'2016-06-01')]#訓練集valid_df=label_ds.loc[(pd.to_datetime(label_ds["start_date"])>='2016-06-01')]#驗證集train_df.sample(frac=0.2) for linkid in range(1,133):#提取訓練集、驗證集、測試集 train_df_id=train_df.loc[(train_df["link_seq"]==linkid)]print "訓練集,有", train_df_id.shape[0], "行", train_df_id.shape[1], "列" valid_df_id=valid_df.loc[(valid_df["link_seq"]==linkid)] print "驗證集,有", valid_df_id.shape[0], "行", valid_df_id.shape[1], "列"test_df=unlabel_ds.loc[(unlabel_ds["link_seq"]==linkid)]#測試集print "測試集,有", test_df.shape[0], "行", test_df.shape[1], "列"#特征選擇#模型訓練train_X=train_df_id[['link_seq','time_slot','length','avg_travel_time',\'inlinks_atl_1','inlinks_atl_2','inlinks_atl_3','inlinks_atl_4','outlinks_atl_1','outlinks_atl_2','outlinks_atl_3','outlinks_atl_4']]train_X=train_X.fillna(0)#空值替換為0train_y = train_df_id['travel_time']#標記model_it=LinearRegression()#tree.DecisionTreeRegressor()model_it.fit(train_X, train_y) #模型驗證valid_X=valid_df_id[['link_seq','time_slot','length','avg_travel_time',\'inlinks_atl_1','inlinks_atl_2','inlinks_atl_3','inlinks_atl_4','outlinks_atl_1','outlinks_atl_2','outlinks_atl_3','outlinks_atl_4']]valid_X=valid_X.fillna(0)#空值替換為0valid_y=valid_df_id['travel_time']pre_valid_y=model_it.predict(valid_X)abs_y=abs(pre_valid_y-valid_y)abs_error=abs_y.sum()#求和mape_id=abs_error/valid_df_id.shape[0]rmse_id=np.sqrt(metrics.mean_squared_error(valid_y, pre_valid_y))#均方差,模型評估print "linkseq="+str(linkid)+"的mape=",mape_idprint "linkseq="+str(linkid)+"的RMSE=",rmse_idmr_list=[[linkid,mape_id,rmse_id]]mr_df=mr_df.append(mr_list)mape=mape+mape_idrmse=rmse+rmse_id#模型預測test_X = test_df[['link_seq','time_slot','length','avg_travel_time',\'inlinks_atl_1','inlinks_atl_2','inlinks_atl_3','inlinks_atl_4','outlinks_atl_1','outlinks_atl_2','outlinks_atl_3','outlinks_atl_4']] test_X=test_X.fillna(0)#空值替換為0test_info = test_df[['link_id','start_date','time_interval']]test_y=model_it.predict(test_X) test_info["travel_time"]=test_youtit=outit.append(test_info)#追加到輸出結果print "all mape:",mape/132print "all RMSE:",rmse/132mr_df.to_csv('linkmape.txt',sep='#',index=False,header=None)outit.to_csv('outit.txt',sep='#',index=False,header=None)#輸出預測數據 #執行 if __name__ == '__main__': start = time.clock() main()end = time.clock() print('finish all in %s' % str(end - start))

總結

以上是生活随笔為你收集整理的【Python学习系列二十九】scikit-learn库实现天池平台智慧交通预测赛的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。