日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

讯飞开发者大赛-环境空气质量评价挑战赛baseline

發(fā)布時(shí)間:2023/12/29 编程问答 32 豆豆
生活随笔 收集整理的這篇文章主要介紹了 讯飞开发者大赛-环境空气质量评价挑战赛baseline 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

前言

最近訊飛開發(fā)者大賽如火如荼地進(jìn)行著,各賽道賽題都具有挑戰(zhàn)性,大家都可以參與挑戰(zhàn)
大賽地址:http://challenge.xfyun.cn/?ch=ds-sq-bm

環(huán)境空氣質(zhì)量評(píng)價(jià)挑戰(zhàn)賽

數(shù)據(jù)說明

具體的數(shù)據(jù)只有報(bào)名后即可下載,數(shù)據(jù)量并不大,初賽訓(xùn)練集和測(cè)試集都只有幾百條數(shù)據(jù)

評(píng)價(jià)指標(biāo)

本模型依據(jù)提交的結(jié)果文件,利用均方根誤差(RMSE)評(píng)價(jià)模型。
(1) 樣本的相對(duì)綜合污染系數(shù) IPRC,用于判斷樣本之間的相對(duì)污染程度。
(2) 基于IPRC,計(jì)算RMSE. 其中m為樣本數(shù),y為IPRC真實(shí)值,y_pred為IPRC預(yù)測(cè)值。

對(duì)于初學(xué)者來說,有一個(gè)baseline比較好上手,所以初步選了一個(gè)XGBoost模型作為baseline的模型,線上提交結(jié)果分?jǐn)?shù)有0.08247,代碼如下:

import lightgbm as lgb import xgboost as xgb import matplotlib.pyplot as plt import numpy as np import pandas as pd import seaborn as sns import sklearn from sklearn import metrics from sklearn.model_selection import KFold from sklearn.preprocessing import LabelEncoder from sklearn.metrics import mean_squared_error from sklearn.model_selection import StratifiedKFold, KFold import math import datetime from sklearn.preprocessing import LabelEncoder import re from sklearn.linear_model import Ridge from catboost import CatBoostRegressor from sklearn.ensemble import RandomForestRegressor from sklearn import ensemble from sklearn.preprocessing import Imputer from sklearn import preprocessing from sklearn.model_selection import KFold, StratifiedKFold from sklearn.model_selection import cross_val_score, cross_validate, cross_val_predict, GridSearchCV from sklearn.ensemble import RandomForestRegressor from sklearn.metrics import mean_squared_error, make_scorer from sklearn.model_selection import train_test_split,cross_val_score from sklearn.metrics import roc_auc_scoretrain=pd.read_csv('保定2016年.csv') test=pd.read_csv('石家莊20160701-20170701.csv') data=pd.concat([train,test]) #Encoder quality_le = LabelEncoder() quality_le.fit(data['質(zhì)量等級(jí)'].values) data['質(zhì)量等級(jí)'] = quality_le.transform(data['質(zhì)量等級(jí)'].values) #簡(jiǎn)單時(shí)間處理 data['日期'] = pd.to_datetime(data['日期'],format='%Y-%m-%d') data['month']=data['日期'].dt.month data['day']=data['日期'].dt.day data['weekday']=data['日期'].dt.weekdaytrain_new=data[data['IPRC'].notnull()] test_new=data[data['IPRC'].isnull()]train_x = train_new.drop(['日期','IPRC'],axis=1) # 訓(xùn)練集輸入 target = train_new['IPRC'] # 訓(xùn)練集標(biāo)簽 test_x = test_new.drop(['日期','IPRC'],axis=1) # 測(cè)試集輸入 #xgb xlf=xgb.XGBRegressor(max_depth=7,learning_rate=0.05,n_estimators=10000,subsample=0.8) answers = [] score = 0 n_fold = 5 folds = KFold(n_splits=n_fold, shuffle=True,random_state=1314) for fold_n, (train_index, valid_index) in enumerate(folds.split(train_x)):X_train, X_valid = train_x.iloc[train_index], train_x.iloc[valid_index]y_train, y_valid = target[train_index], target[valid_index]xlf.fit(X_train,y_train,eval_set=[(X_valid, y_valid)],verbose=100,early_stopping_rounds=100)y_pre=xlf.predict(X_valid)print('每一折驗(yàn)證分?jǐn)?shù):'+str(mean_squared_error(y_valid,y_pre)))score = score + mean_squared_error(y_valid,y_pre)y_pred_valid = xlf.predict(test_x)answers.append(y_pred_valid) xgb_pre=sum(answers)/n_fold print('xgb驗(yàn)證分?jǐn)?shù)'+str(math.sqrt(score/n_fold))) result=pd.DataFrame() result['date']=test['日期'] result['IPRC']=xgb_pre result.to_csv('空氣質(zhì)量.csv',index=False)#保存結(jié)果

寫在最后

本人才疏學(xué)淺,如果有錯(cuò)誤的地方請(qǐng)包涵并指正,有問題也可以提出討論,祝大家在大賽中取得好成績(jī)!

總結(jié)

以上是生活随笔為你收集整理的讯飞开发者大赛-环境空气质量评价挑战赛baseline的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。