tushare实战LSTM实现黄金价格预测
tushare實(shí)戰(zhàn)LSTM實(shí)現(xiàn)黃金價(jià)格預(yù)測(cè)
文章目錄
- tushare實(shí)戰(zhàn)LSTM實(shí)現(xiàn)黃金價(jià)格預(yù)測(cè)
- 拉取數(shù)據(jù)
- 數(shù)據(jù)預(yù)處理
- 訓(xùn)練模型
- 模型預(yù)測(cè)及查看效果
- 先看整體情況
- 選取特定的一小段查看
- 結(jié)果分析
拉取數(shù)據(jù)
老樣子,之前tushare實(shí)戰(zhàn)分析黃金與美元收益率關(guān)系的時(shí)候也是這樣,注意: pro_api中的東西是tushare的token
# 導(dǎo)入tushare import tushare as ts # 初始化pro接口 pro = ts.pro_api('xxx')# 拉取數(shù)據(jù) df = pro.fx_daily(**{"ts_code": "XAUUSD.FXCM","trade_date": "","start_date": 20160910,"end_date": 20210910,"exchange": "FXCM","limit": "","offset": "" }, fields=["ts_code","trade_date","bid_open","bid_close","bid_high","bid_low","ask_open","ask_close","ask_high","ask_low","tick_qty" ]) print(df) df.set_index('trade_date',inplace=True) df.to_csv('黃金數(shù)據(jù)2016-9-10至2021-9-10.csv')數(shù)據(jù)預(yù)處理
LSTM模型的核心是用一個(gè)序列數(shù)據(jù)去預(yù)測(cè)未來的數(shù)據(jù),序列數(shù)據(jù)的構(gòu)造思路: 構(gòu)造一個(gè)隊(duì)列,將每日的數(shù)據(jù)視為一個(gè)個(gè)體,當(dāng)后一個(gè)個(gè)體進(jìn)入隊(duì)列的時(shí)候,就會(huì)擠出隊(duì)首的個(gè)體,然后在每個(gè)時(shí)刻都‘拍照’記錄下隊(duì)列的情況,就可以得到一個(gè)三維數(shù)據(jù)(len, men_day, attribute)其中l(wèi)en是序列的個(gè)數(shù),假設(shè)數(shù)據(jù)集中有100條數(shù)據(jù),我們將5天作為一個(gè)序列,那么就會(huì)有100-5+1=96個(gè)序列,men_day就是那個(gè)5天的5,表示序列的長(zhǎng)度,attribute則是原數(shù)據(jù)的協(xié)變量(就是數(shù)據(jù)的特征);
了解了輸入數(shù)據(jù)后,我們就要處理輸出數(shù)據(jù),假設(shè)我們想用截止到今天的數(shù)據(jù)去預(yù)測(cè)未來五天的價(jià)格,那么構(gòu)造訓(xùn)練集的時(shí)候,就要將價(jià)格數(shù)據(jù)作為標(biāo)簽,向前shift5天;也就是在一條數(shù)據(jù)中,除了那些他原本就有的數(shù)據(jù),還會(huì)多一個(gè)5天之后的價(jià)格。按照這個(gè)思路,寫出數(shù)據(jù)預(yù)處理函數(shù),其中deque表示的就是那個(gè)隊(duì)列
mem_his_days就是序列的長(zhǎng)度,pre_days就是預(yù)測(cè)未來幾天的價(jià)格
def Stock_Price_LSTM_Data_Precesing(df,mem_his_days,pre_days):df.dropna(inplace=True)df.sort_index(inplace=True)df.drop(columns='ts_code',inplace=True)# 將ask_open向前移動(dòng)了pre_days天df['label'] = df['ask_open'].shift(-pre_days)from sklearn.preprocessing import StandardScalerscaler = StandardScaler()sca_X = scaler.fit_transform(df.iloc[:,:-1])from collections import dequedeq = deque(maxlen=mem_his_days)X = []for i in sca_X:deq.append(list(i))if len(deq)==mem_his_days:X.append(list(deq))X_lately = X[-pre_days:]X = X[:-pre_days]y = df['label'].values[mem_his_days-1:-pre_days]import numpy as npX = np.array(X)y = np.array(y)return X,y,X_lately訓(xùn)練模型
終于到我們熟悉的調(diào)參環(huán)節(jié),我調(diào)了好一會(huì)兒,最終選定的是mem_days、Lstm層數(shù)、隱藏層數(shù)、隱層神經(jīng)元個(gè)數(shù)分別是5、3、2、64,朋友們也可以自己取調(diào)一調(diào)
pre_days = 5 # mem_days = [5,10,15] # lstm_layers = [1,2] # dense_layers = [1,2,3] # units = [16,32] mem_days = [5] lstm_layers = [3] dense_layers = [2] units = [64] from tensorflow.keras.callbacks import ModelCheckpoint for the_mem_days in mem_days:for the_lstm_layers in lstm_layers:for the_dense_layers in dense_layers:for the_units in units:filepath = './models_only_problem/{val_mape:.2f}_{epoch:02d}_'+f'men_{the_mem_days}_lstm_{the_lstm_layers}_dense_{the_dense_layers}_unit_{the_units}'checkpoint = ModelCheckpoint(filepath=filepath,save_weights_only=False,monitor='val_mape',mode='min',save_best_only=True)X,y,X_lately = Stock_Price_LSTM_Data_Precesing(golden,the_mem_days,pre_days)from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X,y,shuffle=False,test_size=0.1)from tensorflow.keras.models import Sequentialfrom tensorflow.keras.layers import LSTM, Dense, Dropoutmodel = Sequential()model.add(LSTM(the_units,input_shape=X.shape[1:],activation='relu',return_sequences=True))model.add(Dropout(0.1))for i in range(the_lstm_layers):model.add(LSTM(the_units,activation='relu',return_sequences=True))model.add(Dropout(0.1))model.add(LSTM(the_units,activation='relu'))model.add(Dropout(0.1))for i in range(the_dense_layers):model.add(Dense(the_units,activation='relu'))model.add(Dropout(0.1))model.add(Dense(1))model.compile(optimizer='adam',loss='mse',metrics=['mape'])model.fit(X_train,y_train,batch_size=32,epochs=50,validation_data=(X_test,y_test),callbacks=[checkpoint])模型會(huì)被保存在models_only_problem文件夾下,第一個(gè)數(shù)值表示的就是預(yù)測(cè)的精度,越小越好,后面跟的就是那些超參數(shù)的值,只要找到前面數(shù)值最小的那些超參數(shù)的值,固定這些超參數(shù)即可,我的mape數(shù)值最小是1.66
模型預(yù)測(cè)及查看效果
將剛才訓(xùn)練好的最好的模型拿下來,加載模型,把需要預(yù)測(cè)的數(shù)據(jù)扔進(jìn)去
先看整體情況
from tensorflow.keras.models import load_model import matplotlib.pyplot as plt best_model = load_model('./models_only_problem/1.66_28_men_5_lstm_3_dense_2_unit_64')pre = best_model.predict(X) print(len(pre)) plt.plot(y,color='red',label='price') plt.plot(pre,color='green',label='predict') plt.show()選取特定的一小段查看
x_time1 = y[200:300] pre_time1 = pre[200:300] plt.plot(x_time1,color='red',label='price') plt.plot(pre_time1,color='green',label='predict') plt.legend() plt.show()結(jié)果分析
從總體上看,LSTM的擬合效果還是不錯(cuò)的,可是在一個(gè)較短的時(shí)間內(nèi),預(yù)測(cè)數(shù)據(jù)竟然滯后于實(shí)際價(jià)格,這有可能帶來投資的偏差,一種可行的辦法就是,在數(shù)據(jù)中加入更多的屬性,數(shù)據(jù)的質(zhì)量決定了最終神經(jīng)網(wǎng)絡(luò)預(yù)測(cè)的質(zhì)量
總結(jié)
以上是生活随笔為你收集整理的tushare实战LSTM实现黄金价格预测的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 汽车价格预测回归分析模型
- 下一篇: 机器学习项目实战(五) 住房价格预测