日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 >

MLAT-Autoencoders---下篇-关键代码及结果展示(3)(终)

發布時間:2023/12/14 37 豆豆
生活随笔 收集整理的這篇文章主要介紹了 MLAT-Autoencoders---下篇-关键代码及结果展示(3)(终) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

用于回報預測和交易的條件自動編碼器

本節主要介紹自編碼器在資產定價中的應用。

應用步驟分為:
第一步:創建包含股價和元數據信息的新數據集
第二步:計算預測資產特征
第三步:創建和訓練條件式自動編碼器架構
第四步:評估結果

part1

數據

from pathlib import Pathimport numpy as np import pandas as pdfrom statsmodels.regression.rolling import RollingOLS import statsmodels.api as sm import matplotlib.pyplot as plt import seaborn as snsidx = pd.IndexSlice sns.set_style('whitegrid')results_path = Path('results', 'asset_pricing') if not results_path.exists():results_path.mkdir(parents=True)

價格數據

prices = pd.read_hdf(results_path / 'data.h5', 'stocks/prices/adjusted')prices.info(show_counts=True)

out:

元數據

metadata = pd.read_hdf(results_path / 'data.h5', 'stocks/info').rename(columns=str.lower)metadata.info()

out:

用元數據選股

sectors = (metadata.sector.value_counts() > 50).indextickers_with_errors = ['FTAI', 'AIRT', 'CYBR', 'KTB'] tickers_with_metadata = metadata[metadata.sector.isin(sectors) & metadata.marketcap.notnull() &metadata.sharesoutstanding.notnull() & (metadata.sharesoutstanding > 0)].index.drop(tickers_with_errors)metadata = metadata.loc[tickers_with_metadata, ['sector', 'sharesoutstanding', 'marketcap']] metadata.index.name = 'ticker'prices = prices.loc[idx[tickers_with_metadata, :], :]prices.info(null_counts=True)

metadata.info()

close = prices.close.unstack('ticker').sort_index() close.info()

volume = prices.volume.unstack('ticker').sort_index() volume.info()


創建周回報

returns = (prices.close.unstack('ticker').resample('W-FRI').last().sort_index().pct_change().iloc[1:]) returns.info()

dates = returns.index sns.distplot(returns.count(1), kde=False);

with pd.HDFStore(results_path / 'autoencoder.h5') as store:store.put('close', close)store.put('volume', volume)store.put('returns', returns)store.put('metadata', metadata)

因子

MONTH = 21

價格趨勢

短期
(一個月累計回報)

dates[:5]

mom1m = close.pct_change(periods=MONTH).resample('W-FRI').last().stack().to_frame('mom1m') mom1m.info()

mom1m.squeeze().to_hdf(results_path / 'autoencoder.h5', 'factor/mom1m')

股票走勢
(截止于月末前一個月的十一個月累積報表)

mom12m = (close.pct_change(periods=11 * MONTH).shift(MONTH).resample('W-FRI').last().stack().to_frame('mom12m'))

走勢變化

市場走勢

最大回報率

流動性指標和風險度量略。

part2

建模

導入各種包

import warnings warnings.filterwarnings('ignore')import sys, os from time import time from pathlib import Path from itertools import product from tqdm import tqdm import numpy as np import pandas as pdimport matplotlib.pyplot as plt import seaborn as snsimport tensorflow as tf from tensorflow.keras.layers import Input, Dense, Dot, Reshape, BatchNormalization from tensorflow.keras.models import Model from tensorflow.keras.callbacks import TensorBoardfrom sklearn.preprocessing import quantile_transformfrom scipy.stats import spearmanrsys.path.insert(1, os.path.join(sys.path[0], '..')) from utils import MultipleTimeSeriesCV, format_timeidx = pd.IndexSlice sns.set_style('whitegrid') np.random.seed(42)results_path = Path('results', 'asset_pricing') if not results_path.exists():results_path.mkdir(parents=True)characteristics = ['beta', 'betasq', 'chmom', 'dolvol', 'idiovol', 'ill', 'indmom','maxret', 'mom12m', 'mom1m', 'mom36m', 'mvel', 'retvol', 'turn', 'turn_std']

數據準備

with pd.HDFStore(results_path / 'autoencoder.h5') as store:print(store.info())


周回報率

data = (pd.read_hdf(results_path / 'autoencoder.h5', 'returns').stack(dropna=False).to_frame('returns').loc[idx['1993':, :], :])

with sns.axes_style("white"):fig, axes = plt.subplots(ncols=2, figsize=(14, 4))sns.distplot(nobs_by_date, kde=False, ax=axes[0])axes[0].set_title('# of Stocks per Week')axes[0].set_xlabel('# of Observations')sns.boxplot(x='Characteristic',y='# Observations',data=nobs_by_characteristic,ax=axes[1],palette='Blues')axes[1].set_xticklabels(axes[1].get_xticklabels(),rotation=25,ha='right')axes[1].set_title('# of Observation per Stock Characteristic')sns.despine()fig.tight_layout()


標準列的特點

data.loc[:, characteristics] = (data.loc[:, characteristics].groupby(level='date').apply(lambda x: pd.DataFrame(quantile_transform(x, copy=True, n_quantiles=x.shape[0]),columns=characteristics,index=x.index.get_level_values('ticker'))).mul(2).sub(1))

模型總結

def make_model(hidden_units=8, n_factors=3):input_beta = Input((n_tickers, n_characteristics), name='input_beta')input_factor = Input((n_tickers,), name='input_factor')hidden_layer = Dense(units=hidden_units, activation='relu', name='hidden_layer')(input_beta)batch_norm = BatchNormalization(name='batch_norm')(hidden_layer)output_beta = Dense(units=n_factors, name='output_beta')(batch_norm)output_factor = Dense(units=n_factors, name='output_factor')(input_factor)output = Dot(axes=(2,1), name='output_layer')([output_beta, output_factor])model = Model(inputs=[input_beta, input_factor], outputs=output)model.compile(loss='mse', optimizer='adam')return modelmodel.summary()

訓練模型

關鍵代碼展示(out太長了就不放了)

start = time() for units, n_factors in param_grid:scores = []model = make_model(hidden_units=units, n_factors=n_factors)for fold, (train_idx, val_idx) in enumerate(cv.split(data)):X1_train, X2_train, y_train, X1_val, X2_val, y_val = get_train_valid_data(data,train_idx,val_idx)for epoch in range(250):model.fit([X1_train, X2_train], y_train,batch_size=batch_size,validation_data=([X1_val, X2_val], y_val),epochs=epoch + 1,initial_epoch=epoch,verbose=0, shuffle=True)result = (pd.DataFrame({'y_pred': model.predict([X1_val,X2_val]).reshape(-1),'y_true': y_val.stack().values},index=y_val.stack().index).replace(-2, np.nan).dropna())r0 = spearmanr(result.y_true, result.y_pred)[0]r1 = result.groupby(level='date').apply(lambda x: spearmanr(x.y_pred,x.y_true)[0])scores.append([units, n_factors, fold, epoch, r0,r1.mean(), r1.std(), r1.median()])if epoch % 50 == 0:print(f'{format_time(time()-start)} | {n_factors} | {units:02} | {fold:02}-{epoch:03} | {r0:6.2%} | 'f'{r1.mean():6.2%} | {r1.median():6.2%}') scores = pd.DataFrame(scores, columns=cols)scores.to_hdf(results_path / 'scores.h5', f'{units}/{n_factors}')

評估

scores = [] with pd.HDFStore(results_path / 'scores.h5') as store:for key in store.keys():scores.append(store[key]) scores = pd.concat(scores) scores.info() avg = (scores.groupby(['n_factors', 'units', 'epoch'])['ic_mean', 'ic_daily_mean', 'ic_daily_median'].mean().reset_index()) avg.nlargest(n=20, columns=['ic_daily_median'])

fig, axes = plt.subplots(ncols=5, nrows=2, figsize=(20, 8), sharey='row', sharex=True)for n in range(2, 7):df = avg[avg.n_factors==n].pivot(index='epoch', columns='units', values='ic_mean')df.rolling(10).mean().loc[:200].plot(ax=axes[0][n-2], lw=1, title=f'{n} Factors')axes[0][n-2].axhline(0, ls='--', c='k', lw=1)axes[0][n-2].get_legend().remove()axes[0][n-2].set_ylabel('IC (10-epoch rolling mean)')df = avg[avg.n_factors==n].pivot(index='epoch', columns='units', values='ic_daily_median')df.rolling(10).mean().loc[:200].plot(ax=axes[1][n-2], lw=1)axes[1][n-2].axhline(0, ls='--', c='k', lw=1)axes[1][n-2].get_legend().remove()axes[1][n-2].set_ylabel('IC, Daily Median (10-epoch rolling mean)')handles, labels = axes[0][0].get_legend_handles_labels() fig.legend(handles, labels, loc='center right', title='# Units') fig.suptitle('Cross-Validation Performance (2015-2019)', fontsize=16) fig.tight_layout() fig.subplots_adjust(top=.9) fig.savefig(results_path / 'cv_performance', dpi=300);

part3

生成預測

對一系列似乎能提供良好預測的epochs進行平均。

n_factors = 4 units = 32 batch_size = 32 first_epoch = 50 last_epoch = 80predictions = [] for epoch in tqdm(list(range(first_epoch, last_epoch))):epoch_preds = []for fold, (train_idx, val_idx) in enumerate(cv.split(data)):X1_train, X2_train, y_train, X1_val, X2_val, y_val = get_train_valid_data(data,train_idx,val_idx)model = make_model(n_factors=n_factors, hidden_units=units)model.fit([X1_train, X2_train], y_train,batch_size=batch_size,epochs=epoch,verbose=0,shuffle=True)epoch_preds.append(pd.Series(model.predict([X1_val, X2_val]).reshape(-1),index=y_val.stack().index).to_frame(epoch))predictions.append(pd.concat(epoch_preds))predictions_combined = pd.concat(predictions, axis=1).sort_index()predictions_combined.info()predictions_combined.to_hdf(results_path / 'predictions.h5', 'predictions')


總結

以上是生活随笔為你收集整理的MLAT-Autoencoders---下篇-关键代码及结果展示(3)(终)的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。