當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【时间序列】使用 Auto-TS 自动化时间序列预测

發布時間：2025/3/12 编程问答 37 豆豆

生活随笔收集整理的這篇文章主要介紹了【时间序列】使用 Auto-TS 自动化时间序列预测小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Auto-TS 是 AutoML 的一部分，它將自動化機器學習管道的一些組件。這自動化庫有助于非專家訓練基本的機器學習模型，而無需在該領域有太多知識。在本文中，小編和你一起學習如何使用 Auto-TS 庫自動執行時間序列預測模型。

什么是自動 TS？

它是一個開源 Python 庫，主要用于自動化時間序列預測。它將使用一行代碼自動訓練多個時間序列模型，這將幫助我們為我們的問題陳述選擇最好的模型。

在 python 開源庫 Auto-TS 中，auto-ts.Auto_TimeSeries() 使用訓練數據調用的主要函數。然后我們可以選擇想要的模型類型，例如 stats、ml 或FB prophet-based models （基于 FB 先知的模型）。我們還可以調整參數，這些參數將根據我們希望它基于的評分參數自動選擇最佳模型。它將返回最佳模型和一個字典，其中包含提到的預測周期數的預測（默認值 = 2）。

Auto_timeseries 是用于時間序列數據的復雜模型構建實用程序。由于它自動化了復雜工作中涉及的許多任務，因此它假定了許多智能默認值。5.但是我們可以改變它們。Auto_Timeseries 將基于 Statsmodels ARIMA、Seasonal ARIMA 和 Scikit-Learn ML 快速構建預測模型。它將自動選擇給出指定最佳分數的最佳模型。

Auto_TimeSeries 能夠幫助我們使用 ARIMA、SARIMAX、VAR、可分解（趨勢+季節性+殘差）模型和集成機器學習模型等技術構建和選擇多個時間序列模型。

Auto-TS 庫的特點

它使用遺傳規劃優化找到最佳時間序列預測模型。
它訓練普通模型、統計模型、機器學習模型和深度學習模型，具有所有可能的超參數配置和交叉驗證。
它通過學習最佳 NaN 插補和異常值去除來執行數據轉換以處理雜亂的數據。
選擇用于模型選擇的指標組合。

安裝

pip?install?auto-ts??#?或 pip?install?git+git://github.com/AutoViML/Auto_TS

依賴包，如下依賴包需要提前安裝

dask scikit-learn FB?Prophet statsmodels pmdarima XGBoost

導入庫

from?auto_ts?import?auto_timeseries

巨坑警告

根據上述安裝步驟安裝成功后，很大概率會出現這樣的錯誤：

Running?setup.py?clean?for?fbprophet Failed?to?build?fbprophet Installing?collected?packages:?fbprophetRunning?setup.py?install?for?fbprophet?...?error......from?pystan?import?StanModel ModuleNotFoundError:?No?module?named?'pystan'

這個時候你會裝pystan:pip install pystan 。安裝完成后，還是會出現上述報錯。如果你也出現了如上情況，不要慌，云朵君已經幫你踩過坑了。

參考解決方案：(Mac/anaconda)

1. 安裝 Ephem：

conda?install?-c?anaconda?ephem

2. 安裝 Pystan：

conda?install?-c?conda-forge?pystan

3. 安裝 Fbprophet：

（這個會花費4小時+）

conda?install?-c?conda-forge?fbprophet

4. 最后安裝：

pip?install?prophet pip?install?fbprophet

5. 最后直到出現：

Successfully?installed?cmdstanpy-0.9.5?fbprophet-0.7.1?holidays-0.13

如果上述還不行，你先嘗試重啟anaconda，如果還不行，則需要先安裝：

conda?install?gcc

再上述步驟走一遍。

上述過程可能要花費1天時間！！

最后嘗試導入，成功！

from?auto_ts?import?auto_timeseriesImported?auto_timeseries?version:0.0.65.?Call?by?using: model?=?auto_timeseries(score_type='rmse',? time_interval='M',? non_seasonal_pdq=None,? seasonality=False,???????? seasonal_period=12,? model_type=['best'],? verbose=2,? dask_xgboost_flag=0) model.fit(traindata,? ts_column,target) model.predict(testdata,?model='best')

auto_timeseries 中可用的參數

model?=?auto_timeseries( score_type='rmse',? time_interval='Month', non_seasonal_pdq=None, seasonity=False, season_period=12,?? model_type=['Prophet'],verbose=2)

可以調整參數并分析模型性能的變化。有關參數的更多詳細信息參考auto-ts文檔^[1]。

使用的數據集

本文使用了從 Kaggle 下載的 2006 年 1 月至 2018 年 1 月的亞馬遜股票價格^[2]數據集。該庫僅提供訓練時間序列預測模型。數據集應該有一個時間或日期格式列。

最初，使用時間/日期列加載時間序列數據集：

df?=?pd.read_csv("Amazon_Stock_Price.csv",?usecols=['Date',?'Close']) df['Date']?=?pd.to_datetime(df['Date']) df?=?df.sort_values('Date')

現在，將整個數據拆分為訓練數據和測試數據：

train_df?=?df.iloc[:2800] test_df?=?df.iloc[2800:]

現在，我們將可視化拆分訓練測試：

train_df.Close.plot(figsize=(15,8),?title=?'AMZN?Stock?Price',?fontsize=14,?label='Train') test_df.Close.plot(figsize=(15,8),?title=?'AMZN?Stock?Price', fontsize=14,?label='Test')

現在，讓我們初始化 Auto-TS 模型對象，并擬合訓練數據：

model?=?auto_timeseries(forecast_period=219,?score_type='rmse',?time_interval='D',?model_type='best') model.fit(traindata=?train_df,ts_column="Date",target="Close")

現在讓我們比較不同模型的準確率：

model.get_leaderboard() model.plot_cv_scores()

得到如下結果：

Start of Fit.....

? ? Target variable given as = Close

Start of loading of data.....

? ? Inputs: ts_column = Date, sep = ,, target = ['Close']

? ? Using given input: pandas dataframe...

? ? Date column exists in given train data...

? ? train data shape = (2800, 1)

Alert: Could not detect strf_time_format of Date. Provide strf_time format during "setup" for better results.

Running Augmented Dickey-Fuller test with paramters:

? ? maxlag: 31 regression: c autolag: BIC

Data is stationary after one differencing

There is 1 differencing needed in this datasets for VAR model

No time series plot since verbose = 0. Continuing

Time Interval is given as D

? ? Correct Time interval given as a valid Pandas date-range frequency...

WARNING: Running best models will take time... Be Patient...

==================================================

Building Prophet Model

==================================================

Running Facebook Prophet Model...

? Starting Prophet Fit

? ? ? No seasonality assumed since seasonality flag is set to False

? Starting Prophet Cross Validation

Max. iterations using expanding window cross validation = 5

Fold Number: 1 --> Train Shape: 1705 Test Shape: 219

? ? RMSE = 30.01

? ? Std Deviation of actuals = 19.52

? ? Normalized RMSE (as pct of std dev) = 154%

Cross Validation window: 1 completed

Fold Number: 2 --> Train Shape: 1924 Test Shape: 219

? ? RMSE = 45.33

? ? Std Deviation of actuals = 34.21

? ? Normalized RMSE (as pct of std dev) = 132%

Cross Validation window: 2 completed

Fold Number: 3 --> Train Shape: 2143 Test Shape: 219

? ? RMSE = 65.61

? ? Std Deviation of actuals = 39.85

? ? Normalized RMSE (as pct of std dev) = 165%

Cross Validation window: 3 completed

Fold Number: 4 --> Train Shape: 2362 Test Shape: 219

? ? RMSE = 178.53

? ? Std Deviation of actuals = 75.28

? ? Normalized RMSE (as pct of std dev) = 237%

Cross Validation window: 4 completed

Fold Number: 5 --> Train Shape: 2581 Test Shape: 219

? ? RMSE = 148.18

? ? Std Deviation of actuals = 57.62

? ? Normalized RMSE (as pct of std dev) = 257%

Cross Validation window: 5 completed

-------------------------------------------

Model Cross Validation Results:

-------------------------------------------

? ? MAE (Mean Absolute Error = 85.20

? ? MSE (Mean Squared Error = 12218.34

? ? MAPE (Mean Absolute Percent Error) = 17%

? ? RMSE (Root Mean Squared Error) = 110.5366

? ? Normalized RMSE (MinMax) = 18%

? ? Normalized RMSE (as Std Dev of Actuals)= 60%

Time Taken = 13 seconds

? End of Prophet Fit

==================================================

Building Auto SARIMAX Model

==================================================

Running Auto SARIMAX Model...

? ? Using smaller parameters for larger dataset with greater than 1000 samples

SARIMAX RMSE (all folds): 73.9230

SARIMAX Norm RMSE (all folds): 35%

-------------------------------------------

Model Cross Validation Results:

-------------------------------------------

? ? MAE (Mean Absolute Error = 64.24

? ? MSE (Mean Squared Error = 7962.95

? ? MAPE (Mean Absolute Percent Error) = 12%

? ? RMSE (Root Mean Squared Error) = 89.2354

? ? Normalized RMSE (MinMax) = 14%

? ? Normalized RMSE (as Std Dev of Actuals)= 48%

? ? Using smaller parameters for larger dataset with greater than 1000 samples

Refitting data with previously found best parameters

? ? Best aic metric = 18805.2

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?SARIMAX Results? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??

==============================================================================

Dep. Variable:? ? ? ? ? ? ? ? ? Close? ?No. Observations:? ? ? ? ? ? ? ? ?2800

Model:? ? ? ? ? ? ? ?SARIMAX(2, 2, 0)? ?Log Likelihood? ? ? ? ? ? ? ?-9397.587

Date:? ? ? ? ? ? ? ? Mon, 28 Feb 2022? ?AIC? ? ? ? ? ? ? ? ? ? ? ? ? 18805.174

Time:? ? ? ? ? ? ? ? ? ? ? ? 19:45:31? ?BIC? ? ? ? ? ? ? ? ? ? ? ? ? 18834.854

Sample:? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0? ?HQIC? ? ? ? ? ? ? ? ? ? ? ? ?18815.888

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- 2800? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Covariance Type:? ? ? ? ? ? ? ? ? opg? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

==============================================================================

? ? ? ? ? ? ? ? ?coef? ? std err? ? ? ? ? z? ? ? P>|z|? ? ? [0.025? ? ? 0.975]

------------------------------------------------------------------------------

intercept? ? ?-0.0033? ? ? 0.557? ? ?-0.006? ? ? 0.995? ? ? -1.094? ? ? ?1.088

drift? ? ? ?3.618e-06? ? ? 0.000? ? ? 0.015? ? ? 0.988? ? ? -0.000? ? ? ?0.000

ar.L1? ? ? ? ?-0.6405? ? ? 0.008? ? -79.601? ? ? 0.000? ? ? -0.656? ? ? -0.625

ar.L2? ? ? ? ?-0.2996? ? ? 0.009? ? -32.618? ? ? 0.000? ? ? -0.318? ? ? -0.282

sigma2? ? ? ? 48.6323? ? ? 0.456? ? 106.589? ? ? 0.000? ? ? 47.738? ? ? 49.527

===================================================================================

Ljung-Box (L1) (Q):? ? ? ? ? ? ? ? ? 14.84? ?Jarque-Bera (JB):? ? ? ? ? ? ?28231.48

Prob(Q):? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0.00? ?Prob(JB):? ? ? ? ? ? ? ? ? ? ? ? ?0.00

Heteroskedasticity (H):? ? ? ? ? ? ? 19.43? ?Skew:? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0.56

Prob(H) (two-sided):? ? ? ? ? ? ? ? ? 0.00? ?Kurtosis:? ? ? ? ? ? ? ? ? ? ? ? 18.53

===================================================================================

Warnings:

[1] Covariance matrix calculated using the outer product of gradients (complex-step).

===============================================

Skipping VAR Model since dataset is > 1000 rows and it will take too long

===============================================

==================================================

Building ML Model

==================================================

Creating 2 lagged variables for Machine Learning model...

? ? You have set lag = 3 in auto_timeseries setup to feed prior targets. You cannot set lags > 10 ...

### Be careful setting dask_xgboost_flag to True since dask is unstable and doesn't work sometime's ###

########### Single-Label Regression Model Tuning and Training Started ####

Fitting ML model

? ? 11 variables used in training ML model = ['Close(t-1)', 'Date_hour', 'Date_minute', 'Date_dayofweek', 'Date_quarter', 'Date_month', 'Date_year', 'Date_dayofyear', 'Date_dayofmonth', 'Date_weekofyear', 'Date_weekend']

Running Cross Validation using XGBoost model..

? ? Max. iterations using expanding window cross validation = 2

train fold shape (2519, 11), test fold shape = (280, 11)

### Number of booster rounds = 250 for XGBoost which can be set during setup ####

? ? Hyper Param Tuning XGBoost with CPU parameters. This will take time. Please be patient...

Cross-validated Score = 31.896 in num rounds = 249

Time taken for Hyper Param tuning of XGBoost (in minutes) = 0.0

Top 10 features:

['Date_year', 'Close(t-1)', 'Date_quarter', 'Date_month', 'Date_weekofyear', 'Date_dayofyear', 'Date_dayofmonth', 'Date_dayofweek']

? ? Time taken for training XGBoost on entire train data (in minutes) = 0.0

Returning the following:

? ? Model = <xgboost.core.Booster object at 0x7feb8dd30070>

? ? Scaler = Pipeline(steps=[('columntransformer',

? ? ? ? ? ? ? ? ?ColumnTransformer(transformers=[('simpleimputer',

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? SimpleImputer(),

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ['Close(t-1)', 'Date_hour',

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'Date_minute',

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'Date_dayofweek',

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'Date_quarter', 'Date_month',

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'Date_year',

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'Date_dayofyear',

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'Date_dayofmonth',

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'Date_weekofyear',

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'Date_weekend'])])),

? ? ? ? ? ? ? ? ('maxabsscaler', MaxAbsScaler())])

? ? (3) sample predictions:[359.8374? 356.59747 355.447? ]

XGBoost model tuning completed

Target = Close...CV results:

? ? RMSE = 246.63

? ? Std Deviation of actuals = 94.60

? ? Normalized RMSE (as pct of std dev) = 261%

Fitting model on entire train set. Please be patient...

? ? Time taken to train model (in seconds) = 0

Best Model is: auto_SARIMAX

? ? Best Model (Mean CV) Score: 73.92

--------------------------------------------------

Total time taken: 52 seconds.

--------------------------------------------------

Leaderboard with best model on top of list:

? ? ? ? ? ? name? ? ? ? rmse

1? auto_SARIMAX? ?73.922971

0? ? ? ?Prophet? ?93.532440

2? ? ? ? ? ? ML? 246.630613

現在我們在測試數據上測試我們的模型：

future_predictions?=?model.predict(testdata=219) #?或? model.predict(testdata=test_df.Close)

使用預測周期=219作為auto_SARIMAX模型的輸入進行預測：

future_predictions

可視化看下future_predictions是什么樣子：

最后，可視化測試數據值和預測：

pred_df?=?pd.concat([test_df,future_predictions],axis=1) ax.plot('Date','Close','b',data=pred_df,label='Test') ax.plot('Date','yhat','r',data=pred_df,label='Predicitions')

auto_timeseries 中可用的參數：

model?=?auto_timeseries(?score_type='rmse',time_interval='Month',non_seasonal_pdq=None,?seasonity=False,??season_period=12,model_type=['Prophet'],verbose=2)

model.fit() 中可用的參數：

model.fit(traindata=train_data,ts_column=ts_column,target=target,cv=5,?sep=","?)

model.predict() 中可用的參數：

model?=?model.predict(testdata?=?'可以是數據框或代表預測周期的整數';??model?=?'best',?'或代表訓練模型的任何其他字符串')

可以使用所有這些參數并分析我們模型的性能，然后可以為我們的問題陳述選擇最合適的模型。可以查看auto-ts文檔^[3]詳細檢查所有這些參數。

寫在最后

在本文中，討論了如何在一行 Python 代碼中自動化時間序列模型。Auto-TS 對數據進行預處理，因為它從數據中刪除異常值并通過學習最佳 NaN 插補來處理混亂的數據。

通過初始化 Auto-TS 對象并擬合訓練數據，它將自動訓練多個時間序列模型，例如 ARIMA、SARIMAX、FB Prophet、VAR，并得出性能最佳的模型。模型的結果跟數據集的大小有一定的關系。如果我們嘗試增加數據集的大小，結果應該會有所改善。

參考資料

[1]

auto-ts文檔: https://pypi.org/project/auto-ts/

[2]

亞馬遜股票價格: https://www.kaggle.com/szrlee/stock-time-series-20050101-to-20171231?select=AMZN_2006-01-01_to_2018-01-01.csv

[3]

auto-ts文檔: https://pypi.org/project/auto-ts/

往期精彩回顧適合初學者入門人工智能的路線及資料下載(圖文+視頻)機器學習入門系列下載中國大學慕課《機器學習》（黃海廣主講）機器學習及深度學習筆記等資料打印《統計學習方法》的代碼復現專輯 AI基礎下載機器學習交流qq群955171419，加入微信群請掃碼：

總結

以上是生活随笔為你收集整理的【时间序列】使用 Auto-TS 自动化时间序列预测的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：【学术相关】翻倍！研究生招生规模持续扩张
下一篇：桂林电子科技大学计算机信息管理专业排名,