日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

人口预测和阻尼-增长模型_使用分类模型预测利率-第3部分

發布時間:2023/12/15 编程问答 30 豆豆
生活随笔 收集整理的這篇文章主要介紹了 人口预测和阻尼-增长模型_使用分类模型预测利率-第3部分 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

人口預測和阻尼-增長模型

This is the final article of the series “ Predicting Interest Rate with Classification Models”. Here are the links if you didn't read the First or the Second articles of the series where I explain the challenge I had when started at M2X Investments. As I mentioned before, I will try my best to make this article understandable per se. I will skip the explanation of assumptions regarding the data for “article-length” reasons. Nevertheless, you can check them in previous posts of the series. Let’s do it!

這是“使用分類模型預測利率”系列的最后一篇文章。 如果您沒有閱讀該系列的第一篇或第二篇文章,這些鏈接將為您解釋我在M2X Investments創業時遇到的挑戰。 如前所述,我將盡力使本文本身易于理解。 由于“文章長度”的原因,我將跳過有關數據的假設的解釋。 不過,您可以在本系列的先前文章中進行檢查。 我們開始做吧!

快速回顧 (Fast Recap)

In previous articles, I applied a couple of classification models to the problem of predicting up movements of the Fed Fund Effective Rate. In short, it is a binary classification problem where 1 represents up movement and 0, neutral or negative movement. The models applied were Logistic Regression, Naive Bayes, and Random Forest. Random Forest was the one that yielded the best results so far, without hyperparameter optimization, with an F1-score of 0.76.

在先前的文章中,我將幾個分類模型應用于預測聯邦基金有效利率上升趨勢的問題。 簡而言之,這是一個二進制分類問題,其中1代表上移而0代表中立或負向運動。 應用的模型是Logistic回歸,樸素貝葉斯和隨機森林。 到目前為止,Random Forest是在沒有超參數優化的情況下獲得最佳結果的F1分數為0.76。

If you are curious to know more about the data, please refer to Part 1 or Part 2 of the series. I omitted the explanation about them in this article for practical purposes only.

如果您想了解更多有關數據的信息,請參閱 本系列的 第1 部分 或 第2部分 。 我僅出于實際目的省略了對它們的解釋。

Catboost和支持向量機簡介 (A brief introduction to Catboost and Support Vector Machines)

Catboost (Catboost)

Catboost is an open-source library for gradient boosting on decision trees. Ok, so what is Gradient Boosting?

Catboost是用于在決策樹上進行梯度增強的開源庫。 好的,什么是梯度提升

Gradient boosting is a machine learning algorithm that can be used for classification and regression problems. It usually gives great results when tackling heterogeneous data and small data sets problems. But what in essence is this algorithm? Let's start by defining Boosting.

梯度提升是一種機器學習算法,可用于分類和回歸問題。 在處理異構數據和小數據集問題時,通常會產生很好的結果。 但是這個算法本質上是什么? 讓我們從定義Boosting開始。

Boosting is an ensemble technique that tries to transform weak learners in strong learners by sequentially training them with the objective of making them better than their predecessors. The sequentially part means that each learner (or usually a tree) is made by taking the previous tree error into account (in the case of the AdaBoost algorithm the trees are called stumps).

提拔是一種合奏技術,它通過依次培訓他們以使其比其前任更好為目標,嘗試將弱者轉變為強者。 順序部分意味著每個學習者(或通常是一棵樹)都是通過考慮先前的樹錯誤而制成的(在AdaBoost算法的情況下,樹稱為樹樁)。

As an example, imagine that we train a tree and give each observation equal weights. Next, we evaluate the tree and get its errors. Then, for the next tree, we increase the weight of the observations that were incorrectly classified by the first one and lower the weights of the ones correctly classified. This is basically we saying that the next tree should give more importance to that mistakenly classified observation and classify it correctly. Thus, it goes until we stop and get the final votes of our trees.

例如,假設我們訓練一棵樹,并給每個觀察值相等的權重。 接下來,我們評估樹并得到其錯誤。 然后,對于下一棵樹,我們增加第一個分類錯誤的觀測值的權重,并降低正確分類的觀測值的權重。 基本上,這是我們在說,下一棵樹應更加重視錯誤分類的觀察并將其正確分類。 因此,直到我們停下來并獲得樹木的最終票數為止。

Let's go back to the Gradient Boosting now. With the concept of Boosting in mind, we can think of Gradient Boosting as an algorithm that takes the same process described above. The difference is that now we will define a loss function to be optimized (minimized). This means that, after calculating the loss, the next tree that we create, will have to reduce the loss (follow the gradient by reducing the residual loss).

現在讓我們回到“ 梯度增強” 。 考慮到Boosting的概念,我們可以將Gradient Boosting視為采用上述相同過程的算法。 不同之處在于,現在我們將定義要優化(最小化)的損失函數。 這意味著,在計算了損失之后,我們創建的下一棵樹將必須減少損失(通過減少殘留損失來遵循梯度)。

What about Catboost?

Catboost呢?

Catboost is a gradient boost decision tree library. In their page, they say that it performs well with default parameters, and also that it has categorical features support, built-in model analysis tools, and presents high speed in training on CPU and GPU.

Catboost是梯度提升決策樹庫。 在他們的頁面中 ,他們說它在使用默認參數的情況下表現良好,并且具有分類功能支持,內置的模型分析工具,并且在CPU和GPU方面提供了很高的培訓速度。

支持向量機— SVC (Support Vector Machines— SVC)

SVMs are supervised learning algorithms used for classification, regression, and outlier detection. We will use Support Vector Classifiers (SVC) to find a hyperplane in n-dimensional space that accurately classifies the data. This hyperplane will have the maximum distance between the data points of the different classes — this distance is called maximum margin. Let’s take a two-dimensional space as an example. The hyperplane will be a line dividing the space into two parts with maximum distance between the classification labels.

SVM是用于分類,回歸和離群值檢測的監督學習算法。 我們將使用支持向量分類器(SVC) 在n維空間中找到可對數據進行準確分類的超平面 。 該超平面將在不同類別的數據點之間具有最大距離-此距離稱為最大余量 。 讓我們以二維空間為例。 超平面將是一條將空間分為兩部分的線,分類標簽之間的距離最大。

Image by LAMFO圖片由LAMFO

The data points closer to the line separating the space are called Support Vectors and will dictate the hyperplane margin. So we start with our data in a low dimension and if we cant classify it in that dimension, we move to a higher dimension to find a Support Vector Classifier that will best divide our data into two groups. And so on…To transform the plane that the data relies on and find our Support Vector Classifier, we use a function called Kernel.

靠近分隔線的數據點稱為支持向量 ,將決定超平面的余量。 因此,我們從低維度的數據開始,如果無法在該維度上對其進行分類,那么我們將移至更高的維度以找到一種支持向量分類器 ,該分類器可以將我們的數據最好地分為兩組。 依此類推……為了轉換數據所依賴的平面并找到我們的支持向量分類器,我們使用了一個稱為Kernel的函數。

Ther Kernel Function can have different shapes. As an example, it can be a polynomial kernel or a radial kernel. It is important to notice that, for the sake of the computational cost, kernel functions calculate data relationships as if they were in a higher dimension. However, in reality, they are not transformed into that dimension. This trick is called The Kernel Trick.

內核功能可以具有不同的形狀。 例如,它可以是多項式核或徑向核。 重要的是要注意到,出于計算成本的考慮,內核函數計算數據關系就好像它們在更高維度中一樣。 但是,實際上,它們并沒有轉變為那個維度。 此技巧稱為 “內核技巧”

代碼 (The code)

Catboost (Catboost)

Starting with Catboost. In previous articles, we talked about the data and the assumptions we made to binarize it and deal with NaNs. So will skip the explanation of this part and focus on the model's results and their application.

從Catboost開始。 在先前的文章中,我們討論了數據以及為將其二值化并處理NaN而做出的假設。 因此,將跳過這一部分的說明,而將重點放在模型的結果及其應用上。

import numpy as np
import pandas as pd
import quandl as qdl
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="white")
from imblearn.over_sampling import ADASYN
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import statsmodels.api as sm
from catboost import CatBoostClassifier
from sklearn import metrics# get data from Quandl
data = pd.DataFrame()
meta_data = ['RICIA','RICIM','RICIE']
for code in meta_data:
df=qdl.get('RICI/'+code,start_date="2005-01-03", end_date="2020-07-01")
df.columns = [code]
data = pd.concat([data, df], axis=1)meta_data = ['EMHYY','AAAEY','USEY']
for code in meta_data:
df=qdl.get('ML/'+code,start_date="2005-01-03", end_date="2020-07-01")
df.columns = [code]
data = pd.concat([data, df], axis=1)# dealing with possible empty values (not much attention to this part, but it is very important)
data.fillna(data.mean(), inplace=True)
print(data.head())
print("\nData shape:\n",data.shape)#histograms
data.hist()
plt.show()# scaling values to maked them vary between 0 and 1
scaler = MinMaxScaler()
data_scaled = pd.DataFrame(scaler.fit_transform(data.values), columns=data.columns, index=data.index)# pulling dependent variable from Quandl (par yield curve)
par_yield = qdl.get('FED/RIFSPFF_N_D',start_date="2005-01-03", end_date="2020-07-01")
par_yield.columns = ['FED/RIFSPFF_N_D']# create an empty df with same index as variables and fill it with our independent var values
par_data = pd.DataFrame(index=data_scaled.index, columns=['FED/RIFSPFF_N_D'])
par_data.update(par_yield['FED/RIFSPFF_N_D'])# get the variation and binarize it
par_data=par_data.pct_change()
par_data.fillna(0, inplace=True)
par_data = par_data.apply(lambda x: [0 if y <= 0 else 1 for y in x])
print("Number of 0 and 1s:\n",par_data.value_counts())# plot number of 0 and 1s
sns.countplot(x='FED/RIFSPFF_N_D', data=par_data, palette='Blues')
plt.title('0s and 1s')
plt.savefig('0s and 1s')# Over-sampling with ADASYN method
sampler = ADASYN(random_state=13)
X_os, y_os = sampler.fit_sample(data_scaled, par_data.values.ravel())
columns = data_scaled.columns
data_scaled = pd.DataFrame(data=X_os,columns=columns )
par_data= pd.DataFrame(data=y_os,columns=['FED/RIFSPFF_N_D'])print("\nProportion of 0s in oversampled data: ",len(par_data[par_data['FED/RIFSPFF_N_D']==0])/len(data_scaled))
print("\nProportion 1s in oversampled data: ",len(par_data[par_data['FED/RIFSPFF_N_D']==1])/len(data_scaled))

After adjusting the proportions of 0s and 1s in our label set, we split the data in training and test sets and create the model.

調整標簽集中0和1s的比例后,我們將數據分為訓練集和測試集,然后創建模型。

# split data into test and train set
X_train, X_test, y_train, y_test = train_test_split(data_scaled, par_data, test_size=0.2, random_state=13)# just make it easier to write y
y = y_train['FED/RIFSPFF_N_D']# Catboost model
clf=CatBoostClassifier(iterations=None, learning_rate=None, depth=Non, l2_leaf_reg=None, model_size_reg=None, rsm=None, loss_function=None, border_count=None, feature_border_type=None, per_float_feature_quantization=None, input_borders=None, output_borders=None, fold_permutation_block=None, od_pval=None,od_wait=None, od_type=None,nan_mode=None, counter_calc_method=None, leaf_estimation_iterations=None, leaf_estimation_method=None, thread_count=None, random_seed=None, use_best_model=None, verbose=None, logging_level=None, metric_period=None, ctr_leaf_count_limit=None, store_all_simple_ctr=None, max_ctr_complexity=None, has_time=None, allow_const_label=None, classes_count=None, class_weights=None, one_hot_max_size=None, random_strength=None, name=None, ignored_features=None, train_dir=None, custom_loss=None, custom_metric=None, eval_metric=None, bagging_temperature=None, save_snapshot=None, snapshot_file=None, snapshot_interval=None, fold_len_multiplier=None, used_ram_limit=None, gpu_ram_part=None, allow_writing_files=None, final_ctr_computation_mode=None, approx_on_full_history=None, boosting_type=None, simple_ctr=None, combinations_ctr=None, per_feature_ctr=None, task_type=None, device_config=None, devices=None, bootstrap_type=None, subsample=None, sampling_unit=None, dev_score_calc_obj_block_size=None, max_depth=None, n_estimators=None, num_boost_round=None, num_trees=None, colsample_bylevel=None, random_state=None, reg_lambda=None, objective=None, eta=None, max_bin=None, scale_pos_weight=None, gpu_cat_features_storage=None, data_partition=None, metadata=None, early_stopping_rounds=None, cat_features=None, grow_policy=None, min_data_in_leaf=None, min_child_samples=None, max_leaves=None, num_leaves=None, score_function=None, leaf_estimation_backtracking=None, ctr_history_unit=None, monotone_constraints=None, feature_weights=None, penalties_coefficient=None, first_feature_use_penalties=None, model_shrink_rate=None, model_shrink_mode=None, langevin=None, diffusion_temperature=None, boost_from_average=None, text_features=None, tokenizers=None, dictionaries=None, feature_calcers=None, text_processing=None)

As you can see, I made sure to put all parameters that the model accepts. Yes…a lot! But don’t worry, it is not the time to optimize them; it is time to get a glimpse of the model’s performance. So we are not going to change any of them (all of them will be None).

如您所見,我確保放置了模型接受的所有參數。 是的很多! 但是不用擔心,現在不是優化它們的時候了。 現在該瞥一眼模型的性能了。 因此,我們不會更改其中的任何一個(所有這些都將為None)。

clf.fit(X_train, y)
y_pred = clf.predict(X_test)
print('\nAccuracy of Catboost Classifier on test set: {:.2f}'.format(clf.score(X_test, y_test)))# confusion matrix
confusion_matrix = metrics.confusion_matrix(y_test, y_pred)
print('\nConfusion matrix:\n',confusion_matrix)
print('\nClassification report:\n',metrics.classification_report(y_test, y_pred))# plot confusion matrix
disp = metrics.plot_confusion_matrix(clf, X_test, y_test,cmap=plt.cm.Blues)
disp.ax_.set_title('Confusion Matrix')
plt.savefig('Confusion Matrix')Classification report | Image by Author分類報告| 圖片作者 Catboost ROC curve | Image by AuthorCatboost ROC曲線| 圖片作者

The results show an F1-score of 0.72, close to the Random Forest model results. It seems that this model will enter our “Potential Good Models” list for further investigation and hyperparameter optimization! Let’s see what the SVC model tells us!

結果顯示F1分數為0.72,接近隨機森林模型的結果。 似乎該模型將進入我們的“潛在良好模型”列表,以進行進一步的研究和超參數優化! 讓我們看看SVC模型告訴我們什么!

支持向量分類器 (Support Vector Classifier)

The first part of the code until the oversampling is pretty much the same posted above. So we will dive into the model code.

直到過度采樣為止的代碼的第一部分與上面發布的內容幾乎相同。 因此,我們將深入研究模型代碼。

# split data into test and train set
X_train, X_test, y_train, y_test = train_test_split(data_scaled, par_data, test_size=0.2, random_state=13)# just make it easier to write y
y = y_train['FED/RIFSPFF_N_D']# Support Vector Classifier model
clf=SVC(C=1.0, kernel='rbf', degree=3, gamma='scale', coef0=0.0, shrinking=True, probability=True, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape='ovr', break_ties=False, random_state=13)# fit model
clf.fit(X_train, y)# predict
y_pred = clf.predict(X_test)
print('\nAccuracy of SVC classifier on test set: {:.2f}'.format(clf.score(X_test, y_test)))SVC accuracy | Image by AuthorSVC精度| 圖片作者

The accuracy of the SVC is 0.65. Let’s see what the classification report shows us.

SVC的精度為0.65。 讓我們看看分類報告向我們展示了什么。

# confusion matrix
confusion_matrix = metrics.confusion_matrix(y_test, y_pred)
print('\nConfusion matrix:\n',confusion_matrix)
print('\nClassification report:\n',metrics.classification_report(y_test, y_pred))# plot confusion matrix
disp = metrics.plot_confusion_matrix(clf, X_test, y_test,cmap=plt.cm.Blues)
disp.ax_.set_title('Confusion Matrix')
plt.savefig('Confusion Matrix')# roc curve
logit_roc_auc = metrics.roc_auc_score(y_test, clf.predict(X_test))
fpr, tpr, thresholds = metrics.roc_curve(y_test, clf.predict_proba(X_test)[:,1])
plt.figure()
plt.plot(fpr, tpr, label='SVC (area = %0.2f)' % logit_roc_auc)
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC curve - Support Vector Classifier')
plt.legend(loc="lower right")
plt.savefig('SVC_ROC')SVC Confusion Matrix | Image by AuthorSVC混淆矩陣| 圖片作者 Classification report | Image by Author分類報告| 圖片作者 SVCs ROC curve | Image by AuthorSVC的ROC曲線| 圖片作者

Ok, so it turns out that our F1-score with the SVC model is 0.65. As we saw earlier, the CatBoost model performed better as well as the Random Forest. So we will stick with those two in our “Potential Good Models” list for the hyperparameters optimization step. With the results of the five models at hand, we ended up with two promising models. The next step would be to optimize the two models to see which performs best, but that will be another series on how to optimize and compare models.

好的,事實證明,帶有SVC模型的F1得分是0.65。 正如我們前面所看到的,CatBoost模型的性能優于隨機森林。 因此,我們將在“潛在良好模型”列表中堅持使用這兩個參數進行超參數優化。 有了這五個模型的結果,我們最終得到了兩個有希望的模型。 下一步將是優化兩個模型,以查看哪種模型效果最佳,但這將是有關如何優化和比較模型的另一系列文章。

This article was written in conjunction with Guilherme Bezerra Pujades Magalh?es.

本文與 Guilherme Bezerra PujadesMagalh?es 一起撰寫 。

參考和重要鏈接 (References and great links)

[1] Laboratory of Machine Learning in Finance and Organizations. LAMFO.

[1] 金融與組織機器學習實驗室。 拉姆福。

[2] J. Starmer, StatQuest with Josh Starmer on Support Vector Machines, YouTube.

[2] J. Starmer, StatQuest和Josh Starmer在支持向量機上 ,YouTube。

[3] Catboost Documentation.

[3] Catboost文檔 。

[4] M2X Investments.

[4] M2X投資。

翻譯自: https://towardsdatascience.com/predicting-interest-rate-with-classification-models-part-3-3eef38dd7b32

人口預測和阻尼-增長模型

總結

以上是生活随笔為你收集整理的人口预测和阻尼-增长模型_使用分类模型预测利率-第3部分的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。

主站蜘蛛池模板: 欧美一区二区三区观看 | 调教丰满的已婚少妇在线观看 | 亚洲av片在线观看 | 久久奇米 | 欧美乱操| 少妇av网| 欧美放荡办公室videos4k | 在线观看日韩av | 91嫩草精品 | 两女双腿交缠激烈磨豆腐 | 日韩精品极品视频免费观看 | 天堂网在线视频 | 色999视频 | 色在线影院 | 粉嫩一区二区三区 | 国产精品久草 | 国产精品嫩草69影院 | 日本不卡在线播放 | 久久久精品视频网站 | 国产精品水嫩水嫩 | 国产激情av一区二区三区 | 久久久久久久久久久久久久久久久 | 韩国av在线免费观看 | 精品久久久99 | 男女免费观看视频 | 欧美熟妇另类久久久久久多毛 | 欧美日韩亚洲国产综合 | 日韩一级片网站 | 久久狠狠高潮亚洲精品 | 日韩性网站 | 小小姑娘电影大全免费播放 | 中文字幕在线视频第一页 | 免费网站观看www在线观看 | 久久视频黄色 | 天堂国产一区二区三区 | 国产a自拍| 亚洲激情视频在线播放 | 伊人影院在线播放 | 精品一区二区三区电影 | 韩国伦理大片 | 香蕉视频黄色片 | 狠狠干影院 | 国产免费av观看 | 国产一线天粉嫩馒头极品av | 一本色道久久亚洲综合精品蜜桃 | 91视频三区 | 欧美黑人一区二区三区 | 成年人黄色小视频 | 午夜电影在线播放 | 亚洲最大成人在线 | 国产成人精品av在线观 | 涩涩涩涩涩涩涩涩涩涩 | 亚洲精品推荐 | 久久99国产精品 | 天天操夜夜操夜夜操 | 亚洲97色| 牛牛av国产一区二区 | 日韩三级国产精品 | 淫辱的世界(调教sm)by | 成人在线观看www | 国产成人精品一区二区三区无码熬 | 在线观看网页视频 | 精品少妇一区二区 | 91爱爱视频 | 日本一区二区视频免费 | 青青草免费看 | 天天干夜夜干 | 黄色片久久久久 | 在线观看欧美一区二区 | 激情综合六月 | 一本色道久久综合亚洲精品酒店 | 665566综合网 | 手机av在线免费 | 黄色大片网址 | 精品影片一区二区入口 | 成人片黄网站久久久免费 | 日本欧美久久久久免费播放网 | 日本视频黄 | 女警白嫩翘臀呻吟迎合 | 岛国av在线| 三级视频网 | 在线免费视频一区二区 | 13日本xxxxxⅹxxx20 | 好吊一区二区三区视频 | 亚洲精品日韩精品 | 国产a精品| 亚洲久久天堂 | 色综合久久88色综合天天6 | 人人爱国产 | 亚洲av无码国产在丝袜线观看 | 国产aⅴ无码片毛片一级一区2 | 国产乱叫456在线 | 亚洲精品无码一区二区 | 秘密基地在线观看完整版免费 | 青青草国产在线视频 | 色香蕉在线| 91蜜桃在线| 精品国产一区二区三区四区精华 | 91亚州|