當(dāng)前位置：首頁(yè) > 人工智能 > pytorch >内容正文

pytorch

DeepCTR-Torch，基于深度学习的CTR预测算法库

發(fā)布時(shí)間：2024/1/23 pytorch 36 豆豆

生活随笔收集整理的這篇文章主要介紹了 DeepCTR-Torch，基于深度学习的CTR预测算法库小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

點(diǎn)擊率預(yù)估問題

?

點(diǎn)擊率預(yù)估問題通常形式化描述為給定用戶，物料，上下文的情況下，計(jì)算用戶點(diǎn)擊物料的概率即：pCTR = p(click=1|user,item,context)。

簡(jiǎn)單來說，在廣告業(yè)務(wù)中使用pCTR來計(jì)算廣告的預(yù)期收益，在推薦業(yè)務(wù)中通過使用pCTR來確定候選物料的一個(gè)排序列表。

DeepCTR-Torch

人們通過構(gòu)造有效的組合特征和使用復(fù)雜的模型來學(xué)習(xí)數(shù)據(jù)中的模式來提升效果?；谝蜃臃纸鈾C(jī)的方法，可以通過向量乘積的形式學(xué)習(xí)特征的交互，并且泛化到那些沒有出現(xiàn)過的組合上。

隨著深度神經(jīng)網(wǎng)絡(luò)在若干領(lǐng)域的巨大發(fā)展，近年來研究者也提出了若干基于深度學(xué)習(xí)的分解模型來同時(shí)學(xué)習(xí)低階和高階的特征交互，如：

PNN,Wide&Deep,DeepFM,Attentional FM,Neural FM,DCN,xDeepFM,AutoInt,FiBiNET

以及基于用戶歷史行為序列建模的DIN,DIEN,DSIN等。

對(duì)于剛接觸這方面的同學(xué)來說，可能對(duì)這些方法的細(xì)節(jié)還不太了解，雖然網(wǎng)上有很多介紹，但是代碼卻沒有統(tǒng)一的形式，且當(dāng)想要遷移到自己的數(shù)據(jù)集進(jìn)行實(shí)驗(yàn)時(shí)也很不方便。本文介紹的一個(gè)使用PyTorch實(shí)現(xiàn)的基于深度學(xué)習(xí)的CTR模型包DeepCTR-PyTorch，無論是使用還是學(xué)習(xí)都很方便。

DeepCTR-PyTorch是一個(gè)簡(jiǎn)潔易用、模塊化和可擴(kuò)展的基于深度學(xué)習(xí)的CTR模型包。除了近年來主流模型外，還包括許多可用于輕松構(gòu)建您自己的自定義模型的核心組件層。

您簡(jiǎn)單的通過model.fit()和model.predict()來使用這些復(fù)雜的模型執(zhí)行訓(xùn)練和預(yù)測(cè)任務(wù)，以及在通過模型初始化列表的device參數(shù)來指定運(yùn)行在cpu還是gpu上。

安裝與使用

安裝

pip?install?-U deepctr-torch

使用例子

下面用一個(gè)簡(jiǎn)單的例子告訴大家，如何快速的應(yīng)用一個(gè)基于深度學(xué)習(xí)的CTR模型,代碼地址在：

https://github.com/shenweichen/DeepCTR-Torch/blob/master/examples/run_classification_criteo.py。

The Criteo Display Ads dataset 是kaggle上的一個(gè)CTR預(yù)估競(jìng)賽數(shù)據(jù)集。里面包含13個(gè)數(shù)值特征I1-I13和26個(gè)類別特征C1-C26。

# -*- coding: utf-8 -*- # 使用pandas 讀取上面介紹的數(shù)據(jù)，并進(jìn)行簡(jiǎn)單的缺失值填充 import?pandas?as?pd from?sklearn.metrics?import?log_loss, roc_auc_score from?sklearn.model_selection?import?train_test_split from?sklearn.preprocessing?import?LabelEncoder, MinMaxScaler from?deepctr_torch.models?import?* from?deepctr_torch.inputs?import?SparseFeat, DenseFeat, get_fixlen_feature_names import?torch# 使用pandas 讀取上面介紹的數(shù)據(jù)，并進(jìn)行簡(jiǎn)單的缺失值填充 data = pd.read_csv('./criteo_sample.txt') # 上面的數(shù)據(jù)在：https://github.com/shenweichen/DeepCTR-Torch/blob/master/examples/criteo_sample.txtsparse_features = ['C'?+ str(i)?for?i?in?range(1,?27)] dense_features = ['I'?+ str(i)?for?i?in?range(1,?14)]data[sparse_features] = data[sparse_features].fillna('-1', ) data[dense_features] = data[dense_features].fillna(0, ) target = ['label']#這里我們需要對(duì)特征進(jìn)行一些預(yù)處理，對(duì)于類別特征，我們使用LabelEncoder重新編碼(或者哈希編碼)，對(duì)于數(shù)值特征使用MinMaxScaler壓縮到0~1之間。for?feat?in?sparse_features:lbe = LabelEncoder()data[feat] = lbe.fit_transform(data[feat]) mms = MinMaxScaler(feature_range=(0,?1)) data[dense_features] = mms.fit_transform(data[dense_features])# 這里是比較關(guān)鍵的一步，因?yàn)槲覀冃枰獙?duì)類別特征進(jìn)行Embedding，所以需要告訴模型每一個(gè)特征組有多少個(gè)embbedding向量，我們通過pandas的nunique()方法統(tǒng)計(jì)。fixlen_feature_columns = [SparseFeat(feat, data[feat].nunique())for?feat?in?sparse_features] + [DenseFeat(feat,?1,)for?feat?in?dense_features]dnn_feature_columns = fixlen_feature_columns linear_feature_columns = fixlen_feature_columnsfixlen_feature_names = get_fixlen_feature_names(linear_feature_columns + dnn_feature_columns)#最后，我們按照上一步生成的特征列拼接數(shù)據(jù)train, test = train_test_split(data, test_size=0.2) train_model_input = [train[name]?for?name?in?fixlen_feature_names] test_model_input = [test[name]?for?name?in?fixlen_feature_names]# 檢查是否可以使用gpudevice =?'cpu' use_cuda =?True if?use_cuda?and?torch.cuda.is_available():print('cuda ready...')device =?'cuda:0'# 初始化模型，進(jìn)行訓(xùn)練和預(yù)測(cè)model = DeepFM(linear_feature_columns=linear_feature_columns, dnn_feature_columns=dnn_feature_columns, task='binary',l2_reg_embedding=1e-5, device=device)model.compile("adagrad",?"binary_crossentropy",metrics=["binary_crossentropy",?"auc"],) model.fit(train_model_input, train[target].values,batch_size=256, epochs=10, validation_split=0.2, verbose=2)pred_ans = model.predict(test_model_input,?256) print("") print("test LogLoss", round(log_loss(test[target].values, pred_ans),?4)) print("test AUC", round(roc_auc_score(test[target].values, pred_ans),?4))

相關(guān)資料

?

DeepCTR-Torch代碼主頁(yè)

https://github.com/shenweichen/DeepCTR-Torch

DeepCTR-Torch文檔：

https://deepctr-torch.readthedocs.io/en/latest/index.html
DeepCTR(tensorflow版)代碼主頁(yè) ：

https://github.com/shenweichen/DeepCTR
DeepCTR(tensorflow版)文檔：

https://deepctr-doc.readthedocs.io/en/latest/index.html

總結(jié)

以上是生活随笔為你收集整理的DeepCTR-Torch，基于深度学习的CTR预测算法库的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：复胃散片_功效作用注意事项用药禁忌用法用
下一篇： Java的ClassLoader