日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

【Python学习系列二十四】scikit-learn库逻辑回归实现唯品会用户购买行为预测

發布時間:2025/4/16 python 46 豆豆
生活随笔 收集整理的這篇文章主要介紹了 【Python学习系列二十四】scikit-learn库逻辑回归实现唯品会用户购买行为预测 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1、背景:http://www.datafountain.cn/#/competitions/260/intro

? ? ? ? ? ? ? ? ? DataFountain上的唯品會用戶購買行為預測比賽題目,筆者用邏輯回歸實現,分數是0.48比較弱,代碼這里參考。


2、通過比賽提取的特征如下:

? ?

特征類別特征名特征說明訓練說明
基本特征u_id用戶唯一標識
spu_id商品唯一標識
brand_id商品所屬的品牌標識
cat_id商品所屬的品類標識
人的特征u_buy_num購買次數
u_click_num點擊次數
u_buy_date購買天數
u_click_date點擊天數
u_num_ratio購買點擊次數比,購買次數/點擊次數
u_date_ratio購買點擊天數比,購買天數/點擊天數
u_buy_freq購買頻率,購買次數/90天
u_click_freq點擊頻率,購買次數/90天
商品的特征spu_buy_num購買次數
spu_click_num點擊次數
spu_buy_date購買天數
spu_click_date點擊天數
spu_num_ratio購買點擊次數比,購買次數/點擊次數
spu_date_ratio購買點擊天數比,購買天數/點擊天數
spu_buy_freq購買頻率,購買次數/90天
spu_click_freq點擊頻率,購買次數/90天
品牌的特征brand_buy_num購買次數
brand_click_num點擊次數
brand_buy_date購買天數
brand_click_date點擊天數
brand_num_ratio購買點擊次數比,購買次數/點擊次數
brand_date_ratio購買點擊天數比,購買天數/點擊天數
brand_buy_freq購買頻率,購買次數/90天
brand_click_freq點擊頻率,購買次數/90天
品類的特征cat_buy_num購買次數
cat_click_num點擊次數
cat_buy_date購買天數
cat_click_date點擊天數
cat_num_ratio購買點擊次數比,購買次數/點擊次數
cat_date_ratio購買點擊天數比,購買天數/點擊天數
cat_buy_freq購買頻率,購買次數/90天
cat_click_freq點擊頻率,購買次數/90天
標記action_type該用戶是否會在當日購買此商品(0否,1是)

3、邏輯回歸參考代碼如下:

# -*- coding: utf-8 -*-import pandas as pd import time from sklearn import preprocessing from sklearn.linear_model import LogisticRegression from sklearn import metrics def main():#第一步:加載訓練集和測試集#加載帶標記數據label_ds=pd.read_csv(r"train_features_0714.txt",sep='\t',encoding='utf8',names=["u_id","u_buy_num","u_click_num","u_buy_date","u_click_date","u_num_ratio","u_date_ratio","u_buy_freq","u_click_freq","u_last_date",\"spu_id","spu_buy_num","spu_click_num","spu_buy_date","spu_click_date","spu_num_ratio","spu_date_ratio","spu_buy_freq","spu_click_freq","spu_last_date",\"brand_id","brand_buy_num","brand_click_num","brand_buy_date","brand_click_date","brand_num_ratio","brand_date_ratio","brand_buy_freq","brand_click_freq","brand_last_date",\"cat_id","cat_buy_num","cat_click_num","cat_buy_date","cat_click_date","cat_num_ratio","cat_date_ratio","cat_buy_freq","cat_click_freq","cat_last_date",\"action_type"]) #人特征label_ds["u_id"] = label_ds["u_id"].astype("int")label_ds["u_buy_num"] = label_ds["u_buy_num"].astype("int")label_ds["u_click_num"] = label_ds["u_click_num"].astype("int")label_ds["u_buy_date"] = label_ds["u_buy_date"].astype("int")label_ds["u_click_date"] = label_ds["u_click_date"].astype("int")label_ds["u_num_ratio"] = label_ds["u_num_ratio"].astype("float")label_ds["u_date_ratio"] = label_ds["u_date_ratio"].astype("float")label_ds["u_buy_freq"] = label_ds["u_buy_freq"].astype("float")label_ds["u_click_freq"] = label_ds["u_click_freq"].astype("float")label_ds["u_last_date"] = label_ds["u_last_date"].astype("int")#商品特征label_ds["spu_id"] = label_ds["spu_id"].astype("int")label_ds["spu_buy_num"] = label_ds["spu_buy_num"].astype("int")label_ds["spu_click_num"] = label_ds["spu_click_num"].astype("int")label_ds["spu_buy_date"] = label_ds["spu_buy_date"].astype("int")label_ds["spu_click_date"] = label_ds["spu_click_date"].astype("int")label_ds["spu_num_ratio"] = label_ds["spu_num_ratio"].astype("float")label_ds["spu_date_ratio"] = label_ds["spu_date_ratio"].astype("float")label_ds["spu_buy_freq"] = label_ds["spu_buy_freq"].astype("float")label_ds["spu_click_freq"] = label_ds["spu_click_freq"].astype("float")label_ds["spu_last_date"] = label_ds["spu_last_date"].astype("int")#品牌特征label_ds["brand_id"] = label_ds["brand_id"].astype("int")label_ds["brand_buy_num"] = label_ds["brand_buy_num"].astype("int")label_ds["brand_click_num"] = label_ds["brand_click_num"].astype("int")label_ds["brand_buy_date"] = label_ds["brand_buy_date"].astype("int")label_ds["brand_click_date"] = label_ds["brand_click_date"].astype("int")label_ds["brand_num_ratio"] = label_ds["brand_num_ratio"].astype("float")label_ds["brand_date_ratio"] = label_ds["brand_date_ratio"].astype("float")label_ds["brand_buy_freq"] = label_ds["brand_buy_freq"].astype("float")label_ds["brand_click_freq"] = label_ds["brand_click_freq"].astype("float")label_ds["brand_last_date"] = label_ds["brand_last_date"].astype("int")#品類特征label_ds["cat_id"] = label_ds["cat_id"].astype("int")label_ds["cat_buy_num"] = label_ds["cat_buy_num"].astype("int")label_ds["cat_click_num"] = label_ds["cat_click_num"].astype("int")label_ds["cat_buy_date"] = label_ds["cat_buy_date"].astype("int")label_ds["cat_click_date"] = label_ds["cat_click_date"].astype("int")label_ds["cat_num_ratio"] = label_ds["cat_num_ratio"].astype("float")label_ds["cat_date_ratio"] = label_ds["cat_date_ratio"].astype("float")label_ds["cat_buy_freq"] = label_ds["cat_buy_freq"].astype("float")label_ds["cat_click_freq"] = label_ds["cat_click_freq"].astype("float")label_ds["cat_last_date"] = label_ds["cat_last_date"].astype("int")#標記label_ds["action_type"] = label_ds["action_type"].astype("int")print "訓練集,有", label_ds.shape[0], "行", label_ds.shape[1], "列" #加載未標記數據unlabel_ds=pd.read_csv(r"test_features_0714.txt",sep='\t',encoding='utf8',names=["id","uid","spu_id","brand_id","cat_id",\"u_buy_num","u_click_num","u_buy_date","u_click_date","u_num_ratio","u_date_ratio","u_buy_freq","u_click_freq","u_last_date",\"spu_buy_num","spu_click_num","spu_buy_date","spu_click_date","spu_num_ratio","spu_date_ratio","spu_buy_freq","spu_click_freq","spu_last_date",\"brand_buy_num","brand_click_num","brand_buy_date","brand_click_date","brand_num_ratio","brand_date_ratio","brand_buy_freq","brand_click_freq","brand_last_date",\"cat_buy_num","cat_click_num","cat_buy_date","cat_click_date","cat_num_ratio","cat_date_ratio","cat_buy_freq","cat_click_freq","cat_last_date",]) #人特征unlabel_ds["id"] = unlabel_ds["id"].astype("int")unlabel_ds["u_id"] = unlabel_ds["u_id"].astype("int")unlabel_ds["u_buy_num"] = unlabel_ds["u_buy_num"].astype("int")#391萬unlabel_ds["u_click_num"] = unlabel_ds["u_click_num"].astype("int")unlabel_ds["u_buy_date"] = unlabel_ds["u_buy_date"].astype("int")unlabel_ds["u_click_date"] = unlabel_ds["u_click_date"].astype("int")unlabel_ds["u_num_ratio"] = unlabel_ds["u_num_ratio"].astype("float")unlabel_ds["u_date_ratio"] = unlabel_ds["u_date_ratio"].astype("float")unlabel_ds["u_buy_freq"] = unlabel_ds["u_buy_freq"].astype("float")unlabel_ds["u_click_freq"] = unlabel_ds["u_click_freq"].astype("float")unlabel_ds["u_last_date"] = unlabel_ds["u_last_date"].astype("int")#商品特征unlabel_ds["spu_id"] = unlabel_ds["spu_id"].astype("int")unlabel_ds["spu_buy_num"] = unlabel_ds["spu_buy_num"].astype("int")unlabel_ds["spu_click_num"] = unlabel_ds["spu_click_num"].astype("int")unlabel_ds["spu_buy_date"] = unlabel_ds["spu_buy_date"].astype("int")unlabel_ds["spu_click_date"] = unlabel_ds["spu_click_date"].astype("int")unlabel_ds["spu_num_ratio"] = unlabel_ds["spu_num_ratio"].astype("float")#241萬unlabel_ds["spu_date_ratio"] = unlabel_ds["spu_date_ratio"].astype("float")unlabel_ds["spu_buy_freq"] = unlabel_ds["spu_buy_freq"].astype("float")unlabel_ds["spu_click_freq"] = unlabel_ds["spu_click_freq"].astype("float")unlabel_ds["spu_last_date"] = unlabel_ds["spu_last_date"].astype("int")#品牌特征unlabel_ds["brand_id"] = unlabel_ds["brand_id"].astype("int")unlabel_ds["brand_buy_num"] = unlabel_ds["brand_buy_num"].astype("int")unlabel_ds["brand_click_num"] = unlabel_ds["brand_click_num"].astype("int")unlabel_ds["brand_buy_date"] = unlabel_ds["brand_buy_date"].astype("int")unlabel_ds["brand_click_date"] = unlabel_ds["brand_click_date"].astype("int")unlabel_ds["brand_num_ratio"] = unlabel_ds["brand_num_ratio"].astype("float")unlabel_ds["brand_date_ratio"] = unlabel_ds["brand_date_ratio"].astype("float")unlabel_ds["brand_buy_freq"] = unlabel_ds["brand_buy_freq"].astype("float")unlabel_ds["brand_click_freq"] = unlabel_ds["brand_click_freq"].astype("float")unlabel_ds["brand_last_date"] = unlabel_ds["brand_last_date"].astype("int")#品類特征unlabel_ds["cat_id"] = unlabel_ds["cat_id"].astype("int")unlabel_ds["cat_buy_num"] = unlabel_ds["cat_buy_num"].astype("int")unlabel_ds["cat_click_num"] = unlabel_ds["cat_click_num"].astype("int")unlabel_ds["cat_buy_date"] = unlabel_ds["cat_buy_date"].astype("int")unlabel_ds["cat_click_date"] = unlabel_ds["cat_click_date"].astype("int")unlabel_ds["cat_num_ratio"] = unlabel_ds["cat_num_ratio"].astype("float")unlabel_ds["cat_date_ratio"] = unlabel_ds["cat_date_ratio"].astype("float")unlabel_ds["cat_buy_freq"] = unlabel_ds["cat_buy_freq"].astype("float")unlabel_ds["cat_click_freq"] = unlabel_ds["cat_click_freq"].astype("float")unlabel_ds["cat_last_date"] = unlabel_ds["cat_last_date"].astype("int")print "測試集,有", unlabel_ds.shape[0], "行", unlabel_ds.shape[1], "列" #模型訓練ds_0=label_ds[label_ds['action_type']==0]#標記為0的樣本ds_0_train=ds_0.sample(frac=0.01)#抽0.01出來訓練ds_1=label_ds[label_ds['action_type']==1]#標記為1的樣本ds_train=ds_1.append(ds_0_train)label_X=ds_train[['u_num_ratio','spu_num_ratio','brand_num_ratio','cat_num_ratio']]label_X_scale=preprocessing.scale(label_X)#歸一化label_y = ds_train['action_type']#類別 ds=label_ds[label_ds['action_type']==0]model =LogisticRegression()#ensemble.GradientBoostingClassifier()model.fit(label_X_scale, label_y) #第五步:模型驗證和選擇test_df=ds_train.sample(frac=0.2)#抽0.2驗證test_X=test_df[['u_num_ratio','spu_num_ratio','brand_num_ratio','cat_num_ratio']]test_X_scale=preprocessing.scale(test_X)#歸一化test_y=test_df['action_type']#類別predicted = model.predict(test_X_scale) f1_score = metrics.f1_score(test_y, predicted) #模型評估 print f1_score#第六步:模型預測unlabe_X = unlabel_ds[['u_num_ratio','spu_num_ratio','brand_num_ratio','cat_num_ratio']]unlabe_X_scale=preprocessing.scale(unlabe_X)#歸一化unlabel_y=model.predict_proba(unlabe_X_scale)[:,1]#預測返回概率值,通過概率值閾值選擇正例樣本 out_y=pd.DataFrame(unlabel_y,columns=['prob']) #返回判定正例的比例 out_y["prob"]=out_y["prob"].apply(lambda x: '{0:.3f}'.format(x))out_1=out_y[out_y["prob"]>'0.5'] #看大于0.5的個數print out_1.shapeout_y['prob'].value_counts() #看值分布out_y.to_csv('fangjs/outvip.txt',index=False,header=None)#輸出預測數據 #執行 if __name__ == '__main__': start = time.clock() main()end = time.clock() print('finish all in %s' % str(end - start))

總結

以上是生活随笔為你收集整理的【Python学习系列二十四】scikit-learn库逻辑回归实现唯品会用户购买行为预测的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。

主站蜘蛛池模板: 东方影库av| 日本精品免费一区二区三区 | 国产真实乱人偷精品视频 | 丝袜毛片 | 日韩精品免费一区二区三区竹菊 | 日韩精品一区二区三区丰满 | 精品国产一区二区三区无码 | 在线国产黄色 | 美女av免费观看 | 欧亚乱熟女一区二区在线 | 国产一区二区三区成人 | 成人做受视频试看60秒 | 老妇裸体性猛交视频 | 中文字幕丰满人伦在线 | 免费一级片视频 | 四虎4hu永久免费网站影院 | 在线爱情大片免费观看大全 | 激情插插插 | 天堂av免费| 国产又黄又猛 | 欧美综合在线一区 | 99热手机在线观看 | 亚洲欧美一 | 丁香花完整视频在线观看 | www.黄色大片 | 人妖videosex高潮另类 | 天天干天天操天天玩 | 91网在线 | 成人精品视频一区二区三区尤物 | 69精品久久 | 国产成人av一区 | 国产成a人亚洲精v品无码 | 日本在线中文 | 精品国产乱码久久久久久预案 | 亚洲午夜精品一区二区三区他趣 | 国产日韩欧美精品一区二区 | 九一在线视频 | 亚洲日皮 | 欧美日韩色图 | 91久久国产综合久久 | 香蕉茄子视频 | 91在线观| 欧美性高潮| 黄色福利在线观看 | 国产精品区一区二区三 | 少妇高潮露脸国语对白 | 国产精品第七页 | 久久超碰av | 日韩毛片高清在线播放 | 羞羞网站在线观看 | 成年人黄色免费网站 | av毛片观看 | 亚洲1234区 | 精品伦精品一区二区三区视频 | 插入综合网 | 色亚洲视频 | 91精品区 | 日韩极品在线 | caoporn视频在线 | 亚洲免费观看高清完整版在线 | ass极品水嫩小美女ass | 少妇天天干 | 神马午夜在线观看 | 欧美理论在线 | 粉嫩aⅴ一区二区三区四区五区 | 国产精品免费久久久 | 欧美成人片在线 | 免费看黄色大片 | 久操国产 | 极品美女被c | 天天天综合网 | 日韩欧美视频免费观看 | 光明影院手机版在线观看免费 | 99久久精品一区二区三区 | 国产一区二区高清视频 | 欧美色爽 | 免费三级网站 | 午夜爽爽爽视频 | 日本极品丰满ⅹxxxhd | 国产91免费视频 | 日韩人妻精品一区二区三区视频 | 欧洲成人免费视频 | 欧美性猛片aaaaaaa做受 | 欧美成人手机在线视频 | av网站免费在线观看 | 欧美一级黄视频 | 久国久产久精永久网页 | 日日爱夜夜操 | 国产精品一区二区三 | 久草视频资源 | 天天干天| 久久艹国产精品 | 久久午夜鲁丝片午夜精品 | www.桃色| 我们的2018中文免费看 | 国产精品无码永久免费不卡 | 视频一区在线观看 | 精品成人免费视频 | 色综合天天干 |