日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程语言 > python >内容正文

python

Alternating Least Squares(ASL) for Implicit Feedback Datasets的数学推导以及用Python实现

發(fā)布時(shí)間:2025/6/15 python 62 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Alternating Least Squares(ASL) for Implicit Feedback Datasets的数学推导以及用Python实现 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

近期在看CF的相關(guān)論文,《Collaborative Filtering for Implicit Feedback Datasets》思想非常好,非常easy理解。可是從目標(biāo)函數(shù)

是怎樣推導(dǎo)出Xu和Yi的更新公式的推導(dǎo)過(guò)程卻沒(méi)有非常好的描寫(xiě)敘述。所以以下寫(xiě)一下
推導(dǎo):
首先對(duì)Xu求導(dǎo):

當(dāng)中Y是item矩陣,n*f維,每一行是一個(gè)item_vec,C^u是n*n維的對(duì)角矩陣。
對(duì)角線上的每個(gè)元素是c_ui,P(u)是n*1的列向量,它的第i個(gè)元素為p_ui。
然后令導(dǎo)數(shù)=0,可得:

因?yàn)閤_u和y_i在目標(biāo)函數(shù)中是對(duì)稱的。所以非常easy得到:

當(dāng)中X是user矩陣,m*f維度,每一行是一個(gè)user_vec,C^i是m*m的對(duì)角矩陣。對(duì)角線上的每個(gè)元素是c_ui。P(i)是m*1的列向量。它的第u和元素是p_ui
然后令導(dǎo)數(shù)=0,可得:

以下是論文算法思想的Python實(shí)現(xiàn):

import numpy as np import scipy.sparse as sparse from scipy.sparse.linalg import spsolve import timedef load_matrix(filename, num_users, num_items):t0 = time.time()counts = np.zeros((num_users, num_items))total = 0.0num_zeros = num_users * num_items'''假設(shè)要對(duì)一個(gè)列表或者數(shù)組既要遍歷索引又要遍歷元素時(shí)。能夠用enumerate,當(dāng)傳入?yún)?shù)為文件時(shí),索引為行號(hào),元素相應(yīng)的一行內(nèi)容'''for i, line in enumerate(open(filename, 'r')): #strip()去除最前面和最后面的空格user, item, count = line.strip().split('\t')user = int(user)item = int(item)count = float(count)if user >= num_users:continueif item >= num_items:continueif count != 0:counts[user, item] = counttotal += countnum_zeros -= 1if i % 100000 == 0:print 'loaded %i counts...' % i#數(shù)據(jù)導(dǎo)入完成后計(jì)算稀疏矩陣中零元素個(gè)數(shù)和非零元素個(gè)數(shù)的比例,記為alphaalpha = num_zeros / totalprint 'alpha %.2f' % alphacounts *= alpha#用CompressedSparse Row Format將稀疏矩陣壓縮counts = sparse.csr_matrix(counts)t1 = time.time()print 'Finished loading matrix in %f seconds' % (t1 - t0)return countsclass ImplicitMF():def __init__(self, counts, num_factors=40, num_iterations=30,reg_param=0.8):self.counts = countsself.num_users = counts.shape[0]self.num_items = counts.shape[1]self.num_factors = num_factorsself.num_iterations = num_iterationsself.reg_param = reg_paramdef train_model(self):#創(chuàng)建user_vectors和item_vectors,他們的元素~N(0,1)的正態(tài)分布self.user_vectors = np.random.normal(size=(self.num_users,self.num_factors))self.item_vectors = np.random.normal(size=(self.num_items,self.num_factors))'''要生成非常大的數(shù)字序列的時(shí)候,用xrange會(huì)比range性能優(yōu)非常多,因?yàn)椴豁氁簧蟻?lái)就開(kāi)辟一塊非常大的內(nèi)存空間,這兩個(gè)基本上都是在循環(huán)的時(shí)候用'''for i in xrange(self.num_iterations):t0 = time.time()print 'Solving for user vectors...'self.user_vectors = self.iteration(True, sparse.csr_matrix(self.item_vectors))print 'Solving for item vectors...'self.item_vectors = self.iteration(False, sparse.csr_matrix(self.user_vectors))t1 = time.time()print 'iteration %i finished in %f seconds' % (i + 1, t1 - t0)def iteration(self, user, fixed_vecs):#相當(dāng)于C的三木運(yùn)算符。if user=True num_solve = num_users,反之為num_itemsnum_solve = self.num_users if user else self.num_itemsnum_fixed = fixed_vecs.shape[0]YTY = fixed_vecs.T.dot(fixed_vecs)eye = sparse.eye(num_fixed)lambda_eye = self.reg_param * sparse.eye(self.num_factors)solve_vecs = np.zeros((num_solve, self.num_factors))t = time.time()for i in xrange(num_solve):if user:counts_i = self.counts[i].toarray()else:#假設(shè)要求item_vec,counts_i為counts中的第i列的轉(zhuǎn)置counts_i = self.counts[:, i].T.toarray()''' 原論文中c_ui=1+alpha*r_ui,可是在計(jì)算Y’CuY時(shí)為了減少時(shí)間復(fù)雜度,利用了Y'CuY=Y'Y+Y'(Cu-I)Y,因?yàn)镃u是對(duì)角矩陣,其元素為c_ui,即1+alpha*r_ui。所以Cu-I也就是對(duì)角元素為alpha*r_ui的對(duì)角矩陣'''CuI = sparse.diags(counts_i, [0])pu = counts_i.copy()#np.where(pu != 0)返回pu中元素不為0的索引,然后將這些元素賦值為1,不知道這里為什么要賦值為1?pu[np.where(pu != 0)] = 1.0YTCuIY = fixed_vecs.T.dot(CuI).dot(fixed_vecs)YTCupu = fixed_vecs.T.dot(CuI + eye).dot(sparse.csr_matrix(pu).T)xu = spsolve(YTY + YTCuIY + lambda_eye, YTCupu)solve_vecs[i] = xuif i % 1000 == 0:print 'Solved %i vecs in %d seconds' % (i, time.time() - t)t = time.time()return solve_vecs

總結(jié)

以上是生活随笔為你收集整理的Alternating Least Squares(ASL) for Implicit Feedback Datasets的数学推导以及用Python实现的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。