當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

《集体智慧编程》笔记（2 / 12）：提供推荐

發布時間：2023/12/13 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了《集体智慧编程》笔记（2 / 12）：提供推荐小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Making Recommendations

文章目錄

- 協作型過濾
- 搜集偏好
- 尋找相近的用戶
- - 歐幾里得距離評價
  - 皮爾遜相關度評價
  - 應該選用哪一種相似性度量方法
  - 為評分者打分
- 推薦物品
- 匹配相似商品
- 構建一個基于某數據平臺的鏈接推薦系統
- - 數據平臺API
  - 構造數據集
  - 推薦近鄰與鏈接
- 基于物品的過濾
- - 構造物品比較數據集
  - 獲得推薦
- 使用MovieLens數據集
- 基于用戶進行過濾還是基于物品進行過濾
- 小結

如何根據群體偏好來為人們提供推薦。

協作型過濾

Collaborative Filtering

一個協作型過濾算法通常的做法是對一大群人進行搜索，并從中找出與我們品味相近的一群人。

算法會對這些人所偏愛的其他內容考察，并將它們組合起來構造出一個經過排名的推薦列表。有許多不同的方法可以幫助我們確定哪些人與自己品味相近，并將他們的選擇組合成列表。

搜集偏好

Collecting Preferences

這里使用Python嵌套字典保存影迷對電影的評價。

數據結構：

評分={人a:{電影A:(1~5評分),電影B:(1~5評分),...},人b:...... } # A dictionary of movie critics and their ratings of a small # set of movies critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5, 'The Night Listener': 3.0}, 'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5, 'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 3.5}, 'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,'Superman Returns': 3.5, 'The Night Listener': 4.0}, 'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,'The Night Listener': 4.5, 'Superman Returns': 4.0, 'You, Me and Dupree': 2.5}, 'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,'You, Me and Dupree': 2.0}, 'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5}, 'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}

recommendations.py源碼

>>> import os >>> os.getcwd()>>> os.chdir('.')>>> from recommendations import critics >>> critics['Lisa Rose']['Lady in the Water'] 2.5 >>> critics['Toby']['Snakes on a Plane']=4.5 >>> critics['Toby'] {'Snakes on a Plane': 4.5, 'Superman Returns': 4.0, 'You, Me and Dupree': 1.0} >>>

尋找相近的用戶

Finding Similar Users

我們需要方法來確定人們品味的相似程度。

為此，我們可將每個人與所有其他人進行對比，并計算他們的相似度評價值。

有兩種方法：

歐幾里得距離；

皮爾遜相關度。

歐幾里得距離評價

Euclidean Distance Score

若兩人在“偏好空間”中的距離越近，他們的興趣偏好就越相似。

多數量的評分項同樣適用這距離公式。

計算兩人間距離

>>> from math import sqrt >>> sqrt(pow(4.5-4,2)+pow(1-2,2)) 1.118033988749895 >>>

通常要對這數進行處理，來對偏好越相近的情況給出越大的值。

因此，可計算得到距離值加1（這可避免遇到被零整除的錯誤），并取其倒數：

>>> >>> 1/(1+sqrt(pow(4.5-4,2)+pow(1-2,2))) 0.4721359549995794 >>>

這樣就返回0與1之間的值，值越大，偏好更相似。

構造出用來計算相似度的函數。

from math import sqrt# Returns a distance-based similarity score for person1 and person2 def sim_distance(prefs,person1,person2):# Get the list of shared_itemssi={}for item in prefs[person1]: if item in prefs[person2]: si[item]=1# if they have no ratings in common, return 0if len(si)==0: return 0# Add up the squares of all the differencessum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2) for item in prefs[person1] if item in prefs[person2]])return 1/(1+sum_of_squares)

使用示例

>>> import recommendations >>> recommendations.sim_distance(recommendations.critics,'Lisa Rose', 'Gene Seymour') 0.14814814814814814 >>>

皮爾遜相關度評價

Pearson Correlation Score

皮爾遜相關系數是判斷兩組數據與某一直線擬合程度的一種度量。

對應的公式比歐式距離公式要復雜，但是它在數據不是很規范normalized的時候（譬如，影評人對影片的評價總是相對于平均水平偏離很大時），會傾向于給出更好的結果。

上圖，可看出一條直線。因其繪制原則是盡可能地靠近圖上的所有坐標點，故而被稱作最佳擬合線best-fit line。

若兩位評論者對所有影片的評分情況都相同，那么這條直線將成為對角線，并且會與圖上所有的坐標點都相交，從而得到一個結果為1的理想相關度評價。

比上圖有更佳擬合度的圖。

皮爾遜方法能修正“夸大分值grade inflation”。若某人總是傾向于給出比另一個人更高的分值，而二者的分值之差又始終保持一致，則他們依然可能會存在很好的相關性。

換成歐式距離公式評價方法，會因為一個人的評價始終比另一個人的更為“嚴格”（從而導致評價始終相對偏低），得出兩者不相近的結論，即使他們品味很相似也是如此。

構造出用來計算相似度的函數。

# Returns the Pearson correlation coefficient for p1 and p2 def sim_pearson(prefs,p1,p2):# Get the list of mutually rated itemssi={}for item in prefs[p1]: if item in prefs[p2]: si[item]=1# if they are no ratings in common, return 0if len(si)==0: return 0# Sum calculationsn=len(si)# Sums of all the preferences#1sum1=sum([prefs[p1][it] for it in si])#2sum2=sum([prefs[p2][it] for it in si])# Sums of the squares#3sum1Sq=sum([pow(prefs[p1][it],2) for it in si])#4sum2Sq=sum([pow(prefs[p2][it],2) for it in si]) # Sum of the products#5pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])# Calculate r (Pearson score)num=pSum-(sum1*sum2/n)den=sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))if den==0: return 0r=num/denreturn r

該函數將返回一個介于-1與1之間的數值。值為1則表明兩個人對每一樣物品均有則完全一致的評價。

>>> recommendations.sim_pearson(recommendations.critics,'Lisa Rose', 'Gene Seymour') 0.39605901719066977 >>>

應該選用哪一種相似性度量方法

哪一種方法最優，完全取決于具體的應用。

為評分者打分

給出前幾名相似度高的評論者。

# Returns the best matches for person from the prefs dictionary. # Number of results and similarity function are optional params. def topMatches(prefs,person,n=5,similarity=sim_pearson):scores=[(similarity(prefs,person,other),other) for other in prefs if other!=person]scores.sort()scores.reverse()return scores[0:n]

使用示例

>>> recommendations.topMatches(recommendations.critics,'Toby',n=3) [(0.9912407071619299, 'Lisa Rose'), (0.9244734516419049, 'Mick LaSalle'), (0.8934051474415647, 'Claudia Puig')] >>>

匹配相似商品

Matching Products

我們想了解哪些商品是彼此相近。

在這種情況下，我們可以通過查看某一特定物品被哪些人喜歡，以及哪些其他物品被這些人喜歡來決定相似程度。

事實上，和先前來決定人與人之間相似度的方法是一樣的——只需將人員與物品對換即可。

先前的數據結構

評分={人a:{電影A:(1~5評分),電影B:(1~5評分),...},人b:...... }

換成

評分={電影A:{人a:(1~5評分),人b:(1~5評分),...},電影B:...... }

定義一個函數來進行對換

def transformPrefs(prefs):result={}for person in prefs:for item in prefs[person]:result.setdefault(item,{})# Flip item and personresult[item][person]=prefs[person][item]return result

然后，調用topMatche函數，得到一組與《Superman Returns》最為相近的影片：

>>> movies = recommendations.transformPrefs(recommendations.critics) >>> recommendations.topMatches(movies,"Superman Returns") [(0.6579516949597695, 'You, Me and Dupree'), (0.4879500364742689, 'Lady in the Water'), (0.11180339887498941, 'Snakes on a Plane'), (-0.1798471947990544, 'The Night Listener'), (-0.42289003161103106, 'Just My Luck')] >>>

'Just My Luck’與"Superman Returns"呈負相關關系

我們還可以為影片推薦評論者，例如，我們正在考慮邀請誰和自己一起參加某部影片的重映。

>>> >>> recommendations.getRecommendations(movies, 'Just My Luck') [(4.0, 'Michael Phillips'), (3.0, 'Jack Matthews')] >>>

更多案例

為了向不同的個體推薦商品，在線零售商可能會收集人們的購買歷史，然后找到購買商品的潛在客戶。

在專門推薦鏈接的網站上，這樣做可以確保新出的鏈接，能夠被那些最有可能對它產生興趣的網站用戶找到。

構建一個基于某數據平臺的鏈接推薦系統

這數據平臺是del.icio.us在線書簽網站。該網址登陸不了。

目標：利用平臺的數據查找相近的用戶，并向他們推薦以前未曾看過的鏈接。

數據平臺API

pydelicious.py源碼未知原因不能使用

下面兩節置空。

構造數據集

略

基于物品的過濾

Item-Based Filtering

考慮到性能上的問題。

試想對于Amazon有這海量客戶，將一個客戶和所有其他客戶進行比較，然后再對每位客戶評分過的商品進行比較，工作量何其巨大。

同樣，一商品銷售量百萬的網站，或許客戶在偏好方面彼此間很少有重疊，這可能令客戶的相似性判斷變得十分困難。

先前才用到的技術成為基于用戶的協作型過濾user-based collaborative filtering。

接下來介紹基于物品的協作型過濾item-based collaborative filtering。

基于物品的協作型過濾的優點

在擁有大量數據集的情況下，基于基于物品的協作型過濾能夠得出更好的結論，而且它允許將大量計算任務預先執行（空間換時間），從而需要給予推薦的用戶能夠更快地得到他們所要的結果。

基于物品的協作型過濾的總體思路是

為每件物品預先計算好最為相近的其它物品。

然后，當為某位客戶提供推薦，就可以查看它曾經評分過得的物品，從中選出排位靠前者，

再構造出一個加權列表，其中包含看了與這些選中物品最為相近的其他物品。

這里最為顯著的區別在于，盡管第一步要求我們檢查所有的數據，但是物品間的比較不會像用戶間的比較那么頻繁。

這就意味著，無需不停計算與每樣物品最為相近的其他物品，可將這樣的運算任務安排在網絡流量不是很大的時候進行，或者在獨立于主應用之外的另一臺計算機上獨立進行（空間換時間）。

構造物品比較數據集

def calculateSimilarItems(prefs,n=10):# Create a dictionary of items showing which other items they# are most similar to.result={}# Invert the preference matrix to be item-centricitemPrefs=transformPrefs(prefs)c=0for item in itemPrefs:# Status updates for large datasetsc+=1if c%100==0: print "%d / %d" % (c,len(itemPrefs))# Find the most similar items to this one#物-人數據集得出最相似的物scores=topMatches(itemPrefs,item,n=n,similarity=sim_distance)result[item]=scoresreturn result

運用示例

>>> import recommendations >>> itemsim = recommendations.calculateSimilarItems(recommendations.critics) >>> itemsim {'Lady in the Water': [(0.4, 'You, Me and Dupree'), (0.2857142857142857, 'The Night Listener'), (0.2222222222222222, 'Snakes on a Plane'), (0.2222222222222222, 'Just My Luck'), (0.09090909090909091, 'Superman Returns')], 'Snakes on a Plane': [(0.2222222222222222, 'Lady in the Water'), (0.18181818181818182, 'The Night Listener'), (0.16666666666666666, 'Superman Returns'), (0.10526315789473684, 'Just My Luck'), (0.05128205128205128, 'You, Me and Dupree')], 'Just My Luck': [(0.2222222222222222, 'Lady in the Water'), (0.18181818181818182, 'You, Me and Dupree'), (0.15384615384615385, 'The Night Listener'), (0.10526315789473684, 'Snakes on a Plane'), (0.06451612903225806, 'Superman Returns')], 'Superman Returns': [(0.16666666666666666, 'Snakes on a Plane'), (0.10256410256410256, 'The Night Listener'), (0.09090909090909091, 'Lady in the Water'), (0.06451612903225806, 'Just My Luck'), (0.05333333333333334, 'You, Me and Dupree')], 'You, Me and Dupree': [(0.4, 'Lady in the Water'), (0.18181818181818182, 'Just My Luck'), (0.14814814814814814, 'The Night Listener'), (0.05333333333333334, 'Superman Returns'), (0.05128205128205128, 'Snakes on a Plane')], 'The Night Listener': [(0.2857142857142857, 'Lady in the Water'), (0.18181818181818182, 'Snakes on a Plane'), (0.15384615384615385, 'Just My Luck'), (0.14814814814814814, 'You, Me and Dupree'), (0.10256410256410256, 'Superman Returns')]} >>>

獲得推薦

現在，已經可以在不遍歷整個數據集的情況下，利用反映物品相似度的字典（上一節給出的）來給出推薦。

可以取到用戶評價過的所有物品，找出其相似物品，并根據相似度對其進行加權。

可以容易地根據物品字典來得到相似度。

利用基于物品的方法尋找推薦過程

先前推薦物品用到的表

本節用到表與推薦物品表不同之處是沒有涉及所有評論者

def getRecommendedItems(prefs,itemMatch,user):userRatings=prefs[user]scores={}totalSim={}# Loop over items rated by this userfor (item,rating) in userRatings.items( ):# Loop over items similar to this onefor (similarity,item2) in itemMatch[item]:# Ignore if this user has already rated this itemif item2 in userRatings: continue# Weighted sum of rating times similarityscores.setdefault(item2,0)scores[item2]+=similarity*rating# Sum of all the similaritiestotalSim.setdefault(item2,0)totalSim[item2]+=similarity# Divide each total score by total weighting to get an averagerankings=[(score/totalSim[item],item) for item,score in scores.items( )]# Return the rankings from highest to lowestrankings.sort( )rankings.reverse( )return rankings

運行示例

>>> >>> recommendations.getRecommendedItems(recommendations.critics,itemsim,'Toby') [(3.182634730538922, 'The Night Listener'), (2.5983318700614575, 'Just My Luck'), (2.4730878186968837, 'Lady in the Water')] >>>

使用MovieLens數據集

涉及電影評價的真實數據集

網站提供很多有關電影數據集，最后選擇大小最小的文件ml-100k.zip

該網站還有圖書，笑話等數據，等待你的發掘。

注：個人將數據文件放置配套的源碼包中。

只需關注的文件之一u.item

文件的前5行

一組有關影片ID和片名的列表

1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0 2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0 3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0 4|Get Shorty (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Get%20Shorty%20(1995)|0|1|0|0|0|1|0|0|1|0|0|0|0|0|0|0|0|0|0 5|Copycat (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Copycat%20(1995)|0|0|0|0|0|0|1|0|1|0|0|0|0|0|0|0|1|0|0

只需關注的文件之二u.data

文件的前5行

用戶id 影片id 用戶對影片的評分用戶的評價時間196 242 3 881250949 186 302 3 891717742 22 377 1 878887116 244 51 2 880606923 166 346 1 886397596

加載文件的函數

def loadMovieLens(path='/data/movielens'):# Get movie titlesmovies={}for line in open(path+'/u.item'):(id,title)=line.split('|')[0:2]movies[id]=title# Load dataprefs={}for line in open(path+'/u.data'):(user,movieid,rating,ts)=line.split('\t')prefs.setdefault(user,{})prefs[user][movies[movieid]]=float(rating)return prefs

運用示例

>>> >>> prefs['20'] {'Return of the Jedi (1983)': 4.0, 'Jungle2Jungle (1997)': 4.0, 'Back to the Future (1985)': 3.0, 'Jurassic Park (1993)': 4.0, 'Sting, The (1973)': 3.0, 'Sabrina (1995)': 4.0, 'Island of Dr. Moreau, The (1996)': 1.0, 'Mission: Impossible (1996)': 3.0, 'Twister (1996)': 4.0, 'Toy Story (1995)': 3.0, 'Willy Wonka and the Chocolate Factory (1971)': 3.0, 'Sound of Music, The (1965)': 3.0, 'Home Alone (1990)': 2.0, 'Scream (1996)': 1.0, 'Braveheart (1995)': 5.0, 'Indiana Jones and the Last Crusade (1989)': 4.0, 'Young Frankenstein (1974)': 2.0, 'Raiders of the Lost Ark (1981)': 4.0, "Dante's Peak (1997)": 4.0, "Mr. Holland's Opus (1995)": 4.0, 'Die Hard (1988)': 2.0, 'Speed (1994)': 4.0, 'Michael (1996)': 1.0, 'Christmas Carol, A (1938)': 4.0, 'Lost World: Jurassic Park, The (1997)': 4.0, 'Ghost and the Darkness, The (1996)': 5.0,'African Queen, The (1951)': 3.0, 'Space Jam (1996)': 2.0, 'Ransom (1996)': 4.0, 'Silence of the Lambs, The (1991)': 3.0, 'Searching for Bobby Fischer (1993)': 5.0, "Preacher's Wife, The (1996)": 4.0, 'Blues Brothers, The (1980)': 3.0, 'Happy Gilmore (1996)': 1.0, 'Volcano (1997)': 4.0, 'Aliens (1986)': 2.0, 'Independence Day (ID4) (1996)': 3.0, 'E.T. the Extra-Terrestrial (1982)': 2.0, 'Seven (Se7en) (1995)': 2.0, 'Forrest Gump (1994)': 1.0, 'Aladdin (1992)': 3.0, 'Miracle on 34th Street (1994)': 3.0, 'Empire Strikes Back, The (1980)': 3.0, 'Eraser (1996)': 3.0, "It's a Wonderful Life (1946)": 5.0, 'Star Wars (1977)': 3.0, 'Beauty and the Beast (1991)': 4.0, "One Flew Over the Cuckoo's Nest (1975)": 1.0} >>>

基于用戶的推薦

>>> recommendations.getRecommendations(prefs,'20')[0:10] [(5.0, 'World of Apu, The (Apur Sansar) (1959)'), (5.0, 'Whole Wide World, The (1996)'), (5.0, 'Thieves (Voleurs, Les) (1996)'), (5.0, 'Strawberry and Chocolate (Fresa y chocolate) (1993)'), (5.0, 'Star Kid (1997)'), (5.0, "Someone Else's America (1995)"), (5.0, 'Sliding Doors (1998)'), (5.0, 'Santa with Muscles (1996)'), (5.0, 'Saint of Fort Washington, The (1993)'), (5.0, 'Quiet Room, The (1996)')] >>>

基于物品的推薦

>>> >>> itemsim = recommendations.calculateSimilarItems(prefs,n=50) 100 / 1664 200 / 1664 300 / 1664 400 / 1664 500 / 1664 600 / 1664 700 / 1664 800 / 1664 900 / 1664 1000 / 1664 1100 / 1664 1200 / 1664 1300 / 1664 1400 / 1664 1500 / 1664 1600 / 1664>>> recommendations.getRecommendedItems(prefs,itemsim,'87')[0:30] [(5.0, "What's Eating Gilbert Grape (1993)"), (5.0, 'Vertigo (1958)'), (5.0, 'Usual Suspects, The (1995)'), (5.0, 'Toy Story (1995)'), (5.0, 'Titanic (1997)'), (5.0, 'Sword in the Stone, The (1963)'), (5.0, 'Stand by Me (1986)'), (5.0, 'Sling Blade (1996)'), (5.0, 'Silence of the Lambs, The (1991)'), (5.0, 'Shining, The (1980)'), (5.0, 'Shine (1996)'), (5.0, 'Sense and Sensibility (1995)'), (5.0, 'Scream (1996)'), (5.0, 'Rumble in the Bronx (1995)'), (5.0, 'Rock, The (1996)'), (5.0, 'Robin Hood: Prince of Thieves (1991)'), (5.0, 'Reservoir Dogs (1992)'), (5.0, 'Police Story 4: Project S (Chao ji ji hua) (1993)'), (5.0, 'House of the Spirits, The (1993)'), (5.0, 'Fresh (1994)'), (5.0, 'Denise Calls Up (1995)'), (5.0, 'Day the Sun Turned Cold, The (Tianguo niezi) (1994)'), (5.0, 'Before the Rain (Pred dozhdot) (1994)'), (5.0, 'Assignment, The (1997)'), (5.0, '1-900 (1994)'), (4.875, "Ed's Next Move (1996)"), (4.833333333333333, 'Anna (1996)'), (4.8, 'Dark City (1998)'), (4.75, 'Flower of My Secret, The (Flor de mi secreto, La) (1995)'), (4.75, 'Broken English (1996)')]

基于用戶進行過濾還是基于物品進行過濾

在針對大數據集成生成推薦列表時，基于物品進行過濾的方式明顯要比基于用戶的過濾更快，不過他的確有維護物品相似度表的額外開銷。

對于稀疏數據集（如大多數書簽都是為小眾所收藏），基于物品的過濾方法通常要優于基于用戶的過濾方法，而對于密集數據集（電影評價）而言，兩者效果幾乎一致。

基于用戶的過濾方法更加易于實現，而且無需額外步驟，因此它通常更適用于規模較小的變化非常頻繁的內存數據集。

在應用方面，告訴用戶還有哪些人與自己有著相近偏好是有一定價值的，如有交友業務相關的。但對于一個購物網站而言，并不想這么做。

小結

相似度評價值兩方法
歐幾里得距離公式
皮爾遜相關度公式
協作型過濾
基于用戶的協作型過濾user-based collaborative filtering
基于物品的協作型過濾item-based collaborative filtering

基本原數據結構：

人-物

評分={人a:{電影A:(1~5評分),電影B:(1~5評分),...},人b:...... }

物-人

評分={人a:{電影A:(1~5評分),電影B:(1~5評分),...},人b:...... }

本文用到的重要函數

函數或變量作用備注

critics	人-物數據集	prefs參數搜集偏好
sim_distance( prefs,person1,person2)	歐幾里得距離公式計算相似度	recommendations.sim_distance( recommendations.critics,‘Lisa Rose’, ‘Gene Seymour’) 歐幾里得距離評價
sim_pearson( prefs,p1,p2)	皮爾遜相關度公式計算相似度	recommendations.sim_pearson( recommendations.critics,‘Lisa Rose’, ‘Gene Seymour’) 皮爾遜相關度評價
topMatches(prefs,person, n=5,similarity=sim_pearson)	根據給出人-物 / 物-人數據集，相關度計算算法（歐氏，皮氏）得出前n最相似個人 / 物	recommendations.topMatches( recommendations.critics,‘Toby’,n=3) 為評分者打分，匹配相似商品
getRecommendations(prefs, person,similarity=sim_pearson)	根據人-物數據集，相似度計算后再加權平均，得出推薦物	recommendations.getRecommendations( recommendations.critics,‘Toby’) recommendations.getRecommendations( movies, ‘Just My Luck’) 推薦物品，匹配相似商品
transformPrefs(prefs)	人-物數據集轉換成物-人數據集	recommendations.transformPrefs( recommendations.critics) 匹配相似商品
calculateSimilarItems(prefs,n=10)	物-人數據集得出各物的相似度，記作物-物	recommendations.calculateSimilarItems( recommendations.critics) 構造物品比較數據集
getRecommendedItems(prefs, itemMatch,user)	（一個）人-物，物-物數據集加權平均得出推薦物	recommendations.getRecommendedItems( recommendations.critics,itemsim,‘Toby’) 獲得推薦

創作挑戰賽新人創作獎勵來咯，堅持創作打卡瓜分現金大獎

總結

以上是生活随笔為你收集整理的《集体智慧编程》笔记（2 / 12）：提供推荐的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：《Python Cookbook 3rd
下一篇：《剑指Offer》52：两个链表的第一个

编程问答

《集体智慧编程》笔记（2 / 12）：提供推荐

文章目錄

協作型過濾

搜集偏好

尋找相近的用戶

歐幾里得距離評價

皮爾遜相關度評價

應該選用哪一種相似性度量方法

為評分者打分

推薦物品

匹配相似商品

構建一個基于某數據平臺的鏈接推薦系統

數據平臺API

構造數據集

推薦近鄰與鏈接

基于物品的過濾

構造物品比較數據集

獲得推薦

使用MovieLens數據集

基于用戶進行過濾還是基于物品進行過濾

小結

總結