當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

混合模型的推荐算法（ACM暑校-案例学习）

發(fā)布時間：2025/3/15 编程问答 24 豆豆

生活随笔收集整理的這篇文章主要介紹了混合模型的推荐算法（ACM暑校-案例学习）小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

單純使用基于內容、基于知識或者協(xié)同濾波的推薦引擎已經越來越少了。因為，基于內容的推薦技術面臨“過度個性化缺少驚喜度”的缺點、基于協(xié)同過濾的推薦技術面臨“冷啟動”難題。因此，一種比較好的解決方案是融合多種推薦技術的優(yōu)點，用基于內容的策略解決冷啟動問題，用協(xié)同過濾策略解決用戶驚喜度問題。博客旨在實踐融合內容+協(xié)同過濾的混合推薦算法。

1. 簡介：Netflix的成功

正如開篇提到的，混合推薦通過結合各種簡單的模型組成更加強大的、魯棒的系統(tǒng)為用戶提供更精準的產品建議。目前來看，有幾種構建混合推薦系統(tǒng)的范式：① 分別利用基于內容、基于協(xié)同濾波進行產品推薦，再在利用自適應權重對他們的預測結果進行加權；②將基于內容的推薦技術嵌入到協(xié)同過濾框架中，構成端到端的推薦引擎；③將協(xié)同過濾技術嵌入到基于內容的推薦框架中，構成端到端的推薦引擎。

Netflix就是憑借強大的、高度精準的混合式的推薦技術取得了市場的份額，并在網絡電視/電影領域取得了空前的成功。當我們正在看一個電影時，Netflix的推薦系統(tǒng)就會利用基于內容的技術為我推薦相同的影片，一個例子如下所示：

選中Ratatouille時，Netflix就會推薦Top-5最相似的影片，從推薦結果可以看出，他們都是Disney Pixar出品的動畫片

然而，用戶之所以選擇Netflix進行電影觀看，不僅僅只是為了看動畫片，也可能喜歡看話劇、動作、喜劇等等。此時，Netflix使用協(xié)同過濾技術判別相似的人群，進行推薦具有驚喜度的電影，其推薦頁面如下：

對于不同題材的電影，Netflix利用協(xié)同過濾推薦技術進行推薦

綜上，Netflix同時雇傭了基于內容content-based和基于協(xié)同過濾collaborative-based的推薦技術，這樣的推薦引擎已經證明確實有效。

2. 案例研究：構建混合推薦模型

這里的混合模型是指，充分融合content-based和collaborative-based的優(yōu)點。

基于內容推薦技術的縱向場景應用

以Youtube為例，每當我們看一個電影/視頻時，面板的右側都會出現(xiàn)推薦列表，其實這些推薦都是通過content-based方法產生的。這是充分利用到了conten-based精細化描述的優(yōu)勢：當用戶正在觀看感興趣的視頻時，他們往往更傾向于繼續(xù)觀看類似的內容。

基于協(xié)同過濾技術的橫向場景應用

假設用戶正在觀看The Dark Knight，它屬于蝙蝠俠題材的電影。如果我們基于內容設計推薦系統(tǒng)，就很可能會推薦其他的蝙蝠俠題材（或超級英雄題材）電影，而忽略了推薦影片本身的質量控制。例如，大多數喜歡The Dark Knight的人對蝙蝠俠題材和超級英雄題材的電影評價并不高，盡管他們的主角相同，題材相近。因此，這個時候有必要引入協(xié)同過濾推薦技術，以提高用戶對推薦內容的驚喜度。

因此，混合推薦系統(tǒng)的流程可以設計如下：

輸入電影的標題和用戶圖譜

采用content-based模型計算25個最相似的電影

使用協(xié)同濾波模型對該用戶的25個電影計算評分

參考最高的預測分數返回最高的前10個電影

數據集準備：

ratings_small.csv?https://www.kaggle.com/rounakbanik/the-movies-dataset/downloads/ra tings_small.csv/7?（700個用戶對9000個電影的100000個評分，高度稀疏）
movie_ids.csv?https://drive.google.com/drive/folders/1H9pnfVTzP46s7VwOTcC5ZY_VahRTr5Zv?usp=sharing?（其中的links_small.csv文件包含了ratings_small.csv中評分所有電影的movie IDs）

import numpy as np import pandas as pd# Import or compute the cosine_sim matrix cosine_sim = pd.read_csv('../data/cosine_sim.csv') # Import or compute the cosine sim mapping matrix cosine_sim_map = pd.read_csv('../data/cosine_sim_map.csv', header=None)# Convert cosine_sim_map into a Pandas Series cosine_sim_map = cosine_sim_map.set_index(0) cosine_sim_map = cosine_sim_map[1]# Build the SVD based Collaborative filter from surprise import SVD, Reader, Datasetreader = Reader() ratings = pd.read_csv('../data/ratings_small.csv') data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader) data.split(n_folds=5) svd = SVD() trainset = data.build_full_trainset() svd.train(trainset)# Build title to ID and ID to title mappings id_map = pd.read_csv('../data/movie_ids.csv') id_to_title = id_map.set_index('id') title_to_id = id_map.set_index('title')# Import or compute relevant metadata of the movies smd = pd.read_csv('../data/metadata_small.csv')def hybrid(userId, title):# Extract the cosine_sim index of the movieidx = cosine_sim_map[title]# Extract the TMDB ID of the movietmdbId = title_to_id.loc[title]['id']# Extract the movie ID internally assigned by the datasetmovie_id = title_to_id.loc[title]['movieId']# Extract the similarity scores and their corresponding index for every movie from the cosine_sim matrixsim_scores = list(enumerate(cosine_sim[str(int(idx))]))# Sort the (index, score) tuples in decreasing order of similarity scoressim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)# Select the top 25 tuples, excluding the first # (as it is the similarity score of the movie with itself)sim_scores = sim_scores[1:26]# Store the cosine_sim indices of the top 25 movies in a listmovie_indices = [i[0] for i in sim_scores]# Extract the metadata of the aforementioned moviesmovies = smd.iloc[movie_indices][['title', 'vote_count', 'vote_average', 'year', 'id']]# Compute the predicted ratings using the SVD filtermovies['est'] = movies['id'].apply(lambda x: svd.predict(userId, id_to_title.loc[x]['movieId']).est)# Sort the movies in decreasing order of predicted ratingmovies = movies.sort_values('est', ascending=False)# Return the top 10 movies as recommendationsreturn movies.head(10)

系統(tǒng)測試：

hybrid(1, 'Avatar')

與50位技術專家面對面20年技術見證，附贈技術全景圖

總結

以上是生活随笔為你收集整理的混合模型的推荐算法（ACM暑校-案例学习）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： VC Studio 使用技巧大全
下一篇：对IRP的理解