使用协同过滤推荐电影
ALSO, ARE RECOMMENDER SYSTEMS INFLUENCING OUR TASTE??
此外,推薦系統(tǒng)是否影響我們的口味?
An excerpt on creating a movie recommender system similar to the OTT platforms.
有關(guān)創(chuàng)建類似于OTT平臺(tái)的電影推薦系統(tǒng)的摘錄。
INTRODUCTION
介紹
Formally Defining,A Recommender System is a system that seeks to predict or filter preferences according to the user’s preferences. The demand for a good recommender system is soaring, especially with then onset of Covid-19 induced lock down,forcing everyone to stay home and watch movies of their favourite genre,actor,director….you get it right.This is where a recommender system plays an important role in providing the user, content he is more likely to watch, rather than the user searching for something that interests him,which would mess with the user experience.
正式定義,推薦系統(tǒng)是一種試圖根據(jù)用戶的偏好來(lái)預(yù)測(cè)或過(guò)濾偏好的系統(tǒng)。 對(duì)好的推薦器系統(tǒng)的需求猛增,尤其是在Covid-19引發(fā)鎖定之后,迫使每個(gè)人呆在家里觀看自己喜歡的類型,演員,導(dǎo)演的電影……您就對(duì)了。這就是推薦器的地方系統(tǒng)在提供用戶更可能觀看的內(nèi)容而不是用戶搜索他感興趣的內(nèi)容方面起著重要作用,而這會(huì)干擾用戶體驗(yàn)。
The essence of a recommender system lies in its recommendation engine.There are Two types of Recommendation engine:
推薦系統(tǒng)的本質(zhì)在于其推薦引擎。推薦引擎有兩種類型:
Content-based filtering engine: It provides recommendations by matching the description of the movie and a user profile, generated by the interests provided by the user.It has an explicit understanding of the recommendation.You might have observed it in some apps,where you are asked questions about your preferences as soon as you signup.This is what it’s for.
基于內(nèi)容的過(guò)濾引擎:它通過(guò)匹配電影的描述和由用戶提供的興趣產(chǎn)生的用戶個(gè)人資料來(lái)提供推薦。它對(duì)推薦具有清晰的了解。您可能已經(jīng)在某些應(yīng)用中觀察到了該推薦,在您注冊(cè)后被問(wèn)到有關(guān)您的偏好的問(wèn)題。這就是它的用途。
Collaborative filtering engine: It is a method of making automatic predictions about the interests of a user by collecting preferences or taste information based on the activity of current user along with many other users with similar activity(collaborating).The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B’s opinion on a different issue than that of a randomly chosen person.It need not have any explicit understanding of the recommendation.You might have observed in one of your OTT platforms when you open a particular movie, An array of movies under the heading “people who watched this movie also watched”.This is what it uses.
協(xié)作過(guò)濾引擎:這是一種通過(guò)根據(jù)當(dāng)前用戶以及許多其他具有類似活動(dòng)(協(xié)作)的用戶的活動(dòng)收集偏好或品味信息來(lái)自動(dòng)預(yù)測(cè)用戶興趣的方法。方法是,如果一個(gè)人A在某個(gè)問(wèn)題上與人B擁有相同的觀點(diǎn),那么與隨機(jī)選擇的人相比,A在一個(gè)不同的問(wèn)題上更有可能擁有B的觀點(diǎn),它不需要對(duì)該建議有任何明確的理解。當(dāng)您打開(kāi)特定電影時(shí),您可能已經(jīng)在一個(gè)OTT平臺(tái)中觀察到過(guò),標(biāo)題為“看過(guò)這部電影的人也看過(guò)”的一系列電影。這就是它的用途。
Equipped with this basics,Lets dive into creating a movie recommender system using collaborative filtering.
配備了這些基礎(chǔ)知識(shí)后,我們將深入研究使用協(xié)作過(guò)濾創(chuàng)建電影推薦系統(tǒng)。
We start by Importing required libraries. We will be using Scikit-surprise which contains the SVD(Singular Value Decomposition).SVD allows us to extract and untangle information,which is really helpful in creating a recommender system.
我們首先導(dǎo)入所需的庫(kù)。 我們將使用包含SVD(奇異值分解)的Scikit-surprise。SVD允許我們提取和解開(kāi)信息,這對(duì)于創(chuàng)建推薦系統(tǒng)非常有幫助。
This topic involves a lot of statistical data analysis.resources to know more about scikit surprise,SVD:
本主題涉及大量統(tǒng)計(jì)數(shù)據(jù)分析。了解更多關(guān)于scikit Surprise,SVD的資源:
First thing one must do before creating a model is observe the data. This gives us a lot of insight on the type of data it is, and what we could use to gain the maximum from it.
創(chuàng)建模型之前,必須做的第一件事就是觀察數(shù)據(jù)。 這使我們對(duì)數(shù)據(jù)的類型以及可以用來(lái)從中獲得最大收益的數(shù)據(jù)有很多了解。
As we observe the data, we see that timestamp is a redundant column and it is best to remove it.
當(dāng)我們觀察數(shù)據(jù)時(shí),我們看到時(shí)間戳是多余的列,最好將其刪除。
It is always a good practice to check for NaNs in your dataset,luckily we don’t have any.
最好在您的數(shù)據(jù)集中檢查NaN,幸運(yùn)的是我們沒(méi)有。
現(xiàn)在是該模型的主要部分, 探索性數(shù)據(jù)分析 (Now comes the Main Part of this model, Exploratory Data Analysis)
To start,We look for the Number of movies and users in the dataset.
首先,我們?cè)跀?shù)據(jù)集中尋找電影和用戶數(shù)。
Now we find Sparsity of the data. Sparsity tells us the percentage of movies missing rating by the users. i.e Not all users rate a movie, It tells us the percentage of missing values by the total values.Sparsity for this data is 98%. Usually the lower the sparsity,the better.But in the case of Collaborative Filtering, below 99% is manageable.
現(xiàn)在我們發(fā)現(xiàn)數(shù)據(jù)的稀疏性。 稀疏度告訴我們用戶缺少電影評(píng)分的百分比。 即,并非所有用戶都對(duì)電影進(jìn)行評(píng)分,它告訴我們?nèi)笔е嫡伎傊档陌俜直取4藬?shù)據(jù)的稀疏度為98%。 通常,稀疏度越低越好。但是在協(xié)作過(guò)濾的情況下,低于99%是可以控制的。
Sparsity(%) = (No of Missing Values/(Total Values))*100
稀疏度(%)=(遺漏值/(總值))* 100
Now we try to visualize ratings distribution.
現(xiàn)在,我們嘗試可視化收視率分布。
Most of the ratings are between 3–5 and the range of the ratings are from 0.5 to 5.
大多數(shù)評(píng)級(jí)介于3-5之間,評(píng)級(jí)范圍介于0.5到5之間。
FEATURE ENGINEERING
特征工程
Now comes The next essential part of the system, Feature Engineering.I always believe that Feature Engineering as Important as building a model, as It allows the model to better understand and converge better.
現(xiàn)在是系統(tǒng)的下一個(gè)基本部分,即要素工程。我一直認(rèn)為要素工程對(duì)于構(gòu)建模型同樣重要,因?yàn)樗梢允鼓P透玫乩斫夂腿诤稀?
Here We are Reducing the Dimensions by removing the redundant data like Movies with less than 3 ratings or user who rated less than 3 movies, as it is difficult to recommend something with such less data to analyse.
在這里,我們正在通過(guò)刪除冗余數(shù)據(jù)(例如評(píng)級(jí)低于3的電影或評(píng)級(jí)低于3的用戶的電影)來(lái)減少尺寸,因?yàn)楹茈y推薦具有此類數(shù)據(jù)的數(shù)據(jù)來(lái)進(jìn)行分析。
Now lets start creating the Model,
現(xiàn)在開(kāi)始創(chuàng)建模型,
Creating a Surprise Dataset for training using the Reader class that we imported and provide the expected scale of rating,which we found out during our exploratory data analysis.You can add that to your data using the dataset import.
使用我們導(dǎo)入的Reader類創(chuàng)建一個(gè)用于訓(xùn)練的Surprise Dataset,并提供我們?cè)谔剿餍詳?shù)據(jù)分析中發(fā)現(xiàn)的預(yù)期的評(píng)分等級(jí)。您可以使用數(shù)據(jù)集導(dǎo)入將其添加到數(shù)據(jù)中。
Now as we are using our whole train set for training,we create an antiset which consists of all the data without the reviews on which we can test.
現(xiàn)在,當(dāng)我們使用整個(gè)訓(xùn)練集進(jìn)行訓(xùn)練時(shí),我們將創(chuàng)建一個(gè)包含所有數(shù)據(jù)的antiset,而沒(méi)有可以測(cè)試的評(píng)論。
We create our SVD, which untangles the information for us to complete the recommender model.
我們創(chuàng)建了SVD,它為我們整理了信息,以完成推薦模型。
We then evaluate our model with the metrics Root Mean Square Error and Mean Absolute Error as they provide the average over the epoch of the absolute values of difference between the recommendation and the actual observation.
然后,我們使用度量均方根誤差和均值絕對(duì)誤差來(lái)評(píng)估我們的模型,因?yàn)樗鼈兲峁┝私ㄗh與實(shí)際觀察值之間的絕對(duì)差值的平均值。
Predicting
預(yù)測(cè)
預(yù)測(cè)為我們提供了用戶ID為1的電影ID。 (The prediction gives us a movie id for user id 1.)
This finishes our recommender system’s job.
這樣就完成了推薦系統(tǒng)的工作。
Now… lets discuss about something debatable.
現(xiàn)在...讓我們討論一些值得商bat的問(wèn)題。
推薦系統(tǒng)是否正在影響我們?cè)陔娪爸械钠肺恫⒖刂莆覀?#xff1f; (Are Recommender Systems influencing our taste in movies and taking the control from us??)
Photo by Juan Rumimpunu on Unsplash Juan Rumimpunu在Unsplash上的照片My Father who is no way related to computer Science asked me this one fine morning.He was going through his favourite video streaming service and made an observation that, He was seeing videos that are related to a few areas only. It made him feel that his choice is getting Influenced by it and was unable to come across something new.
我父親與計(jì)算機(jī)科學(xué)毫無(wú)關(guān)系,今天上午好。我正在經(jīng)歷他最喜歡的視頻流媒體服務(wù),并觀察到,他正在觀看的視頻僅涉及幾個(gè)領(lǐng)域。 這讓他感到自己的選擇正在受到影響,無(wú)法遇到新的事物。
I explained this to him using my own words and understanding:
我用自己的語(yǔ)言和理解向他解釋了這一點(diǎn):
He has been watching the same videos over and over daily,Thus creating a profile that, he is interested in only in this particular topic of videos.That was the reason he was shown videos from that particular topic only.
他每天都在看相同的視頻,因此創(chuàng)建了一個(gè)個(gè)人檔案,他只對(duì)特定的視頻主題感興趣。這就是為什么他只看到該特定主題的視頻。
But does it mean you have no control over it,
但這是否意味著您無(wú)法控制它,
The Answer is NO.
答案是否定的。
You still have your control, If you are not interested in a topic, but you were recommended by the engine, Just let the engine know that you are not interested. Yes, you have that option. Expand your viewing horizons for diverse content. A recommender system is there just to help you, not control you.It all finally depends on the viewer to watch or not.
您仍然可以控制自己,如果您對(duì)某個(gè)主題不感興趣,但是引擎推薦您,只需讓引擎知道您不感興趣即可。 是的,您可以選擇。 擴(kuò)大您的觀看范圍,以獲取各種內(nèi)容。 推薦系統(tǒng)只是在幫助您而不是控制您,最終取決于觀看者是否觀看。
Lets share our views on this and spread some knowledge.Lets learn and grow as a community.. Because all we are left with is people,memories and knowledge.
讓我們就此發(fā)表看法并傳播一些知識(shí)。讓我們作為一個(gè)社區(qū)學(xué)習(xí)和成長(zhǎng)。因?yàn)槲覀兯5木褪侨?#xff0c;記憶和知識(shí)。
Thank you.
謝謝。
翻譯自: https://medium.com/swlh/recommending-a-movie-using-collaborative-filtering-6dab1b8f4472
總結(jié)
以上是生活随笔為你收集整理的使用协同过滤推荐电影的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 梦到不认识的人出车祸是什么意思
- 下一篇: 数据暑假实习面试_面试数据科学实习如何准