步进电机无细分和20细分_细分网站导航会话
步進電機無細分和20細分
目標與應用 (Goal and applications)
This article will show a method for segmenting website navigation sessions according to the pages visited, using a topic modelling approach.
本文將展示一種使用主題建模方法根據訪問的頁面細分網站導航會話的方法。
There are some possible applications for this: descriptive analysis, segment-oriented marketing, custom website navigation patterns and so on. It can also give you a sense of how people use your website, what types of content are often seen together, etc.
有一些可能的應用程序:描述性分析,面向細分的營銷,自定義網站導航模式等。 它還可以使您了解人們如何使用您的網站,經常一起看到哪些類型的內容,等等。
You could also try the same approach to segment users instead of sessions, and target those segments differently in marketing campaigns, or cross this data with your existing transactional segments, to understand how each of those uses your website.
您也可以嘗試使用相同的方法來細分用戶而不是會話,并在市場營銷活動中以不同的方式定位這些細分,或者將這些數據與現有的交易細分交叉,以了解每個人如何使用您的網站。
聚類 (Clustering)
Clustering means grouping objects by their similarities. There are many ways of doing it, also because there are many different definitions of what a cluster is. The common denominator is that a cluster is a group of data objects.
聚類意味著根據對象的相似性對其進行分組。 這樣做的方法有很多,也因為對集群的定義有很多不同。 共同點是集群是一組數據對象。
Hard clustering methods classify each object as belonging to only one cluster; whereas soft clustering gives these objects a degree to which each object belongs to a cluster (using, for instance, the likelihood).
硬聚類方法將每個對象分類為僅屬于一個聚類。 而軟聚類為這些對象提供了每個對象屬于一個聚類的程度(例如,使用似然性)。
主題建模 (Topic modelling)
Topic modelling, on the other hand, is used to infer topics from texts, based on words that appear together often. One algorithm that can be used for this is the Latent Dirichlet Allocation (LDA). In the LDA, we give it a set of documents, we set the number of topics we think there are, and it returns us a list of topics, each one described by a list of the words that best identify them.
另一方面,主題建模用于根據經常出現的單詞從文本推斷主題。 可以用于此目的的一種算法是潛在狄利克雷分配(LDA)。 在LDA中,我們給它提供了一組文檔,設置了我們認為存在的主題數,它返回了一個主題列表,每個主題都由最能識別它們的單詞列表來描述。
用于識別導航模式的LDA (LDA for identifying navigation patterns)
Ok, so how can we use LDA to segment website sessions? It’s actually quite straightforward: we treat each session as a document, where each page is a word (you can use URLs or page names, whatever helps you identify uniquely each page). If it helps, think of sessions as a text being written by a user, where he’s trying to tell you what he wants from your website, using pages instead of words.
好的,那么我們如何使用LDA細分網站會話? 它實際上非常簡單:我們將每個會話都視為一個文檔,其中每個頁面都是一個單詞(您可以使用URL或頁面名稱,只要能幫助您唯一地標識每個頁面即可)。 如果有幫助,可以將會話視為用戶編寫的文本,他試圖使用頁面而不是單詞來告訴您他想要從您的網站獲得什么。
It is not always evident what is the right number of topics, and just purely optimizing won’t necessarily yield the best results. Try a few times with different numbers, look at the results, and see if they make sense.
始終無法確定正確的主題數,僅進行優化不一定會產生最佳結果。 用不同的數字嘗試幾次,查看結果,看看它們是否有意義。
The LDA is a soft clustering algorithm, meaning that, in the end, it will not tell you which session belongs to which group. Instead, it will give you a vector with the probabilities of each session belonging to each of the groups. Then, you can attribute it to the group with the highest probability (if you want to have a hard clustering method in the end).
LDA是一種軟群集算法,這意味著最后,它不會告訴您哪個會話屬于哪個組。 相反,它將為您提供一個向量,其中包含每個會話屬于每個組的概率。 然后,您可以將其歸因于概率最高的組(如果最終要使用硬聚類方法)。
Python代碼教程 (Code tutorial with Python)
We import the necessary libraries:
我們導入必要的庫:
We then extract our sessions from our original DataFrame and fit a CountVectorizer to it. CountVectorizer will transform text into a dense vector.
然后,我們從原始DataFrame中提取會話,并為其添加CountVectorizer。 CountVectorizer會將文本轉換為密集向量。
Warning: pages must be separated by a space, and their names/URLs should not contain punctuation such as dots and commas. Replace all those by an underscore “_”.
警告:頁面之間必須用空格隔開,并且它們的名稱/ URL不應包含標點符號,例如點和逗號。 用下劃線“ _”替換所有那些。
In the next step, we perform a GridSearch: we run through all possible combinations of parameters for the LDA function, to find which one will yield the best groups, and then apply this combination to our data. Warning: this step might take a while…
在下一步中,我們執行GridSearch:我們遍歷LDA函數的所有可能參數組合,以找出哪一個將產生最佳組,然后將此組合應用于我們的數據。 警告:此步驟可能需要一段時間...
Finally, we print all topics, together with the list of words that best describe it and their importance.
最后,我們打印所有主題以及最能描述其主題的單詞列表及其重要性。
Once you are satisfied with your results, you can hard cluster your results in the end if needed:
對結果感到滿意后,可以根據需要將結果硬分組:
結論 (Conclusion)
Once this work is adapted to your existing data and infrastructure, you can use it as it is in a few different ways, or try to expand this work. One possible way of doing it, is using the probability vector, instead of hard-coding the clusters, to measure distances between sessions or users. This could open doors to content recommendation (ex: once you have calculated distances between users, by calculating distances between the probability vectors, you could recommend content to new users based on other users whose vectors are close to theirs). Another possible extension to the method presented here is to take into account the order in which pages are visited (it can be quite relevant, depending on the context).
一旦這項工作適應了您現有的數據和基礎架構,您就可以通過幾種不同的方式使用它,或者嘗試擴展這項工作。 一種可行的方法是使用概率向量,而不是對群集進行硬編碼,以測量會話或用戶之間的距離。 這可能會打開內容推薦的大門(例如:一旦您計算出用戶之間的距離,通過計算概率矢量之間的距離,您便可以根據矢量與他們的矢量接近的其他用戶向新用戶推薦內容)。 此處介紹的方法的另一種可能的擴展是考慮到了頁面的訪問順序(取決于上下文,它可能是非常相關的)。
This approach shows the importance of thinking outside the box sometimes, and applying algorithms to problems which are not necessarily their intended application, but that have an equivalent framework. In this specific case, it helped thinking of users as a authors, writing their thoughts using page names instead of words.
這種方法表明了有時需要跳出思路進行思考,并將算法應用于不一定是其預期應用程序但具有等效框架的問題的重要性。 在這種特定情況下,它有助于將用戶視為作者,使用頁面名稱而不是單詞來表達他們的想法。
I hope you have enjoyed this article, and please let me know if you have applied it or improved it somehow.
希望您喜歡這篇文章,如果您已應用或以某種方式對其進行了改進,請告訴我。
翻譯自: https://towardsdatascience.com/segmenting-website-navigation-sessions-f9258117737c
步進電機無細分和20細分
總結
以上是生活随笔為你收集整理的步进电机无细分和20细分_细分网站导航会话的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 全民pc是什么(全民K歌电脑版下载)
- 下一篇: mask rcnn实例分割_使用Mask