當(dāng)前位置：首頁(yè) >

用数据方法进行简单商品推荐

發(fā)布時(shí)間：2025/4/16 38 豆豆

生活随笔收集整理的這篇文章主要介紹了用数据方法进行简单商品推荐小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

背景介紹

當(dāng)顧客在購(gòu)買一件商品時(shí)，商家可以趁機(jī)了解他們還想買什么，以便把多數(shù)顧客愿意同時(shí)購(gòu)買的商品放到一起銷售以提升銷售額。當(dāng)商家收集到足夠多的數(shù)據(jù)時(shí)，就可以對(duì)其進(jìn)行親和性分析，以確定哪些商品適合放在一起出售。
什么是親和性呢，簡(jiǎn)單的說(shuō)就是物品之間的相似性或者說(shuō)是相關(guān)性。比如說(shuō)，一個(gè)去商場(chǎng)購(gòu)物，買了蘋果的同時(shí)也買了香蕉，如果又買蘋果又買香蕉的人比較多，那么我們把蘋果和香蕉擺放在一起來(lái)銷售，往往可以提高銷量。這背后的思想就是人們經(jīng)常購(gòu)買同一件商品，下次大概率還是會(huì)繼續(xù)購(gòu)買。看似簡(jiǎn)單的思想，的確是很多線上和線下商品推薦服務(wù)的基礎(chǔ)。
之前的商品推薦工作，常常是人工在線下來(lái)完成的，費(fèi)時(shí)費(fèi)力，也沒有很好地精準(zhǔn)度。現(xiàn)在我們可以用數(shù)據(jù)驅(qū)動(dòng)的方式來(lái)自動(dòng)完成。節(jié)約成本，也提高了效率，下面我們來(lái)看看如何來(lái)做。

數(shù)據(jù)準(zhǔn)備和介紹

結(jié)果是：This dataset has 100 samples and 5 features

我們來(lái)解釋下這個(gè)數(shù)據(jù)，看看顧客在前五次交易中都買了什么

print(X[:5]) [[ 0. 0. 1. 1. 1.][ 1. 1. 0. 1. 0.][ 1. 0. 1. 1. 0.][ 0. 0. 1. 1. 1.][ 0. 1. 0. 0. 1.]]

豎著看，每一列分別表示一種商品的購(gòu)買情況。分別是面包、牛奶、奶酪、蘋果和香蕉。舉個(gè)例子，第一行表示一個(gè)顧客，買了奶酪、蘋果和香蕉。而沒有買別的商品。每一行表示的是一次顧客購(gòu)買行為。

數(shù)據(jù)處理

我們把數(shù)據(jù)特征打上標(biāo)簽，方便后面做處理：

# The names of the features, for your reference. features = ["bread", "milk", "cheese", "apples", "bananas"]

我們下面來(lái)做一個(gè)顧客既買蘋果又買香蕉的支持度和置信度，這里支持度指的是，對(duì)于總體而言，有多少樣本符合這個(gè)規(guī)則。置信度是：支持度/總體，比如說(shuō)對(duì)于這個(gè)規(guī)則而言總是是買蘋果也買香蕉+買蘋果不買香蕉的總?cè)藬?shù)的和。即，只要他買蘋果，就算做是總體中的一員。

# How many of the cases that a person bought Apples involved the people purchasing Bananas too? # Record both cases where the rule is valid and is invalid. rule_valid = 0 rule_invalid = 0 for sample in X:if sample[3] == 1: # This person bought Applesif sample[4] == 1:# This person bought both Apples and Bananasrule_valid += 1else:# This person bought Apples, but not Bananasrule_invalid += 1 print("{0} cases of the rule being valid were discovered".format(rule_valid)) print("{0} cases of the rule being invalid were discovered".format(rule_invalid))

輸出結(jié)果是

21 cases of the rule being valid were discovered 15 cases of the rule being invalid were discovered

根據(jù)排列組合的知識(shí)，我們知道如果5種商品兩兩隨機(jī)組合的話，一共有10種組合方式（C25），我們計(jì)算所有組合的置信度，并把排名前三的打印出來(lái)：

import numpy as np dataset_filename = "affinity_dataset.txt" X = np.loadtxt(dataset_filename) n_samples, n_features = X.shape print("This dataset has {0} samples and {1} features".format(n_samples, n_features))# The names of the features, for your reference. features = ["bread", "milk", "cheese", "apples", "bananas"]from collections import defaultdict # Now compute for all possible rules valid_rules = defaultdict(int) invalid_rules = defaultdict(int) num_occurences = defaultdict(int) #num_occurances represents the same number of rulesfor sample in X: # (sample means record of buying fruit)for premise in range(n_features):if sample[premise] == 0: continue# Record that the premise was bought in another transactionnum_occurences[premise] += 1for conclusion in range(n_features):'''根據(jù)排列組合的規(guī)則，我這里希望按照1,2,3,4； 2，3,4； 3,4；4這樣的順序進(jìn)行比較。這樣的話，比較10次，就遍歷完所有的情況。基于此，有了最外層的if...else語(yǔ)句第一句話是為了讓他按照我前面說(shuō)的那個(gè)順序走，后面的判斷語(yǔ)句，保證不遍歷超出范圍'''conclusion = conclusion + premise if conclusion < n_features:if premise == conclusion: # It makes little sense to measure if X -> X.continueif sample[conclusion] == 1:# This person also bought the conclusion itemvalid_rules[(premise, conclusion)] += 1else:# This person bought the premise, but not the conclusioninvalid_rules[(premise, conclusion)] += 1else:continuesupport = valid_rules confidence = defaultdict(float) for premise, conclusion in valid_rules.keys():confidence[(premise, conclusion)] = valid_rules[(premise, conclusion)] / num_occurences[premise]

最后我們來(lái)進(jìn)行排序操作，打印前三個(gè)結(jié)果。先來(lái)看一下我們處理之后的結(jié)果都是什么樣子的

# 用于打印 Python 數(shù)據(jù)結(jié)構(gòu). 當(dāng)你在命令行下打印特定數(shù)據(jù)結(jié)構(gòu)時(shí)你會(huì)發(fā)現(xiàn)它很有用(輸出格式比較整齊, 便于閱讀). from pprint import pprint pprint(list(support.items())) [((0, 1), 14),((1, 2), 7),((3, 2), 25),((1, 3), 9),((0, 2), 4),((3, 0), 5),((4, 1), 19),((3, 1), 9),((1, 4), 19),((2, 4), 27),((2, 0), 4),((2, 3), 25),((2, 1), 7),((4, 3), 21),((0, 4), 17),((4, 2), 27),((1, 0), 14),((3, 4), 21),((0, 3), 5),((4, 0), 17)]

我們給輸出定義一個(gè)函數(shù)形式，方面后面進(jìn)行輸出：
因?yàn)槲覀冎皩懥艘粋€(gè)feature列表，這樣的話就很容易鎖定到具體產(chǎn)品信息，只用一個(gè)列表就可以搞定，不用定義字典（這是一個(gè)不錯(cuò)的思路）

def print_rule(premise, conclusion, support, confidence, features):premise_name = features[premise]conclusion_name = features[conclusion]print("Rule: If a person buys {0} they will also buy {1}".format(premise_name, conclusion_name))print(" - Confidence: {0:.3f}".format(confidence[(premise, conclusion)]))print(" - Support: {0}".format(support[(premise, conclusion)]))print("")

示例輸出：

premise = 1 conclusion = 3 print_rule(premise, conclusion, support, confidence, features)

Rule: If a person buys milk they will also buy apples
- Confidence: 0.196
- Support: 9

然后進(jìn)行排序操作，我們按照置信度大小進(jìn)行排序，降序：

# sort and print the first three resultfrom operator import itemgetter sorted_confidence = sorted(confidence.items(), key=itemgetter(1), reverse=True) for index in range(3):print("Rule #{0}".format(index + 1))(premise, conclusion) = sorted_confidence[index][0]print_rule(premise, conclusion, support, confidence, features)

結(jié)果如下：
Rule #1
Rule: If a person buys cheese they will also buy bananas
- Confidence: 0.659
- Support: 27

Rule #2
Rule: If a person buys bread they will also buy bananas
- Confidence: 0.630
- Support: 17

Rule #3
Rule: If a person buys cheese they will also buy apples
- Confidence: 0.610
- Support: 25

從排序結(jié)果來(lái)看，“顧客買蘋果，也會(huì)買奶酪”和“顧客買奶酪，也會(huì)買香蕉”，這兩條規(guī) 則的支持度和置信度都很高。超市經(jīng)理可以根據(jù)這些規(guī)則來(lái)調(diào)整商品擺放位置。例如，如果本周蘋果促銷，就在旁邊擺上奶酪。或許可以提高超市銷量哦。

參考資料：
《python數(shù)據(jù)挖掘入門與實(shí)踐》
數(shù)據(jù)集

總結(jié)

以上是生活随笔為你收集整理的用数据方法进行简单商品推荐的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：谷歌云盘Colaboratory如何载入
下一篇：用OneR算法对Iris植物数据进行分类