日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

三十、电子商务分析与服务推荐的分析方法与过程

發(fā)布時(shí)間:2024/9/16 编程问答 33 豆豆
生活随笔 收集整理的這篇文章主要介紹了 三十、电子商务分析与服务推荐的分析方法与过程 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

1. 分析方法與過(guò)程

1.1 目標(biāo)

本案例的目標(biāo)是對(duì)用戶進(jìn)行推薦,即以一定的方式將用戶與物品之間建立聯(lián)系。為了更好地幫助用戶從海量的數(shù)據(jù)中快速發(fā)現(xiàn)感興趣的網(wǎng)頁(yè),在目前相對(duì)單一的推薦系統(tǒng)上進(jìn)行補(bǔ)充。電子商務(wù)服務(wù)推薦的分析方法與過(guò)程的主要內(nèi)容包括:

  • 數(shù)據(jù)抽取
  • 數(shù)據(jù)探索性分析

2. 數(shù)據(jù)抽取

  • 推薦系統(tǒng)使用的推薦算法**
  • 本項(xiàng)目中使用的是協(xié)同過(guò)濾算法,其特點(diǎn)是通過(guò)歷史數(shù)據(jù)找出相似的用戶或者網(wǎng)頁(yè),在數(shù)據(jù)抽取的過(guò)程中,進(jìn)可能選擇大量的數(shù)據(jù),這樣就能降低推薦結(jié)果的隨機(jī)性,提高推薦結(jié)果的準(zhǔn)確性,能更好地發(fā)掘長(zhǎng)尾網(wǎng)頁(yè)中用戶感興趣的網(wǎng)頁(yè)。

用戶訪問(wèn)數(shù)據(jù)的特征

  • 用戶的訪問(wèn)時(shí)間為條件,選取3個(gè)月用戶的訪問(wèn)數(shù)據(jù)作為原始數(shù)據(jù)集。數(shù)據(jù)總量有837450條記錄,其中包括用戶號(hào)、訪問(wèn)時(shí)間、來(lái)源網(wǎng)站、訪問(wèn)頁(yè)面、頁(yè)面標(biāo)題、來(lái)源網(wǎng)頁(yè)、標(biāo)簽、網(wǎng)頁(yè)類別和關(guān)鍵詞詞等屬性。

智能推薦系統(tǒng)的流程圖

  • 建立數(shù)據(jù)庫(kù)
  • 導(dǎo)入數(shù)據(jù)
  • 搭建Python等數(shù)據(jù)庫(kù)環(huán)境
  • 數(shù)據(jù)分析
  • 建立模型

Python訪問(wèn)數(shù)據(jù)庫(kù)的代碼

import pandas as pd from sqlalchemy import create_engineengine = create_engine('mysql+pymysql://root:222850@127.0.0.1:3306/7law?charset=utf8') sql = pd.read_sql('all_gzdata', engine, chunksize=10000)

3 探索性數(shù)據(jù)分析

網(wǎng)頁(yè)類分析

  • 首先對(duì)原始數(shù)據(jù)中用戶點(diǎn)擊的網(wǎng)頁(yè)類型進(jìn)行統(tǒng)計(jì),網(wǎng)頁(yè)類型是指“網(wǎng)址類型”中的3位數(shù)字(本身有6/7位數(shù)字)。

網(wǎng)頁(yè)類統(tǒng)計(jì)結(jié)果

  • 點(diǎn)擊與咨詢相關(guān)(網(wǎng)頁(yè)類型為101)的記錄占了49.16%,其次是其他的類型(網(wǎng)頁(yè)類型為199)占比24%左右,然后是知識(shí)相關(guān)(網(wǎng)頁(yè)類型為107)占比22%左右。
  • 可以得到用戶的點(diǎn)擊頁(yè)面類型的排行榜為:咨詢相關(guān)、知識(shí)相關(guān)、其它方面的網(wǎng)頁(yè)、法規(guī)(類型為310)、律師相關(guān)(類型為102)。可以初步得出相對(duì)于長(zhǎng)篇的知識(shí),用戶更加偏向于查看咨詢或者進(jìn)行咨詢。
  • 知識(shí)類型內(nèi)部統(tǒng)計(jì)

網(wǎng)頁(yè)類分析實(shí)現(xiàn)的代碼

counts = [i['fullURLId'].value_counts() for i in sql] # 逐塊統(tǒng)計(jì) counts = pd.concat(counts).groupby(level=0).sum() # 合并統(tǒng)計(jì)結(jié)果,把相同的統(tǒng)計(jì)項(xiàng)合并(即按index分組并求和) counts = counts.reset_index() # 重新設(shè)置index,將原來(lái)的index作為counts的一列。 counts.columns = ['index', 'num'] # 重新設(shè)置列名,主要是第二列,默認(rèn)為0 counts['type'] = counts['index'].str.extract('(\d{3})') # 提取前三個(gè)數(shù)字作為類別id counts['percent'] = counts['num'] / counts['num'].sum() * 100 counts_ = counts[['type', 'num', 'percent']].groupby('type').sum() # 按類別合并 counts_.sort_values('num', ascending=False) # 降序排列 print(counts_)

點(diǎn)擊次數(shù)分析

  • 統(tǒng)計(jì)分析原始數(shù)據(jù)用戶瀏覽網(wǎng)頁(yè)次數(shù)(以“真實(shí)IP區(qū)分”)的情況,其結(jié)果如下表所示:可以從表中發(fā)現(xiàn)瀏覽一次的用戶占所有用戶總量的58%左右,大部分用戶瀏覽的次數(shù)在2~7次,用戶瀏覽的平均次數(shù)是3次


點(diǎn)擊次數(shù)分析代碼的實(shí)現(xiàn)過(guò)程

#統(tǒng)計(jì)點(diǎn)擊次數(shù) #value_count統(tǒng)計(jì)數(shù)據(jù)出現(xiàn)的頻率c = [i['realIP'].value_counts() for i in sql] count3 = pd.concat(c).groupby(level=0).sum() count3 = pd.DataFrame(count3) count3[1] = 1 count3 = count3.groupby('realIP').sum()count3_ =count3.iloc[:7,:].append(count3.iloc[7:,:].sum(),ignore_index=True) count3_.index = list(range(1,8))+['7次以上'] print(count3_)

網(wǎng)頁(yè)排名

  • 由分析目標(biāo)課程,個(gè)性化推薦主要針對(duì)以html為后綴的網(wǎng)頁(yè)。從原始數(shù)據(jù)中統(tǒng)計(jì)以html為后綴的網(wǎng)頁(yè)的點(diǎn)擊率。
  • 從表中可以看出,點(diǎn)擊次數(shù)前20名中,“法規(guī)專題”占了大部分,其次是“知識(shí)”,然后是“咨詢”。

類型點(diǎn)擊數(shù)統(tǒng)計(jì)

翻頁(yè)網(wǎng)頁(yè)統(tǒng)計(jì)

6 總結(jié)

分析方法與過(guò)程

  • 數(shù)據(jù)抽取
    1、建立數(shù)據(jù)庫(kù)—導(dǎo)入數(shù)據(jù)—搭建Python環(huán)境—數(shù)據(jù)分析—建立模型

  • 數(shù)據(jù)探索性分析
    2、網(wǎng)頁(yè)類型分析
    3、網(wǎng)頁(yè)點(diǎn)擊次數(shù)分析
    4、網(wǎng)頁(yè)排名分析

7.完整代碼

7.1 代碼目錄結(jié)果

7.2 完整代碼

1 sql_value_counts.py

import pandas as pd from sqlalchemy import create_engineengine = create_engine('mysql+pymysql://root:222850@127.0.0.1:3306/7law?charset=utf8') sql = pd.read_sql('all_gzdata', engine, chunksize=10000) ''' 用create_engine建立連接,連接地址的意思依次為“數(shù)據(jù)庫(kù)格式(mysql)+程序名(pymysql)+賬號(hào)密碼@地址端口/數(shù)據(jù)庫(kù)名(test)”,最后指定編碼為utf8; all_gzdata是表名,engine是連接數(shù)據(jù)的引擎,chunksize指定每次讀取1萬(wàn)條記錄。這時(shí)候sql是一個(gè)容器,未真正讀取數(shù)據(jù)。 '''counts = [i['fullURLId'].value_counts() for i in sql] # 逐塊統(tǒng)計(jì) counts = pd.concat(counts).groupby(level=0).sum() # 合并統(tǒng)計(jì)結(jié)果,把相同的統(tǒng)計(jì)項(xiàng)合并(即按index分組并求和) counts = counts.reset_index() # 重新設(shè)置index,將原來(lái)的index作為counts的一列。 counts.columns = ['index', 'num'] # 重新設(shè)置列名,主要是第二列,默認(rèn)為0 counts['type'] = counts['index'].str.extract('(\d{3})') # 提取前三個(gè)數(shù)字作為類別id counts['percent'] = counts['num'] / counts['num'].sum() * 100 counts_ = counts[['type', 'num', 'percent']].groupby('type').sum() # 按類別合并 counts_.sort_values('num', ascending=False) # 降序排列 print(counts_)

2 ask_value_counts.py

import pandas as pd from sqlalchemy import create_engineengine = create_engine('mysql+pymysql://root:222850@127.0.0.1:3306/7law?charset=utf8') sql = pd.read_sql('all_gzdata', engine, chunksize=10000)# 統(tǒng)計(jì)101類別的情況 def count101(i): # 自定義統(tǒng)計(jì)函數(shù)j = i[['fullURLId']][i['fullURLId'].str.contains('101')].copy() # 找出類別包含101的網(wǎng)址return j['fullURLId'].value_counts()counts2 = [count101(i) for i in sql] # 逐塊統(tǒng)計(jì) counts2 = pd.concat(counts2).groupby(level=0).sum() # 合并統(tǒng)計(jì)結(jié)果 counts2 = pd.DataFrame(counts2) counts2.columns = ['num'] counts2['percent'] = counts2['num'] / counts2['num'].sum() * 100 counts2.sort_values('num', ascending=False) # 降序排列print(counts2)

3 know_value_counts.py

import pandas as pd from sqlalchemy import create_engineengine = create_engine('mysql+pymysql://root:222850@127.0.0.1:3306/7law?charset=utf8') sql = pd.read_sql('all_gzdata', engine, chunksize = 10000)#統(tǒng)計(jì)107類別的情況 def count107(i): #自定義統(tǒng)計(jì)函數(shù)j = i[['fullURL']][i['fullURLId'].str.contains('107')].copy() #找出類別包含107的網(wǎng)址j['type'] = None #添加空列j['type'][j['fullURL'].str.contains('info/.+?/')] = u'知識(shí)首頁(yè)'j['type'][j['fullURL'].str.contains('info/.+?/.+?')] = u'知識(shí)列表頁(yè)'j['type'][j['fullURL'].str.contains('/\d+?_*\d+?\.html')] = u'知識(shí)內(nèi)容頁(yè)'return j['type'].value_counts()counts2 = [count107(i) for i in sql] #逐塊統(tǒng)計(jì) counts2 = pd.concat(counts2).groupby(level=0).sum() #合并統(tǒng)計(jì)結(jié)果 counts2 = pd.DataFrame(counts2) counts2.columns=['num'] counts2['percent'] = counts2['num']/counts2['num'].sum()*100 print(counts2)

4 other_value_counts.py

import pandas as pd from sqlalchemy import create_engineengine = create_engine('mysql+pymysql://root:222850@127.0.0.1:3306/7law?charset=utf8') sql = pd.read_sql('all_gzdata', engine, chunksize=10000)# 統(tǒng)計(jì)1999001類別的情況 def count101(i): # 自定義統(tǒng)計(jì)函數(shù)j = i[['pageTitle']][i['fullURLId'].str.contains('1999001')].copy() # 找出類別包含101的網(wǎng)址j['type'] = u'其他'j['type'][(j['pageTitle'] != '') & (j['pageTitle'].str.contains(u'快車-律師助手'))] = u'快車-律師助手'j['type'][(j['pageTitle'] != '') & (j['pageTitle'].str.contains(u'免費(fèi)發(fā)布法律咨詢'))] = u'免費(fèi)發(fā)布咨詢'j['type'][(j['pageTitle'] != '') & (j['pageTitle'].str.contains(u'咨詢發(fā)布成功'))] = u'咨詢發(fā)布成功'j['type'][(j['pageTitle'] != '') & (j['pageTitle'].str.contains(u'快搜'))] = u'快搜'return j['type'].value_counts()counts2 = [count101(i) for i in sql] # 逐塊統(tǒng)計(jì) counts2 = pd.concat(counts2).groupby(level=0).sum() # 合并統(tǒng)計(jì)結(jié)果 counts2 = pd.DataFrame(counts2) counts2.columns = ['num'] counts2['percent'] = counts2['num'] / counts2['num'].sum() * 100 counts2.sort_values('num', ascending=False) # 降序排列

web_click_counts.py

import pandas as pd from sqlalchemy import create_engineengine = create_engine('mysql+pymysql://root:222850@127.0.0.1:3306/7law?charset=utf8') sql = pd.read_sql('all_gzdata', engine, chunksize = 10000)#統(tǒng)計(jì)點(diǎn)擊次數(shù) #value_count統(tǒng)計(jì)數(shù)據(jù)出現(xiàn)的頻率c = [i['realIP'].value_counts() for i in sql] count3 = pd.concat(c).groupby(level=0).sum() count3 = pd.DataFrame(count3) count3[1] = 1 count3 = count3.groupby('realIP').sum()count3_ =count3.iloc[:7,:].append(count3.iloc[7:,:].sum(),ignore_index=True) count3_.index = list(range(1,8))+['7次以上'] print(count3_)# 對(duì)瀏覽次數(shù)達(dá)7次以上的情況進(jìn)行分析,發(fā)現(xiàn)大部分用戶瀏覽8~100次,代碼實(shí)現(xiàn):counts3_7 = pd.concat([count3.iloc[7:100,:].sum(),count3.iloc[100:300,:].sum(),count3.iloc[300:,:].sum()]) counts3_7.index = ['8-100','101-300','301以上'] counts3_7df = pd.DataFrame(counts3_7) counts3_7df.index.name = '點(diǎn)擊次數(shù)' counts3_7df.columns = ['用戶數(shù)'] print(counts3_7df)

web_sort

import pandas as pd from sqlalchemy import create_engine engine = create_engine('mysql+pymysql://root:222850@127.0.0.1:3306/7law?charset=utf8')sql = pd.read_sql('all_gzdata', engine, chunksize=10000)counts4 = [i[['realIP','fullURL','fullURLId']] for i in sql] counts4_ = pd.concat(counts4) a = counts4_[counts4_['fullURL'].str.contains('\.html')] print(a.head())

.0.0.1:3306/7law?charset=utf8’)

sql = pd.read_sql('all_gzdata', engine, chunksize=10000)counts4 = [i[['realIP','fullURL','fullURLId']] for i in sql] counts4_ = pd.concat(counts4) a = counts4_[counts4_['fullURL'].str.contains('\.html')] print(a.head())

總結(jié)

以上是生活随笔為你收集整理的三十、电子商务分析与服务推荐的分析方法与过程的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。