日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程语言 > python >内容正文

python

itemcf的hadoop实现优化(Python)

發(fā)布時(shí)間:2025/4/5 python 22 豆豆
生活随笔 收集整理的這篇文章主要介紹了 itemcf的hadoop实现优化(Python) 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

原始數(shù)據(jù)如下:

u1 a,d,b,c u2 a,a,c u3 b,d u4 a,d,c u5 a,b,c
計(jì)算公式使用:sim = U(i)∩U(j) / (U(i)∪U(j))

其中:?(U(i)∪U(j)) =?U(i) + U(j) - ?U(i)∩U(j)?

原始的Hadoop實(shí)現(xiàn)需要5輪MR,優(yōu)化后只需要兩輪就可以完成。

之前的輪數(shù)過(guò)多,主要在于計(jì)算(U(i)∪U(j)) 的時(shí)候,需要多次更改key,并非計(jì)算量大。只需要修改一下傳遞的key,就可以兩輪實(shí)現(xiàn)。

mapper_1.py

#!/usr/bin/python #-*-coding:utf-8-*- import sysfor line in sys.stdin:user,item_str = line.strip().split()item_list = sorted(list(set(item_str.split(','))))print "item_str:",item_str,"item_list:",item_listfor i in range(len(item_list)):i1 = item_list[i]print i1,1,'norm'for i2 in item_list[i+1:]:print i1,i2,1,'dot'
reducer_1.py

#!/usr/bin/python #-*-coding:utf-8-*- import sysdef PrintOut():i1 = old_keyprint i1,old_dict['norm'],'norm'for i2 in old_dict['dot']:print i1 + "-" + i2,old_dict['dot'][i2],old_dict['norm'],'dot-norm_i1'old_key = "" old_dict = {'norm':0,'dot':{}} for line in sys.stdin:sp = line.strip().split()if sp[-1] == 'norm':key,value = sp[:2]if key == old_key:old_dict['norm'] += int(value) else:if old_key != "":PrintOut()old_key = key# Notice: norm part should be int(value)old_dict = {'norm':int(value),'dot':{}}elif sp[-1] == 'dot':key,i2,value = sp[:3]if key == old_key:if i2 not in old_dict['dot']:old_dict['dot'][i2] = 0old_dict['dot'][i2] += int(value)else:if old_dot_key != "":PrintOut()old_key = keyold_dict = {'norm':int(value),'dot':{}}if old_key != "":PrintOut()
mapper_2.py

#!/usr/bin/python #-*-coding:utf-8-*- import sysfor line in sys.stdin:sp = line.strip().split()if sp[-1] == 'norm':print line.strip()elif sp[-1] == "dot-norm_i1":key,dot,norm_i1 = sp[:3]i1,i2 = key.split('-')print i2,i1,dot,norm_i1,'dot-norm_i1'
reducer_2.py

#!/usr/bin/python #-*-coding:utf-8-*- import sysdef GenSim(norm_i1,norm_i2,dot):return float(dot) / (int(norm_i1) + int(norm_i2) - int(dot))def PrintOut():i2 = old_keynorm_i2 = old_dict['norm']for i1 in old_dict['dot']:dot,norm_i1 = old_dict['dot'][i1]sim = GenSim(norm_i1,norm_i2,dot)print i1+"-"+i2,dot,norm_i1,norm_i2,sim,'dot,norm_i1,norm_i2,sim'old_key = "" old_dict = {'norm':"",'dot':{}} for line in sys.stdin:sp = line.strip().split()if sp[-1] == 'norm':key,value = sp[:2]if key == old_key:old_dict['norm'] = valueelse:if old_key != "":PrintOut()old_key = keyold_dict = {'norm':value,"dot":{}}elif sp[-1] == 'dot-norm_i1':key,i1,dot,norm_i1 = sp[:4] #key is i2.if key == old_key:if i1 not in old_dict['dot']:old_dict['dot'][i1] = (dot,norm_i1)else:if old_key != "":PrintOut()old_key = keyold_dict = {'norm':value,'dot':{i1:(dot,norm_i1)}}if old_key != "":PrintOut()
執(zhí)行腳本 t.sh:

#!/bin/bashcat user_log.txt |./mapper_1.py |sort -k1 > d.m.1 cat d.m.1 |./reducer_1.py > d.r.1cat d.r.1 |./mapper_2.py |sort -k1 > d.m.2 cat d.m.2 |./reducer_2.py > d.r.2





總結(jié)

以上是生活随笔為你收集整理的itemcf的hadoop实现优化(Python)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。