K近邻算法应用——价格预测
生活随笔
收集整理的這篇文章主要介紹了
K近邻算法应用——价格预测
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
一、構造數據
#根據rating和age評估價格 def wineprice(rating,age):peak_age=rating-50price=rating/2if age>peak_age:price=price*(5-(age-peak_age))#過了峰值年,價值降低else:price=price*(5*(age+1)/peak_age)#臨近峰值年,價值增高if price<0:price=0return price#構造數據集合:輸入——rating,age;輸出——price def wineset1():rows=[]for i in range(300):rating=random()*50+50age=random()*50price=wineprice(rating,age)price*=(random()*0.4+0.8)rows.append({'input':(rating,age),'result':price})return rows#構造數據集合:輸入——rating,age,aisle,bottlesize;輸出——price def wineset2():rows=[]for i in range(300):rating=random()*50+50age=random()*50aisle=float(randint(1,20))bottlesize=[375.0,750.0,1500.0,3000.0][randint(0,3)]price=wineprice(rating,age)price*=(bottlesize/750)price*=(random()*0.9+0.2)rows.append({'input':(rating,age,aisle,bottlesize),'result':price})return rows二、距離評估
#為了使距離度量更有說服力,將數據根據不同維度的影響力進行收縮或者擴張處理 def rescale(data,scale):scaleddata=[]for row in data:scaled=[scale[i]*row['input'][i] for i in range(len(scale))]scaleddata.append({'input':scaled,'result':row['result']})return scaleddata#向量的歐氏距離 def enclidean(v1,v2):d=0.0for i in range(len(v1)):d+=(v1[i]-v2[i])**2return sqrt(d)#數據集data中每條記錄按照和vec1的歐氏距離從小到大排序 def getdistances(data,vec1):distancelist=[]for i in range(len(data)):vec2=data[i]['input']distancelist.append((enclidean(vec1,vec2),i))distancelist.sort()return distancelist三、K近鄰預測
#k近鄰預測價格 def knnestimate(data,vec1,k=5):dlist=getdistances(data,vec1)avg=0.0for i in range(k):idx=dlist[i][1]avg+=data[idx]['result']avg=avg/kreturn avg#三種方法為k個近鄰項的影響設置權值,保證距離被預測的對象越遠,影響越小#反函數,對距離很近和很遠比較敏感,尤其是對很近的"噪音數據" def inverseweight(dist,num=1.0,const=0.1):return num/(dist+const)#遞減的一次函數,容易造成影響全為0 def subtractweight(dist,const=1.0):if dist>const:return 0else:return const-dist#膜拜高斯大神!!!!!! def gaussian(dist,sigma=10.0):return math.e**(-dist**2/(2*sigma**2))#加入距離影響權值的k近鄰預測算法 def weightedknn(data,vec1,k=5,weightf=gaussian):dlist=getdistances(data,vec1)avg=0.0totalweight=0.0for i in range(k):dist=dlist[i][0]idx=dlist[i][1]weight=weightf(dist)avg+=weight*data[idx]['result']totalweight+=weightavg=avg/totalweightreturn avg四、交叉驗證
#劃分數據集為測試數據集合和訓練數據集合 def dividedata(data,test=0.05):trainset=[]testset=[]for row in data:if random()<test:testset.append(row)else:trainset.append(row)return trainset,testset#用訓練集合訓練預測,用測試集合計算偏差 def testalgorithm(algf,trainset,testset):error=0.0for row in testset:guess=algf(trainset,row['input'])error+=(row['result']-guess)**2return error/len(testset)#交叉驗證,隨機驗證N次求取平均值 def crossvalidate(algf,data,trials=100,test=0.05):error=0.0for i in range(trials):trainset,testset=dividedata(data,test)error+=testalgorithm(algf,trainset,testset)return error/trials#k=3或者1的k近鄰預測算法 def knn3(d,v):return knnestimate(d,v,k=3) def knn1(d,v):return knnestimate(d,v,k=1) def wknn3(d,v):return weightedknn(d,v,k=3) def wknn1(d,v):return weightedknn(d,v,k=1)五、測試
data=wineset2() data=rescale(data,[10,10,0,0.5]) print(crossvalidate(weightedknn,data)) print(crossvalidate(wknn3,data)) print(crossvalidate(wknn1,data))輸出
7553.2723966454005 6079.271962207396 7672.686618173495可以看出k過大、過小的預測結果偏差都比較大,因為k較小容易出現過擬合,k較大容易受無關數據影響。
六、優化
回來補上
總結
以上是生活随笔為你收集整理的K近邻算法应用——价格预测的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: MUMU模拟器启动时提示MuMu App
- 下一篇: 啊萨顶顶丁丁