【學(xué)習(xí)排序】 Learning to Rank 中Listwise關(guān)于ListNet算法講解及實(shí)現(xiàn)
???????????
版權(quán)聲明:本文為博主原創(chuàng)文章,轉(zhuǎn)載請(qǐng)注明CSDN博客源地址!共同學(xué)習(xí),一起進(jìn)步~
目錄(?) [+]
一 基于列的學(xué)習(xí)排序Listwise介紹 二 ListNet算法介紹 三 ListNet算法Java實(shí)現(xiàn) 四 總結(jié)
? ? 前一篇文章"Learning to Rank中Pointwise關(guān)于PRank算法源碼實(shí)現(xiàn) "講述了基于點(diǎn)的學(xué)習(xí)排序PRank算法的實(shí)現(xiàn).該篇文章主要講述Listwise Approach和基于神經(jīng)網(wǎng)絡(luò)的ListNet算法及Java實(shí)現(xiàn).包括: ? ? 1.基于列的學(xué)習(xí)排序(Listwise)介紹 ? ? 2.ListNet算法介紹 ? ? 3.ListNet算法Java實(shí)現(xiàn) ? ? LTR中單文檔方法是將訓(xùn)練集里每一個(gè)文檔當(dāng)做一個(gè)訓(xùn)練實(shí)例,文檔對(duì)方法是將同一個(gè)查詢的搜索結(jié)果里任意兩個(gè)文檔對(duì)作為一個(gè)訓(xùn)練實(shí)例,文檔列方法是將一個(gè)查詢里的所有搜索結(jié)果列表作為一個(gè)訓(xùn)練實(shí)例.?
一. 基于列的學(xué)習(xí)排序(Listwise)介紹 ? ? Listwise方法將一個(gè)查詢對(duì)應(yīng)的所有搜索結(jié)果評(píng)分作為一個(gè)實(shí)例,訓(xùn)練得到一個(gè)最優(yōu)的評(píng)分函數(shù).在給出如下數(shù)據(jù)集中:(數(shù)據(jù)集介紹詳見上一篇文章)
===============================================================================
0 qid:10 1:0.000000 2:0.000000 3:0.000000 ... 45:0.000000 46:0.000000 #docid = 1 qid:10 1:0.031310 2:0.666667 3:0.500000 ... 45:0.448276 46:0.000000 #docid = 1 qid:10 1:0.078682 2:0.166667 3:0.500000 ... 45:1.000000 46:0.000000 #docid = 0 qid:50 1:0.044248 2:0.400000 3:0.333333 ... 45:0.622951 46:0.000000 #docid =? 2 qid:50 1:0.764381 2:0.200000 3:0.000000 ... 45:0.252874 46:0.000000 #docid = 1 qid:50 1:0.693584 2:0.000000 3:0.000000 ... 45:0.275862 46:0.000000 #docid = ===============================================================================
? ? 基于列的學(xué)習(xí)排序(Listwise Approach)是將qid=10對(duì)應(yīng)的所有查詢文檔作為一個(gè)實(shí)例進(jìn)行訓(xùn)練,即一個(gè)查詢及其對(duì)應(yīng)的所有搜索結(jié)果評(píng)分作為一個(gè)實(shí)例進(jìn)行訓(xùn)練;訓(xùn)練得到一個(gè)最后評(píng)分函數(shù)F后,test測(cè)試集中一個(gè)新的查詢,函數(shù)F對(duì)每一個(gè)文檔進(jìn)行打分,之后按照得分順序由高到低排序即是對(duì)應(yīng)搜索的結(jié)果. ? ? 下面介紹一種基于搜索結(jié)果排序組合的概率分布情況來訓(xùn)練.如下圖: ? ? 參考《這就是搜索引擎:核心技術(shù)詳解 by:張俊林》第5章 ? ? 用戶輸入查詢Q1,假設(shè)返回的搜索結(jié)果集合里包含A、B和C三個(gè)文檔,搜索引擎要對(duì)搜索結(jié)果排序,而3個(gè)文檔順序共有6種排列組合方式:ABC、ACB、BAC、BCA、CAB和CBA,每種排列組合都是一種可能的搜索結(jié)果排序方法. ? ? 我們可以把函數(shù)g設(shè)想成最優(yōu)評(píng)分函數(shù)(人工打分),對(duì)查詢Q1來說:文檔A得6分,文檔B得4分,文檔C得3分;我們的任務(wù)是找到一個(gè)函數(shù),使得其對(duì)Q1的搜索結(jié)果打分順序盡可能的接近標(biāo)準(zhǔn)函數(shù)g.其中函數(shù)f和h就是實(shí)際的評(píng)分函數(shù),通過比較兩個(gè)概率之間的KL距離,發(fā)現(xiàn)f比h更接近假想的最優(yōu)函數(shù)g.故選擇函數(shù)f為搜索的評(píng)分函數(shù). ? ? Listwise主要的算法包括:AdaRank、SVM-MAP、ListNet、LambdaMART等.
二. ListNet算法介紹 ? ? Pointwise學(xué)習(xí)排序是將訓(xùn)練集中的每個(gè)文檔看作一個(gè)樣本獲取Rank函數(shù),主要解決辦法是把分類問題轉(zhuǎn)換為單個(gè)文檔的分類和回歸問題,如PRank. ? ? Pairwise學(xué)習(xí)排序(下篇介紹)是將同一個(gè)查詢中不同的相關(guān)標(biāo)注的兩個(gè)文檔看作一個(gè)樣本,主要解決思想是把Rank問題轉(zhuǎn)換為二值分類問題,如RankNet. ? ? Listwise學(xué)習(xí)排序是將整個(gè)文檔序列看作一個(gè)樣本,主要是通過直接優(yōu)化信息檢索的評(píng)價(jià)方法和定義損失函數(shù)兩種方法實(shí)現(xiàn).ListNet算法將Luce模型引入到了排序?qū)W習(xí)方法中來表示文檔序列,同時(shí)大多數(shù)基于神經(jīng)網(wǎng)絡(luò)的排序?qū)W習(xí)算法都是基于Luce模型(Luce模型就是將序列的任意一種排序方式表示成一個(gè)概率值 )來表示序列的排序方式的. ? ? ListNet算法參考: ? ?《Learning to Rank: From Pairwise Approach to Listwise Approach 》 ? ?《基于神經(jīng)網(wǎng)絡(luò)的Listwise排序?qū)W習(xí)方法的研究 》 By:林原
? ? 通過該算法步驟解釋如下: ? ? 1.首先輸入訓(xùn)練集train.txt數(shù)據(jù). {x,y}表示查詢號(hào)對(duì)應(yīng)的樣本文檔,包括標(biāo)注等級(jí)Label=y(46維微軟數(shù)據(jù)集共3個(gè)等級(jí):0-不相關(guān),1-部分相關(guān),2-全部相關(guān)),x表示對(duì)應(yīng)的特征和特征值,需要注意的是x(m)表示m個(gè)qid數(shù),每個(gè)x(m)中有多個(gè)樣本文檔. ? ? 2.初始化操作. 迭代次數(shù)T(設(shè)置為30次)和Learning rate(ita可以為0.003、0.001、0.03、0.01等),同時(shí)初始化權(quán)重w. ? ? 3.兩層循環(huán)操作. 第一層是循環(huán)迭代次數(shù):for t = 1 to T do;第二層循環(huán)是迭代查詢總數(shù)(qid總數(shù)):for i = 1 to m do. ? ? 4.計(jì)算該行分?jǐn)?shù)用當(dāng)前權(quán)重w. 注意權(quán)重w[46]是一維數(shù)組,分別對(duì)應(yīng)46個(gè)特征值,同時(shí)f(w) = w * x. ? ?5.計(jì)算梯度向量delta_w(46個(gè)維度). 其中計(jì)算公式如下: ? ? 其中n(i)表示查詢號(hào)qid=i對(duì)應(yīng)的總文檔數(shù),j表示qid=i的當(dāng)前文檔.x的右上方下標(biāo)表示對(duì)應(yīng)的qid數(shù),右下方下標(biāo)表示對(duì)應(yīng)的文檔標(biāo)號(hào).而P是計(jì)算概率的函數(shù),如下: ? ? 它表示S1排第一、S2排第二且S3排第三的概率值.這就是使用Luce模型使一個(gè)序列的排序方式表示成一個(gè)單一的概率值.實(shí)際過程中,我們通過使用exp()函數(shù)來表示fai.主要保證其值為正、遞增. ? ? 但N!的時(shí)間復(fù)雜度很顯然效率很低,所以提出了Top-K概率來解決,即用前k項(xiàng)的排列概率來近似原有的整個(gè)序列的概率,通過降低精準(zhǔn)度來?yè)Q取運(yùn)行時(shí)間. ? ? Top-K概率公式如下: ? ? 在下面的Java代碼實(shí)現(xiàn)中我采用的是Top-1,即獲取當(dāng)前行文檔排第一的概率值. ? ? 6.循環(huán)更新權(quán)重w. ? ? 7.最后輸出w[46]權(quán)重,訓(xùn)練過程結(jié)束.通過該模型可以進(jìn)行測(cè)試預(yù)測(cè)排序,test.txt通過該權(quán)重進(jìn)行w*x打分,再進(jìn)行從高到低排序即可. ? ? PS:這僅僅是我結(jié)合兩篇論文后的個(gè)人理解,如果有錯(cuò)誤或不足之處,歡迎探討!同時(shí)感謝我的同學(xué)XP和MT,我們一起探討和分享才理解了一些ListNet算法及代碼. 三. ListNet算法Java實(shí)現(xiàn) ? ? (PS:該部分代碼非常感謝我的組長(zhǎng)XP和MT,他們?cè)谡麄€(gè)編程路上對(duì)我?guī)椭且簧?同時(shí)自己也希望以后工作中能找到更多的老師和摯友指導(dǎo)我前行~) ? ? 代碼中有詳細(xì)的注釋,按照每個(gè)步驟完成.左圖是主函數(shù),它主要包括:讀取文件并解析數(shù)據(jù)、寫數(shù)據(jù)、學(xué)習(xí)排序模型和打分預(yù)測(cè),右圖是學(xué)習(xí)排序的核心算法.
?? ? ? 代碼如下: [java] view plaincopyprint?
package ?listNet_xiuzhang;???? import ?java.io.BufferedReader;??import ?java.io.File;??import ?java.io.FileInputStream;??import ?java.io.FileWriter;??import ?java.io.InputStreamReader;???? public ?class ?listNet?{???????? ?????? ????private ?static ?int ?sumLabel;????????????????????? ?????? ????private ?static ?double ?feature[][]?=?new ?double [100000 ][48 ];?????????????????? ?????? ????private ?static ?double ?weight?[]?=?new ?double [48 ];?? ?????? ????private ?static ?int ?label?[]?=?new ?int [1000000 ];?? ?????? ????private ?static ?int ?qid?[]?=?new ?int [1000000 ];?? ?????? ????private ?static ?int ?doc_ofQid[]?=?new ?int [100000 ];??? ?? ????private ?static ?int ?ITER_NUM=30 ;??????? ????private ?static ?int ?weidu=46 ;?????????? ????private ?static ?int ?qid_Num=0 ;????????? ????private ?static ?int ?tempQid=-1 ;???????? ????private ?static ?int ?tempDoc=0 ;????????? ?????? ????? ? ? ?? ????public ?static ?void ?ReadTxtFile(String?filePath)?{?? ????????try ?{?? ????????????String?encoding="GBK" ;?? ????????????File?file=new ?File(filePath);?? ????????????if (file.isFile()?&&?file.exists())?{??? ????????????????InputStreamReader?read?=?new ?InputStreamReader(new ?FileInputStream(file),?encoding);??? ????????????????BufferedReader?bufferedReader?=?new ?BufferedReader(read);?? ????????????????String?lineTxt?=?null ;?? ????????????????sumLabel?=1 ;??? ?????????????????? ????????????????while ((lineTxt?=?bufferedReader.readLine())?!=?null )?{?? ????????????????????String?str?=?null ;?? ????????????????????int ?lengthLine?=?lineTxt.length();?? ?????????????????????? ????????????????????String?arrays[]?=?lineTxt.split("?" );?? ????????????????????for (int ?i=0 ;?i<arrays.length;?i++)?{?? ?????????????????????????? ????????????????????????if (i==0 )?{?? ????????????????????????????label[sumLabel]?=?Integer.parseInt(arrays[0 ]);?? ????????????????????????}??? ????????????????????????else ?if (i>=weidu+2 ){??? ????????????????????????????continue ;?? ????????????????????????}?? ????????????????????????else ?{?? ????????????????????????????String?subArrays[]?=?arrays[i].split(":" );??? ????????????????????????????if (i==1 )?{??? ?????????????????????????????????? ????????????????????????????????if (tempQid?!=?Integer.parseInt(subArrays[1 ]))?{??? ????????????????????????????????????if (tempQid?!=?-1 ){??? ?????????????????????????????????????????? ????????????????????????????????????????doc_ofQid[qid_Num]=tempDoc;?????? ????????????????????????????????????????tempDoc=0 ;?? ????????????????????????????????????}?? ?????????????????????????????????????? ?????????????????????????????????????? ????????????????????????????????????qid_Num++;?? ????????????????????????????????????tempQid=Integer.parseInt(subArrays[1 ]);??????????????????????? ????????????????????????????????}?? ????????????????????????????????tempDoc++;??? ????????????????????????????????qid[sumLabel]?=?Integer.parseInt(subArrays[1 ]);?? ????????????????????????????}??? ????????????????????????????else ?{??? ????????????????????????????????int ?number?=?Integer.parseInt(subArrays[0 ]);??? ????????????????????????????????double ?value?=?Double.parseDouble(subArrays[1 ]);?? ????????????????????????????????feature[sumLabel][number]?=?value;??? ????????????????????????????}?? ????????????????????????}?? ????????????????????}?? ????????????????????sumLabel++;?? ????????????????}?? ????????????????doc_ofQid[qid_Num]=tempDoc;?? ????????????????read.close();?? ????????????}?else ?{?? ????????????????System.out.println("找不到指定的文件\n" );?? ????????????}?? ????????}?catch ?(Exception?e)?{?? ????????????System.out.println("讀取文件內(nèi)容出錯(cuò)" );?? ????????????e.printStackTrace();?? ????????}?? ????}?? ?? ????? ? ? ?? ????public ?static ?void ?LearningToRank()?{?? ?????????? ?????????? ????????double ?index?[]?=?new ?double [1000000 ];?? ????????double ?tao?[]?=?new ?double [1000000 ];?? ????????double ?yita=0.00003 ;?? ?????????? ????????for (int ?i=0 ;i<weidu+2 ;i++)?{??? ????????????weight[i]?=?(double )?1.0 ;??? ????????}?? ????????System.out.println("training..." );???????????????? ?????????? ????????for (int ?iter?=?0 ;?iter<ITER_NUM;?iter++)??? ????????{??? ????????????System.out.println("---迭代次數(shù):" +iter);?? ????????????int ?now_doc=0 ;??? ????????????for (int ?i=1 ;?i<=qid_Num;?i++)??? ????????????{??? ????????????????double ?delta_w[]?=?new ?double [weidu+2 ];??? ????????????????int ?doc_of_i=doc_ofQid[i];??? ?????????????????? ????????????????double ?fw[]?=?new ?double [doc_of_i+2 ];?? ?????????????????? ?????????????????? ????????????????for (int ?k=1 ;k<=doc_of_i;k++)?{??? ????????????????????fw[k]=0.0 ;?? ????????????????}?? ????????????????for (int ?k=1 ;k<=doc_of_i;k++)?{??? ????????????????????for (int ?p=1 ;p<=weidu;p++)?{?? ????????????????????????fw[k]=fw[k]+weight[p]*feature[now_doc+k][p];??? ????????????????????}?? ????????????????}?? ?????????????????? ????????????????? ? ? ? ? ? ?? ????????????????double []?a=new ?double [weidu+2 ],c=new ?double [weidu+2 ];?? ????????????????for (int ?k=0 ;k<weidu+2 ;k++){a[k]=0.0 ;}??? ????????????????for (int ?k=0 ;k<weidu+2 ;k++){c[k]=0.0 ;}??? ????????????????double ?b=0.0 ;?? ?????????????????? ????????????????for (int ?k=1 ;?k<=doc_of_i;?k++)?{?? ????????????????????double ?p=1.0 ;??? ????????????????????double []?temp=new ?double [48 ];?? ????????????????????for (int ?q=1 ;q<=weidu;q++)?{?? ?????????????????????????? ?????????????????????????? ????????????????????????double ?fenmu=0.0 ;?? ????????????????????????for (int ?m=1 ;m<=doc_of_i;m++)?{?? ????????????????????????????fenmu=fenmu+Math.exp(fw[m]);??? ????????????????????????}?? ?????????????????????????? ????????????????????????for (int ?m=1 ;m<=doc_of_i;m++)?{?? ????????????????????????????p=p*(Math.exp(fw[m])/fenmu);?? ????????????????????????}?? ?????????????????????????? ????????????????????????temp[q]=temp[q]+p*feature[now_doc+k][q];?? ????????????????????}?? ????????????????????for (int ?q=1 ;?q<=weidu;?q++){??????????? ????????????????????????a[q]=a[q]+temp[q];?? ????????????????????}????? ????????????????}??? ?????????????????? ????????????????for (int ?k=1 ;?k<=doc_of_i;?k++){?? ????????????????????b=b+Math.exp(fw[k]);?? ????????????????}?? ?????????????????? ????????????????for (int ?k=1 ;?k<=doc_of_i;?k++){?? ????????????????????double []?temp=new ?double [weidu+2 ];?? ????????????????????for (int ?q=1 ;?q<=weidu;?q++){??????????? ????????????????????????temp[q]=temp[q]+Math.exp(fw[k])*feature[now_doc+k][q];?? ????????????????????}?? ????????????????????for (int ?q=1 ;?q<=weidu;?q++){??????????? ????????????????????????c[q]=c[q]+temp[q];?? ????????????????????}????? ????????????????}?? ?????????????????? ????????????????for (int ?q=1 ;?q<=weidu;?q++){?? ????????????????????delta_w[q]=?(-1 )*a[q]?+?((1.0 /b)*c[q]);?? ????????????????}?? ?????????????????? ?????????????????? ????????????????/*?第三步?更新權(quán)重?fin.?*/?? ????????????????for (int ?k=1 ;?k<=weidu;?k++){?? ????????????????????weight[k]=weight[k]-yita*delta_w[k];?? ????????????????}?? ????????????????now_doc=now_doc+doc_of_i;??? ????????????}?? ????????}??? ?????????? ?????????? ????????for (int ?i=1 ;i<=weidu;i++)??? ????????{?? ????????????System.out.println(i+"wei:" +weight[i]);?? ????????}?? ????}?? ?????? ????? ? ? ?? ????public ?static ?void ?WriteFileModel(String?fileModel)?{?? ?????????? ????????try ?{?? ????????????System.out.println("write?start.總行數(shù):" +sumLabel);?? ????????????FileWriter?fileWriter?=?new ?FileWriter(fileModel);?? ?????????????? ????????????fileWriter.write("##?ListNet" );?? ????????????fileWriter.write("\r\n" );?? ????????????fileWriter.write("##?Epochs?=?" +ITER_NUM);?? ????????????fileWriter.write("\r\n" );?? ????????????fileWriter.write("##?No.?of?features?=?46" );?? ????????????fileWriter.write("\r\n" );?? ????????????fileWriter.write("1?2?3?4?5?6?7?8?9?10?...??39?40?41?42?43?44?45?46" );?? ????????????fileWriter.write("\r\n" );?? ????????????fileWriter.write("0" );?? ????????????fileWriter.write("\r\n" );?? ????????????for (int ?k=0 ;?k<weidu;?k++){?? ????????????????fileWriter.write("0?" +k+"?" +weight[k+1 ]);?? ????????????????fileWriter.write("\r\n" );?? ????????????}?? ????????????fileWriter.close();?? ????????????System.out.println("write?fin." );?? ????????}?catch (Exception?e)?{?? ????????????System.out.println("寫文件內(nèi)容出錯(cuò)" );?? ????????????e.printStackTrace();?? ????????}?? ????}?? ?????? ????? ? ? ? ?? ????public ?static ?void ?PredictRank(String?fileScore)?{?? ?????????? ????????try ?{?? ????????????System.out.println("write?start.總行數(shù):" +sumLabel);?? ????????????String?encoding?=?"GBK" ;?? ????????????FileWriter?fileWriter?=?new ?FileWriter(fileScore);?? ?????????????? ????????????for (int ?k=1 ;?k<sumLabel;?k++){?? ????????????????double ?score=0.0 ;?? ????????????????for (int ?j=1 ;j<=weidu;j++){?? ????????????????????score=score+weight[j]*feature[k][j];?? ????????????????}?? ????????????????fileWriter.write("qid:" +qid[k]+"?score:" +score+"?label:" +label[k]);?? ????????????????fileWriter.write("\r\n" );?? ????????????}????? ????????????fileWriter.close();?? ????????????System.out.println("write?fin." );????? ????????}?catch (Exception?e)?{?? ????????????System.out.println("寫文件內(nèi)容出錯(cuò)" );?? ????????????e.printStackTrace();?? ????????}?? ????}?? ?????? ????? ? ?? ????public ?static ?void ?main(String?args[])?{?? ????????String?fileInput?=?"Fold1\\train.txt" ;????????? ????????String?fileModel?=?"model_weight.txt" ;????????? ????????String?fileScore?=?"score_listNet.txt" ;???????? ?????????? ????????System.out.println("read..." );?? ????????ReadTxtFile(fileInput);?? ????????System.out.println("read?and?write?well." );?? ?????????? ????????LearningToRank();?? ?????????? ????????WriteFileModel(fileModel);?? ?????????? ????????PredictRank(fileScore);?? ??????}?? ?????? ????? ? ?? ?????? }?? package listNet_xiuzhang;import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.InputStreamReader;public class listNet {//文件總行數(shù)(標(biāo)記數(shù))private static int sumLabel; //特征值 46個(gè) (標(biāo)號(hào)1-46)private static double feature[][] = new double[100000][48]; //特征值權(quán)重 46個(gè) (標(biāo)號(hào)1-46)private static double weight [] = new double[48];//相關(guān)度 其值有0-2三個(gè)級(jí)別 從1開始記錄private static int label [] = new int[1000000];//查詢id 從1開始記錄private static int qid [] = new int[1000000];//每個(gè)Qid的doc數(shù)量private static int doc_ofQid[] = new int[100000]; private static int ITER_NUM=30; //迭代次數(shù)private static int weidu=46; //特征數(shù)private static int qid_Num=0; //Qid數(shù)量private static int tempQid=-1; //臨時(shí)Qid數(shù)private static int tempDoc=0; //臨時(shí)doc數(shù)/** * 函數(shù)功能 讀取文件* 參數(shù) String filePath 文件路徑*/public static void ReadTxtFile(String filePath) {try {String encoding="GBK";File file=new File(filePath);if(file.isFile() && file.exists()) { //判斷文件是否存在InputStreamReader read = new InputStreamReader(new FileInputStream(file), encoding); BufferedReader bufferedReader = new BufferedReader(read);String lineTxt = null;sumLabel =1; //初始化從1記錄//按行讀取數(shù)據(jù)并分解數(shù)據(jù)while((lineTxt = bufferedReader.readLine()) != null) {String str = null;int lengthLine = lineTxt.length();//獲取數(shù)據(jù) 字符串空格分隔String arrays[] = lineTxt.split(" ");for(int i=0; i<arrays.length; i++) {//獲取每行樣本的Label值if(i==0) {label[sumLabel] = Integer.parseInt(arrays[0]);} else if(i>=weidu+2){ //讀取至#跳出 0-label 1-qid 2:47-特征continue;}else {String subArrays[] = arrays[i].split(":"); //特征:特征值if(i==1) { //獲取qid //判斷是否是新的Qidif(tempQid != Integer.parseInt(subArrays[1])) { if(tempQid != -1){ //不是第一次出現(xiàn)新Qid//賦值上一個(gè)為qid_Num對(duì)應(yīng)的tempDoc個(gè)文檔doc_ofQid[qid_Num]=tempDoc; tempDoc=0;}//當(dāng)tempQid不等于當(dāng)前qid時(shí)下標(biāo)加1 //相等則直接跳至Doc加1直到不等qid_Num++;tempQid=Integer.parseInt(subArrays[1]); }tempDoc++; //新的文檔 qid[sumLabel] = Integer.parseInt(subArrays[1]);} else { //獲取46維特征值int number = Integer.parseInt(subArrays[0]); //判斷特征double value = Double.parseDouble(subArrays[1]);feature[sumLabel][number] = value; //number數(shù)組標(biāo)號(hào):1-46}}}sumLabel++;}doc_ofQid[qid_Num]=tempDoc;read.close();} else {System.out.println("找不到指定的文件\n");}} catch (Exception e) {System.out.println("讀取文件內(nèi)容出錯(cuò)");e.printStackTrace();}}/*** 學(xué)習(xí)排序* 訓(xùn)練模型得到46維權(quán)重*/public static void LearningToRank() {//變量double index [] = new double[1000000];double tao [] = new double[1000000];double yita=0.00003;//初始化for(int i=0;i<weidu+2;i++) { //從1到136為權(quán)重,0和137無用weight[i] = (double) 1.0; //權(quán)重初值}System.out.println("training..."); //計(jì)算權(quán)重 學(xué)習(xí)算法for(int iter = 0; iter<ITER_NUM; iter++) //迭代ITER_NUM次{ System.out.println("---迭代次數(shù):"+iter);int now_doc=0; //全局文檔索引for(int i=1; i<=qid_Num; i++) //總樣qid數(shù) 相當(dāng)于兩層循環(huán)T和m { double delta_w[] = new double[weidu+2]; //46個(gè)梯度組成的向量int doc_of_i=doc_ofQid[i]; //該Qid的文檔數(shù)//得分f(w),一個(gè)QID有多個(gè)文檔,一個(gè)文檔為一個(gè)分,所以一個(gè)i對(duì)應(yīng)一個(gè)分?jǐn)?shù)數(shù)組double fw[] = new double[doc_of_i+2];/* 第一步 算得分?jǐn)?shù)組fw fin */for(int k=1;k<=doc_of_i;k++) { //初始化fw[k]=0.0;}for(int k=1;k<=doc_of_i;k++) { //每個(gè)文檔的得分for(int p=1;p<=weidu;p++) {fw[k]=fw[k]+weight[p]*feature[now_doc+k][p]; //算出這個(gè)文檔的分?jǐn)?shù)}}/** 第二步 算梯度delta_w向量* a=Σp*x,a是向量 * b=Σexpf(x),b是數(shù)字* c=expf(x)*x,c是向量* 最終結(jié)果delta_w是向量*/double[] a=new double[weidu+2],c=new double[weidu+2];for(int k=0;k<weidu+2;k++){a[k]=0.0;} //初始化for(int k=0;k<weidu+2;k++){c[k]=0.0;} //初始化double b=0.0;//算a:----for(int k=1; k<=doc_of_i; k++) {double p=1.0; //先不topKdouble[] temp=new double[48];for(int q=1;q<=weidu;q++) {//算P: ----第q個(gè)向量排XX的概率是多少//分母:double fenmu=0.0;for(int m=1;m<=doc_of_i;m++) {fenmu=fenmu+Math.exp(fw[m]); //所有文檔得分}//top-1 exp(s1) / exp(s1)+exp(s2)+..+exp(sn)for(int m=1;m<=doc_of_i;m++) {p=p*(Math.exp(fw[m])/fenmu);}//算積temp[q]=temp[q]+p*feature[now_doc+k][q];}for(int q=1; q<=weidu; q++){ a[q]=a[q]+temp[q];} } //End a//算b:---- fin.for(int k=1; k<=doc_of_i; k++){b=b+Math.exp(fw[k]);}//算c:----for(int k=1; k<=doc_of_i; k++){double[] temp=new double[weidu+2];for(int q=1; q<=weidu; q++){ temp[q]=temp[q]+Math.exp(fw[k])*feature[now_doc+k][q];}for(int q=1; q<=weidu; q++){ c[q]=c[q]+temp[q];} }//算梯度:delta_x=-a+1/b*cfor(int q=1; q<=weidu; q++){delta_w[q]= (-1)*a[q] + ((1.0/b)*c[q]);}//**********/* 第三步 更新權(quán)重 fin. */for(int k=1; k<=weidu; k++){weight[k]=weight[k]-yita*delta_w[k];}now_doc=now_doc+doc_of_i; //更新當(dāng)前文檔索引}} //End 迭代次數(shù)//輸出權(quán)重for(int i=1;i<=weidu;i++) //從1到136為權(quán)重,0和137無用{System.out.println(i+"wei:"+weight[i]);}}/*** 輸出權(quán)重到文件fileModel* @param fileModel*/public static void WriteFileModel(String fileModel) {//輸出權(quán)重到文件try {System.out.println("write start.總行數(shù):"+sumLabel);FileWriter fileWriter = new FileWriter(fileModel);//寫數(shù)據(jù)fileWriter.write("## ListNet");fileWriter.write("\r\n");fileWriter.write("## Epochs = "+ITER_NUM);fileWriter.write("\r\n");fileWriter.write("## No. of features = 46");fileWriter.write("\r\n");fileWriter.write("1 2 3 4 5 6 7 8 9 10 ... 39 40 41 42 43 44 45 46");fileWriter.write("\r\n");fileWriter.write("0");fileWriter.write("\r\n");for(int k=0; k<weidu; k++){fileWriter.write("0 "+k+" "+weight[k+1]);fileWriter.write("\r\n");}fileWriter.close();System.out.println("write fin.");} catch(Exception e) {System.out.println("寫文件內(nèi)容出錯(cuò)");e.printStackTrace();}}/*** 預(yù)測(cè)排序* 正規(guī)應(yīng)對(duì)test.txt文件進(jìn)行打分排序* 但我們是在Hadoop實(shí)現(xiàn)該打分排序步驟 此函數(shù)僅測(cè)試train.txt打分*/public static void PredictRank(String fileScore) {//輸出得分try {System.out.println("write start.總行數(shù):"+sumLabel);String encoding = "GBK";FileWriter fileWriter = new FileWriter(fileScore);//寫數(shù)據(jù)for(int k=1; k<sumLabel; k++){double score=0.0;for(int j=1;j<=weidu;j++){score=score+weight[j]*feature[k][j];}fileWriter.write("qid:"+qid[k]+" score:"+score+" label:"+label[k]);fileWriter.write("\r\n");} fileWriter.close();System.out.println("write fin."); } catch(Exception e) {System.out.println("寫文件內(nèi)容出錯(cuò)");e.printStackTrace();}}/*** 主函數(shù)*/public static void main(String args[]) {String fileInput = "Fold1\\train.txt"; //訓(xùn)練String fileModel = "model_weight.txt"; //輸出權(quán)重模型String fileScore = "score_listNet.txt"; //輸出得分//第1步 讀取文件并解析數(shù)據(jù)System.out.println("read...");ReadTxtFile(fileInput);System.out.println("read and write well.");//第2步 排序計(jì)算LearningToRank();//第3步 輸出模型WriteFileModel(fileModel);//第4步 打分預(yù)測(cè)排序PredictRank(fileScore);}/** End*/}
四. 總結(jié) ? ? 上面的代碼我更希望你關(guān)注的是ListNet在訓(xùn)練模型過程中的代碼,也就是通過train.txt獲取得到46維的權(quán)重的模型.通過該模型你可以對(duì)test.txt進(jìn)行打分(權(quán)重*特征值)排序,而上面的代碼僅是對(duì)train.txt進(jìn)行了簡(jiǎn)單的打分操作,那時(shí)因?yàn)槲覀兊淖鳂I(yè)是基于Hadoop或Spark分布式處理基礎(chǔ)上的.所以該部分由其他同學(xué)完成. ? ? 同時(shí)你也可以通過開源的RankLib或羅磊同學(xué)的ListNet算法進(jìn)行學(xué)習(xí),地址如下: ? ??http://sourceforge.net/projects/minorthird/ ? ??http://code.google.com/p/learning-to-rank-listnet/ ? ?? http://people.cs.umass.edu/~vdang/ranklib.html ? ? 最后我們使用開源的MAP和NDCG@r簡(jiǎn)單對(duì)該算法進(jìn)行了性能評(píng)估,同時(shí)附上Hadoop上的運(yùn)行截圖(MapReduce只找到了PRank的一張截圖).
?
?
? ? 希望文章對(duì)大家有所幫助,同時(shí)我是根據(jù)論文寫出的Java代碼,如果有錯(cuò)誤或不足之處,還請(qǐng)海涵~同時(shí)歡迎提出問題,我對(duì)機(jī)器學(xué)習(xí)和算法的了解還是初學(xué),但是會(huì)盡力答復(fù).同時(shí)發(fā)現(xiàn)該部分的代碼真的很少,所以才寫了這樣一些文章,后面還準(zhǔn)備寫寫Pairwise和Map\NDCG評(píng)價(jià). ? ? (By:Eastmount 2015-2-5 夜10點(diǎn) ?http://blog.csdn.net/eastmount/article/)
總結(jié)
以上是生活随笔 為你收集整理的Learning to Rank 中Listwise关于ListNet算法讲解及实现 的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔 網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔 推薦給好友。