當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

贝叶斯网络结构学习之MCMC算法（基于FullBNT-1.0.4的MATLAB实现）

發布時間：2023/12/9 编程问答 23 豆豆

生活随笔收集整理的這篇文章主要介紹了贝叶斯网络结构学习之MCMC算法（基于FullBNT-1.0.4的MATLAB实现）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

題目：貝葉斯網絡結構學習之MCMC算法（基于FullBNT-1.0.4的MATLAB實現）

? ? ? ? 有關貝葉斯網絡結構學習的一基本概念可以參考：貝葉斯網絡結構學習方法簡介

? ? ? ? 有關函數輸入輸出參數的解釋可以參考：貝葉斯網絡結構學習若干問題解釋

? ? ? ? 本篇所基于的馬爾可夫鏈蒙特卡羅(Markov Chain Monte Carlo, MCMC)方法是一個通用的框架，其代表性的算法是Metropolis-Hastings(MH)算法，MH算法的特例Gibbs采樣算法應用也很廣泛。本文不具體探討MCMC的細節，若想了解更多有關MCMC算法內容推薦參考【Bishop C M. Pattern Recognition and Machine Learning (Information Science and Statistics)[M]. Springer-Verlag New York, Inc. 2006.】（簡稱PRML）的11.2節（MarkovChain Monte Carlo），中文資料參考【周志華. 機器學習[M]. 清華大學出版社，2016.】（簡稱西瓜書）的14.5.1節（MCMC采樣），Gibbs采樣算法的細節可以參考PRML的11.3節（Gibbs Sampling），中文資料參考西瓜書的7.5.3節（推斷），本文附錄中會簡要給出幾個查閱文獻時的關鍵點，但若想真正弄懂還須閱讀推薦的文獻資料。

? ? ? ? 將工具箱添加到matlab中的步驟參見上一篇《貝葉斯網絡結構學習之K2算法（基于FullBNT-1.0.4的MATLAB實現）》的附錄2，若僅想使用本文介紹的算法學得貝葉斯網絡結構，可以在添加工具箱到Matlab中后，調用本文后面部分個人進行二次封裝的函數：learn_struct_mcmc_basic.m，此函數僅需輸入數據集，輸出即為學得的網絡結構。但是，建議大致瀏覽一遍全文。

? ? ? ? 據文獻【胡春玲. 貝葉斯網絡結構學習及其應用研究[D]. 合肥工業大學, 2011.】所述：

? ? ? ? ?將MH方法引入貝葉斯網絡結構學習的文獻有如下幾篇：

[45] Friedman N, Koller D. Being Bayesianabout network structure. A Bayesian approach to structure discovery in Bayesiannetworks[J]. Machine learning, 2003, 50(1-2): 95-125.

[62] Laskey K B, Myers J W. Population markovchain monte carlo[J]. Machine Learning, 2003, 50(1): 175-196.

[101] Madigan D, York J, Allard D. Bayesiangraphical models for discrete data[J]. International Statistical Review/RevueInternationale de Statistique, 1995: 215-232.

[102] Giudici P, Castelo R. ImprovingMarkov chain Monte Carlo model search for data mining[J]. Machine learning,2003, 50(1-2): 127-158.

? ? ? ? 從頁碼可以看出，這幾篇文獻篇幅都比較長，就不去細究了。接下來重點說說基于工具箱FullBNT-1.0.4的實現。

? ? ? ? 函數文件路徑：\FullBNT-1.0.4\BNT\learning\learn_struct_mcmc.m

? ? ? ? 例子文件路徑：\FullBNT-1.0.4\BNT\examples\static\StructLearn\mcmc1.m

learn_struct_mcmc.m的代碼如下：

function [sampled_graphs, accept_ratio, timing] = learn_struct_mcmc(data, ns, varargin) % MY_LEARN_STRUCT_MCMC Monte Carlo Markov Chain search over DAGs assuming fully observed data % [sampled_graphs, accept_ratio, num_edges] = learn_struct_mcmc(data, ns, ...) % % data(i,m) is the value of node i in case m. % ns(i) is the number of discrete values node i can take on. % % sampled_graphs{m} is the m'th sampled graph. % accept_ratio(t) = acceptance ratio at iteration t % num_edges(t) = number of edges in model at iteration t % % The following optional arguments can be specified in the form of name/value pairs: % [default value in brackets] % % scoring_fn - 'bayesian' or 'bic' [ 'bayesian' ] % Currently, only networks with all tabular nodes support Bayesian scoring. % type - type{i} is the type of CPD to use for node i, where the type is a string % of the form 'tabular', 'noisy_or', 'gaussian', etc. [ all cells contain 'tabular' ] % params - params{i} contains optional arguments passed to the CPD constructor for node i, % or [] if none. [ all cells contain {'prior', 1}, meaning use uniform Dirichlet priors ] % discrete - the list of discrete nodes [ 1:N ] % clamped - clamped(i,m) = 1 if node i is clamped in case m [ zeros(N, ncases) ] % nsamples - number of samples to draw from the chain after burn-in [ 100*N ] % burnin - number of steps to take before drawing samples [ 5*N ] % init_dag - starting point for the search [ zeros(N,N) ] % % e.g., samples = my_learn_struct_mcmc(data, ns, 'nsamples', 1000); % % Modified by Sonia Leach (SML) 2/4/02, 9/5/03tic[n ncases] = size(data);% set default params type = cell(1,n); params = cell(1,n); for i=1:ntype{i} = 'tabular';%params{i} = { 'prior', 1};params{i} = { 'prior_type', 'dirichlet', 'dirichlet_weight', 1 }; end scoring_fn = 'bayesian'; discrete = 1:n; clamped = zeros(n, ncases); nsamples = 100*n; burnin = 5*n; dag = zeros(n); lML = [];args = varargin; nargs = length(args); for i=1:2:nargsswitch args{i},case 'nsamples', nsamples = args{i+1};case 'burnin', burnin = args{i+1};case 'init_dag', dag = args{i+1};case 'scoring_fn', scoring_fn = args{i+1};case 'type', type = args{i+1}; case 'discrete', discrete = args{i+1}; case 'clamped', clamped = args{i+1}; case 'lML', lML = args{i+1};case 'params', if isempty(args{i+1}), params = cell(1,n); else params = args{i+1}; endend end% We implement the fast acyclicity check described by P. Giudici and R. Castelo, % "Improving MCMC model search for data mining", submitted to J. Machine Learning, 2001.% SML: also keep descendant matrix C use_giudici = 1; if use_giudici[nbrs, ops, nodes, A] = mk_nbrs_of_digraph(dag); else[nbrs, ops, nodes] = mk_nbrs_of_dag(dag);A = []; endnum_accepts = 1; num_rejects = 1; T = burnin + nsamples; accept_ratio = zeros(1, T); num_edges = zeros(1, T); sampled_graphs = cell(1, nsamples);timing = zeros(1,T-burnin); %sampled_bitv = zeros(nsamples, n^2);for t=1:T[dag, nbrs, ops, nodes, A, accept] = take_step(dag, nbrs, ops, ...nodes, ns, data, clamped, A, ...scoring_fn, discrete, type, params, lML );num_edges(t) = sum(dag(:));num_accepts = num_accepts + accept;num_rejects = num_rejects + (1-accept);accept_ratio(t) = num_accepts/(num_rejects+num_accepts);if t > burninsampled_graphs{t-burnin} = dag;%sampled_bitv(t-burnin, :) = dag(:)';timing(t-burnin) = toc;endif mod(t, 1000)==0fprintf('%i/%i %f\n', t, T, accept_ratio(t));end end%%%%%%%%%function [new_dag, new_nbrs, new_ops, new_nodes, A, accept] = ...take_step(dag, nbrs, ops, nodes, ns, data, clamped, A, ...scoring_fn, discrete, type, params, lML, prior_w )use_giudici = ~isempty(A); if use_giudici[new_dag, op, i, j, new_A] = pick_digraph_nbr(dag, nbrs, ops, nodes,A); % updates A[new_nbrs, new_ops, new_nodes] = mk_nbrs_of_digraph(new_dag, new_A); elsed = sample_discrete(normalise(ones(1, length(nbrs))));new_dag = nbrsozvdkddzhkzd;op = opsozvdkddzhkzd;i = nodes(d, 1); j = nodes(d, 2);[new_nbrs, new_ops, new_nodes] = mk_nbrs_of_dag(new_dag); endbf = bayes_factor(dag, new_dag, op, i, j, ns, data, clamped, scoring_fn, discrete, type, params, lML);%R = bf * (new_prior / prior) * (length(nbrs) / length(new_nbrs)); R = bf * (length(nbrs) / length(new_nbrs)); u = rand(1,1); if u > min(1,R) % reject the moveaccept = 0;new_dag = dag;new_nbrs = nbrs;new_ops = ops;new_nodes = nodes; elseaccept = 1;if use_giudici A = new_A; % new_A already updated in pick_digraph_nbrend end%%%%%%%%%function bfactor = bayes_factor(old_dag, new_dag, op, i, j, ns, data, clamped, scoring_fn, discrete, type, params, lML)if isempty(lML)u = find(clamped(j,:)==0);LLnew = score_family(j, parents(new_dag, j), type{j}, scoring_fn, ns, discrete, data(:,u), params{j});LLold = score_family(j, parents(old_dag, j), type{j}, scoring_fn, ns, discrete, data(:,u), params{j}); elsepaNew = find(new_dag(:,j));paOld = find(old_dag(:,j));kNew = sum(2.^(paNew-1));kOld = sum(2.^(paOld-1));LLnew = lML(j, kNew+1);LLold = lML(j, kOld+1); endbf1 = exp(LLnew - LLold);if strcmp(op, 'rev') % must also multiply in the changes to i's familyif isempty(lML)u = find(clamped(i,:)==0);LLnew = score_family(i, parents(new_dag, i), type{i}, scoring_fn, ns, discrete, data(:,u), params{i});LLold = score_family(i, parents(old_dag, i), type{i}, scoring_fn, ns, discrete, data(:,u), params{i});elsepaNew = find(new_dag(:,i));paOld = find(old_dag(:,i));kNew = sum(2.^(paNew-1));kOld = sum(2.^(paOld-1));LLnew = lML(i, kNew+1);LLold = lML(i, kOld+1);endbf2 = exp(LLnew - LLold);elsebf2 = 1; end bfactor = bf1 * bf2;%%%%%%%% Giudici stuff follows %%%%%%%%%%% SML: This now updates A as it goes from digraph it choses function [new_dag, op, i, j, new_A] = pick_digraph_nbr(dag, digraph_nbrs, ops, nodes, A)d = sample_discrete(normalise(ones(1, length(digraph_nbrs)))); %d = myunidrnd(length(digraph_nbrs),1,1); i = nodes(d, 1); j = nodes(d, 2); new_dag = digraph_nbrs(:,:,d); op = opsozvdkddzhkzd; new_A = update_ancestor_matrix(A, op, i, j, new_dag); %%%%%%%%%%%%%%function A = update_ancestor_matrix(A, op, i, j, dag)switch op case 'add',A = do_addition(A, op, i, j, dag); case 'del', A = do_removal(A, op, i, j, dag); case 'rev', A = do_removal(A, op, i, j, dag);A = do_addition(A, op, j, i, dag); end%%%%%%%%%%%%function A = do_addition(A, op, i, j, dag)A(j,i) = 1; % i is an ancestor of j anci = find(A(i,:)); if ~isempty(anci)A(j,anci) = 1; % all of i's ancestors are added to Anc(j) end ancj = find(A(j,:)); descj = find(A(:,j)); if ~isempty(ancj)for k=descj(:)'A(k,ancj) = 1; % all of j's ancestors are added to each descendant of jend end%%%%%%%%%%% function A = do_removal(A, op, i, j, dag)% find all the descendants of j, and put them in topological order% SML: originally Kevin had the next line commented and the %* lines % being used but I think this is equivalent and much less expensive % I assume he put it there for debugging and never changed it back...? descj = find(A(:,j)); %* R = reachability_graph(dag); %* descj = find(R(j,:));order = topological_sort(dag);% SML: originally Kevin used the %* line but this was extracting the % wrong things to sort %* descj_topnum = order(descj); [junk, perm] = sort(order); %SML:node i is perm(i)-TH in order descj_topnum = perm(descj); %SML:descj(i) is descj_topnum(i)-th in order% SML: now re-sort descj by rank in descj_topnum [junk, perm] = sort(descj_topnum); descj = descj(perm);% Update j and all its descendants A = update_row(A, j, dag); for k = descj(:)'A = update_row(A, k, dag); end%%%%%%%%%%%function A = old_do_removal(A, op, i, j, dag)% find all the descendants of j, and put them in topological order % SML: originally Kevin had the next line commented and the %* lines % being used but I think this is equivalent and much less expensive % I assume he put it there for debugging and never changed it back...? descj = find(A(:,j)); %* R = reachability_graph(dag); %* descj = find(R(j,:)); order = topological_sort(dag); descj_topnum = order(descj); [junk, perm] = sort(descj_topnum); descj = descj(perm); % Update j and all its descendants A = update_row(A, j, dag); for k = descj(:)'A = update_row(A, k, dag); end%%%%%%%%%function A = update_row(A, j, dag)% We compute row j of A A(j, :) = 0; ps = parents(dag, j); if ~isempty(ps)A(j, ps) = 1; end for k=ps(:)'anck = find(A(k,:));if ~isempty(anck)A(j, anck) = 1;end end%%%%%%%%function A = init_ancestor_matrix(dag)order = topological_sort(dag); A = zeros(length(dag)); for j=order(:)'A = update_row(A, j, dag); end
mcmc1.m的代碼如下：

% We compare MCMC structure learning with exhaustive enumeration of all dags.N = 3; %N = 4; dag = mk_rnd_dag(N); ns = 2*ones(1,N); bnet = mk_bnet(dag, ns); for i=1:Nbnet.CPD{i} = tabular_CPD(bnet, i); endncases = 100; data = zeros(N, ncases); for m=1:ncasesdata(:,m) = cell2num(sample_bnet(bnet)); enddags = mk_all_dags(N); score = score_dags(data, ns, dags); post = normalise(exp(score));[sampled_graphs, accept_ratio] = learn_struct_mcmc(data, ns, 'nsamples', 100, 'burnin', 10); mcmc_post = mcmc_sample_to_hist(sampled_graphs, dags);if 0subplot(2,1,1)bar(post)subplot(2,1,2)bar(mcmc_post)print(gcf, '-djpeg', '/home/cs/murphyk/public_html/Bayes/Figures/mcmc_post.jpg')clfplot(accept_ratio)print(gcf, '-djpeg', '/home/cs/murphyk/public_html/Bayes/Figures/mcmc_accept.jpg') end

? ? ? ? 首先來說函數learn_struct_mcmc.m，此函數必須設置的輸入參數比較少，大部分可以默認。僅兩個必須的輸入參數如下：

? ? ? ? data和ns，含義與learn_struct_K2.m相同（參見《貝葉斯網絡結構學習之K2算法（基于FullBNT-1.0.4的MATLAB實現）》）

% data(i,m) is the value of node i in case m. % ns(i) is the number of discrete values node i can take on.

? ? ? ? 再解釋nsamples和burnin兩個可選輸入參數，因為例子當中用到了這兩個參數：

? ? ? ? 因為MCMC本身就是通過多次迭代收斂到平穩狀態，此時產出的樣本近似服從分布p。也就是說必須經過多次迭代后產生的樣本才是符合要求的，因此需要設定兩個參數。第一個是從第幾次迭代之后開始產生的樣本認為符合要求，此即burnin參數；第二個是一共產生多少個合格的樣本（誠然貝葉斯網絡結構只需每次產生一個即可，但很多MCMC應用實際上是產生多個樣本求期望），此即nsamples參數；因此總共迭代次數是nsamples+burnin，即前burnin次迭代產生的結果舍掉，保存后nsamples次迭代結果。

% nsamples - number of samples to draw from the chain after burn-in [ 100*N ] % burnin - number of steps to take before drawing samples [ 5*N ]

以上注釋中最后中括號內的值為此參數的默認值。其它參數含義詳見函數注釋。

? ? ? ? 值得注意的是，learn_struct_mcmc.m的第66~67行提到了一篇參考文獻，個人認為應該是上面提到的四篇英文文獻的最后一篇，即【Giudici P, Castelo R. Improving Markov chain Monte Carlo modelsearch for data mining[J]. Machine learning, 2003, 50(1-2): 127-158.】。

? ? ? ? 接下來說例子mcmc1.m，此例子的大致流程是：

? ? ? ? (1)人工生成一個貝葉斯網絡BN（含結構和參數）（前10行）；

? ? ? ? (2)根據貝葉斯網絡BN生成數據集D（第12~16行）；

? ? ? ? (3)得到所有結點個數為N的貝葉斯網絡結構并用評分函數進行依次評分（第18~20行）；

? ? ? ? (4)調用learn_struct_mcmc函數根據數據集D學習網絡結構（第22行）；

? ? ? ? (5)對學得的所有網絡結構進行統計，得到每種網絡結構的比例（第23行）

? ? ? ? (6)畫出第3步所有評分函數的柱狀圖和第5步學得各種網絡結構比例的柱狀圖（第26~29行），以及接受概率曲線圖（第33行）。

? ? ? ? 直接運行mcmc1.m，貼三張圖，結合這三張圖進一步解釋整個例子細節：

圖1 工作區變量

圖2 評分函數的柱狀圖（上）和各種網絡結構比例的柱狀圖（下）

圖3 接受概率曲線圖

? ? ? ? 前兩步沒什么特別的，生成一個人工數據集供后面使用。

? ? ? ? 第3步先是得到所有結點個數為N的貝葉斯網絡結構（第18行），從圖1可以知道dags大小為25，也就是說共有25種合法的網絡結構（有向無環圖）；此處節點個數N=3，任意兩個節點A和B之間可以由A指向B，也可以由B指向A，也可以沒有連接，因此共三種情況，而三個節點共有三對節點，因此共計3^3=27種可能的結構，但這里面有兩種結構并不是有向無環圖，如下圖所示：

圖4 兩種非有向無環圖

因此共計27-2=25種結構。得到所有結構之后，對每種結構用評分函數進行打分（第19行），可以執行“edit score_dags”命令打開score_dags.m文件看一下，其實就是調用評分函數score_family依次計算每種結構的評分，每種網絡結構的評分是該結構中各節點評分之和。接下來，第20行對評分結果求指數并正規化（即所有評分結果之和為1），但是為什么要調用exp求指數我并不是很明白，第19行得到的score都是負數，用exp轉換后就變為0和1之間的數，不知道是不是這個原因。

? ? ? ? 第4步調用learn_struct_mcmc函數，輸入參數當中data即為前兩部生成的數據集，ns為各個節點可能的取值個數（每個節點的取值是離散的，只有有限的幾個值）；接下來nsamples設置為100即得到100個采樣結果（貝葉斯網絡結構），burnin設置為10意思是舍棄最初的10次采樣結果（貝葉斯網絡結構）。輸出參數有兩個，sampled_graph存儲著100個采樣結果，accept_ratio存儲著歷史采樣的接受概率。從圖1中可以知道sampled_graphs大小為100，即采樣個數，accept_ratio大小為110，即所有采樣次數nsamples+burnin。

? ? ? ? 這里有一個最關鍵問題：我只是想調用learn_struct_mcmc函數根據數據集data學得一個最好的貝葉斯網絡結果，但這個函數卻輸出了很多個（例子中sampled_graphs存儲著100種結構，因為最多才25種結構，即100個結果肯定有相互一樣的），到底選哪一個呢？是不是最后一次采樣結果就是最好的結構呢？因為迭代次數越多結果應該越穩定呀。這應該是每一個使用者疑問，畢竟作為一個普通人，僅想調用某個結構學習函數根據數據集學到一個貝葉斯網絡結構而已。答案很簡單，選擇在結果中出現次數最多的那一個結構！

? ? ? ? 第5步調用mcmc_sample_to_hist函數，對學得的所有網絡結構進行統計，得到每種網絡結構的比例。比如例子中共有25可能的結構，mcmc_sample_to_hist將sampled_graphs中的100種結構與25種可能的結構進行對比，統計每一種可能結構在sampled_graphs中的出現次數，再正規化，即分別計算出25種可能的結構在100個采樣結果中出現的比例。

? ? ? ? 第6步是繪出顯示工作，執行與否由第25行的if語句決定，默認不執行，若要執行則將if 0改為if 1即可。第27行將可能的25種結構的正規化評分結果（第20行所得）用柱狀圖顯示，第29行將sampled_graphs正規化的統計結果（第23行所得）用柱狀圖顯示，分別如圖2的上圖和下圖所示。第33行繪出每次迭代的接受概率，因為Metropolis-Hastings算基于“拒絕采樣”(reject sampling)來逼近平穩分布的。

? ? ? ? 比較圖2中的兩幅子圖可以發現，它們很相識！是的，MCMC方法的關鍵就在于通過構造“平穩分布為p的馬爾可夫鏈”來產生樣本：若馬爾可夫鏈運行時間足夠長（即收斂到平穩狀態），則此時產出的樣本x近似服從于分布p（摘自西瓜書P333）。learn_struct_mcmc函數產生的樣本x即為sampled_graphs中的網絡結構，分布p即為第20行得到的正規化的各種結構評分結果。到此，應該明白選擇sampled_graphs中的哪一個結構了吧，出現頻率服從正規化評分結果，即出現次數越多的結構評分越好；因此，當然選擇sampled_graphs中出現次數最多的那一個結構！

? ? ? ? 可能你會發現，第5行生成的dag并不是評分最好的結構。也就是說基于結構dag生成數據集D，但在此數據集D上用評分函數去評價各種可能的結構時有優于dag的結構！打個比方說，你(結構dag)生的孩子(數據集D)長的居然不是與你長的最像？！為什么呢？不是親生的？稍安勿躁，孩子還是親生的^_^，只是決定孩子長像的除了你(結構dag)之外還有你的另一半(網絡參數)，另外遺傳過程除了crossover還有mutation，所以淡定淡定^_^

? ? ? ? 總結一下這個例子，首先人工生成數據集，然后對比MCMC結構學習結果正規化分布與所有可能結構的評分正規化分布。其實例子第1行的注釋已經說明了：

% We compare MCMC structure learning withexhaustive enumeration of all dags.

? ? ? ? 另外，特別注意一下learn_struct_mcmc函數的輸入參數nsamples和burnin的設置，本例中分別設置為100和10，默認值分別是100*N和5*N（N為節點個數，本例中N=3），當在實際中應用該函數時節點個數一般大于3，所以就不要再使用本例中的100和10兩個值了，其實最簡單的辦法就是不管這兩個參數，直接使用默認值！

? ? ? ? 下面是我對learn_struct_mcmc進行了二次封裝處理所得的learn_struct_mcmc_basic.m，輸入參數只有數據集data，輸出是評分結果最好的網絡結構：

function [DAG] = learn_struct_mcmc_basic(data) %調用工具箱FullBNT-1.0.4中的learn_struct_mcmc函數學習貝葉斯網絡結構 %該函數主要是對learn_struct_mcmc進行二次封裝 %輸入僅有data，輸出即為學得貝葉斯網絡結構DAG%節點個數N = size(data,1);%得到learn_struct_mcmc的輸入參數nsns = zeros(1,N);for nn=1:Nns_temp = unique(data(nn,:));ns(1,nn)=length(ns_temp);end%調用learn_struct_mcmc[sampled_graphs, ~] = learn_struct_mcmc(data, ns);%將sampled_graphs的所有結構矩陣展成一行存在sampled_bitvsnsamples = length(sampled_graphs);%結構個數sampled_bitvs = zeros(nsamples, N*N);for nn=1:nsamplessampled_bitvs(nn, :) = sampled_graphs{nn}(:)';end%找出sampled_graphs中共有多少種不同的結果（即去除重復的結構）[ugraphs, I, J] = unique(sampled_bitvs, 'rows');%統計每種結構的個數num = zeros(1,length(I));for nn=1:length(I)num(nn) = sum(J==nn);end%根據統計結果找出出現次數最多的結構[~,p] = max(num); DAG = reshape(ugraphs(p,:),N,N);%恢得N*N矩陣%DAG= reshape(sampled_bitvs(I(p),:),N,N) end
? ? ? ? 但是，我多次運行learn_struct_mcmc函數，發現對于同一數據集，每次的結果都是不同的，這也很正常，因為本身該函數使用了Metropolis-Hastings算法，里面的有一定的隨機性。

? ? ? ? 最后總結一下基于MCMC的結構學習方法本質：該方法不同于K2算法直接搜索評分函數值最大的結構，而是通過構建一個馬爾可夫鏈，使其收斂至平穩分布p（該分布即為每種貝葉斯網絡結構的規格化評分函數值，即例子中的第20行得到的結果，該分布在真實應用中是未知的，因為當貝葉斯網絡節點個數較多時不可能遍歷每種結構求出其評分函數值，但通過MCMC方法卻可以近似產生分布服從p的樣本），依此馬爾可夫鏈產生的樣本（即網絡結構）中出現次數最多（出現概率最大）的結構即為評分函數最好的結構。因此基于MCMC的結構學習方法本質上講仍然是基于評分搜索的方法，只是這種搜索策略并不是直接搜索評分函數最優的網絡結構，而是按各種結構的評分函數值大小為概率產生好多結構。所以一般文獻也會將此方法歸類到基于評分搜索的方法之中，例如：

? ? ? ? 【胡春玲. 貝葉斯網絡結構學習及其應用研究[D]. 合肥工業大學, 2011.】

? ? ? ? 只是不清楚為何上圖文獻的作者在3.3.1節如此描述完之后（注意上圖最后四個字“抽樣算法”）又在3.3.3節單列一節專門講述“基于隨機抽樣的貝葉斯網絡結構學習”，可能正如3.3.3節第一段最后一句話所述，“將隨機抽樣的思想引入評分搜索學習方法的搜索過程是解決評分搜索算法收斂于局部最優的有效途徑之一”，作者雖然在3.3.1節點明確了抽樣算法是一種搜索策略，但可能又發現基于隨機抽樣的思想不同于普通的搜索策略，所以單列說明。

? ? ? ? 另外，從這個算法中可以進一步理解何為蒙特卡羅(Monte Carlo)仿真，蒙特卡羅仿真的確是通過大量仿真來說明問題的，但絕不是隨機進行仿真的，而是按實際概率分布進行抽樣得到樣本后進行大量仿真分析。再回到MCMC方法，就是先通過構造馬爾可夫鏈得到分布為p的樣本（這是蒙特卡羅仿真的前提），然后再進行蒙特卡羅分析。

附錄1：西瓜書中有關MCMC方法(MH算法和Gibbs采樣)

? ? ? ? 由于MCMC是一種采樣技術，所以首先說明為什么要研究采樣技術：

? ? ? ? 介紹完了為什么要用采樣法，開始轉入MCMC方法，“概率圖模型中最常用的采樣技術是MCMC方法”，對MCMC簡單介紹之后又轉入一種MCMC具體算法——MH算法：

? ? ? ? 接受概率為什么等于式(14.28)呢？把式(14.28)代入式(14.27)立馬就明白了，具體可參見英文書PRML中的式(11.45)，附錄2中也會有截圖。

? ? ? ? Gibbs采樣在西瓜書中是在貝葉斯網絡推斷中講述的：

? ? ? ? 這只是Gibbs采樣在貝葉斯網絡近似推斷中的具體應用，更通用的表達參見PRML。

附錄2：PRML中有關MCMC方法(MH算法和Gibbs采樣)

? ? ? ? PRML中并沒有一個單獨的MCMC方法或具體的MH算法的步驟，從11.2.2節開始討論MH算法，給出了接受概率表達式(11.44)：

? ? ? ? 其中文中提到的式(11.33)如下：

? ? ? ? 文中提到的式(11.40)如下：

? ? ? ? 為什么接收概率如此設置呢？證明如下：

? ? ? ? 有關Gibbs采樣在PRML中有一個通用的描述：

? ? ? ? 為什么說Gibbs采樣可以看成是MH算法的一個特例呢？

即Gibbs采樣相當于MH算法接收概率等于1的特殊情形。

總結

以上是生活随笔為你收集整理的贝叶斯网络结构学习之MCMC算法（基于FullBNT-1.0.4的MATLAB实现）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： jquery ui php,php –
下一篇：读取 wps_软件前世今生篇之WPS（求