INTRODUCTION TO NONELINEAR OPTIMIZATION Excise 5.2 Freudenstein and Roth Test Function
Amir Beck’s INTRODUCTION TO NONELINEAR OPTIMIZATION Theory, Algorithms, and Applications with MATLAB Excise 5.2
INTRODUCTION TO NONELINEAR OPTIMIZATION Theory, Algorithms, and Applications with MATLAB. Amir Beck. 2014
本文主要涉及題目(ii)的MATLAB部分,
題目(i): https://blog.csdn.net/IYXUAN/article/details/121454946
實驗目的
- 掌握回溯法梯度下降、阻尼牛頓法和混合牛頓法;
- 學會使用MATLAB,通過上述三種方法求解優化問題;
- 了解這三種方法的優劣,并能夠分析結果產生的原因。
實驗環境
MATLAB R2021a
實驗內容
Consider the Freudenstein and Roth test function
f(x)=f1(x)2+f2(x)2,x∈R2,f\left(x\right)=f_1\left(x\right)^2+f_2\left(x\right)^2,x\in \mathbb{R}^2,f(x)=f1?(x)2+f2?(x)2,x∈R2,
where
f1(x)=?13+x1+((5?x2)x2?2)x2,f_1\left(x\right)=-13+x_1+\left(\left(5-x_2\right)x_2-2\right)x_2\ ,f1?(x)=?13+x1?+((5?x2?)x2??2)x2??,
f2(x)=?29+x1+((x2+1)x2?14)x2.f_2\left(x\right)=-29+x_1+\left(\left(x_2+1\right)x_2-14\right)x_2\ .f2?(x)=?29+x1?+((x2?+1)x2??14)x2??.
Use MATLAB to employ the following three methods on the problem of minimizing fff :
All the algorithms should use the stopping criteria ∣∣?f(x)∣∣≤?,?=10?5\left|\left|\ \nabla f\left(x\right)\right|\right|\le\epsilon,\epsilon={10}^{-5}∣∣??f(x)∣∣≤?,?=10?5. Each algorithm should be employed four times on the following four starting points: (?50,7)T,(20,7)T,(20,?18)T,(5,?10)T\left(-50,7\right)^T,\left(20,7\right)^T,\left(20,-18\right)^T,\left(5,-10\right)^T(?50,7)T,(20,7)T,(20,?18)T,(5,?10)T. For each of the four starting points, compare the number of iterations and the point to which each method converged. If a method did not converge, explain why.
算法描述
gradient_method_backtracking
回溯法梯度下降算法描述:
設定 ?\epsilon?;
初始化 x0∈Rnx_0\in \mathbb{R}^nx0?∈Rn;
FOR k = 0, 1, 2, …
在算法循環的 2. 步長選取中,對于定步長的梯度下降來說,tkt_ktk? 為一常量,即 tk=tˉt_k=\bar{t}tk?=tˉ. 而對于采用回溯法的非精確線搜索來說,令步長的初值 tk=s,s>0t_k=s, s>0tk?=s,s>0,更新步長 tk←βtk,β∈(0,1)t_k \gets \beta t_k, \beta \in (0,1)tk?←βtk?,β∈(0,1),直到滿足 f(xk)?f(xk+tkdk)≥?αtk?f(xk)Tdk,α∈(0,1)f(x_k)-f(x_k+t_kd_k)\ge -\alpha t_k\nabla f(x_k)^Td_k, \alpha \in (0,1)f(xk?)?f(xk?+tk?dk?)≥?αtk??f(xk?)Tdk?,α∈(0,1),可以證明,當 t∈[0,?]t\in [0,\epsilon]t∈[0,?],上述不等式總成立。
newton_backtracking
回溯牛頓算法描述:
回溯牛頓法和上述梯度下降相似,只需將1. 下降方向選取 改為牛頓方向即可。
dk=?(?2f(xk))?1?f(xk)d_k=-\left( \nabla^2f(x_k) \right )^{-1}\nabla f(x_k)dk?=?(?2f(xk?))?1?f(xk?).
newton_hybrid
混合牛頓法描述:
在選取下降方向dkd_kdk?時,當 Hessian 矩陣?2f(xk)\nabla^2f(x_k)?2f(xk?)非正定時,使用回溯法梯度下降處理,即dk=??f(xk)d_k=-\nabla f(x_k)dk?=??f(xk?)。當?2f(xk)\nabla^2f(x_k)?2f(xk?)正定時,dkd_kdk?可選用牛頓方向。在判定 Hessian 矩陣是否正定時,使用 Cholesky 分解 作為判定條件。
實驗步驟
計算 f(x),g(x),h(x)f(x),g(x),h(x)f(x),g(x),h(x)
f(x)=2x12+12x1x22?32x1x2?84x1+2x26?8x25+2x24?80x23+12x22+864x2+1010f(x)=2x_1^2 + 12x_1x_2^2 - 32x_1x_2 - 84x_1 + 2x_2^6 - 8x_2^5 + 2x_2^4 - 80x_2^3 + 12x_2^2 + 864x_2 + 1010f(x)=2x12?+12x1?x22??32x1?x2??84x1?+2x26??8x25?+2x24??80x23?+12x22?+864x2?+1010
g(x)=(12x22?32x2+4x1?84,24x2?32x1+24x1x2?240x22+8x23?40x24+12x25+864)Tg(x)=\left ( 12x_2^2 - 32x_2 + 4x_1 - 84,24x_2 - 32x_1 + 24x_1x_2 - 240x_2^2 + 8x_2^3 - 40x_2^4 + 12x_2^5 + 864\right )^Tg(x)=(12x22??32x2?+4x1??84,24x2??32x1?+24x1?x2??240x22?+8x23??40x24?+12x25?+864)T
h(x)=(424x2?3224x2?3260x24?160x23+24x22?480x2+24x1+24)h(x)=\begin{pmatrix} 4 &24x_2 - 32 \\ 24x_2 - 32 &60x_2^4 - 160x_2^3 + 24x_2^2 - 480x_2 + 24x_1 + 24 \end{pmatrix}h(x)=(424x2??32?24x2??3260x24??160x23?+24x22??480x2?+24x1?+24?)
clc;clear syms f1 f2 x1 x2 f g h f1=-13+x1+((5-x2)*x2-2)*x2; f2=-29+x1+((x2+1)*x2-14)*x2; f=expand(f1^2+f2^2) % f = 2*x1^2 + 12*x1*x2^2 - 32*x1*x2 - 84*x1 + 2*x2^6 - 8*x2^5 + 2*x2^4 - 80*x2^3 + 12*x2^2 + 864*x2 + 1010 g=gradient(f) % g = [12*x2^2 - 32*x2 + 4*x1 - 84; 24*x2 - 32*x1 + 24*x1*x2 - 240*x2^2 + 8*x2^3 - 40*x2^4 + 12*x2^5 + 864] h=hessian(f) % h = [4, 24*x2 - 32; 24*x2 - 32, 60*x2^4 - 160*x2^3 + 24*x2^2 - 480*x2 + 24*x1 + 24]繪制函數的等高線和曲面圖
Figure 1. contour and surface plots of f(x)f(x)f(x) around the global optimal solution x?=(5,4)x^*=(5,4)x?=(5,4).
Figure 2. contour and surface plots of f(x)f(x)f(x) around the local optimal solution x?=(11.4128,?0.8968)x^*=(11.4128, -0.8968)x?=(11.4128,?0.8968).
clc;clear clc;clear clc;clear % local optimality x1 = linspace(11.40,11.42,50); x2 = linspace(-0.897,-0.8965,50); % global optimality % x1 = linspace(4.9,5.1,50); % x2 = linspace(3.9,4.1,50); [X1,X2] = meshgrid(x1,x2); f = 2*X1.^2 + 12*X1.*X2.^2 - 32*X1.*X2 - 84*X1 + 2*X2.^6 - 8*X2.^5 ...+ 2*X2.^4 - 80*X2.^3 + 12*X2.^2 + 864*X2 + 1010; createfigure(X1,X2,f) %% 由 surfc 函數生成: function createfigure(xdata1, ydata1, zdata1) %CREATEFIGURE(xdata1, ydata1, zdata1) % XDATA1: surface xdata % YDATA1: surface ydata % ZDATA1: surface zdata% Auto-generated by MATLAB on 08-Dec-2021 15:44:51% Create figure figure1 = figure;% Create axes axes1 = axes('Parent',figure1); hold(axes1,'on');% Create surf surf(xdata1,ydata1,zdata1,'Parent',axes1);% Create contour contour(xdata1,ydata1,zdata1,'ZLocation','zmin');view(axes1,[-113 13]); grid(axes1,'on'); axis(axes1,'tight'); hold(axes1,'off'); % Create colorbar colorbar(axes1);執行函數
clc;clear%% init s = 1; alpha = .5; beta = .5; epsilon = 1e-5; p1 = [-50;7]; p2 = [20;7]; p3 = [20;-18]; p4 = [5;-10]; f = @(x) 2*x(1)^2 + 12*x(1)*x(2)^2 - 32*x(1)*x(2) - 84*x(1) ...+ 2*x(2)^6 - 8*x(2)^5 + 2*x(2)^4 - 80*x(2)^3 + 12*x(2)^2 ...+ 864*x(2) + 1010; g = @(x) [12*x(2)^2 - 32*x(2) + 4*x(1) - 84; 24*x(2) - 32*x(1)...+ 24*x(1)*x(2) - 240*x(2)^2 + 8*x(2)^3 - 40*x(2)^4 + 12*x(2)^5 + 864]; h = @(x) [4, 24*x(2) - 32; 24*x(2) - 32, 60*x(2)^4 - 160*x(2)^3 ...+ 24*x(2)^2 - 480*x(2) + 24*x(1) + 24];%% call func % gradient_method_backtracking [x_gb1,fun_val_gb1] = gradient_method_backtracking(f,g,p1,s,alpha,... beta,epsilon); [x_gb2,fun_val_gb2] = gradient_method_backtracking(f,g,p2,s,alpha,... beta,epsilon); [x_gb3,fun_val_gb3] = gradient_method_backtracking(f,g,p3,s,alpha,... beta,epsilon); [x_gb4,fun_val_gb4] = gradient_method_backtracking(f,g,p4,s,alpha,... beta,epsilon);% newton_backtracking x_nb1 = newton_backtracking(f,g,h,p1,alpha,beta,epsilon); x_nb2 = newton_backtracking(f,g,h,p2,alpha,beta,epsilon); x_nb3 = newton_backtracking(f,g,h,p3,alpha,beta,epsilon); x_nb4 = newton_backtracking(f,g,h,p4,alpha,beta,epsilon);% newton_hybrid x_nh1 = newton_hybrid(f,g,h,p1,alpha,beta,epsilon); x_nh2 = newton_hybrid(f,g,h,p2,alpha,beta,epsilon); x_nh3 = newton_hybrid(f,g,h,p3,alpha,beta,epsilon); x_nh4 = newton_hybrid(f,g,h,p4,alpha,beta,epsilon); function [x,fun_val]=gradient_method_backtracking(f,g,x0,s,alpha,... beta,epsilon)% Gradient method with backtracking stepsize rule % % INPUT %======================================= % f ......... objective function % g ......... gradient of the objective function % x0......... initial point % s ......... initial choice of stepsize % alpha ..... tolerance parameter for the stepsize selection % beta ...... the constant in which the stepsize is multiplied % at each backtracking step (0<beta<1) % epsilon ... tolerance parameter for stopping rule % OUTPUT %======================================= % x ......... optimal solution (up to a tolerance) % of min f(x) % fun_val ... optimal function valuex=x0; grad=g(x); fun_val=f(x); iter=0; while (norm(grad)>epsilon)iter=iter+1;t=s;while (fun_val-f(x-t*grad)<alpha*t*norm(grad)^2)t=beta*t;endx=x-t*grad;fun_val=f(x);grad=g(x);fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...iter,norm(grad),fun_val); end function x=newton_backtracking(f,g,h,x0,alpha,beta,epsilon)% Newton’s method with backtracking % % INPUT %======================================= % f ......... objective function % g ......... gradient of the objective function % h ......... hessian of the objective function % x0......... initial point % alpha ..... tolerance parameter for the stepsize selection strategy % beta ...... the proportion in which the stepsize is multiplied % at each backtracking step (0<beta<1) % epsilon ... tolerance parameter for stopping rule % OUTPUT %======================================= % x ......... optimal solution (up to a tolerance) % of min f(x) % fun_val ... optimal function valuex=x0; gval=g(x); hval=h(x); d=hval\gval; iter=0; while ((norm(gval)>epsilon)&&(iter<10000))iter=iter+1;t=1;while(f(x-t*d)>f(x)-alpha*t*gval'*d)t=beta*t;endx=x-t*d;fprintf('iter= %2d f(x)=%10.10f\n',iter,f(x))gval=g(x);hval=h(x);d=hval\gval; end if (iter==10000)fprintf('did not converge\n') end function x=newton_hybrid(f,g,h,x0,alpha,beta,epsilon)% Hybrid Newton’s method % % INPUT %======================================= % f ......... objective function % g ......... gradient of the objective function % h ......... hessian of the objective function % x0......... initial point % alpha ..... tolerance parameter for the stepsize selection strategy % beta ...... the proportion in which the stepsize is multiplied % at each backtracking step (0<beta<1) % epsilon ... tolerance parameter for stopping rule % OUTPUT %======================================= % x ......... optimal solution (up to a tolerance) % of min f(x) % fun_val ... optimal function valuex=x0; gval=g(x); hval=h(x); [L,p]=chol(hval,'lower'); if (p==0)d=L'\(L\gval); elsed=gval; end iter=0; while ((norm(gval)>epsilon)&&(iter<10000))iter=iter+1;t=1;while(f(x-t*d)>f(x)-alpha*t*gval'*d)t=beta*t;endx=x-t*d;fprintf('iter= %2d f(x)=%10.10f\n',iter,f(x))gval=g(x);hval=h(x);[L,p]=chol(hval,'lower');if (p==0)d=L'\(L\gval);elsed=gval;end end if (iter==10000)fprintf('did not converge\n') end結果與分析
p1=(?50,7)T,p2=(20,7)T,p3=(20,?18)T,p4=(5,?10)Tp_1=\left(-50,7\right)^T,p_2=\left(20,7\right)^T,p_3=\left(20,-18\right)^T,p_4=\left(5,-10\right)^Tp1?=(?50,7)T,p2?=(20,7)T,p3?=(20,?18)T,p4?=(5,?10)T
gradient_method_backtracking
| p1 | 5.0, 4.0 | 0.000000 | - |
| p2 | 5.0, 4.0 | 0.000000 | - |
| p3 | 11.4128, -0.8968 | 48.984254 | - |
| p4 | 5.0, 4.0 | 0.000107 | - |
newton_backtracking
| p1 | 5.0, 4.0 | -0.000000 | 8 |
| p2 | 5.0, 4.0 | 0.000000 | 11 |
| p3 | 11.4128, -0.8968 | 48.984254 | 16 |
| p4 | 11.4128, -0.8968 | 48.984254 | 13 |
newton_hybrid
| p1 | 5.0, 4.0 | -0.000000 | 8 |
| p2 | 5.0, 4.0 | 0.000000 | 11 |
| p3 | 11.4128, -0.8968 | 48.984254 | 16 |
| p4 | 11.4128, -0.8968 | 48.984254 | 13 |
結果分析
回溯法梯度下降(gradient_method_backtracking)在f(x)f(x)f(x)函數上的表現情況比較糟糕,在所有初始點(p1-p4)上均不能滿足結束條件∣∣?f(x)∣∣≤?,?=10?5\left|\left|\ \nabla f\left(x\right)\right|\right|\le\epsilon,\epsilon={10}^{-5}∣∣??f(x)∣∣≤?,?=10?5,所以其迭代步數iter→∞iter \to \inftyiter→∞. 但是可以發現,隨著迭代次數的不斷增加,最優解x?x^*x?和函數值fun_valfun\_valfun_val均收斂。在起始點為p1,p2,p4時,回溯梯度下降求得的最優解xˉ→(5.0,4.0),fun_val→0.0\bar{x} \to (5.0,4.0), fun\_val \to 0.0xˉ→(5.0,4.0),fun_val→0.0,顯然是f(x)f(x)f(x)的全局最優,但當以p3為起始點時,所得解為局部最優。此外,當以p4為起始點時,fun_val→0.000107fun\_val \to 0.000107fun_val→0.000107,收斂效果不如p1,p2起始點。總之,回溯法梯度下降的結果說明,由梯度下降得到的f(x)f(x)f(x)的最優解,無論是在全局最優還是局部最優,其梯度?f(x?)\nabla f\left(x^*\right)?f(x?)難以收斂到0\mathscr{0}0,這也導致了梯度下降的發散。
回溯牛頓法(newton_backtracking)和混合牛頓法(newton_hybrid)得到的結果極為相似,二者在所有四個起始點上的收斂結果表現良好。當以p1,p2為起始點時,兩種方法均在10步左右收斂到全局最優(5.0,4.0)(5.0,4.0)(5.0,4.0)。當以p3,p4為起始點時,兩方法也能在16步以內收斂到局部最優(11.4128,?0.8968)(11.4128,-0.8968)(11.4128,?0.8968). 觀察兩種方法在四個起始點上的輸出結果,不難發現,輸出結果完全相同,說明?2f(xk),k=1,2,…\nabla ^2f(x_k), k=1,2,\dots?2f(xk?),k=1,2,…是非正定的,所以混合牛頓法中的(a)步(純牛頓)永遠不執行,混合牛頓此時等價于回溯牛頓。
總之,在f(x)f(x)f(x)上,當以∣∣?f(x)∣∣≤?,?=10?5\left|\left|\ \nabla f\left(x\right)\right|\right|\le\epsilon,\epsilon={10}^{-5}∣∣??f(x)∣∣≤?,?=10?5為終止條件時,牛頓法的迭代次數明顯少于梯度下降法。兩種方法,當選取的起始點不同,可能會導致收斂不到全局最優解。
?? Sylvan Ding ??
總結
以上是生活随笔為你收集整理的INTRODUCTION TO NONELINEAR OPTIMIZATION Excise 5.2 Freudenstein and Roth Test Function的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 计算机网络——CSMA/CD最小帧长相关
- 下一篇: 【剑指offer】面试题19:正则表达式