Paper Review: Bayesian Shrinkage towards Sharp Minimaxity
Paper Review: Bayesian Shrinkage towards Sharp Minimaxity
Motivation and Conclusion
Sparse normal mean model (?~N(σ2In)\epsilon \sim N(\sigma^2I_n)?~N(σ2In?) but set σ2=1\sigma^2=1σ2=1):
y=θ+?,?~N(0,In)y = \theta+\epsilon,\epsilon \sim N(0,I_n)y=θ+?,?~N(0,In?)
A general form of shrinkage prior:
π(θ∣τ)=∏i=1n1τπ0(θiτ),τ~π(τ)\pi(\theta|\tau) = \prod_{i=1}^n \frac{1}{\tau}\pi_0\left( \frac{\theta_i}{\tau} \right),\tau \sim \pi(\tau)π(θ∣τ)=i=1∏n?τ1?π0?(τθi??),τ~π(τ)
If π0\pi_0π0? is mixture of Gaussian, the general form leads to local-global shrinkage:
θi~N(λi2τ2),λi2~π(λi2),τ~π(τ)\theta_i \sim N(\lambda_i^2\tau^2),\lambda_i^2 \sim \pi(\lambda_i^2),\tau \sim \pi(\tau)θi?~N(λi2?τ2),λi2?~π(λi2?),τ~π(τ)
Observation: about contraction rate
Let θ?\theta^*θ? be true parameter, sss be the number of nonzero entries in θ\thetaθ, rnr_nrn? denote the contraction rate.
Frequentest:
min?θ^max?θ?∥θ^?θ?∥=(2+o(1))slog?ns\min_{\hat \theta} \max_{\theta^*} \left\| \hat \theta - \theta^* \right\|=\sqrt{(2+o(1))s\log \frac{n}{s}}θ^min?θ?max?∥∥∥?θ^?θ?∥∥∥?=(2+o(1))slogsn??
Bayesian:
Further questions: How order of polynomial decaying π0\pi_0π0? affects contraction rate? How to choose τ\tauτ to achieve (near-) optimal contraction rate given polynomial decaying π0\pi_0π0??
Contribution of this paper:
Questions not been covered
Bayesian sharp minimaxity
Import conditions on model sparsity and π0\pi_0π0?
For simplicity, τ\tauτ is a deterministic value and θi\theta_iθi?s are mutually independent.
Remark 1: Conditions for τ\tauτ,
These conditions indicate α∈(1,1+w/2)\alpha \in (1,1+w/2)α∈(1,1+w/2) and www should be as small as possible.
Remark 2: E?E^*E? is the expectation under true parameter θ?\theta^*θ?. Theoretical results indicate L2L_2L2? contraction rate is not greater than O(slog?(n/s))O(\sqrt{s\log (n/s)})O(slog(n/s)?) and L1L_1L1? contraction rate is not greater than O(slog?(n/s))O(s\sqrt{\log (n/s)})O(slog(n/s)?).
Remark 3: Note that log?(n/s)?(n/s)c,?c>0\log(n/s)\prec (n/s)^c,\exists c>0log(n/s)?(n/s)c,?c>0. This observation leads to Corollary 2.1 which unifies (2.1) and (2.2).
Remark 4: Corollary 2.1 indicates τ?(s/n)c/(α?1)\tau \asymp (s/n)^{c/(\alpha-1)}τ?(s/n)c/(α?1). Select c=α+δc=\alpha+\deltac=α+δ for very small δ>0\delta>0δ>0. So a good choice would be τ?(s/n)(α+δ)/(α?1)\tau \asymp (s/n)^{(\alpha+\delta)/(\alpha-1)}τ?(s/n)(α+δ)/(α?1). However, we don’t know sss. An alternative is τ?(1/n)(α+δ)/(α?1)\tau \asymp (1/n)^{(\alpha+\delta)/(\alpha-1)}τ?(1/n)(α+δ)/(α?1). Theorem 2.2 considers the properties of this alternative.
Remark 5: Conditions for τ\tauτ,
Theoretical results indicate L2L_2L2? contraction rate is not greater than O(slog?(n))O(\sqrt{s\log (n)})O(slog(n)?) (sub-optimal) and L1L_1L1? contraction rate is not greater than O(slog?(n))O(s\sqrt{\log (n)})O(slog(n)?). If log?(s)?log?(n)\log(s) \prec \log(n)log(s)?log(n), sub-optimal is asymptotically non-different from optimal. If s?nc,c∈(0,1)s \asymp n^c,c \in (0,1)s?nc,c∈(0,1), sub-optimal has the same order as optimal. If log?(s)~log?(n)\log(s) \sim \log(n)log(s)~log(n), sub-optimal is of greater order.
Remark 6: Theorems above are derived based on deterministic τ\tauτ. Now consider π(τ)\pi(\tau)π(τ). π(τ)\pi(\tau)π(τ) should shrink to zero but should not shrink to zero so fast because π(τ)\pi(\tau)π(τ) needs to assign a little density to (s/n)(α+δ)/(α?1)(s/n)^{(\alpha+\delta)/(\alpha-1)}(s/n)(α+δ)/(α?1). Theorem 3.1 provides sufficient conditions on τ\tauτ to guarantee (2.1) and (2.2).
Remark 7: The prior density of τ\tauτ is split into three parts: around zero, (s/n)(1+w/2)/(α?1)(s/n)^{(1+w/2)/(\alpha-1)}(s/n)(1+w/2)/(α?1) to (s/n)α/(α?1)(s/n)^{\alpha/(\alpha-1)}(s/n)α/(α?1), and greater than (s/n)α/(α?1)(s/n)^{\alpha/(\alpha-1)}(s/n)α/(α?1). The first part is very huge and the second part is minor. Assume the third part is decay to zero.
Remark 8: A possible choice of π(τ)\pi(\tau)π(τ) is Beta (which may be multi-modal), i.e. τ~[Beta(1,n)]c,c∈(α/(α?1),(1+w/2)/(α?1))\tau \sim [Beta(1,n)]^c,c \in (\alpha/(\alpha-1),(1+w/2)/(\alpha-1))τ~[Beta(1,n)]c,c∈(α/(α?1),(1+w/2)/(α?1)).
Remark 9: Note that the restriction on θ?\theta^*θ? is a technique assumption. Without this assumption, it’s possible to achieve sub-optimal. See Theorem 3.2.
總結
以上是生活随笔為你收集整理的Paper Review: Bayesian Shrinkage towards Sharp Minimaxity的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: UA MATH567 高维统计I 概率不
- 下一篇: [概统]本科二年级 概率论与数理统计 第