當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

Paper Review: Bayesian Shrinkage towards Sharp Minimaxity

發(fā)布時(shí)間：2025/4/14 编程问答 35 豆豆

生活随笔收集整理的這篇文章主要介紹了 Paper Review: Bayesian Shrinkage towards Sharp Minimaxity 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Paper Review: Bayesian Shrinkage towards Sharp Minimaxity

Motivation and Conclusion

Sparse normal mean model ( $?～N(σ2In)\epsilon \sim N(\sigma^2I_n)$ but set $σ2=1\sigma^2=1$ ):
$\theta+\epsilon,\epsilon \sim N(0,I_n)$

A general form of shrinkage prior:
$π(θ∣τ)=∏i=1n1τπ0(θiτ),τ～π(τ)\pi(\theta|\tau) = \prod_{i=1}^n \frac{1}{\tau}\pi_0\left( \frac{\theta_i}{\tau} \right),\tau \sim \pi(\tau)$

If $π0\pi_0$ is mixture of Gaussian, the general form leads to local-global shrinkage:
$θi～N(λi2τ2),λi2～π(λi2),τ～π(τ)\theta_i \sim N(\lambda_i^2\tau^2),\lambda_i^2 \sim \pi(\lambda_i^2),\tau \sim \pi(\tau)$

Observation: about contraction rate
Let $θ?\theta^*$ be true parameter, $s$ be the number of nonzero entries in $θ\theta$ , $r_n$ denote the contraction rate.

Frequentest:
$min?θ^max?θ?∥θ^?θ?∥=(2+o(1))slog?ns\min_{\hat \theta} \max_{\theta^*} \left\| \hat \theta - \theta^* \right\|=\sqrt{(2+o(1))s\log \frac{n}{s}}$
Bayesian:

Dirichlet-Laplace prior:

rn?slog?nsr_n\asymp\sqrt{s\log \frac{n}{s}}

when

∥θ?∥≤slog?2ns\left\| \theta^*\right\| \le \sqrt{s} \log^2\frac{n}{s}

Horseshoe prior:

rn=Mnslog?ns,asMn→∞r(nóng)_n=M_n\sqrt{s\log \frac{n}{s}},\ as\ M_n \to \infty

In general, polynomial decaying

π0\pi_0

leads to near optimal rate

Further questions: How order of polynomial decaying $π0\pi_0$ affects contraction rate? How to choose $τ\tau$ to achieve (near-) optimal contraction rate given polynomial decaying $π0\pi_0$ ?

Contribution of this paper:

If order of

π0\pi_0

, say

α≈1\alpha \approx 1

rn/2slog?ns≈1r_n/\sqrt{2s\log\frac{n}{s}} \approx 1

(Bayesian sharp minimaxity, Thm 2.1)

Choosing

τ\tau

requires knowledge on

s / n

, so the author proposed a Beta modeling on

τ\tau

to avoid unknown information.

Questions not been covered

How

r_n

changes w.r.t

lim?nαn→1\lim_n \alpha_n \to 1

How about

α=1\alpha = 1

(Thm 2.1 breaks down)?

Beyond contraction rate, how

α\alpha

affects model selection?

How

α\alpha

affects contraction rate in linear regression setting?

Bayesian sharp minimaxity

Import conditions on model sparsity and $π0\pi_0$

For simplicity, $τ\tau$ is a deterministic value and $θi\theta_i$ s are mutually independent.

Remark 1: Conditions for $τ\tau$ ,

τα?1≥(s/n)clog?(n/s)\tau^{\alpha-1}\ge (s/n)^c\sqrt{\log (n/s)}

, for some

\in (0,1+w/2)

τ\tau

cannot be too small, or

θ\theta

will be over-shrunk.

τα?1?(s/n)α[log?(n/s)]α\tau^{\alpha-1}\prec (s/n)^{\alpha}[\log (n/s)]^{\alpha}

τ\tau

cannot be too large, or

θ\theta

will be insufficient shrunk.

τα?1?(s/n)α[log?(n/s)](1+α)/2\tau^{\alpha-1}\prec (s/n)^{\alpha}[\log (n/s)]^{(1+\alpha)/2}

. This is the condition for

L_1

contraction rate.

These conditions indicate $α∈(1,1+w/2)\alpha \in (1,1+w/2)$ and $w$ should be as small as possible.

Remark 2: $E^*$ is the expectation under true parameter $θ?\theta^*$ . Theoretical results indicate $L_2$ contraction rate is not greater than $O(slog?(n/s))O(\sqrt{s\log (n/s)})$ and $L_1$ contraction rate is not greater than $O(slog?(n/s))O(s\sqrt{\log (n/s)})$ .

Remark 3: Note that $log?(n/s)?(n/s)c,?c>0\log(n/s)\prec (n/s)^c,\exists c>0$ . This observation leads to Corollary 2.1 which unifies (2.1) and (2.2).

Remark 4: Corollary 2.1 indicates $τ?(s/n)c/(α?1)\tau \asymp (s/n)^{c/(\alpha-1)}$ . Select $c=α+δc=\alpha+\delta$ for very small $δ>0\delta>0$ . So a good choice would be $τ?(s/n)(α+δ)/(α?1)\tau \asymp (s/n)^{(\alpha+\delta)/(\alpha-1)}$ . However, we don’t know $s$ . An alternative is $τ?(1/n)(α+δ)/(α?1)\tau \asymp (1/n)^{(\alpha+\delta)/(\alpha-1)}$ . Theorem 2.2 considers the properties of this alternative.

Remark 5: Conditions for $τ\tau$ ,

τα?1≥(1/n)clog?(n/s)\tau^{\alpha-1}\ge (1/n)^c\sqrt{\log (n/s)}

, replace

s

with 1

τα?1?(s/n)α[log?(n/s)](1+α)/2\tau^{\alpha-1}\prec (s/n)^{\alpha}[\log (n/s)]^{(1+\alpha)/2}

Theoretical results indicate $L_2$ contraction rate is not greater than $O(slog?(n))O(\sqrt{s\log (n)})$ (sub-optimal) and $L_1$ contraction rate is not greater than $O(slog?(n))O(s\sqrt{\log (n)})$ . If $log?(s)?log?(n)\log(s) \prec \log(n)$ , sub-optimal is asymptotically non-different from optimal. If $\asymp n^c,c \in (0,1)$ , sub-optimal has the same order as optimal. If $log?(s)～log?(n)\log(s) \sim \log(n)$ , sub-optimal is of greater order.

Remark 6: Theorems above are derived based on deterministic $τ\tau$ . Now consider $π(τ)\pi(\tau)$ . $π(τ)\pi(\tau)$ should shrink to zero but should not shrink to zero so fast because $π(τ)\pi(\tau)$ needs to assign a little density to $(s/n)(α+δ)/(α?1)(s/n)^{(\alpha+\delta)/(\alpha-1)}$ . Theorem 3.1 provides sufficient conditions on $τ\tau$ to guarantee (2.1) and (2.2).

Remark 7: The prior density of $τ\tau$ is split into three parts: around zero, $(s/n)(1+w/2)/(α?1)(s/n)^{(1+w/2)/(\alpha-1)}$ to $(s/n)α/(α?1)(s/n)^{\alpha/(\alpha-1)}$ , and greater than $(s/n)α/(α?1)(s/n)^{\alpha/(\alpha-1)}$ . The first part is very huge and the second part is minor. Assume the third part is decay to zero.

Remark 8: A possible choice of $π(τ)\pi(\tau)$ is Beta (which may be multi-modal), i.e. $τ～[Beta(1,n)]c,c∈(α/(α?1),(1+w/2)/(α?1))\tau \sim [Beta(1,n)]^c,c \in (\alpha/(\alpha-1),(1+w/2)/(\alpha-1))$ .

Remark 9: Note that the restriction on $θ?\theta^*$ is a technique assumption. Without this assumption, it’s possible to achieve sub-optimal. See Theorem 3.2.

總結(jié)

以上是生活随笔為你收集整理的Paper Review: Bayesian Shrinkage towards Sharp Minimaxity的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： UA MATH567 高维统计I 概率不
下一篇： [概统]本科二年级概率论与数理统计第

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

Paper Review: Bayesian Shrinkage towards Sharp Minimaxity

Paper Review: Bayesian Shrinkage towards Sharp Minimaxity

Motivation and Conclusion

Bayesian sharp minimaxity

總結(jié)