日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Paper Review: Bayesian Shrinkage towards Sharp Minimaxity

發布時間:2025/4/14 编程问答 29 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Paper Review: Bayesian Shrinkage towards Sharp Minimaxity 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Paper Review: Bayesian Shrinkage towards Sharp Minimaxity

Motivation and Conclusion

Sparse normal mean model (?~N(σ2In)\epsilon \sim N(\sigma^2I_n)?N(σ2In?) but set σ2=1\sigma^2=1σ2=1):
y=θ+?,?~N(0,In)y = \theta+\epsilon,\epsilon \sim N(0,I_n)y=θ+?,?N(0,In?)

A general form of shrinkage prior:
π(θ∣τ)=∏i=1n1τπ0(θiτ),τ~π(τ)\pi(\theta|\tau) = \prod_{i=1}^n \frac{1}{\tau}\pi_0\left( \frac{\theta_i}{\tau} \right),\tau \sim \pi(\tau)π(θτ)=i=1n?τ1?π0?(τθi??),τπ(τ)

If π0\pi_0π0? is mixture of Gaussian, the general form leads to local-global shrinkage:
θi~N(λi2τ2),λi2~π(λi2),τ~π(τ)\theta_i \sim N(\lambda_i^2\tau^2),\lambda_i^2 \sim \pi(\lambda_i^2),\tau \sim \pi(\tau)θi?N(λi2?τ2),λi2?π(λi2?),τπ(τ)

Observation: about contraction rate
Let θ?\theta^*θ? be true parameter, sss be the number of nonzero entries in θ\thetaθ, rnr_nrn? denote the contraction rate.

Frequentest:
min?θ^max?θ?∥θ^?θ?∥=(2+o(1))slog?ns\min_{\hat \theta} \max_{\theta^*} \left\| \hat \theta - \theta^* \right\|=\sqrt{(2+o(1))s\log \frac{n}{s}}θ^min?θ?max??θ^?θ??=(2+o(1))slogsn??
Bayesian:

  • Dirichlet-Laplace prior: rn?slog?nsr_n\asymp\sqrt{s\log \frac{n}{s}}rn??slogsn?? when ∥θ?∥≤slog?2ns\left\| \theta^*\right\| \le \sqrt{s} \log^2\frac{n}{s}θ?s?log2sn?
  • Horseshoe prior: rn=Mnslog?ns,asMn→∞r_n=M_n\sqrt{s\log \frac{n}{s}},\ as\ M_n \to \inftyrn?=Mn?slogsn??,?as?Mn?
  • In general, polynomial decaying π0\pi_0π0? leads to near optimal rate
  • Further questions: How order of polynomial decaying π0\pi_0π0? affects contraction rate? How to choose τ\tauτ to achieve (near-) optimal contraction rate given polynomial decaying π0\pi_0π0??

    Contribution of this paper:

  • If order of π0\pi_0π0?, say α≈1\alpha \approx 1α1, rn/2slog?ns≈1r_n/\sqrt{2s\log\frac{n}{s}} \approx 1rn?/2slogsn??1 (Bayesian sharp minimaxity, Thm 2.1)
  • Choosing τ\tauτ requires knowledge on s/ns/ns/n, so the author proposed a Beta modeling on τ\tauτ to avoid unknown information.
  • Questions not been covered

  • How rnr_nrn? changes w.r.t lim?nαn→1\lim_n \alpha_n \to 1limn?αn?1?
  • How about α=1\alpha = 1α=1 (Thm 2.1 breaks down)?
  • Beyond contraction rate, how α\alphaα affects model selection?
  • How α\alphaα affects contraction rate in linear regression setting?
  • Bayesian sharp minimaxity

    Import conditions on model sparsity and π0\pi_0π0?

    For simplicity, τ\tauτ is a deterministic value and θi\theta_iθi?s are mutually independent.


    Remark 1: Conditions for τ\tauτ,

  • τα?1≥(s/n)clog?(n/s)\tau^{\alpha-1}\ge (s/n)^c\sqrt{\log (n/s)}τα?1(s/n)clog(n/s)?, for some c∈(0,1+w/2)c \in (0,1+w/2)c(0,1+w/2). τ\tauτ cannot be too small, or θ\thetaθ will be over-shrunk.
  • τα?1?(s/n)α[log?(n/s)]α\tau^{\alpha-1}\prec (s/n)^{\alpha}[\log (n/s)]^{\alpha}τα?1?(s/n)α[log(n/s)]α. τ\tauτ cannot be too large, or θ\thetaθ will be insufficient shrunk.
  • τα?1?(s/n)α[log?(n/s)](1+α)/2\tau^{\alpha-1}\prec (s/n)^{\alpha}[\log (n/s)]^{(1+\alpha)/2}τα?1?(s/n)α[log(n/s)](1+α)/2. This is the condition for L1L_1L1? contraction rate.
  • These conditions indicate α∈(1,1+w/2)\alpha \in (1,1+w/2)α(1,1+w/2) and www should be as small as possible.

    Remark 2: E?E^*E? is the expectation under true parameter θ?\theta^*θ?. Theoretical results indicate L2L_2L2? contraction rate is not greater than O(slog?(n/s))O(\sqrt{s\log (n/s)})O(slog(n/s)?) and L1L_1L1? contraction rate is not greater than O(slog?(n/s))O(s\sqrt{\log (n/s)})O(slog(n/s)?).

    Remark 3: Note that log?(n/s)?(n/s)c,?c>0\log(n/s)\prec (n/s)^c,\exists c>0log(n/s)?(n/s)c,?c>0. This observation leads to Corollary 2.1 which unifies (2.1) and (2.2).


    Remark 4: Corollary 2.1 indicates τ?(s/n)c/(α?1)\tau \asymp (s/n)^{c/(\alpha-1)}τ?(s/n)c/(α?1). Select c=α+δc=\alpha+\deltac=α+δ for very small δ>0\delta>0δ>0. So a good choice would be τ?(s/n)(α+δ)/(α?1)\tau \asymp (s/n)^{(\alpha+\delta)/(\alpha-1)}τ?(s/n)(α+δ)/(α?1). However, we don’t know sss. An alternative is τ?(1/n)(α+δ)/(α?1)\tau \asymp (1/n)^{(\alpha+\delta)/(\alpha-1)}τ?(1/n)(α+δ)/(α?1). Theorem 2.2 considers the properties of this alternative.


    Remark 5: Conditions for τ\tauτ,

  • τα?1≥(1/n)clog?(n/s)\tau^{\alpha-1}\ge (1/n)^c\sqrt{\log (n/s)}τα?1(1/n)clog(n/s)?, replace sss with 1
  • τα?1?(s/n)α[log?(n/s)](1+α)/2\tau^{\alpha-1}\prec (s/n)^{\alpha}[\log (n/s)]^{(1+\alpha)/2}τα?1?(s/n)α[log(n/s)](1+α)/2
  • Theoretical results indicate L2L_2L2? contraction rate is not greater than O(slog?(n))O(\sqrt{s\log (n)})O(slog(n)?) (sub-optimal) and L1L_1L1? contraction rate is not greater than O(slog?(n))O(s\sqrt{\log (n)})O(slog(n)?). If log?(s)?log?(n)\log(s) \prec \log(n)log(s)?log(n), sub-optimal is asymptotically non-different from optimal. If s?nc,c∈(0,1)s \asymp n^c,c \in (0,1)s?nc,c(0,1), sub-optimal has the same order as optimal. If log?(s)~log?(n)\log(s) \sim \log(n)log(s)log(n), sub-optimal is of greater order.

    Remark 6: Theorems above are derived based on deterministic τ\tauτ. Now consider π(τ)\pi(\tau)π(τ). π(τ)\pi(\tau)π(τ) should shrink to zero but should not shrink to zero so fast because π(τ)\pi(\tau)π(τ) needs to assign a little density to (s/n)(α+δ)/(α?1)(s/n)^{(\alpha+\delta)/(\alpha-1)}(s/n)(α+δ)/(α?1). Theorem 3.1 provides sufficient conditions on τ\tauτ to guarantee (2.1) and (2.2).

    Remark 7: The prior density of τ\tauτ is split into three parts: around zero, (s/n)(1+w/2)/(α?1)(s/n)^{(1+w/2)/(\alpha-1)}(s/n)(1+w/2)/(α?1) to (s/n)α/(α?1)(s/n)^{\alpha/(\alpha-1)}(s/n)α/(α?1), and greater than (s/n)α/(α?1)(s/n)^{\alpha/(\alpha-1)}(s/n)α/(α?1). The first part is very huge and the second part is minor. Assume the third part is decay to zero.

    Remark 8: A possible choice of π(τ)\pi(\tau)π(τ) is Beta (which may be multi-modal), i.e. τ~[Beta(1,n)]c,c∈(α/(α?1),(1+w/2)/(α?1))\tau \sim [Beta(1,n)]^c,c \in (\alpha/(\alpha-1),(1+w/2)/(\alpha-1))τ[Beta(1,n)]c,c(α/(α?1),(1+w/2)/(α?1)).

    Remark 9: Note that the restriction on θ?\theta^*θ? is a technique assumption. Without this assumption, it’s possible to achieve sub-optimal. See Theorem 3.2.

    總結

    以上是生活随笔為你收集整理的Paper Review: Bayesian Shrinkage towards Sharp Minimaxity的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。