Deep gratitude to Prof. Ed George, who opened a door to the world of emperical bayes for me, I began to read Prof. Jim Berger’s book Statistical Decision Theory and Bayesian Analysis (1985). Here I mainly focus on Chapter 1, 3, 4, 5 of Jim’s book and also share some thoughts on Ed’s paper Minimax Multiple Shrinkage Estimation (1986).

# Why Bayesian?

Different people would have different conclusions based on their prior beliefs of the plausibility of the event, Baysian analysis is to seek to utilize prior information.

Some Definitions:

• Bayeisan expected loss: $\rho (\pi^*, a) = E^{\pi^*}L(\theta, a) = \int_\Theta L(\theta, a)dF^{\pi^*}(\theta)$
• Risk function of decision rule $\delta(x)$: $R(\theta, \delta) = E^X_\theta [L(\theta,\delta(X))] = \int_x L(\theta, \delta(x))dF^X(x|\theta)$
• A decision rule $\delta$ is admissible if there exists no R-better decision rule. Inadmissible should not be used because we can found better decision rule with smaller risk.
• Bayesian risk of decision rule $\delta$: $r(\pi, \delta) = E^\pi [R(\theta, \delta) ]$
• Randomized decision rule: $\delta^*(x,A)$ is a probability distribution on $\mathscr{A}$ that for each $x$, if observed, an action $A$ will be chosen.

## Decision Principle

### Bayes Risk Principle

A decision rule $\delta_1$ is preferred to a rule $\delta_2$ if $r(\pi, \delta_1) < r(\pi, \delta_2)$. Bayes rule $\delta^\pi$ is optimal that minimizes Bayes risk $r(\pi, \delta^\pi)$. It can also be written as $r(\pi)$.

Example:
Assume $X$ is $\mathcal{N}(\theta, 1)$, and it is desired to estimate $\theta$ under the squared-error loss $L(\theta, a) = (\theta-a)^2$. Consider the decision rule $\delta_c(x) = cx$, then
\begin{align} R(\theta, \delta_c) &= E_\theta^XL(\theta, \delta_c(X)) = E_\theta^XL(\theta - cX)^2 \\ &= E_\theta^X(c[\theta-X] + [1-c]\theta)^2\\ &= c^2 E_\theta^X [\theta-X]^2 + 2c(1-c)\theta E_\theta^X[\theta-X] + (1-c)^2\theta^2\\ &= c^2 + (1-c)^2\theta^2 \end{align}
So the Bayes risk $r(\pi, \delta_c) = c^2 + (1-c)^2\tau^2$ for $\pi\sim \mathcal{N}(0,\tau^2)$ is minimized when $c = c_0 = \tau^2/(1+\tau^2)$, which is
\begin{align} r(\pi) &= r(\pi, \delta_{c_0}) = c_0^2 + (1-c_0)^2\tau^2 = (\frac{\tau^2}{1+\tau^2})^2 + (\frac{1}{1+\tau^2})^2\tau^2 = \frac{\tau^2}{1+\tau^2} \end{align}

### Minimax Principle

Let $\delta^*\in \mathscr{D}^*$ be a randomized rule, the quantity $\underset{\theta\in \Theta}{\text{sup }} R(\theta, \delta^* )$ is the worst scenario for the rule $\delta^*$. To protect the worst possible state of nature, we come to use minimax principle.
A rule $\delta^{*M}$ is a minimax decision rule if it minimizes $\underset{\theta}{\text{sup }} R(\theta, \delta^* )$ among all randomized rules in $\mathscr{D}^*$:
\begin{align} \underset{\theta}{\text{sup }} R(\theta, \delta^{*M} ) = \underset{\delta^*\in\mathscr{D}^*}{\text{inf }} \underset{\theta\in \Theta}{\text{sup }} R(\theta, \delta^* ) \end{align}

## Foundation

What we are really interested in determining is whether or not the null hypothesis is approximately true.

## Convexity

• Convex: A set is convex if if for any two pooints $\mathbf{x}$ and $\mathbf{y}$ in $\Omega$, the point $[\alpha\mathbf{x}+(1-\alpha)\mathbf{y} ]$ is in $\Omega$ for $0\leq \alpha \leq 1$.
• Convex combination: If ${\mathbf{x}^1, \mathbf{x}^2, …}$ is a sequence of points in R^m, and $0\leq\alpha_i\leq 1$ are numbers such that $\sum_{i=1}^\infty\alpha_i =1$, then $\sum_{i=1}^\infty\alpha_i\mathbf{x}^i$ is convex combination of ${\mathbf{x}^i}$

# Prior Information

Haven’t finished yet. To be continued…

# Reference

 Berger James, O. “Statistical decision theory and Bayesian analysis.” Berlin: Spring-Verlag (1985).
 George, Edward I. “Minimax multiple shrinkage estimation.” The Annals of Statistics 14.1 (1986): 188-205.

0%