Deep gratitude to Prof. Ed George, who opened a door to the world of emperical bayes for me, I began to read Prof. Jim Berger’s book *Statistical Decision Theory and Bayesian Analysis* (1985). Here I mainly focus on Chapter 1, 3, 4, 5 of Jim’s book and also share some thoughts on Ed’s paper *Minimax Multiple Shrinkage Estimation* (1986).

# Why Bayesian?

Different people would have different conclusions based on their prior beliefs of the plausibility of the event, Baysian analysis is to seek to utilize prior information.**Some Definitions:**

- Bayeisan expected loss: $\rho (\pi^*, a) = E^{\pi^*}L(\theta, a) = \int_\Theta L(\theta, a)dF^{\pi^*}(\theta)$
- Risk function of decision rule $\delta(x)$: $R(\theta, \delta) = E^X_\theta [L(\theta,\delta(X))] = \int_x L(\theta, \delta(x))dF^X(x|\theta)$
- A decision rule $\delta$ is admissible if there exists no R-better decision rule. Inadmissible should not be used because we can found better decision rule with smaller risk.
- Bayesian risk of decision rule $\delta$: $r(\pi, \delta) = E^\pi [R(\theta, \delta) ]$
- Randomized decision rule: $\delta^*(x,A)$ is a probability distribution on $\mathscr{A}$ that for each $x$, if observed, an action $A$ will be chosen.

## Decision Principle

### Bayes Risk Principle

A decision rule $\delta_1$ is preferred to a rule $\delta_2$ if $r(\pi, \delta_1) < r(\pi, \delta_2)$. **Bayes rule** $\delta^\pi$ is optimal that minimizes **Bayes risk** $r(\pi, \delta^\pi)$. It can also be written as $r(\pi)$.

**Example:**

Assume $X$ is $\mathcal{N}(\theta, 1)$, and it is desired to estimate $\theta$ under the squared-error loss $L(\theta, a) = (\theta-a)^2$. Consider the decision rule $\delta_c(x) = cx$, then

$$\begin{align}
R(\theta, \delta_c) &= E_\theta^XL(\theta, \delta_c(X)) = E_\theta^XL(\theta - cX)^2 \\
&= E_\theta^X(c[\theta-X] + [1-c]\theta)^2\\
&= c^2 E_\theta^X [\theta-X]^2 + 2c(1-c)\theta E_\theta^X[\theta-X] + (1-c)^2\theta^2\\
&= c^2 + (1-c)^2\theta^2
\end{align}$$

So the Bayes risk $r(\pi, \delta_c) = c^2 + (1-c)^2\tau^2$ for $\pi\sim \mathcal{N}(0,\tau^2)$ is minimized when $c = c_0 = \tau^2/(1+\tau^2)$, which is

$$\begin{align}
r(\pi) &= r(\pi, \delta_{c_0}) = c_0^2 + (1-c_0)^2\tau^2 = (\frac{\tau^2}{1+\tau^2})^2 + (\frac{1}{1+\tau^2})^2\tau^2 = \frac{\tau^2}{1+\tau^2}
\end{align}$$

### Minimax Principle

Let $\delta^*\in \mathscr{D}^*$ be a randomized rule, the quantity $\underset{\theta\in \Theta}{\text{sup }} R(\theta, \delta^* )$ is the worst scenario for the rule $\delta^*$. To protect the worst possible state of nature, we come to use **minimax principle**.

A rule $\delta^{*M}$ is a minimax decision rule if it minimizes $\underset{\theta}{\text{sup }} R(\theta, \delta^* )$ among all randomized rules in $\mathscr{D}^*$:

$$\begin{align}
\underset{\theta}{\text{sup }} R(\theta, \delta^{*M} ) = \underset{\delta^*\in\mathscr{D}^*}{\text{inf }} \underset{\theta\in \Theta}{\text{sup }} R(\theta, \delta^* )
\end{align}$$

### *Invariance Principle

## Foundation

What we are really interested in determining is whether or not the null hypothesis is *approximately* true.

## Convexity

- Convex: A set is convex if if for any two pooints $\mathbf{x}$ and $\mathbf{y}$ in $\Omega$, the point $[\alpha\mathbf{x}+(1-\alpha)\mathbf{y} ]$ is in $\Omega$ for $0\leq \alpha \leq 1$.
- Convex combination: If ${\mathbf{x}^1, \mathbf{x}^2, …}$ is a sequence of points in R^m, and $0\leq\alpha_i\leq 1$ are numbers such that $\sum_{i=1}^\infty\alpha_i =1$, then $\sum_{i=1}^\infty\alpha_i\mathbf{x}^i$ is convex combination of ${\mathbf{x}^i}$

# Prior Information

**Haven’t finished yet. To be continued…**

# Reference

[1] Berger James, O. “Statistical decision theory and Bayesian analysis.” Berlin: Spring-Verlag (1985).

[2] George, Edward I. “Minimax multiple shrinkage estimation.” The Annals of Statistics 14.1 (1986): 188-205.