Deep gratitude to Prof. Ed George, who opened a door to the world of emperical bayes for me, I began to read Prof. Jim Berger’s book Statistical Decision Theory and Bayesian Analysis (1985). Here I mainly focus on Chapter 1, 3, 4, 5 of Jim’s book and also share some thoughts on Ed’s paper Minimax Multiple Shrinkage Estimation (1986).

Why Bayesian?

Different people would have different conclusions based on their prior beliefs of the plausibility of the event, Baysian analysis is to seek to utilize prior information.

Some Definitions:

Bayeisan expected loss: $\rho (\pi^*, a) = E^{\pi^*}L(\theta, a) = \int_\Theta L(\theta, a)dF^{\pi^*}(\theta)$
Risk function of decision rule $\delta(x)$: $R(\theta, \delta) = E^X_\theta [L(\theta,\delta(X))] = \int_x L(\theta, \delta(x))dF^X(x|\theta)$
A decision rule $\delta$ is admissible if there exists no R-better decision rule. Inadmissible should not be used because we can found better decision rule with smaller risk.
Bayesian risk of decision rule $\delta$: $r(\pi, \delta) = E^\pi [R(\theta, \delta) ]$
Randomized decision rule: $\delta^*(x,A)$ is a probability distribution on $\mathscr{A}$ that for each $x$, if observed, an action $A$ will be chosen.

Decision Principle

Bayes Risk Principle

A decision rule $\delta_1$ is preferred to a rule $\delta_2$ if $r(\pi, \delta_1) < r(\pi, \delta_2)$. Bayes rule $\delta^\pi$ is optimal that minimizes Bayes risk $r(\pi, \delta^\pi)$. It can also be written as $r(\pi)$.

Example:
Assume $X$ is $\mathcal{N}(\theta, 1)$, and it is desired to estimate $\theta$ under the squared-error loss $L(\theta, a) = (\theta-a)^2$. Consider the decision rule $\delta_c(x) = cx$, then
$$\begin{align} R(\theta, \delta_c) &= E_\theta^XL(\theta, \delta_c(X)) = E_\theta^XL(\theta - cX)^2 \\ &= E_\theta^X(c[\theta-X] + [1-c]\theta)^2\\ &= c^2 E_\theta^X [\theta-X]^2 + 2c(1-c)\theta E_\theta^X[\theta-X] + (1-c)^2\theta^2\\ &= c^2 + (1-c)^2\theta^2 \end{align}$$
So the Bayes risk $r(\pi, \delta_c) = c^2 + (1-c)^2\tau^2$ for $\pi\sim \mathcal{N}(0,\tau^2)$ is minimized when $c = c_0 = \tau^2/(1+\tau^2)$, which is
$$\begin{align} r(\pi) &= r(\pi, \delta_{c_0}) = c_0^2 + (1-c_0)^2\tau^2 = (\frac{\tau^2}{1+\tau^2})^2 + (\frac{1}{1+\tau^2})^2\tau^2 = \frac{\tau^2}{1+\tau^2} \end{align}$$

Minimax Principle

Let $\delta^*\in \mathscr{D}^*$ be a randomized rule, the quantity $\underset{\theta\in \Theta}{\text{sup }} R(\theta, \delta^* )$ is the worst scenario for the rule $\delta^*$. To protect the worst possible state of nature, we come to use minimax principle.
A rule $\delta^{*M}$ is a minimax decision rule if it minimizes $\underset{\theta}{\text{sup }} R(\theta, \delta^* )$ among all randomized rules in $\mathscr{D}^*$:
$$\begin{align} \underset{\theta}{\text{sup }} R(\theta, \delta^{*M} ) = \underset{\delta^*\in\mathscr{D}^*}{\text{inf }} \underset{\theta\in \Theta}{\text{sup }} R(\theta, \delta^* ) \end{align}$$

*Invariance Principle

Foundation

What we are really interested in determining is whether or not the null hypothesis is approximately true.

Convexity

Convex: A set is convex if if for any two pooints $\mathbf{x}$ and $\mathbf{y}$ in $\Omega$, the point $[\alpha\mathbf{x}+(1-\alpha)\mathbf{y} ]$ is in $\Omega$ for $0\leq \alpha \leq 1$.
Convex combination: If ${\mathbf{x}^1, \mathbf{x}^2, …}$ is a sequence of points in R^m, and $0\leq\alpha_i\leq 1$ are numbers such that $\sum_{i=1}^\infty\alpha_i =1$, then $\sum_{i=1}^\infty\alpha_i\mathbf{x}^i$ is convex combination of ${\mathbf{x}^i}$

Prior Information

Haven’t finished yet. To be continued…

Reference

[1] Berger James, O. “Statistical decision theory and Bayesian analysis.” Berlin: Spring-Verlag (1985).
[2] George, Edward I. “Minimax multiple shrinkage estimation.” The Annals of Statistics 14.1 (1986): 188-205.

Researcher✨Qiuyi Wu

Bayesian Decision Theory Notes