### Continuous Data and the Gaussian Distribution¶

**[1]** (##) We are given an IID data set $D = \{x_1,x_2,\ldots,x_N\}$, where $x_n \in \mathbb{R}^M$. Let's assume that the data were drawn from a multivariate Gaussian (MVG),
$$\begin{align*}
p(x_n|\theta) = \mathcal{N}(x_n|\,\mu,\Sigma) = |2 \pi \Sigma|^{-\frac{1}{2}} \exp\left\{-\frac{1}{2}(x_n-\mu)^T
\Sigma^{-1} (x_n-\mu) \right\}
\end{align*}$$

(a) Derive the log-likelihood of the parameters for these data.

(b) Derive the maximum likelihood estimates for the mean $\mu$ and variance $\Sigma$ by setting the derivative of the log-likelihood to zero.

**[2]** (#) Shortly explain why the Gaussian distribution is often preferred as a prior distribution over other distributions with the same support?

**[3]** (###) Proof that the Gaussian distribution is the maximum entropy distribution over the reals with specified mean and variance.

**[4]** (##) Proof that a linear transformation $z=Ax+b$ of a Gaussian variable $\mathcal{N}(x|\mu,\Sigma)$ is Gaussian distributed as
$$
p(z) = \mathcal{N} \left(z \,|\, A\mu+b, A\Sigma A^T \right)
$$

**[5]** (#) Given independent variables
$x \sim \mathcal{N}(\mu_x,\sigma_x^2)$ and $y \sim \mathcal{N}(\mu_y,\sigma_y^2)$, what is the PDF for $z = A\cdot(x -y) + b$?

\begin{equation*}
\int_{-\infty}^{\infty} \exp(-x^2)\mathrm{d}x \,.
\end{equation*}