HW5: Sampling and MCMC | Statistical Pattern Recognition

Last modified: 2020-04-17 16:41

Due date: Sun Apr 26, 2020 at 11:59pm ET

Status: RELEASED

How to turn in: Submit PDF to https://www.gradescope.com/courses/82862/assignments/460082/

Questions?: Post to the hw5 topic on the Piazza discussion forums.

Instructions for Preparing your PDF Report

What to turn in: PDF of typeset answers via LaTeX. No handwritten solutions will be accepted, so that grading can be speedy and you get prompt feedback.

Please use the official LaTeX Template: https://raw.githubusercontent.com/tufts-ml-courses/comp136-spr-20s-assignments/master/hw5/hw5_template.tex

Your PDF should include (in order):

Your full name
About how many hours did you spend (debugging your approach, asking for help, writing it up)?
Collaboration statement
Problem 1 answer
Problem 2 answer
Problem 3 answer

For each problem:

- Be sure to write a human-readable label for each part (e.g. "1c", 2a" and "3b")
- Be sure to mark each part via the in-browser Gradescope annotation tool)

Throughout this homework, we are practicing the skills necessary to derive and analyze mathematical expressions involving probabiility.

Each step of a mathematical derivation that you turn in should be justified by an accompanying short verbal phrase (e.g. "using Bayes rule").

Solutions that lack justifications or skip key steps without showing work will receive poor marks.

Problem 1: Transformations of Scalar Random Variables

Background: Recall what we learned about transformations of random variables on day 20.

Suppose $u$ is a scalar random variable with known sample space $\mathcal{U}$ and PDF function $f(u)$, so: $\int_{u \in \mathcal{U}} f(u) du = 1.

Suppose further that we have invertible transformation functions $S$ and $T$, which satisfy:

$T : \mathcal{U} \rightarrow \mathcal{X}$
$S : \mathcal{X} \rightarrow \mathcal{U}$
$S(T(u)) = u$, for all $u \in \mathcal{U}$
$T(S(x)) = x$, for all $x \in \mathcal{X}$

Then if we take a sample $u \sim f$, and transform it to $x = T(u)$, then we know the PDF of our transformed value $x$ is:

\begin{align} p(x) = f( S(x) ) | S'(x) | \end{align}

where $S'(x)$ is the first derivative of function $S$ with respect to $x$, and the bars denote absolute value.

Problem 1 Setup: Consider a sample $u \in \mathbb{R}$ from the univariate standard Gaussian distribution (standard means it has mean zero and variance 1).

The source PDF is $f(u) = \frac{1}{(2\pi)^{1}{2}} e^{-\frac{1}{2} u^2}$.
The source-to-target transform is $T(u) = \sigma u + \mu$
The target-to-source transform is $S(x) = \frac{1}{\sigma} (x - \mu)$
The parameters are $\mu \in \mathbb{R}$ and $\sigma > 0$

1a: Give an expression for the Jacobian $S'(x) = \frac{d}{dx} S(x)$, as a function of $x, \mu, \sigma$, for the specific target-to-source function $S(x) = \frac{1}{\sigma}(x - \mu)$ defined above.

1b: Show that the pdf of $x = T(u) = \sigma u + \mu$ is given by:

\begin{align} p(x) = \frac{1}{(2\pi)^{\frac{1}{2}}} \frac{1}{\sigma} e^{-\frac{1}{2\sigma^2} (x-\mu)^2} \end{align}

Problem 2: Transformations of Vector Random Variables

Background: Recall what we learned about transformations of vector random variables on day 20.

Suppose $u$ is a $D-$length vector source random variable with known sample space $\mathcal{U} \subseteq \mathbb{R}^D$ and PDF function $f(u)$, so: $\int_{u \in \mathcal{U}} f(u) du = 1.

Suppose our target random variable $x$ has a $D$-dimension vector sample space $\mathcal{X} \subseteq \mathbb{R}^D$.

Suppose further that we have invertible transformation functions $S$ and $T$, which satisfy:

$T : \mathcal{U} \rightarrow \mathcal{X}$
$S : \mathcal{X} \rightarrow \mathcal{U}$
$S(T(u)) = u$, for all $u \in \mathcal{U}$
$T(S(x)) = x$, for all $x \in \mathcal{X}$

Then if we take a sample $u \sim f$, and transform it to $x = T(u)$, then we know the PDF of our transformed value $x$ is:

\begin{align} p(x) = f( S(x) ) | \det\left( J_S(x) \right) | \end{align}

where $J_S(x)$ is the $D \times D$ Jacobian matrix of the function $S(x)$ defined below. The symbol $\det$ denotes the determinant computation (which always produces a scalar given a square matrix), and the bars denote the absolute value.

\begin{align} J_S(x) = \left[ \begin{array}{c c c c} \frac{d S_1}{dx_1} & \frac{d S_1}{dx_2} & \ldots & \frac{d S_1}{dx_D} \\ \frac{d S_2}{dx_1} & \frac{d S_2}{dx_2} & \ldots & \frac{d S_2}{dx_D} \\ \vdots & & \ddots & \\ \frac{d S_D}{dx_1} & \frac{d S_D}{dx_2} & \ldots & \frac{d S_D}{dx_D} \end{array} \right] \end{align}

Problem 2 Setup: Consider a vector sample $u \in \mathbb{R}^D$ from the multivariate standard Gaussian distribution (standard means it has mean equal to the all-zero vector and covariance equal to the $D\times D$ identity matrix $I_D$).

The source PDF is $f(u) = (2\pi)^{\frac{-D}{2}} e^{-\frac{1}{2} u^T u}$.
The source-to-target transform is $T(u) = L u + \mu$
The target-to-source transform is $S(x) = L^{-1} (x - \mu)$
The first parameter is $\mu \in \mathbb{R}^D$, a vector defining a mean
The second parameter is $L$, a positive definite lower-triangular $D \times D$ square matrix. Note that any lower triangular matrix whose main diagonal has all positive entries is positive definite (and thus invertible).

2a: First, give a compact expression for the Jacobian matrix $J_S(x)$ as a function of $x, \mu, L$, for the specific target-to-source function $S(x) = L^{-1} (x - \mu)$ defined above.

2b: Define $\Sigma = L L^T$. Show the following:

\begin{align} ( L^{-1} (x - \mu))^T ( L^{-1} (x - \mu)) = (x-\mu)^T \Sigma^{-1} (x-mu) \end{align}

2c: Define $\Sigma = L L^T$. Show the following:

\begin{align} | \det (L^{-1}) | = \frac{1}{(\text{det} \Sigma)^{\frac{1}{2}}} \end{align}

2d: Bring the two previous answers together to show that the pdf of $x = T(u) = L u + \mu$ is given by:

\begin{align} p(x) = \frac{1}{(2\pi)^{\frac{D}{2}}} \frac{1}{(\text{det} \Sigma)^{\frac{1}{2}}} e^{-\frac{1}{2} (x-\mu)^T \Sigma^{1} (x-\mu)} \end{align}

where we define $\Sigma = L L^T$. Provide one full sentence to summarize what we have accomplished in problem 2.

Hints: Feel free to use any of the following identities

Hint 2(i): For any invertible, lower triangular matrix $L$, its inverse $L^{-1}$ is also a lower triangular matrix see proof
Hint 2(ii): For any invertible, lower triangular matrix $L$, we know: $(L^{-1})^T = (L^T)^{-1}$
Hint 2(iii): For any invertible matrices $A$ and $B$, we have $(A B)^{-1} = B^{-1} A^{-1}$
Hint 2(iv): Determinant of inverse equals inverse of determinant: $\det(A^{-1}) = \frac{1}{\det A}$
Hint 2(v): Determinant of products equals product of determinants: $\det(A B) = (\det A) (\det B)$
Hint 2(vi): Determinant of lower triangular matrix is equal to determinant of its transponse: $\det(L) = \det( L^T )$

FYI: To understand the statement $\Sigma = L L^T$, it is helpful to realize that $L$ is the Cholesky decomposition of $\Sigma$.

Problem 3: Detailed Balance

We are interested in a scalar real random variable $z$ with sample space $\Omega \subseteq \mathbb{R}$.

We wish we could sample from a target distribution $p^*(z)$. However, we cannot evaluate this PDF or sample from it. Instead, we can compute an easier function $\tilde{p}(z)$ for any $z \in \Omega$, where

\begin{align} p^*(z) = \frac{1}{\int \tilde{p}(z) dz } \tilde{p}(z) \end{align}

Consider the Metropolis-Hastings algorithm for sampling from $p^*$. We assume we are given a specific proposal distribution $Q$ which, when given a current $z$ value, proposes a new $z'$ value. This distribution $Q$ must satisfy:

$Q$ is a valid PDF over the sample space $\Omega$
$Q$ has an easy-to-evaluate PDF function, which we'll denote as $Q(z' | z)$.
Given any previous value $z$, $Q$ is easy to draw samples from: $z' \sim Q( \cdot | z)$.

Consider a single propose-accept transition using the Metropolis-Hastings algorithm. The PDF of a transition from $z$ to some new location $z' \neq z$ is:

\begin{align} \mathcal{T}( z' | z) = \text{min}(1, \frac{\tilde{p}(z') Q(z | z')}{\tilde{p}(z) Q(z' | z)} ) Q( z' | z) \end{align}

This is the compound probability of two events: we propose the specific different $z'$ value from $Q$, and then we accept it using the Metropolis-Hastings threshold.

3a: Show the following useful result for any $x > 0, y > 0$:

\begin{align} \frac { \min \left( 1, \frac{x}{y} \right) } { \min \left( 1, \frac{y}{x} \right) } = \frac{x}{y} \end{align}

3b: Show that the Metropolis-Hastings transition distribution $\mathcal{T}$ satisfies detailed balance with respect to the target distribution $p^*$.

That is, show that:

\begin{align} p^*( a) \mathcal{T}( b | a) = p^*(b) \mathcal{T}( a | b) \end{align}

for all possible $a \neq b$, where $a, b$ are any two distinct values of random variable $z$. Note that distinctness is key here, because it means definition of $\mathcal{T}$ above applies.

Hint: Use the identity from 3a.