HW3: Transformations of Random Variables and MCMC | Statistical Pattern Recognition

Last modified: 2021-03-24 13:46

Due date: Wed. Mar. 24, 2021 at 11:59pm AoE (anywhere on Earth)

Status: RELEASED.

How to turn in: Submit PDF to https://www.gradescope.com/courses/239096/assignments/1089705

Questions?: Post to the hw3 topic on the Piazza discussion forums.

Instructions for Preparing your PDF Report

What to turn in: PDF of typeset answers via LaTeX. No handwritten solutions will be accepted, so that grading can be speedy and you get prompt feedback.

Please use the official LaTeX Template: https://raw.githubusercontent.com/tufts-ml-courses/comp136-21s-assignments/main/hw3/hw3_template.tex

Your PDF should include (in order):

Your full name
Collaboration statement
Problem 1 answer
Problem 2 answer

For each problem:

- Be sure to write a human-readable label for each part (e.g. "1c", 2a" and "3b")
- Be sure to mark each part via the in-browser Gradescope annotation tool)

Throughout this homework, we are practicing the skills necessary to derive and analyze mathematical expressions involving probabiility.

Each step of a mathematical derivation that you turn in should be justified by an accompanying short verbal phrase (e.g. "using Bayes rule").

Solutions that lack justifications or skip key steps without showing work will receive poor marks.

Problem 1: Transformations of Vector Random Variables

Background: Recall what we learned about transformations of vector random variables on day 11.

Suppose \(u\) is a \(D-\)length vector source random variable with known sample space \(\mathcal{U} \subseteq \mathbb{R}^D\) and PDF function \(f(u)\), so: \(\int_{u \in \mathcal{U}} f(u) du = 1\).

Suppose our target random variable \(x\) has a \(D\)-dimension vector sample space \(\mathcal{X} \subseteq \mathbb{R}^D\).

Suppose further that we have invertible transformation functions \(S\) and \(T\), which satisfy:

\(T : \mathcal{U} \rightarrow \mathcal{X}\)
\(S : \mathcal{X} \rightarrow \mathcal{U}\)
\(S(T(u)) = u\), for all \(u \in \mathcal{U}\)
\(T(S(x)) = x\), for all \(x \in \mathcal{X}\)

Then if we take a sample \(u \sim f\), and transform it to \(x = T(u)\), then we know the PDF of our transformed value \(x\) is:

\begin{align} p(x) = f( S(x) ) | \det\left( J_S(x) \right) | \end{align}

where \(J_S(x)\) is the \(D \times D\) Jacobian matrix of the function \(S(x)\) defined below. The symbol \(\det\) denotes the determinant computation (which always produces a scalar given a square matrix), and the bars denote the absolute value.

\begin{align} J_S(x) = \left[ \begin{array}{c c c c} \frac{d S_1}{dx_1} & \frac{d S_1}{dx_2} & \ldots & \frac{d S_1}{dx_D} \\ \frac{d S_2}{dx_1} & \frac{d S_2}{dx_2} & \ldots & \frac{d S_2}{dx_D} \\ \vdots & & \ddots & \\ \frac{d S_D}{dx_1} & \frac{d S_D}{dx_2} & \ldots & \frac{d S_D}{dx_D} \end{array} \right] \end{align}

Problem 1 Setup: Consider a vector sample \(u \in \mathbb{R}^D\) from the multivariate standard Gaussian distribution (standard means it has mean equal to the all-zero vector and covariance equal to the \(D\times D\) identity matrix \(I_D\)).

The source PDF is \(f(u) = (2\pi)^{\frac{-D}{2}} e^{-\frac{1}{2} u^T u}\).
The source-to-target transform is \(T(u) = L u + \mu\)
The target-to-source transform is \(S(x) = L^{-1} (x - \mu)\)
The first parameter is \(\mu \in \mathbb{R}^D\), a vector defining a mean
The second parameter is \(L\), a positive definite lower-triangular \(D \times D\) square matrix. Note that any lower triangular matrix whose main diagonal has all positive entries is positive definite (and thus invertible).

Partial Credit Option: If you wish, you can solve all parts of this problem (1a - 1d) for the specialized \(D=1\) univariate case only, for 85% credit.

1a: First, give a compact expression for the Jacobian matrix \(J_S(x)\) as a function of \(x, \mu, L\), for the specific target-to-source function \(S(x) = L^{-1} (x - \mu)\) defined above.

1b: Define \(\Sigma = L L^T\). Show the following:

\begin{align} ( L^{-1} (x - \mu))^T ( L^{-1} (x - \mu)) = (x-\mu)^T \Sigma^{-1} (x - \mu) \end{align}

1c: Define \(\Sigma = L L^T\). Show the following:

\begin{align} | \det (L^{-1}) | = \frac{1}{(\text{det} \Sigma)^{\frac{1}{2}}} \end{align}

1d: Bring the two previous answers together to show that the pdf of \(x = T(u) = L u + \mu\) is given by:

\begin{align} p(x) = \frac{1}{(2\pi)^{\frac{D}{2}}} \frac{1}{(\text{det} \Sigma)^{\frac{1}{2}}} e^{-\frac{1}{2} (x-\mu)^T \Sigma^{1} (x-\mu)} \end{align}

where we define \(\Sigma = L L^T\).

1e: Provide one full sentence to summarize what we have accomplished in problem 1.

Hints: Feel free to use any of the following identities

Hint (i): For any invertible, lower triangular matrix \(L\), its inverse \(L^{-1}\) is also a lower triangular matrix see proof
Hint (ii): For any invertible, lower triangular matrix \(L\), we know: \((L^{-1})^T = (L^T)^{-1}\)
Hint (iii): For any invertible matrices \(A\) and \(B\), we have \((A B)^{-1} = B^{-1} A^{-1}\)
Hint (iv): Determinant of inverse equals inverse of determinant: \(\det(A^{-1}) = \frac{1}{\det A}\)
Hint (v): Determinant of products equals product of determinants: \(\det(A B) = (\det A) (\det B)\)
Hint (vi): Determinant of lower triangular matrix is equal to determinant of its transponse: \(\det(L) = \det( L^T )\)
Hint (vii): Transpose of products equals product of transposes: \( (A B C)^T = C^T B^T A^T\)

FYI: To understand the statement \(\Sigma = L L^T\), it is helpful to realize that \(L\) is the Cholesky decomposition of \(\Sigma\).

Problem 2: Detailed Balance

We are interested in a scalar real random variable \(Z\) with sample space \(\Omega \subseteq \mathbb{R}\).

We wish we could sample from a target distribution with pdf function \(p^*(z)\). However, we cannot evaluate this PDF or sample from the distribution. Instead, we can compute an easier function \(\tilde{p}(z)\) for any \(z \in \Omega\), where

\begin{align} p^*(z) = \underbrace{\frac{1}{\int_{z \in \Omega} \tilde{p}(z) dz }}_{\text{normalization constant}} \tilde{p}(z) \end{align}

Remember that the normalization constant is just a positive scalar value, which does not depend on any specific candidate value \(z \in \Omega\) that we might assign to our random variable \(Z\).

Consider the Metropolis-Hastings algorithm for sampling from \(p^*\). We assume we are given a specific proposal distribution \(Q\) which, when given a current \(z\) value, proposes a new \(z'\) value. This distribution \(Q\) must satisfy:

\(Q\) is a valid PDF over the sample space \(\Omega\)
\(Q\) has an easy-to-evaluate PDF function, which we'll denote as \(Q(z' | z)\).
Given any previous value \(z\), \(Q\) is easy to draw samples from: \(z' \sim Q( \cdot | z)\).

Consider a single propose-accept transition using the Metropolis-Hastings algorithm. The PDF of a transition from \(z\) to some new location \(z' \neq z\) is:

\begin{align} \mathcal{T}( z' | z) = \text{min}(1, \frac{\tilde{p}(z') Q(z | z')}{\tilde{p}(z) Q(z' | z)} ) Q( z' | z) \end{align}

This is the compound probability of two events: we propose the specific different \(z'\) value from \(Q\), and then we accept it using the Metropolis-Hastings threshold.

2a: Show the following useful result for any \(x > 0, y > 0\):

\begin{align} \frac { \min \left( 1, \frac{x}{y} \right) } { \min \left( 1, \frac{y}{x} \right) } = \frac{x}{y} \end{align}

2b: Show that the Metropolis-Hastings transition distribution \(\mathcal{T}\) satisfies detailed balance with respect to the target distribution \(p^*\).

That is, show that:

\begin{align} p^*( a) \mathcal{T}( b | a) = p^*(b) \mathcal{T}( a | b) \end{align}

for all possible \(a \neq b\), where \(a, b\) are any two distinct values of random variable \(z\). Note that distinctness is key here, because it means definition of \(\mathcal{T}\) above applies.

Hint: Use the identity from 2a.