Bayesian Deep Learning

Tufts CS Special Topics Course | COMP 150 - 03 BDL | Fall 2018

HW1: Gaussian Processes


PDF writeup due by Tuesday Sept 11 at 11:59PM. Submit your PDF to Gradescope: https://www.gradescope.com/courses/23483/assignments/92633

Your PDF should include:

Jump to: Problem 1   Problem 2

Background

Squared exponential kernel

See the squared exponential kernel formula in the Rasmussen & Williams (R&W) textbook Eq. 4.9:

$$ k_{\text{sq-exp}}(x, x') = \exp \left( -\frac{(x - x')^2}{2 \ell^2} \right) $$

Hyperparameters: \(\ell > 0\)

Matern kernel

See the Matern kernel formula in the R&W textbook Eq. 4.14:

$$ k_{\text{Matern}}(x, x') = \frac{2^{1-\nu}}{\Gamma(\nu)} \bigg( \frac {\sqrt{2\nu} |x - x'| } { \ell } \bigg)^\nu K^{\text{bessel}}_\nu\bigg(\frac{\sqrt{2\nu} |x - x'| }{\ell}\bigg) $$

Hyperparameters: \(\ell > 0\) and \(\nu > 0\)

Useful:

Helpful Hint

You might wish to define two Python functions, sqexp_kernel_func and matern_kernel_func to compute these kernels given any two possible inputs, \(x\) and \(x'\). It is probably smart to write these functions in a vectorized form, so that given two vectors of length \(A\) and \(B\), the function returns a kernel matrix of size \(A x B\).

Problem 1: Sampling from the Prior

Write Python code to sample function values from a Gaussian Process (GP) prior.

You should sample the function values that correspond to a set of at least 200 evenly-spaced test points \(\{x_i\}\) between -20 and 20. One way to generate a 1D array of \(G\) points would be: x_grid_G = np.linspace(-20, 20, G).

Your prior should be specified by:

Here's an example template (you don't have to use it):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def draw_GP_prior_samples_at_x_grid(
        x_grid_G, mean_func, cov_func,
        random_seed=42,
        n_samples=1):
    """ Draw sample from GP prior given mean/cov functions

    Args
    ----
    x_grid_G : 1D array, size n_grid_pts (G)
        Contains G specific x_i values to evaluate function at
    mean_func : function, maps (1D arr size A) to (1D arr size A)
        Computes mean value $m(x_i)$ at each input x_i value 
    cov_func : function, maps (1D arr size A, 1D arr size B) to (2D AxB)
        Computes covariance (kernel) value at each pair of inputs.
    random_seed : int
        See for the random number generator
    n_samples : int
        Number of samples to draw from the prior

    Returns
    -------
    f_SG : 2D array, n_samples (S) x n_grid_pts (G)
        Contains sampled function values at each point of x_grid
    """
    # TODO compute mean at each grid point

    # TODO compute covariance matrix across grid points

    # Use consistent random number generator for reproducibility
    prng = np.random.RandomState(int(random_seed))
    TODO = prng.multivariate_normal(TODO)

    return TODO

To demonstrate your implementation, you'll make plots of sampled function values. Each individual plot should show a line plot of the test grid points \(x_i\) and the corresponding sampled function values \(f_i = f(x_i)\). Use a matplotlib line style '.-' to show both the specific \(\{x_i, f_i\}\) pair values and connecting lines.

For Problem 1, your report PDF should include:

a. 1 row x 3 column grid of plots, where each panel shows 5 samples from the prior using a squared exponential kernel

  • For the columns, try 3 possible length scale values for \(\ell\): 0.25, 1.0, and 4.0

b. Short text description of what each of the SE kernel's hyperparameters do (in terms of qualitative trends). A few short but complete sentences.

c. 3 row x 3 column grid of plots, where each panel shows 5 samples from the prior using a Matern kernel

  • For the columns, try 3 possible lengthscales for \(\ell\): 0.25, 1.0, 4.0
  • For the rows, try 3 possible \(\nu\) values: 0.5, 2, 8

d. Short text description of what each of the Matern kernel's hyperparameters do (in terms of qualitative trends). A few short but complete sentences.

Problem 2: Sampling from the Posterior

Consider the following training data with \(N=6\) example pairs of \(x\) and \(y\) values:

x_train_N = np.asarray([-2.,    -1.8,   -1.,  1.,  1.8,     2.])
y_train_N = np.asarray([-3.,  0.2224,    3.,  3.,  0.2224, -3.])

As in R&W Ch. 2's section on 'Prediction using Noisy Observations', we assume that the observed \(y_i\) values are themselves 'noisy' perturbations of the true function values \(f(x_i)\) that are modeled by a GP. So the likelihood is: \(y_i | x_i \sim \mathcal{N}(f(x_i), \sigma^2)\). For this problem, you can assume the likelihood noise standard deviation is \(\sigma = 0.1\).

Write Python code to sample the function values \(\{f_i\}_{i=1}^G\) that correspond to a given grid of new \(\{x^{\text{grid}}_i\}_{i=1}^G\) values from a Gaussian Process (GP) posterior given this training data. Use the same GP prior specification as in Problem 1 (same mean function, same possible kernel functions).

Use a one-dimensional grid of test x points between -20 and 20, with at least 200 grid points.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def draw_GP_posterior_samples_at_x_grid(
        x_train_N, y_train_N, x_grid_G, mean_func, cov_func,
        sigma=0.1,
        random_seed=42,
        n_samples=1):
    """ Draw sample from GP posterior given training data and mean/cov

    Args
    ----
    x_train_N : 1D array, size n_train_examples (N)
        Each entry i provides the x value observed at training example i 
    y_train_N : 1D array, size n_train_examples (N)
        Each entry i provides the y value observed at training example i 
    sigma : scalar float
        Specifies the standard deviation of the likelihood.
        y_i drawn from a 1D Normal with mean f(x_i) and std. dev. \sigma.
    Other args same as earlier function: draw_GP_prior_samples_at_x_grid

    Returns
    -------
    f_SG : 2D array, n_samples (S) x n_grid_pts (G)
        Contains sampled function values at each point of x_grid
    """
    TODO

For Problem 2, your report PDF should include:

a. 1 row x 3 col grid of plots, where each panel shows 5 posterior samples using a squared exponential kernel

  • For the columns, try 3 possible length scale values for \(\ell\): 0.25, 1.0, and 4.0

b. Short answer (1-3 sentences): What happens at \(x=0\) that differs across the three different choices of \(L\)? Why?

c. 3 row x 3 col grid of plots, where each panel shows 5 posterior samples using a Matern kernel

  • For the columns, try 3 possible length scale values for \(\ell\): 0.25, 1.0, and 4.0
  • For the rows, try 3 possible \(\nu\) values: 0.5, 2, 8

d. Short answer (a few sentences): Consider the plot with L=4 and nu=8. What (roughly) is a typical value of a posterior sample around \(x=-18\) or \(x=+18\)? Why? What do you expect the typical value would be at \(x=+100\)? If you wanted this typical value to be something else, how would you achieve that?

e. Short answer (a few sentences): Consider the plot with L=4 and nu=8. What (roughly) is the standard deviation of a posterior sample around \(x=-18\) or \(x=+18\)? Why? What do you expect the standard deviation of posterior samples would be at \(x=+100\)? How would you set the standard deviation to a specific desired value (like 42.0)?

Coding Clarifications

For both problems, your implementation should of course not use any third-party Gaussian Process libraries, but instead be written from scratch to turn the math/pseudocode provided in resources like R&W Ch. 2 into useful Python code. Any questions should be posted to the HW1 discussion on Canvas.

You may use existing libraries (e.g. numpy, scipy) for basic functions like:

  • random number generation
  • array/matrix operations (product, inverse, linear solve, cholesky, eigenvalues)
  • special functions (like bessel or gamma)

Be sure to turn in your complete code as an appendix. This could be just a concatenation of one-or-more ".py" files, or a ".ipynb" notebook if you like that format. Just make sure it is somewhat easy to follow (label the parts, use clear variable names, use decent doc-strings).

We will be unlikely to run your code, but we will possibly read it and thus it must clearly solve the problem at hand.

We reserve the right to have you reproduce key parts of your code at a whiteboard without references if we have questions about integrity. You should be able to explain any code you turn in (because you must have developed it yourself).