{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Intro to neural net training with autograd\n", "\n", "In this notebook, we'll practice\n", "\n", "* using the **autograd** Python package to compute gradients\n", "* using gradient descent to train a basic linear regression (a NN with 0 hidden layers)\n", "* using gradient descent to train a basic neural network for regression (NN with 1+ hidden layers)\n", "\n", "\n", "### Requirements:\n", "\n", "Follow the Python environment setup instructions here:\n", "https://www.cs.tufts.edu/comp/150BDL/2018f/setup_python_env.html\n", "\n", "All the specific Python packages you'll need are in the 'bdl_basic_env' conda environment:\n", "https://www.cs.tufts.edu/comp/150BDL/2018f/conda_envs/bdl_basic_env.yml" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "import pickle\n", "import copy\n", "import time" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "## Import plotting tools\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "## Import numpy\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "## Import autograd\n", "import autograd.numpy as ag_np\n", "import autograd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 5: Neural Networks and Autograd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's use a convenient data structure for NN model parameters\n", "\n", "Use a list of dicts of arrays.\n", "\n", "Each entry in the list is a dict that represents the parameters of one \"layer\".\n", "\n", "Each layer-specific dict has two named attributes: a vector of weights 'w' and a vector of biases 'b'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Here's a function to create NN params as a 'list-of-dicts' that match a provided set of dimensions" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "def make_nn_params_as_list_of_dicts(\n", " n_hiddens_per_layer_list=[5],\n", " n_dims_input=1,\n", " n_dims_output=1,\n", " weight_fill_func=np.zeros,\n", " bias_fill_func=np.zeros):\n", " nn_param_list = []\n", " n_hiddens_per_layer_list = [n_dims_input] + n_hiddens_per_layer_list + [n_dims_output]\n", "\n", " # Given full network size list is [a, b, c, d, e]\n", " # For loop should loop over (a,b) , (b,c) , (c,d) , (d,e)\n", " for n_in, n_out in zip(n_hiddens_per_layer_list[:-1], n_hiddens_per_layer_list[1:]):\n", " nn_param_list.append(\n", " dict(\n", " w=weight_fill_func((n_in, n_out)),\n", " b=bias_fill_func((n_out,)),\n", " ))\n", " return nn_param_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Here's a function to pretty-print any given set of NN parameters to stdout, so we can inspect" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "def pretty_print_nn_param_list(nn_param_list_of_dict):\n", " \"\"\" Create pretty display of the parameters at each layer\n", " \"\"\"\n", " for ll, layer_dict in enumerate(nn_param_list_of_dict):\n", " print(\"Layer %d\" % ll)\n", " print(\" w | size %9s | %s\" % (layer_dict['w'].shape, layer_dict['w'].flatten()))\n", " print(\" b | size %9s | %s\" % (layer_dict['b'].shape, layer_dict['b'].flatten()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: NN with 0 hidden layers (equivalent to linear regression)\n", "\n", "For univariate regression: 1D -> 1D\n", "\n", "Will fill all parameters with zeros by default" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layer 0\n", " w | size (1, 1) | [0.]\n", " b | size (1,) | [0.]\n" ] } ], "source": [ "nn_params = make_nn_params_as_list_of_dicts(n_hiddens_per_layer_list=[], n_dims_input=1, n_dims_output=1)\n", "pretty_print_nn_param_list(nn_params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: NN with 0 hidden layers (equivalent to linear regression)\n", "\n", "For multivariate regression when |x_i| = 2: 2D -> 1D\n", "\n", "Will fill all parameters with zeros by default" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layer 0\n", " w | size (2, 1) | [0. 0.]\n", " b | size (1,) | [0.]\n" ] } ], "source": [ "nn_params = make_nn_params_as_list_of_dicts(n_hiddens_per_layer_list=[], n_dims_input=2, n_dims_output=1)\n", "pretty_print_nn_param_list(nn_params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: NN with 1 hidden layer of 3 hidden units" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layer 0\n", " w | size (2, 3) | [0. 0. 0. 0. 0. 0.]\n", " b | size (3,) | [0. 0. 0.]\n", "Layer 1\n", " w | size (3, 1) | [0. 0. 0.]\n", " b | size (1,) | [0.]\n" ] } ], "source": [ "nn_params = make_nn_params_as_list_of_dicts(n_hiddens_per_layer_list=[3], n_dims_input=2, n_dims_output=1)\n", "pretty_print_nn_param_list(nn_params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: NN with 1 hidden layer of 3 hidden units\n", "\n", "Use 'ones' as the fill function for weights" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layer 0\n", " w | size (2, 3) | [1. 1. 1. 1. 1. 1.]\n", " b | size (3,) | [0. 0. 0.]\n", "Layer 1\n", " w | size (3, 1) | [1. 1. 1.]\n", " b | size (1,) | [0.]\n" ] } ], "source": [ "nn_params = make_nn_params_as_list_of_dicts(\n", " n_hiddens_per_layer_list=[3], n_dims_input=2, n_dims_output=1,\n", " weight_fill_func=np.ones)\n", "pretty_print_nn_param_list(nn_params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: NN with 1 hidden layer of 3 hidden units\n", "\n", "Use random draws from standard normal as the fill function for weights" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layer 0\n", " w | size (2, 3) | [ 1.37995026 0.15780478 -2.31025992 -0.47964717 0.38124501 -0.1729307 ]\n", " b | size (3,) | [0. 0. 0.]\n", "Layer 1\n", " w | size (3, 1) | [-0.42010263 0.50818113 -0.7218878 ]\n", " b | size (1,) | [0.]\n" ] } ], "source": [ "nn_params = make_nn_params_as_list_of_dicts(\n", " n_hiddens_per_layer_list=[3], n_dims_input=2, n_dims_output=1,\n", " weight_fill_func=lambda size_tuple: np.random.randn(*size_tuple))\n", "pretty_print_nn_param_list(nn_params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: NN with 7 hidden layers of diff sizes\n", "\n", "Just shows how generic this framework is!" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layer 0\n", " w | size (2, 3) | [0. 0. 0. 0. 0. 0.]\n", " b | size (3,) | [0. 0. 0.]\n", "Layer 1\n", " w | size (3, 4) | [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " b | size (4,) | [0. 0. 0. 0.]\n", "Layer 2\n", " w | size (4, 5) | [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " b | size (5,) | [0. 0. 0. 0. 0.]\n", "Layer 3\n", " w | size (5, 6) | [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0.]\n", " b | size (6,) | [0. 0. 0. 0. 0. 0.]\n", "Layer 4\n", " w | size (6, 5) | [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0.]\n", " b | size (5,) | [0. 0. 0. 0. 0.]\n", "Layer 5\n", " w | size (5, 4) | [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " b | size (4,) | [0. 0. 0. 0.]\n", "Layer 6\n", " w | size (4, 3) | [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " b | size (3,) | [0. 0. 0.]\n", "Layer 7\n", " w | size (3, 1) | [0. 0. 0.]\n", " b | size (1,) | [0.]\n" ] } ], "source": [ "nn_params = make_nn_params_as_list_of_dicts(\n", " n_hiddens_per_layer_list=[3, 4, 5, 6, 5, 4, 3], n_dims_input=2, n_dims_output=1)\n", "pretty_print_nn_param_list(nn_params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup: Function that performs **prediction**" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "def predict_y_given_x_with_NN(x=None, nn_param_list=None, activation_func=ag_np.tanh):\n", " \"\"\" Predict y value given x value via feed-forward neural net\n", " \n", " Args\n", " ----\n", " x : array_like, n_examples x n_input_dims\n", " \n", " Returns\n", " -------\n", " y : array_like, n_examples\n", " \"\"\"\n", " for layer_id, layer_dict in enumerate(nn_param_list):\n", " if layer_id == 0:\n", " if x.ndim > 1:\n", " in_arr = x\n", " else:\n", " if x.size == nn_param_list[0]['w'].shape[0]:\n", " in_arr = x[ag_np.newaxis,:]\n", " else:\n", " in_arr = x[:,ag_np.newaxis] \n", " else:\n", " in_arr = activation_func(out_arr)\n", " out_arr = ag_np.dot(in_arr, layer_dict['w']) + layer_dict['b']\n", " return ag_np.squeeze(out_arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Toy data for linear regression task" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "N = 300\n", "D = 2\n", "sigma = 0.1\n", "\n", "true_w_D = np.asarray([4.2, -4.2])\n", "true_bias = 0.1\n", "\n", "train_prng = np.random.RandomState(0)\n", "x_ND = train_prng.uniform(low=-5, high=5, size=(N,D))\n", "y_N = np.dot(x_ND, true_w_D) + true_bias + sigma * train_prng.randn(N)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sns.pairplot(\n", " data=pd.DataFrame(np.hstack([x_ND, y_N[:,np.newaxis]]), columns=['x1', 'x2', 'y']));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Make predictions with 0-layer NN whose parameters are filled with the 'true' params for our toy dataset" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "true_nn_params = make_nn_params_as_list_of_dicts(n_hiddens_per_layer_list=[], n_dims_input=2, n_dims_output=1)\n", "true_nn_params[0]['w'][:] = true_w_D[:,np.newaxis]\n", "true_nn_params[0]['b'][:] = true_bias" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5,0,u'predicted y|x')" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "yhat_N = predict_y_given_x_with_NN(x_ND, true_nn_params)\n", "assert yhat_N.size == N\n", "\n", "plt.plot(yhat_N, y_N, 'k.')\n", "plt.ylabel('true y')\n", "plt.xlabel('predicted y|x')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Make predictions with 0-layer NN whose parameters are filled with all zeros" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5,0,u'predicted y|x')" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "zero_nn_params = make_nn_params_as_list_of_dicts(n_hiddens_per_layer_list=[], n_dims_input=2, n_dims_output=1)\n", "yhat_N = predict_y_given_x_with_NN(x_ND, zero_nn_params)\n", "assert yhat_N.size == N\n", "\n", "plt.plot(yhat_N, y_N, 'k.')\n", "plt.ylabel('true y')\n", "plt.xlabel('predicted y|x')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup: Gradient descent implementation that can use list-of-dict parameters (not just arrays)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "def run_many_iters_of_gradient_descent_with_list_of_dict(f, g, init_x_list_of_dict=None, n_iters=100, step_size=0.001):\n", "\n", " # Copy the initial parameter vector\n", " x_list_of_dict = copy.deepcopy(init_x_list_of_dict)\n", "\n", " # Create data structs to track the per-iteration history of different quantities\n", " history = dict(\n", " iter=[],\n", " f=[],\n", " x=[],\n", " g=[])\n", " start_time = time.time()\n", " for iter_id in range(n_iters):\n", " if iter_id > 0:\n", " # Gradient is a list of layer-specific dicts\n", " grad_list_of_dict = g(x_list_of_dict)\n", " for layer_id, x_layer_dict in enumerate(x_list_of_dict):\n", " for key in x_layer_dict.keys():\n", " x_layer_dict[key] = x_layer_dict[key] - step_size * grad_list_of_dict[layer_id][key]\n", " \n", " fval = f(x_list_of_dict)\n", " history['iter'].append(iter_id)\n", " history['f'].append(fval)\n", " history['x'].append(copy.deepcopy(x_list_of_dict))\n", " history['g'].append(g(x_list_of_dict))\n", "\n", " if iter_id < 3 or (iter_id+1) % 50 == 0:\n", " print(\"completed iter %5d/%d after %7.1f sec | loss %.6e\" % (\n", " iter_id+1, n_iters, time.time()-start_time, fval))\n", " return x_list_of_dict, history" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Worked Exercise: Train 0-layer NN via gradient descent on LINEAR toy data" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "def nn_regression_loss_function(nn_params):\n", " yhat_N = predict_y_given_x_with_NN(x_ND, nn_params)\n", " return 0.5 / ag_np.square(sigma) * ag_np.sum(np.square(y_N - yhat_N))" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "completed iter 1/100 after 0.1 sec | loss 1.632803e+02\n", "completed iter 2/100 after 0.3 sec | loss 1.626805e+02\n", "completed iter 3/100 after 0.5 sec | loss 1.622965e+02\n", "completed iter 50/100 after 9.1 sec | loss 1.598157e+02\n", "completed iter 100/100 after 18.6 sec | loss 1.596883e+02\n" ] } ], "source": [ "fromtrue_opt_nn_params, fromtrue_history = run_many_iters_of_gradient_descent_with_list_of_dict(\n", " nn_regression_loss_function,\n", " autograd.grad(nn_regression_loss_function),\n", " true_nn_params,\n", " n_iters=100,\n", " step_size=0.000001)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layer 0\n", " w | size (2, 1) | [ 4.20137427 -4.2023327 ]\n", " b | size (1,) | [0.08737985]\n" ] } ], "source": [ "pretty_print_nn_param_list(fromtrue_opt_nn_params)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(fromtrue_history['iter'], fromtrue_history['f'], 'k.-')" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "completed iter 1/100 after 0.1 sec | loss 4.649827e+06\n", "completed iter 2/100 after 0.3 sec | loss 2.520065e+06\n", "completed iter 3/100 after 0.6 sec | loss 1.367394e+06\n", "completed iter 50/100 after 10.7 sec | loss 1.634750e+02\n", "completed iter 100/100 after 22.0 sec | loss 1.598624e+02\n" ] } ], "source": [ "fromzero_opt_nn_params, fromzero_history = run_many_iters_of_gradient_descent_with_list_of_dict(\n", " nn_regression_loss_function,\n", " autograd.grad(nn_regression_loss_function),\n", " zero_nn_params,\n", " n_iters=100,\n", " step_size=0.000001)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layer 0\n", " w | size (2, 1) | [ 4.20137256 -4.2023501 ]\n", " b | size (1,) | [0.08325989]\n" ] } ], "source": [ "pretty_print_nn_param_list(fromzero_opt_nn_params)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(fromzero_history['iter'], fromzero_history['f'], 'k.-')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Try it yourself!\n", "\n", "* Can you make a function **calc_prior_logpdf** that takes in weights and produces a real value?\n", "* Can you make a function **calc_likelihood_logpdf** that takes in weights and produces a real value?\n", "* Can you make a function **calc_posterior_logpdf** that takes in weights and produces a real value?\n", "* Can you use autograd to make function **calc_gradient_of_posterior_logpdf** that takes in weights and produces the gradient?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "TODO:\n", " Update HW2 to be clear it is a FUNCTION value that we want samples of" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.15" } }, "nbformat": 4, "nbformat_minor": 2 }