{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Intro to neural net training with autograd\n", "\n", "In this notebook, we'll practice\n", "\n", "* using the **autograd** Python package to compute gradients\n", "* using gradient descent to train a basic linear regression (a NN with 0 hidden layers)\n", "* using gradient descent to train a basic neural network for regression (NN with 1+ hidden layers)\n", "\n", "\n", "### Requirements:\n", "\n", "Follow the Python environment setup instructions here:\n", "https://www.cs.tufts.edu/comp/150BDL/2018f/setup_python_env.html\n", "\n", "All the specific Python packages you'll need are in the 'bdl_basic_env' conda environment:\n", "https://www.cs.tufts.edu/comp/150BDL/2018f/conda_envs/bdl_basic_env.yml" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pickle\n", "import copy\n", "import time" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "## Import plotting tools\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "## Import numpy\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "## Import autograd\n", "import autograd.numpy as ag_np\n", "import autograd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# PART 1: Using autograd's 'grad' function on univariate functions\n", "\n", "Suppose we have a mathematical function of interest $f(x)$. For now, we'll work with functions that have a scalar input and scalar output. \n", "\n", "Then we can of course ask: what is the derivative (aka *gradient*) of this function:\n", "\n", "$$\n", "g(x) \\triangleq \\frac{\\partial}{\\partial x} f(x)\n", "$$\n", "\n", "Instead of computing this gradient by hand via calculus/algebra, we can use autograd to do it for us.\n", "\n", "First, we need to implement the math function $f(x)$ as a **Python function** `f`.\n", "\n", "\n", "The Python function `f` needs to satisfy the following requirements:\n", "* INPUT 'x': scalar float\n", "* OUTPUT 'f(x)': scalar float\n", "* All internal operations are composed of calls to functions from `ag_np`, the `autograd` version of numpy\n", "\n", "**Important:**\n", "* You might be used to importing numpy as `import numpy as np`, and then using this shorthand for `np.cos(0.0)` or `np.square(5.0)` etc.\n", "* For autograd to work, you need to instead use **autograd's** provided numpy wrapper interface: `from autograd.numpy as ag_np`\n", "* The `ag_np` module has the same API as `numpy`, so you can call `ag_np.cos(0.0)`, `ag_np.square(5.0)`, etc.\n", "\n", "Now, if `f` meeds the above requirements, we can create a Python function `g` to compute derivatives of $f(x)$ by calling `autograd.grad`:\n", "\n", "```\n", "g = autograd.grad(f)\n", "```\n", "\n", "The symbol `g` is now a **Python function** that takes the same input as `f`, but produces the derivative at a given input.\n", "\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def f(x):\n", " return ag_np.square(x)\n", "\n", "g = autograd.grad(f)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 'g' is just a function. You can call it as usual, by providing a possible scalar float input\n", "\n", "g(0.0)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[-2.0, 2.0]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[g(-1.0), g(1.0)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot to demonstrate the gradient function side-by-side with original function" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5,1,u'gradient of f(x)')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "x_grid_G = np.linspace(-10, 10, 100)\n", "\n", "fig_h, subplot_grid = plt.subplots(nrows=1, ncols=2, sharex=True, sharey=True, squeeze=False)\n", "subplot_grid[0,0].plot(x_grid_G, [f(x_g) for x_g in x_grid_G], 'k.-')\n", "subplot_grid[0,0].set_title('f(x) = x^2')\n", "\n", "subplot_grid[0,1].plot(x_grid_G, [g(x_g) for x_g in x_grid_G], 'b.-')\n", "subplot_grid[0,1].set_title('gradient of f(x)')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 1a:\n", "\n", "Consider the decaying periodic function below. Can you compute its derivative using autograd and plot the result?\n", "\n", "$$\n", "f(x) = e^{-x/10} * cos(x)\n", "$$" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEICAYAAABbOlNNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAFglJREFUeJzt3H2QZXV95/H3RyaSqDwzII+CEY3o1iY6Ykg0YhSEpHCIYgXFcmIgk6zLZrd0N4GyIs9buBExKkkAQYmUUYNPE0VxxLDGlQiDGBENMhJZBhAGQSMawNHv/nHOsPd3uT3d0/dO953h/ao61fec3++c873dv9Ofex66U1VIkrTR4xa7AEnSdDEYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg2GBJHlGkhuS/DDJH/fLjkjy8Tmuf22SZ23ZKje5/19MclqSgxerBm3d+vFzWf96/yQPJNluCup61LE51L5nki/07ef2yw5OsmaO2/9okiMnXfeWZDAsnD8Brq6qHarqnf2y/wmcM8f13wacsUUqm0WSJwOfBV4MfDbJ/kPtv53ki0m+n+S7SS5KssNi1KqtQ1X936p6UlX9dNxtJbk6yYljbGLUsTloJXAvsGNVvalfdibdMTkX5wBnj1HfgjMYFs5TgJs2ziR5HrBTVf3THNdfBbw4yV5boriZJNkR+DTwgap6EXAe8Jkkuw102wk4C9gbeCawL/DnC1mnFlaSJYtdwwQ1x+YM7d+o/q+B+2PwxcCczvar6lpgxyTLxi10wVSV0xaegM8DPwUeBB4Ang68BXjPQJ9fo/tUsl8//x+B7wO/NNBnNbBiwrVdAZw7MP8h4JL+9fbAPwCnDK3zBuBLwBNn2OYrgBsX+/vutNlj4TnADcAPgb/rx8JZfdthwDrgT4HvAu8HdgE+CawH7u9f7zuwvQOB/91vbzXwbuCyvu0AoIAl/fxOwMXAXcAddB80tuvbfg/4It0n9PuBfwWO6tvOHjq23j3De3s53S//7wNXA8/slz/q2Bxa733AT4CH+/aXAq8DPjfQ5xeB+4Dn9PN798fyYQN9LgJOXeyf8ZzHwmIX8FiZ+sF44sD83wH/Y6jP2f1A/QXga8BJQ+3vBN4+w/Zf0A/6maYXzLDek4F7gN8EjgduBXYY872+A/jgYn/PnTbrZ/Z44DbgvwI/RxfuDw8FwwbgrXQfGH4B2A14JfAEYId+TH98YJvXAG/v+/9GHxAzBcPHgQuAJwJ7ANcCf9i3/V7/y/kPgO2A/wTcCaRvb46tEe/t6cCPgMP79/YnwFrg8XNc/30bvw/9/J8D5w/1+QPgm/334krgbUPtbwQ+utg/57lO29Lp4NZmZ7oDZdBpwD/RHRR3AOcPtf8QGHkpqaq+2G9zs1TVd5P8EXAp3cF+TFUN1zVnSQ4HVgDPn+82tCh+FVgCvLO632QfTXLtUJ+f0X3qfaif/3fgIxsbk5xNd4ZJfx/qecBL+/5fSPL3o3acZE/gKGDnqvp34EdJzqO7tn9B3+22qrqo738p8JfAnnRnL7P5XeBTVbW6X/9tdAH4a3ShsLl2Br43uKCqLkpyNPBlusB7+dA6P2Qex+di8R7D4rmf7lPWI6rqJ3SfTp5Nd3ln+D8c7kD36X/SPkn3SezmPmDmJcmvAh8Ajq2qb02qOC2IvYE7hsbc7UN91lfVgxtnkjwhyQVJbkvyb8AXgJ37J432Bu6vqh8NrH/bDPt+Ct0n+bv6Bxi+TxcIewz0eSQAqurH/csnbcZ7e2TfVfWz/r3tM8f1hz3q2O1dRHfsvmsgPDfaUsfuFmEwLJ6v0Z3iPiLJPsCpwHuBc5NsP7TOM4F/HrWxJC/sH/+baXrhJmo5m+40eK8kr57Pm0nyK3Q3yH+/qq6azza0qO4C9kmSgWX7DfUZ/qDyJuAZwPOrake6y0UA6be3S5InDvTfn9FuBx4Cdq+qnftpx6qa6+PZs/2L6DvpwqcrrnuP+9Gdlc/HqGP3SXSXUC8GTkuy69A6Mx6708hgWDxXAC/aONMP1vfRDawT6A6sMwfatweeS3cT71Gq6h+re/xvpukfR62X5DeA19PdUHsd8K4+oOYsybOBzwD/papGXi7Q1LuG7ibsSUmWJFkOHDLLOjvQXU76fv+L8NSNDVV1G7AGOD3J45O8ADh61Eaq6i66x6HPTbJjksf1fzfzolH9R7gbeOom2j8M/HaSlyT5ObpAe4juAYr5WA08J8nPDyz7C+D6qjoR+BTw10PrvIju6b6tgsGwSKrqK8APkmy8Fv/HdNdM/6w/nX898PqBT/ovp3vW+s5J1dA/ivo3dDe57+gvI10MvHfok+Ns3gQsBS4eOEPZ1ON/mjJV9TDdDecT6C55vJbuEuPwJZFB76C7L3Uv3b2xzwy1v4buXtN9dKHxN5vY1uvoboB/g+5SzeXMcD9thL8Ajk1yf5JH/R1CVd1M937e1dd6NHB0/543W1XdTfeQyHKAPkSPBP6o7/JGuuA4vm9/HvCj6h5b3SpsvKuvRZDkCOANVXXMHPp+GTihqr6+5SuTHhlzf11V713sWqZN/x8ALgUOGXEvcLjvR4CLq+qKBSluAgwGSQD0l25upvtUfTzd5ZCn9pd69Bji46qSNnoG3fX4JwHfpnu6zFB4DPKMQZLU8OazJKmxVV5K2n333euAAw5Y7DK0jbr++uvvraqlC71fx7W2pOuvv/7fgGuqatZ/Ab5VBsMBBxzAmjVz+lfo0mZLMtNf6G5RjmttSUlumUsogJeSJElDDAZJUsNgkCQ1DAZJUsNgkCQ1DAZJUsNgkCQ1DAZJUsNgkCQ1DAZJUsNgkCQ1DAZJUsNgkCQ1DAZJUsNgkCQ1DAZJUsNgkCQ1DAZJUsNgkCQ1DAZJUsNgkCQ1DAZJUsNgkCQ1DAZJUsNgkCQ1JhIMSY5McnOStUlOHtG+fZIP9e1fTnLAUPv+SR5I8t8nUY8kaf7GDoYk2wHnA0cBBwOvTnLwULcTgPur6mnAecBbh9rPAz49bi2SpPFN4ozhEGBtVd1aVQ8DHwSWD/VZDlzav74ceEmSACQ5BrgVuGkCtUiSxjSJYNgHuH1gfl2/bGSfqtoA/ADYLckTgT8FTp9tJ0lWJlmTZM369esnULa0+BzXmkaTCIaMWFZz7HM6cF5VPTDbTqrqwqpaVlXLli5dOo8ypenjuNY0WjKBbawD9huY3xe4c4Y+65IsAXYC7gOeDxyb5H8BOwM/S/JgVb17AnVJkuZhEsFwHXBQkgOBO4DjgNcM9VkFrACuAY4FPl9VBbxwY4ckpwEPGAqStLjGDoaq2pDkJOBKYDvgkqq6KckZwJqqWgVcDLw/yVq6M4Xjxt2vJGnLmMQZA1V1BXDF0LK3DLx+EHjVLNs4bRK1SJLG418+S5IaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqTGRYEhyZJKbk6xNcvKI9u2TfKhv/3KSA/rlhye5PsmN/dffnEQ9kqT5GzsYkmwHnA8cBRwMvDrJwUPdTgDur6qnAecBb+2X3wscXVX/AVgBvH/ceiRJ45nEGcMhwNqqurWqHgY+CCwf6rMcuLR/fTnwkiSpqhuq6s5++U3AzyfZfgI1SZLmaRLBsA9w+8D8un7ZyD5VtQH4AbDbUJ9XAjdU1UMTqEmSNE+TCIaMWFab0yfJs+guL/3hjDtJViZZk2TN+vXr51WoNG0c15pGkwiGdcB+A/P7AnfO1CfJEmAn4L5+fl/gY8DrqurbM+2kqi6sqmVVtWzp0qUTKFtafI5rTaNJBMN1wEFJDkzyeOA4YNVQn1V0N5cBjgU+X1WVZGfgU8ApVfV/JlCLJGlMYwdDf8/gJOBK4JvAh6vqpiRnJHl53+1iYLcka4E3AhsfaT0JeBrwZ0m+2k97jFuTJGn+lkxiI1V1BXDF0LK3DLx+EHjViPXOAs6aRA2SpMnwL58lSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSY2JBEOSI5PcnGRtkpNHtG+f5EN9+5eTHDDQdkq//OYkL5tEPZKk+Vsy7gaSbAecDxwOrAOuS7Kqqr4x0O0E4P6qelqS44C3Ar+b5GDgOOBZwN7A55I8vap+Op9aXvva13L55ZezYcMGqmpjfY+0L/Qy9zed+0vC7rvvzumnn87KlSuZdtdcA294A9x4I1TBwNuif1sk7etR7ZNetq3vbxpq2Nx1tt8envc8OOccOPRQ5q+qxpqAQ4ErB+ZPAU4Z6nMlcGj/eglwL5DhvoP9NjU997nPrWHHH398AU5OmzVdcMEFjxpLwJpxj4v5TKPG9Ze+VAVOTps3LVnSjZ35jutJXEraB7h9YH5dv2xkn6raAPwA2G2O6wKQZGWSNUnWrF+//lHtn/70p+dbvx7DPvKRjyzq/mcb11dfvfA1aeu3YcN4Y2cSwZARy2qOfeaybrew6sKqWlZVy5YuXfqo9qOOOmq2OqVHeeUrX7mo+59tXB92GDzOR0S0mZYs6cbOvNefQA3rgP0G5vcF7pyhz7okS4CdgPvmuO6cXHbZZQDeY3B/sy7bmu4xHHoofPGL3mPwHsPc1pnUPYZJBMN1wEFJDgTuoLuZ/JqhPquAFcA1wLHA56uqkqwCPpDk7XQ3nw8Crp1vIZdddtkjASFtKw49FG64YbGr0GPJ2MFQVRuSnER343g74JKquinJGXQ3O1YBFwPvT7KW7kzhuH7dm5J8GPgGsAH4zzXPJ5IkSZMxiTMGquoK4IqhZW8ZeP0g8KoZ1j0bOHsSdUiSxudtLUlSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDXGCoYkuyZZneSW/usuM/Rb0fe5JcmKftkTknwqyb8kuSnJOePUIkmajHHPGE4Grqqqg4Cr+vlGkl2BU4HnA4cApw4EyNuq6peAXwF+PclRY9YjSRrTuMGwHLi0f30pcMyIPi8DVlfVfVV1P7AaOLKqflxV/wBQVQ8DXwH2HbMeSdKYxg2GPavqLoD+6x4j+uwD3D4wv65f9ogkOwNH0511SJIW0ZLZOiT5HPDkEU1vnuM+MmJZDWx/CfC3wDur6tZN1LESWAmw//77z3HX0nRzXGsazRoMVfXSmdqS3J1kr6q6K8lewD0juq0DDhuY3xe4emD+QuCWqnrHLHVc2Pdl2bJltam+0tbCca1pNO6lpFXAiv71CuATI/pcCRyRZJf+pvMR/TKSnAXsBPy3MeuQJE3IuMFwDnB4kluAw/t5kixL8h6AqroPOBO4rp/OqKr7kuxLdznqYOArSb6a5MQx65EkjWnWS0mbUlXfA14yYvka4MSB+UuAS4b6rGP0/QdJ0iLyL58lSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSY2xgiHJrklWJ7ml/7rLDP1W9H1uSbJiRPuqJF8fpxZJ0mSMe8ZwMnBVVR0EXNXPN5LsCpwKPB84BDh1MECSvAJ4YMw6JEkTMm4wLAcu7V9fChwzos/LgNVVdV9V3Q+sBo4ESPIk4I3AWWPWIUmakHGDYc+qugug/7rHiD77ALcPzK/rlwGcCZwL/Hi2HSVZmWRNkjXr168fr2ppSjiuNY1mDYYkn0vy9RHT8jnuIyOWVZJfBp5WVR+by0aq6sKqWlZVy5YuXTrHXUvTzXGtabRktg5V9dKZ2pLcnWSvqroryV7APSO6rQMOG5jfF7gaOBR4bpLv9HXskeTqqjoMSdKiGfdS0ipg41NGK4BPjOhzJXBEkl36m85HAFdW1V9V1d5VdQDwAuBbhoIkLb5xg+Ec4PAktwCH9/MkWZbkPQBVdR/dvYTr+umMfpkkaQrNeilpU6rqe8BLRixfA5w4MH8JcMkmtvMd4Nnj1CJJmgz/8lmS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEmNVNVi17DZkqwHbpuheXfg3gUsZybTUgdMTy3TUgdsupanVNXShSwGtppxDdNTy7TUAVtHLQcB11TVkbNtYKsMhk1JsqaqllnH/zcttUxLHTBdtczFNNU7LbVMSx2w7dXipSRJUsNgkCQ1tsVguHCxC+hNSx0wPbVMSx0wXbXMxTTVOy21TEsdsI3Vss3dY5AkjWdbPGOQJI3BYJAkNbaJYEjyqiQ3JflZkmVDbackWZvk5iQvW+C6TktyR5Kv9tNvLfD+j+zf99okJy/kvkfU8p0kN/bfhzULvO9LktyT5OsDy3ZNsjrJLf3XXRayprlybM+4/8f82N6S43qbCAbg68ArgC8MLkxyMHAc8CzgSOAvk2y3wLWdV1W/3E9XLNRO+/d5PnAUcDDw6v77sZhe3H8fFvp57/fR/fwHnQxcVVUHAVf189PIsT3Esf2I97GFxvU2EQxV9c2qunlE03Lgg1X1UFX9K7AWOGRhq1s0hwBrq+rWqnoY+CDd9+Mxp6q+ANw3tHg5cGn/+lLgmAUtao4c2yM5ttmy43qbCIZN2Ae4fWB+Xb9sIZ2U5Gv9ad9CXq6Yhvc+qIDPJrk+ycpFrGOjPavqLoD+6x6LXM/mmoafr2O7M01jeyLjeslES9qCknwOePKIpjdX1SdmWm3Esok+n7upuoC/As7s93kmcC7w+5Pc/6ZKG7FsMZ9N/vWqujPJHsDqJP/Sf+J5zHNsb35pI5Y5tidoqwmGqnrpPFZbB+w3ML8vcOdkKurMta4kFwGfnOS+Z7HF3/vmqKo7+6/3JPkY3eWAxTx47k6yV1XdlWQv4J7FKsSxvdkc2zObyLje1i8lrQKOS7J9kgPp/rvgtQu18/4Hs9Hv0N1IXCjXAQclOTDJ4+luVK5awP0/IskTk+yw8TVwBAv7vRhlFbCif70CmOmT+bRybDu2R5nMuK6qrX6iG5jrgIeAu4ErB9reDHwbuBk4aoHrej9wI/C1/ge21wLv/7eAb/Xv/82L+PN5KvDP/XTTQtcC/C1wF/CTfpycAOxG99TGLf3XXRfr+zNL7Y7t0ft/zI/tLTmu/ZcYkqTGtn4pSZK0mQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNf4fa8CDeTtsR+UAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "def f(x):\n", " return 0.0 # TODO compute the function above\n", " \n", "g = f # TODO define g as gradient of f\n", "\n", "# TODO plot the result\n", "x_grid_G = np.linspace(-10, 10, 500)\n", "fig_h, subplot_grid = plt.subplots(nrows=1, ncols=2, sharex=True, sharey=True, squeeze=False)\n", "subplot_grid[0,0].plot(x_grid_G, [f(x_g) for x_g in x_grid_G], 'k.-');\n", "subplot_grid[0,0].set_title('f(x) = x^2');\n", "\n", "subplot_grid[0,1].plot(x_grid_G, [g(x_g) for x_g in x_grid_G], 'b.-');\n", "subplot_grid[0,1].set_title('gradient of f(x)');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# PART 2: Using autograd's 'grad' function on functions with multivariate input\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, imagine the input $x$ could be a vector of size D. \n", "\n", "Our mathematical function $f(x)$ will map each input vector to a scalar.\n", "\n", "We want the gradient function\n", "\n", "\\begin{align}\n", "g(x) &\\triangleq \\nabla_x f(x)\n", "\\\\\n", "&= [\n", " \\frac{\\partial}{\\partial x_1} f(x)\n", " \\quad \\frac{\\partial}{\\partial x_2} f(x)\n", " \\quad \\ldots \\quad \\frac{\\partial}{\\partial x_D} f(x) ]\n", "\\end{align}\n", "\n", "Instead of computing this gradient by hand via calculus/algebra, we can use autograd to do it for us.\n", "\n", "First, we implement math function $f(x)$ as a **Python function** `f`.\n", "\n", "The Python function `f` needs to satisfy the following requirements:\n", "* INPUT 'x': numpy array of float\n", "* OUTPUT 'f(x)': scalar float\n", "* All internal operations are composed of calls to functions from `ag_np`, the `autograd` version of numpy\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def f(x_D):\n", " return np.sum(np.square(x_D))\n", "\n", "g = autograd.grad(f)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 0. 0. 0.]\n", "0.0\n", "[0. 0. 0. 0.]\n" ] } ], "source": [ "x_D = np.zeros(4)\n", "print(x_D)\n", "print(f(x_D))\n", "print(g(x_D))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1. 2. 3. 4.]\n", "30.0\n", "[2. 4. 6. 8.]\n" ] } ], "source": [ "x_D = np.asarray([1., 2., 3., 4.])\n", "print(x_D)\n", "print(f(x_D))\n", "print(g(x_D))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 3: Using autograd gradients within gradient descent to solve multivariate optimization problems " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Helper function: basic gradient descent\n", "\n", "Here's a very simple function that will perform many gradient descent steps to optimize a given function.\n", "\n", "Make sure you understand its basic properties (the gradient descent algorithm is one of the prereqs of this course)." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def run_many_iters_of_gradient_descent(f, g, init_x_D=None, n_iters=100, step_size=0.001):\n", "\n", " # Copy the initial parameter vector\n", " x_D = copy.deepcopy(init_x_D)\n", "\n", " # Create data structs to track the per-iteration history of different quantities\n", " history = dict(\n", " iter=[],\n", " f=[],\n", " x_D=[],\n", " g_D=[])\n", "\n", " for iter_id in range(n_iters):\n", " if iter_id > 0:\n", " x_D = x_D - step_size * g(x_D)\n", "\n", " history['iter'].append(iter_id)\n", " history['f'].append(f(x_D))\n", " history['x_D'].append(x_D)\n", " history['g_D'].append(g(x_D))\n", " return x_D, history" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Worked Example 3a: Minimize f(x) = sum(square(x))\n", "\n", "It's easy to figure out that the vector with smallest L2 norm (smallest sum of squares) is the all-zero vector.\n", "\n", "Here's a quick example of showing that using gradient functions provided by autograd can help us solve the optimization problem:\n", "\n", "$$\n", "\\min_x \\sum_{d=1}^D x_d^2\n", "$$" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "def f(x_D):\n", " return np.sum(np.square(x_D))\n", "\n", "g = autograd.grad(f)\n", "\n", "# Initialize at x_D = [-3, 4, -5, 6]\n", "init_x_D = np.asarray([-3.0, 4.0, -5.0, 6.0])" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "opt_x_D, history = run_many_iters_of_gradient_descent(f, g, init_x_D, n_iters=1000, step_size=0.01)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA3sAAADTCAYAAAA4agr9AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAGQRJREFUeJzt3Xu0ZGV55/Hvb7oVIyo3G0WaScPQxrQ63k5AJE5QBNGozYq4RDPaYzAkrmC8xNFGJxKvA2qCunTUXkpk1BEd1Ngi2iKXP5I4yOGi0FykBSMtIAebYIxR7PDMH7Uby3Oq29PnnKp9zq7vZ61aVfvdb+391Hv2Wc96au/9VqoKSZIkSVK3/Ie2A5AkSZIkLTyLPUmSJEnqIIs9SZIkSeogiz1JkiRJ6iCLPUmSJEnqIIs9SZIkSeogiz1JkiRJ6iCLPUmSJEnqIIs9SZIkSeqg5W0HsLse+tCH1qpVq9oOQ5I0ZJdffvmdVbWi7TiWCvOjJI2P2ebIJVfsrVq1isnJybbDkCQNWZJ/ajuGpcT8KEnjY7Y50ss4JUmSJKmDLPYkSZIkqYNaL/aS7J3k3CTXJ7kuyRFtxyRJkiRJS91iuGfvfcBXq+qEJPcHHth2QJIkSZK01LVa7CV5CPBfgP8GUFX3APcMc5+3v/Od/Py664e5C0kae3v89qN4+Bvf2HYYkiSNtbYv4zwEmAL+NsmVST6aZM/pnZKcnGQyyeTU1NToo5QkSZKkJabtyziXA08EXllVlyZ5H7Ae+Mv+TlW1AdgAMDExUfPZod80S5IkSRoHbZ/Z2wpsrapLm+Vz6RV/kiRJkqR5aLXYq6rbgVuS/FbTdDRwbYshSZIkSVIntH0ZJ8ArgU81M3HeBLys5XgkSZIkaclrvdirqquAibbjkCRJkqQuafuePUmSJEnSEFjsSZIkSVIHWexJkiRJUgdZ7EmSJElSB1nsSZIkSVIHWexJkiRJUgdZ7EmSNCRJjktyQ5ItSdYPWL9Hks806y9Nsmra+v+Y5CdJXjeqmCVJ3WGxJ0nSECRZBnwQeBawBnhRkjXTup0E3FVVhwJnAmdMW38m8JVhxypJ6iaLPUmShuMwYEtV3VRV9wDnAGun9VkLnN28Phc4OkkAkhwP3ARsHlG8kqSOsdiTJGk4DgRu6Vve2rQN7FNV24G7gf2S7Am8AXjLrnaQ5OQkk0kmp6amFixwSVI3WOxJkjQcGdBWs+zzFuDMqvrJrnZQVRuqaqKqJlasWDHHMCVJXbW87QAkSeqorcBBfcsrgVt30mdrkuXAXsA24HDghCTvAvYG7k3ys6r6wPDDliR1hcWeJEnDcRmwOsnBwA+AE4EXT+uzEVgHfAM4Abioqgp46o4OSf4K+ImFniRpd1nsSZI0BFW1PckpwCZgGXBWVW1O8lZgsqo2Ah8DPpFkC70zeie2F7EkqWss9iRJGpKqOh84f1rbm/te/wx4wa/Zxl8NJThJUuc5QYskSZIkdZDFniRJkiR1kMWeJEmSJHXQoij2kixLcmWS89qORZIkSZK6YFEUe8CrgOvaDkKSJEmSuqL12TiTrAR+H3gH8Nqh7/Ar6+H2q4e+G0kaaw9/LDzr9LajkCRprC2GM3vvBV4P3LuzDklOTjKZZHJqamp0kUmSJEnSEtXqmb0kzwHuqKrLkxy1s35VtQHYADAxMVHz2qnfNEuSJEkaA22f2TsSeF6S7wHnAE9P8sl2Q5IkSZKkpa/VYq+qTq2qlVW1CjgRuKiq/mubMUmSJElSF7R9Zk+SJEmSNAStz8a5Q1VdAlzSchiSJEmS1Ame2ZMkSZKkDrLYkyRJkqQOstiTJEmSpA6y2JMkSZKkDrLYkyRJkqQOstiTJEmSpA6y2JMkSZKkDrLYkyRJkqQOstiTJEmSpA6y2JMkaUiSHJfkhiRbkqwfsH6PJJ9p1l+aZFXTfkySy5Nc3Tw/fdSxS5KWPos9SZKGIMky4IPAs4A1wIuSrJnW7STgrqo6FDgTOKNpvxN4blU9FlgHfGI0UUuSusRiT5Kk4TgM2FJVN1XVPcA5wNppfdYCZzevzwWOTpKqurKqbm3aNwMPSLLHSKKWJHWGxZ4kScNxIHBL3/LWpm1gn6raDtwN7Detz/OBK6vq59N3kOTkJJNJJqemphYscElSN1jsSZI0HBnQVrvTJ8mj6V3a+SeDdlBVG6pqoqomVqxYMedAJUndZLEnSdJwbAUO6lteCdy6sz5JlgN7Adua5ZXAF4CXVtV3hx6tJKlzLPYkSRqOy4DVSQ5Ocn/gRGDjtD4b6U3AAnACcFFVVZK9gS8Dp1bVP4wsYklSp1jsSZI0BM09eKcAm4DrgM9W1eYkb03yvKbbx4D9kmwBXgvs+HmGU4BDgb9MclXz2H/EH0GStMQtbzsASZK6qqrOB86f1vbmvtc/A14w4H1vB94+9AAlSZ3W6pm9JAcluTjJdUk2J3lVm/FIkiRJUle0fWZvO/AXVXVFkgcDlye5oKqubTkuSZIkSVrSWj2zV1W3VdUVzet/oXdPw/TfIJIkSZIk7aZFM0FLklXAE4BLB6zzR2MlSZIkaTcsimIvyYOAzwGvrqofT1/vj8ZKkiRJ0u5pvdhLcj96hd6nqurzbccjSZIkSV3Q9mycofcbQ9dV1d+0GYskSZIkdUnbZ/aOBF4CPL3vR2Of3XJMkiRJkrTktfrTC1X190DajEGSJEmSuqjtM3uSJEmSpCGw2JMkSZKkDrLYkyRJkqQOstiTJEmSpA6y2JMkSZKkDrLYkyRJkqQOstiTJEmSpA6y2JMkSZKkDrLYkyRJkqQOWr6rlUn+YFfrq+rzCxuOJEmjl2R/4EjgEcC/AdcAk1V1b6uBSZI0D7ss9oDnNs/7A08BLmqWnwZcAljsSZKWrCRPA9YD+wJXAncADwCOB/5TknOBv66qH7cXpSRJc7PLYq+qXgaQ5DxgTVXd1iwfAHxw+OFJkjRUzwb+uKq+P31FkuXAc4BjgM/NZeNJjgPeBywDPlpVp09bvwfwv4EnAT8CXlhV32vWnQqcBPw78OdVtWkuMUiSxtevO7O3w6odhV7jh8AjhxCPJEkjU1X/fRfrtgN/N9dtJ1lG74vRY4CtwGVJNlbVtX3dTgLuqqpDk5wInAG8MMka4ETg0fQuLf16kkdW1b/PNR5J0viZbbF3SZJNwKeBopeALh5aVJIkjVCSTwCnVNXdzfIq4GNVdfQ8NnsYsKWqbmq2eQ6wFugv9tYCf9W8Phf4QJI07edU1c+Bm5Nsabb3jXnE82u95UubufZWr1iVpGFa84iHcNpzHz2Sfc1qNs6qOgX4MPA44PHAhqp65TADkyRphP4euDTJs5P8MfA14L3z3OaBwC19y1ubtoF9mjOJdwP7zfK9JDk5yWSSyampqXmGK0nqmtme2aOqvgB8YdC6JN+oqiMWLCpJkkaoqj6SZDO9q1buBJ5QVbfPc7MZtKtZ9pnNe6mqDcAGgImJiRnrd9eovmmWJI3GQv3O3gMWaDuSJI1ckpcAZwEvBT4OnJ/kcfPc7FbgoL7llcCtO+vTTAizF7Btlu+VJGmXFqrYm/e3iZIktej5wO9W1aer6lTgT+kVffNxGbA6ycFJ7k/vfveN0/psBNY1r08ALqqqatpPTLJHkoOB1cA35xmPJGnMLFSxN2dJjktyQ5ItSda3HY8kafxU1fFVdUff8jeBw+e5ze3AKcAm4Drgs1W1Oclbkzyv6fYxYL9mApbX0vvNP6pqM/BZepO5fBX4M2filCTtrlnds5dkzbSpoklyVFVdsmNxLjuf5bTUkiQNRZL/Afyvqto2fV1V3ZPk6cADq+q8uWy/qs4Hzp/W9ua+1z8DXrCT974DeMdc9itJEsx+gpbPNtNSv4ve/XnvAiaAHZOyvGSO+5/NtNSSJA3L1cCXkvwMuAKYopfnVtObffrrwDvbC0+SpLmb7WWch9O7Ufwf6d2DcCtw5I6VVXXNHPfv1NKSpDadUFVH0rvUcjOwDPgx8EngsKp6TVWZeCRJS9Jsz+z9Avg34DfofeN5c1XduwD7b2VqaUmSGk9K8pvAHwJPm7buN+jlPkmSlqTZFnuXAV8Efofej71+JMkJVXXCPPfv1NKSpDZ9mN4EKIcAk33toffl4yFtBCVJ0kKYbbF3UlXtSIK3A2ub3ySar/umpQZ+QG9a6hcvwHYlSfq1qur9wPuTfKiqXtF2PJIkLaRZFXt9hV5/2yfmu/Oq2p5kx7TUy4CzmummJUkaGQs9SVIXzfbM3tAMmpZakiRJkjQ/rf+ouiRJkiRp4VnsSZIkSVIHWexJkiRJUgdZ7EmSJElSB1nsSZIkSVIHWexJkiRJUgdZ7EmSJElSB1nsSZIkSVIHWexJkiRJUgdZ7EmSJElSB1nsSZIkSVIHWexJkiRJUgdZ7EmSJElSB1nsSZIkSVIHWexJkrTAkuyb5IIkNzbP++yk37qmz41J1jVtD0zy5STXJ9mc5PTRRi9J6gqLPUmSFt564MKqWg1c2Cz/iiT7AqcBhwOHAaf1FYXvqapHAU8AjkzyrNGELUnqEos9SZIW3lrg7Ob12cDxA/o8E7igqrZV1V3ABcBxVfXTqroYoKruAa4AVo4gZklSx1jsSZK08B5WVbcBNM/7D+hzIHBL3/LWpu0+SfYGnkvv7OAMSU5OMplkcmpqakEClyR1x/K2dpzk3fQS2D3Ad4GXVdU/txWPJEm7I8nXgYcPWPWm2W5iQFv1bX858Gng/VV106ANVNUGYAPAxMREDeojSRpfbZ7ZuwB4TFX9Z+A7wKktxiJJ0m6pqmdU1WMGPL4I/DDJAQDN8x0DNrEVOKhveSVwa9/yBuDGqnrvsD6DJKnbWiv2quprVbW9Wfx/eD+CJKk7NgLrmtfrgC8O6LMJODbJPs3ELMc2bSR5O7AX8OoRxCpJ6qjFcs/eHwFf2dlK70mQJC0xpwPHJLkROKZZJslEko8CVNU24G3AZc3jrVW1LclKepeCrgGuSHJVkpe38SEkSUvbUO/Z29X9DM1lLiR5E7Ad+NTOtuM9CZKkpaSqfgQcPaB9Enh53/JZwFnT+mxl8P18kiTtlqEWe1X1jF2tb35A9jnA0VVlESdJkiRJC6TN2TiPA94A/F5V/bStOCRJkiSpi9q8Z+8DwIOBC5r7ET7cYiySJEmS1CmtndmrqkPb2rckSZIkdd1imY1TkiRJkrSALPYkSZIkqYMs9iRJkiSpgyz2JEmSJKmDLPYkSZIkqYMs9iRJkiSpg1r76YW2vOVLm7n21h+3HYYkddqaRzyE05776LbDkCRprHlmT5IkSZI6aOzO7PlNsyRJkqRx4Jk9SZIkSeogiz1JkiRJ6iCLPUmSJEnqIIs9SZIkSeogiz1JkiRJ6iCLPUmSJEnqIIs9SZIkSeogiz1JkiRJ6iCLPUmSFliSfZNckOTG5nmfnfRb1/S5Mcm6Aes3Jrlm+BFLkrqo9WIvyeuSVJKHth2LJEkLZD1wYVWtBi5sln9Fkn2B04DDgcOA0/qLwiR/APxkNOFKkrqo1WIvyUHAMcD324xDkqQFthY4u3l9NnD8gD7PBC6oqm1VdRdwAXAcQJIHAa8F3j6CWCVJHdX2mb0zgdcD1XIckiQtpIdV1W0AzfP+A/ocCNzSt7y1aQN4G/DXwE93tZMkJyeZTDI5NTU1/6glSZ3SWrGX5HnAD6rqW7PoazKTJC0qSb6e5JoBj7Wz3cSAtkryeODQqvrCr9tAVW2oqomqmlixYsVuxS9J6r7lw9x4kq8DDx+w6k3AG4FjZ7OdqtoAbACYmJjwLKAkqXVV9YydrUvywyQHVNVtSQ4A7hjQbStwVN/ySuAS4AjgSUm+Ry9P75/kkqo6CkmSdsNQi72dJcIkjwUOBr6VBHoJ7ookh1XV7cOMSZKkEdgIrANOb56/OKDPJuCdfZOyHAucWlXbgA8BJFkFnGehJ0mai6EWeztTVVfTd/9C8+3lRFXd2UY8kiQtsNOBzyY5id4kZC8ASDIB/GlVvbyqtiV5G3BZ8563NoWeJEkLopViT5KkLquqHwFHD2ifBF7et3wWcNYutvM94DFDCFGSNAYWRbFXVavajkGSJEmSuqTtn16QJEmSJA2BxZ4kSZIkdZDFniRJkiR1kMWeJEmSJHWQxZ4kSZIkdZDFniRJkiR10KL46YVROuObZ3D9tuvbDkOSOu1R+z6KNxz2hrbDkCRprHlmT5IkSZI6aOzO7PlNsyRJkqRx4Jk9SZIkSeogiz1JkiRJ6iCLPUmSJEnqIIs9SZIkSeogiz1JkiRJ6iCLPUmSJEnqoFRV2zHsliRTwD/NczMPBe5cgHC6xDEZzHGZyTGZyTGZaSHG5DerasVCBDMOFig/gsfzII7JTI7JTI7JYI7LTCPLkUuu2FsISSaraqLtOBYTx2Qwx2Umx2Qmx2Qmx2Tp8m83k2Myk2Myk2MymOMy0yjHxMs4JUmSJKmDLPYkSZIkqYPGtdjb0HYAi5BjMpjjMpNjMpNjMpNjsnT5t5vJMZnJMZnJMRnMcZlpZGMylvfsSZIkSVLXjeuZPUmSJEnqNIs9SZIkSeqgsSv2khyX5IYkW5KsbzueUUlyUJKLk1yXZHOSVzXt+ya5IMmNzfM+TXuSvL8Zp28neWK7n2B4kixLcmWS85rlg5Nc2ozJZ5Lcv2nfo1ne0qxf1Wbcw5Jk7yTnJrm+OV6OGPfjJMlrmv+ba5J8OskDxvE4SXJWkjuSXNPXttvHRpJ1Tf8bk6xr47NoJvOj+XE68+NM5siZzJGLOz+OVbGXZBnwQeBZwBrgRUnWtBvVyGwH/qKqfht4MvBnzWdfD1xYVauBC5tl6I3R6uZxMvCh0Yc8Mq8CrutbPgM4sxmTu4CTmvaTgLuq6lDgzKZfF70P+GpVPQp4HL2xGdvjJMmBwJ8DE1X1GGAZcCLjeZx8HDhuWttuHRtJ9gVOAw4HDgNO25EA1R7zo/lxJ8yPM5kj+5gj7/NxFmt+rKqxeQBHAJv6lk8FTm07rpbG4ovAMcANwAFN2wHADc3rjwAv6ut/X78uPYCVzT/g04HzgAB3AsunHzPAJuCI5vXypl/a/gwLPB4PAW6e/rnG+TgBDgRuAfZt/u7nAc8c1+MEWAVcM9djA3gR8JG+9l/p56O1v6v58Zef3fxY5sedjIk5cuaYmCN/ORaLMj+O1Zk9fnlA7rC1aRsrzSnzJwCXAg+rqtsAmuf9m27jMlbvBV4P3Nss7wf8c1Vtb5b7P/d9Y9Ksv7vp3yWHAFPA3zaX7nw0yZ6M8XFSVT8A3gN8H7iN3t/9csb7OOm3u8dG54+ZJcq/C+bHacyPM5kjpzFH7tKiyI/jVuxlQNtY/fZEkgcBnwNeXVU/3lXXAW2dGqskzwHuqKrL+5sHdK1ZrOuK5cATgQ9V1ROAf+WXlx0M0vkxaS6hWAscDDwC2JPeJRjTjdNxMhs7GwfHZ3Ea+7+L+fGXzI87ZY6cxhw5JyPNj+NW7G0FDupbXgnc2lIsI5fkfvQS2aeq6vNN8w+THNCsPwC4o2kfh7E6Enheku8B59C7VOW9wN5Jljd9+j/3fWPSrN8L2DbKgEdgK7C1qi5tls+ll9jG+Th5BnBzVU1V1S+AzwNPYbyPk367e2yMwzGzFI3138X8OIP5cTBz5EzmyJ1bFPlx3Iq9y4DVzQxB96d3A+nGlmMaiSQBPgZcV1V/07dqI7Bjtp919O5V2NH+0mbGoCcDd+84Fd0VVXVqVa2sqlX0joWLquoPgYuBE5pu08dkx1id0PTv1LdRVXU7cEuS32qajgauZYyPE3qXpjw5yQOb/6MdYzK2x8k0u3tsbAKOTbJP843wsU2b2mV+ND/ex/w4mDlyIHPkzi2O/Nj2zYyjfgDPBr4DfBd4U9vxjPBz/y69U8HfBq5qHs+md530hcCNzfO+Tf/Qm5ntu8DV9GZZav1zDHF8jgLOa14fAnwT2AL8X2CPpv0BzfKWZv0hbcc9pLF4PDDZHCt/B+wz7scJ8BbgeuAa4BPAHuN4nACfpndPxi/ofQN50lyODeCPmvHZArys7c/l476/i/nR/DhofMyPvzoe5siZYzL2OXIx58c0G5YkSZIkdci4XcYpSZIkSWPBYk+SJEmSOshiT5IkSZI6yGJPkiRJkjrIYk+SJEmSOshiT2pRkn9snlcleXHb8UiStFiYI6X5s9iTWlRVT2lergJ2K5ElWbbgAUmStEiYI6X5s9iTWpTkJ83L04GnJrkqyWuSLEvy7iSXJfl2kj9p+h+V5OIk/we4OsmeSb6c5FtJrknywtY+jCRJC8gcKc3f8rYDkATAeuB1VfUcgCQnA3dX1e8k2QP4hyRfa/oeBjymqm5O8nzg1qr6/eZ9e7URvCRJQ2SOlObIM3vS4nQs8NIkVwGXAvsBq5t136yqm5vXVwPPSHJGkqdW1d0txCpJ0iiZI6VZstiTFqcAr6yqxzePg6tqx7eW/7qjU1V9B3gSvYT2P5O8uYVYJUkaJXOkNEsWe9Li8C/Ag/uWNwGvSHI/gCSPTLLn9DcleQTw06r6JPAe4ImjCFaSpBEyR0pz5D170uLwbWB7km8BHwfeR2/2sSuSBJgCjh/wvscC705yL/AL4BUjiVaSpNExR0pzlKpqOwZJkiRJ0gLzMk5JkiRJ6iCLPUmSJEnqIIs9SZIkSeogiz1JkiRJ6iCLPUmSJEnqIIs9SZIkSeogiz1JkiRJ6qD/DwxARpf6nCQPAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Make plots of how x parameter values evolve over iterations, and function values evolve over iterations\n", "# Expected result: f goes to zero. all x values goto zero.\n", "\n", "fig_h, subplot_grid = plt.subplots(\n", " nrows=1, ncols=2, sharex=True, sharey=False, figsize=(15,3), squeeze=False)\n", "subplot_grid[0,0].plot(history['iter'], history['x_D'])\n", "subplot_grid[0,0].set_xlabel('iters')\n", "subplot_grid[0,0].set_ylabel('x_d')\n", "\n", "subplot_grid[0,1].plot(history['iter'], history['f'])\n", "subplot_grid[0,1].set_xlabel('iters')\n", "subplot_grid[0,1].set_ylabel('f(x)');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Try it Example 3b: Minimize the 'trid' function\n", "\n", "Given a 2-dimensional vector $x = [x_1, x_2]$, the trid function is:\n", "\n", "$$\n", "f(x) = (x_1-1)^2 + (x_2-1)^2 - x_1 x_2\n", "$$\n", "\n", "Background and Picture: \n", "\n", "Can you use autograd + gradient descent to find the optimal value $x^*$ that minimizes $f(x)$?\n", "\n", "You can initialize your gradient descent at [+1.0, -1.0]" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "def f(x_D):\n", " return 0.0 # TODO\n", "\n", "g = f # TODO" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "# TODO call run_many_iters_of_gradient_descent() with appropriate args" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# TRID example\n", "# Make plots of how x parameter values evolve over iterations, and function values evolve over iterations\n", "# Expected result: ????\n", "\n", "fig_h, subplot_grid = plt.subplots(\n", " nrows=1, ncols=2, sharex=True, sharey=False, figsize=(15,3), squeeze=False)\n", "subplot_grid[0,0].plot(history['iter'], history['x_D'])\n", "subplot_grid[0,0].set_xlabel('iters')\n", "subplot_grid[0,0].set_ylabel('x_d')\n", "\n", "subplot_grid[0,1].plot(history['iter'], history['f'])\n", "subplot_grid[0,1].set_xlabel('iters')\n", "subplot_grid[0,1].set_ylabel('f(x)');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 4: Solving linear regression with gradient descent + autograd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We observe $N$ examples $(x_i, y_i)$ consisting of D-dimensional 'input' vectors $x_i$ and scalar outputs $y_i$.\n", "\n", "Consider the multivariate linear regression model:\n", "\n", "\\begin{align}\n", "y_i &\\sim \\mathcal{N}(w^T x_i, \\sigma^2), \\forall i \\in 1, 2, \\ldots N\n", "\\end{align}\n", "where we assume $\\sigma = 0.1$.\n", "\n", "One non-Bayesian way to train weights would be to just compute the maximum likelihood solution:\n", "\n", "\\begin{align}\n", "\\min_w - \\log p(y | w, x)\n", "\\end{align}\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Toy Data for linear regression task\n", "\n", "We'll generate data that comes from an idealized linear regression model.\n", "\n", "Each example has D=2 dimensions for x.\n", "\n", "The first dimension is weighted by +4.2.\n", "The second dimension is weighted by -4.2\n" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "N = 300\n", "D = 2\n", "sigma = 0.1\n", "\n", "true_w_D = np.asarray([4.2, -4.2])\n", "true_bias = 0.1\n", "\n", "train_prng = np.random.RandomState(0)\n", "x_ND = train_prng.uniform(low=-5, high=5, size=(N,D))\n", "y_N = np.dot(x_ND, true_w_D) + true_bias + sigma * train_prng.randn(N)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Toy Data Visualization: Pairplots for all possible (x_d, y) combinations\n", "\n", "You can clearly see the slopes of the lines:\n", "* x1 vs y plot: slope is around +4\n", "* x2 vs y plot: slope is around -4" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sns.pairplot(\n", " data=pd.DataFrame(np.hstack([x_ND, y_N[:,np.newaxis]]), columns=['x1', 'x2', 'y']));" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "# Define the optimization problem as an AUTOGRAD-able function wrt the weights w_D\n", "def calc_neg_likelihood_linreg(w_D):\n", " return 0.5 / ag_np.square(sigma) * ag_np.sum(ag_np.square(ag_np.dot(x_ND, w_D) - y_N))" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4649826.583041586" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Test the function at an easy initial point\n", "init_w_D = np.zeros(2)\n", "calc_neg_likelihood_linreg(init_w_D)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-1053476.33939486, 1159525.45599207])" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Test the gradient at that easy point \n", "calc_grad_wrt_w = autograd.grad(calc_neg_likelihood_linreg)\n", "calc_grad_wrt_w(init_w_D)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "# Because the gradient's magnitude is very large, use very small step size\n", "opt_w_D, history = run_many_iters_of_gradient_descent(\n", " calc_neg_likelihood_linreg, autograd.grad(calc_neg_likelihood_linreg), init_w_D,\n", " n_iters=300, step_size=0.000001,\n", " )" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# LinReg worked example\n", "# Make plots of how w_D parameter values evolve over iterations, and function values evolve over iterations\n", "# Expected result: x\n", "\n", "fig_h, subplot_grid = plt.subplots(\n", " nrows=1, ncols=2, sharex=True, sharey=False, figsize=(15,3), squeeze=False)\n", "subplot_grid[0,0].plot(history['iter'], history['x_D'])\n", "subplot_grid[0,0].set_xlabel('iters')\n", "subplot_grid[0,0].set_ylabel('w_d')\n", "\n", "subplot_grid[0,1].plot(history['iter'], history['f'])\n", "subplot_grid[0,1].set_xlabel('iters')\n", "subplot_grid[0,1].set_ylabel('-1 * log p(y | w, x)');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Try it Example 4b: Solve the linear regression problem using a weights-and-bias representation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above example only uses weights on the dimensions of $x_i$, and thus can only learn linear models that pass through the origin.\n", "\n", "Can you instead optimize a model that includes a **bias** term $b>0$?\n", "\n", "\\begin{align}\n", "y_i &\\sim \\mathcal{N}(w^T x_i + b, \\sigma^2), \\forall i \\in 1, 2, \\ldots N\n", "\\end{align}\n", "where we assume $\\sigma = 0.1$.\n", "\n", "One non-Bayesian way to train weights would be to just compute the maximum likelihood solution:\n", "\n", "\\begin{align}\n", "\\min_{w,b} - \\log p(y | w, b, x)\n", "\\end{align}\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An easy way to do this is to imagine that each observation vector $x_i$ is expanded into a $\\tilde{x}_i$ that contains a column of all ones. Then, we can write the corresponding expanded weights as $\\tilde{w} = [w_1 w_2 b]$.\n", "\n", "\n", "\\begin{align}\n", "\\min_{\\tilde{w}} - \\log p(y | \\tilde{w},\\tilde{x})\n", "\\end{align}\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Now, each expanded xtilde vector has size E = D+1 = 3\n", "\n", "xtilde_NE = np.hstack([x_ND, np.ones((N,1))])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO: Define f to minimize that takes a COMBINED weights-and-bias vector wtilde_E of size E=3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO: Compute gradient of f" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO run gradient descent and plot the results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 5 setup: Autograd for functions of data structures of arrays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Useful Fact: autograd can take derivatives with respect to DATA STRUCTURES of parameters\n", "\n", "This can help us when it is natural to define models in terms of several parts (e.g. NN layers).\n", "\n", "We don't need to turn our many model parameters into one giant weights-and-biases vector. We can express our thoughts more naturally." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Demo 1: gradient of a LIST of parameters" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "def f(w_list_of_arr):\n", " return ag_np.sum(ag_np.square(w_list_of_arr[0])) + ag_np.sum(ag_np.square(w_list_of_arr[1]))\n", "\n", "g = autograd.grad(f)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Type of the gradient is: \n", "\n", "Result of the gradient is: \n" ] }, { "data": { "text/plain": [ "[array([0., 0., 0.]), array([0., 2., 4., 6., 8.])]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "w_list_of_arr = [np.zeros(3), np.arange(5, dtype=np.float64)]\n", "\n", "print(\"Type of the gradient is: \")\n", "print(type(g(w_list_of_arr)))\n", "\n", "print(\"Result of the gradient is: \")\n", "g(w_list_of_arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Demo 2: gradient of DICT of parameters\n" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "def f(dict_of_arr):\n", " return ag_np.sum(ag_np.square(dict_of_arr['weights'])) + ag_np.sum(ag_np.square(dict_of_arr['bias']))\n", "g = autograd.grad(f)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Type of the gradient is: \n", "\n", "Result of the gradient is: \n" ] }, { "data": { "text/plain": [ "{'bias': array(8.4), 'weights': array([0., 2., 4., 6., 8.])}" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dict_of_arr = dict(weights=np.arange(5, dtype=np.float64), bias=4.2)\n", "\n", "print(\"Type of the gradient is: \")\n", "print(type(g(dict_of_arr)))\n", "\n", "print(\"Result of the gradient is: \")\n", "g(dict_of_arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 5: Neural Networks and Autograd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's use a convenient data structure for NN model parameters\n", "\n", "Use a list of dicts of arrays.\n", "\n", "Each entry in the list is a dict that represents the parameters of one \"layer\".\n", "\n", "Each layer-specific dict has two named attributes: a vector of weights 'w' and a vector of biases 'b'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Here's a function to create NN params as a 'list-of-dicts' that match a provided set of dimensions" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "def make_nn_params_as_list_of_dicts(\n", " n_hiddens_per_layer_list=[5],\n", " n_dims_input=1,\n", " n_dims_output=1,\n", " weight_fill_func=np.zeros,\n", " bias_fill_func=np.zeros):\n", " nn_param_list = []\n", " n_hiddens_per_layer_list = [n_dims_input] + n_hiddens_per_layer_list + [n_dims_output]\n", "\n", " # Given full network size list is [a, b, c, d, e]\n", " # For loop should loop over (a,b) , (b,c) , (c,d) , (d,e)\n", " for n_in, n_out in zip(n_hiddens_per_layer_list[:-1], n_hiddens_per_layer_list[1:]):\n", " nn_param_list.append(\n", " dict(\n", " w=weight_fill_func((n_in, n_out)),\n", " b=bias_fill_func((n_out,)),\n", " ))\n", " return nn_param_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Here's a function to pretty-print any given set of NN parameters to stdout, so we can inspect" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "def pretty_print_nn_param_list(nn_param_list_of_dict):\n", " \"\"\" Create pretty display of the parameters at each layer\n", " \"\"\"\n", " for ll, layer_dict in enumerate(nn_param_list_of_dict):\n", " print(\"Layer %d\" % ll)\n", " print(\" w | size %9s | %s\" % (layer_dict['w'].shape, layer_dict['w'].flatten()))\n", " print(\" b | size %9s | %s\" % (layer_dict['b'].shape, layer_dict['b'].flatten()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: NN with 0 hidden layers (equivalent to linear regression)\n", "\n", "For univariate regression: 1D -> 1D\n", "\n", "Will fill all parameters with zeros by default" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layer 0\n", " w | size (1, 1) | [0.]\n", " b | size (1,) | [0.]\n" ] } ], "source": [ "nn_params = make_nn_params_as_list_of_dicts(n_hiddens_per_layer_list=[], n_dims_input=1, n_dims_output=1)\n", "pretty_print_nn_param_list(nn_params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: NN with 0 hidden layers (equivalent to linear regression)\n", "\n", "For multivariate regression when |x_i| = 2: 2D -> 1D\n", "\n", "Will fill all parameters with zeros by default" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layer 0\n", " w | size (2, 1) | [0. 0.]\n", " b | size (1,) | [0.]\n" ] } ], "source": [ "nn_params = make_nn_params_as_list_of_dicts(n_hiddens_per_layer_list=[], n_dims_input=2, n_dims_output=1)\n", "pretty_print_nn_param_list(nn_params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: NN with 1 hidden layer of 3 hidden units" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layer 0\n", " w | size (2, 3) | [0. 0. 0. 0. 0. 0.]\n", " b | size (3,) | [0. 0. 0.]\n", "Layer 1\n", " w | size (3, 1) | [0. 0. 0.]\n", " b | size (1,) | [0.]\n" ] } ], "source": [ "nn_params = make_nn_params_as_list_of_dicts(n_hiddens_per_layer_list=[3], n_dims_input=2, n_dims_output=1)\n", "pretty_print_nn_param_list(nn_params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: NN with 1 hidden layer of 3 hidden units\n", "\n", "Use 'ones' as the fill function for weights" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layer 0\n", " w | size (2, 3) | [1. 1. 1. 1. 1. 1.]\n", " b | size (3,) | [0. 0. 0.]\n", "Layer 1\n", " w | size (3, 1) | [1. 1. 1.]\n", " b | size (1,) | [0.]\n" ] } ], "source": [ "nn_params = make_nn_params_as_list_of_dicts(\n", " n_hiddens_per_layer_list=[3], n_dims_input=2, n_dims_output=1,\n", " weight_fill_func=np.ones)\n", "pretty_print_nn_param_list(nn_params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: NN with 1 hidden layer of 3 hidden units\n", "\n", "Use random draws from standard normal as the fill function for weights" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layer 0\n", " w | size (2, 3) | [-1.89696697 0.10497134 -0.11371797 1.09290398 0.28420754 1.60185133]\n", " b | size (3,) | [0. 0. 0.]\n", "Layer 1\n", " w | size (3, 1) | [-1.35481542 1.56242066 0.28087651]\n", " b | size (1,) | [0.]\n" ] } ], "source": [ "nn_params = make_nn_params_as_list_of_dicts(\n", " n_hiddens_per_layer_list=[3], n_dims_input=2, n_dims_output=1,\n", " weight_fill_func=lambda size_tuple: np.random.randn(*size_tuple))\n", "pretty_print_nn_param_list(nn_params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: NN with 7 hidden layers of diff sizes\n", "\n", "Just shows how generic this framework is!" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layer 0\n", " w | size (2, 3) | [0. 0. 0. 0. 0. 0.]\n", " b | size (3,) | [0. 0. 0.]\n", "Layer 1\n", " w | size (3, 4) | [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " b | size (4,) | [0. 0. 0. 0.]\n", "Layer 2\n", " w | size (4, 5) | [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " b | size (5,) | [0. 0. 0. 0. 0.]\n", "Layer 3\n", " w | size (5, 6) | [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0.]\n", " b | size (6,) | [0. 0. 0. 0. 0. 0.]\n", "Layer 4\n", " w | size (6, 5) | [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0.]\n", " b | size (5,) | [0. 0. 0. 0. 0.]\n", "Layer 5\n", " w | size (5, 4) | [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " b | size (4,) | [0. 0. 0. 0.]\n", "Layer 6\n", " w | size (4, 3) | [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " b | size (3,) | [0. 0. 0.]\n", "Layer 7\n", " w | size (3, 1) | [0. 0. 0.]\n", " b | size (1,) | [0.]\n" ] } ], "source": [ "nn_params = make_nn_params_as_list_of_dicts(\n", " n_hiddens_per_layer_list=[3, 4, 5, 6, 5, 4, 3], n_dims_input=2, n_dims_output=1)\n", "pretty_print_nn_param_list(nn_params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup: Function that performs **prediction**" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "def predict_y_given_x_with_NN(x=None, nn_param_list=None, activation_func=ag_np.tanh):\n", " \"\"\" Predict y value given x value via feed-forward neural net\n", " \n", " Args\n", " ----\n", " x : array_like, n_examples x n_input_dims\n", " \n", " Returns\n", " -------\n", " y : array_like, n_examples\n", " \"\"\"\n", " for layer_id, layer_dict in enumerate(nn_param_list):\n", " if layer_id == 0:\n", " if x.ndim > 1:\n", " in_arr = x\n", " else:\n", " if x.size == nn_param_list[0]['w'].shape[0]:\n", " in_arr = x[ag_np.newaxis,:]\n", " else:\n", " in_arr = x[:,ag_np.newaxis] \n", " else:\n", " in_arr = activation_func(out_arr)\n", " out_arr = ag_np.dot(in_arr, layer_dict['w']) + layer_dict['b']\n", " return ag_np.squeeze(out_arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Make predictions with 0-layer NN whose parameters are filled with the 'true' params for our toy dataset" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [], "source": [ "true_nn_params = make_nn_params_as_list_of_dicts(n_hiddens_per_layer_list=[], n_dims_input=2, n_dims_output=1)\n", "true_nn_params[0]['w'][:] = true_w_D[:,np.newaxis]\n", "true_nn_params[0]['b'][:] = true_bias" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0,0.5,u'predicted y|x')" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "yhat_N = predict_y_given_x_with_NN(x_ND, true_nn_params)\n", "assert yhat_N.size == N\n", "\n", "plt.plot(yhat_N, y_N, 'k.')\n", "plt.xlabel('true y')\n", "plt.ylabel('predicted y|x')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Make predictions with 0-layer NN whose parameters are filled with all zeros" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0,0.5,u'predicted y|x')" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "zero_nn_params = make_nn_params_as_list_of_dicts(n_hiddens_per_layer_list=[], n_dims_input=2, n_dims_output=1)\n", "yhat_N = predict_y_given_x_with_NN(x_ND, zero_nn_params)\n", "assert yhat_N.size == N\n", "\n", "plt.plot(yhat_N, y_N, 'k.')\n", "plt.xlabel('true y')\n", "plt.ylabel('predicted y|x')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup: Gradient descent implementation that can use list-of-dict parameters (not just arrays)" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [], "source": [ "def run_many_iters_of_gradient_descent_with_list_of_dict(f, g, init_x_list_of_dict=None, n_iters=100, step_size=0.001):\n", "\n", " # Copy the initial parameter vector\n", " x_list_of_dict = copy.deepcopy(init_x_list_of_dict)\n", "\n", " # Create data structs to track the per-iteration history of different quantities\n", " history = dict(\n", " iter=[],\n", " f=[],\n", " x=[],\n", " g=[])\n", " start_time = time.time()\n", " for iter_id in range(n_iters):\n", " if iter_id > 0:\n", " # Gradient is a list of layer-specific dicts\n", " grad_list_of_dict = g(x_list_of_dict)\n", " for layer_id, x_layer_dict in enumerate(x_list_of_dict):\n", " for key in x_layer_dict.keys():\n", " x_layer_dict[key] = x_layer_dict[key] - step_size * grad_list_of_dict[layer_id][key]\n", " \n", " fval = f(x_list_of_dict)\n", " history['iter'].append(iter_id)\n", " history['f'].append(fval)\n", " history['x'].append(copy.deepcopy(x_list_of_dict))\n", " history['g'].append(g(x_list_of_dict))\n", "\n", " if iter_id < 3 or (iter_id+1) % 50 == 0:\n", " print(\"completed iter %5d/%d after %7.1f sec | loss %.6e\" % (\n", " iter_id+1, n_iters, time.time()-start_time, fval))\n", " return x_list_of_dict, history" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Worked Exercise 5a: Train 0-layer NN via gradient descent on LINEAR toy data" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [], "source": [ "def nn_regression_loss_function(nn_params):\n", " yhat_N = predict_y_given_x_with_NN(x_ND, nn_params)\n", " return 0.5 / ag_np.square(sigma) * ag_np.sum(np.square(y_N - yhat_N))" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "completed iter 1/100 after 0.1 sec | loss 1.632803e+02\n", "completed iter 2/100 after 0.4 sec | loss 1.626805e+02\n", "completed iter 3/100 after 0.5 sec | loss 1.622965e+02\n", "completed iter 50/100 after 9.2 sec | loss 1.598157e+02\n", "completed iter 100/100 after 20.8 sec | loss 1.596883e+02\n" ] } ], "source": [ "fromtrue_opt_nn_params, fromtrue_history = run_many_iters_of_gradient_descent_with_list_of_dict(\n", " nn_regression_loss_function,\n", " autograd.grad(nn_regression_loss_function),\n", " true_nn_params,\n", " n_iters=100,\n", " step_size=0.000001)" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layer 0\n", " w | size (2, 1) | [ 4.20137427 -4.2023327 ]\n", " b | size (1,) | [0.08737985]\n" ] } ], "source": [ "pretty_print_nn_param_list(fromtrue_opt_nn_params)" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(fromtrue_history['iter'], fromtrue_history['f'], 'k.-')" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "completed iter 1/100 after 0.1 sec | loss 4.649827e+06\n", "completed iter 2/100 after 0.3 sec | loss 2.520065e+06\n", "completed iter 3/100 after 0.6 sec | loss 1.367394e+06\n", "completed iter 50/100 after 9.4 sec | loss 1.634750e+02\n", "completed iter 100/100 after 20.2 sec | loss 1.598624e+02\n" ] } ], "source": [ "fromzero_opt_nn_params, fromzero_history = run_many_iters_of_gradient_descent_with_list_of_dict(\n", " nn_regression_loss_function,\n", " autograd.grad(nn_regression_loss_function),\n", " zero_nn_params,\n", " n_iters=100,\n", " step_size=0.000001)" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layer 0\n", " w | size (2, 1) | [ 4.20137256 -4.2023501 ]\n", " b | size (1,) | [0.08325989]\n" ] } ], "source": [ "pretty_print_nn_param_list(fromzero_opt_nn_params)" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(fromzero_history['iter'], fromzero_history['f'], 'k.-')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Create more complex non-linear toy dataset\n", "\n", "True method *regression from QUADRATIC features*:\n", "\n", "$$\n", "y \\sim \\text{Normal}( w_1 x_1 + w_2 x_2 + w_3 x_1^2 + w_4 x_2^2 + b, \\sigma^2)\n", "$$" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [], "source": [ "N = 300\n", "D = 2\n", "sigma = 0.1\n", "\n", "wsq_D = np.asarray([-2.0, 2.0])\n", "w_D = np.asarray([4.2, -4.2])\n", "\n", "train_prng = np.random.RandomState(0)\n", "x_ND = train_prng.uniform(low=-5, high=5, size=(N,D))\n", "y_N = (\n", " np.dot(np.square(x_ND), wsq_D)\n", " + np.dot(x_ND, w_D)\n", " + sigma * train_prng.randn(N))" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sns.pairplot(\n", " data=pd.DataFrame(np.hstack([x_ND, y_N[:,np.newaxis]]), columns=['x1', 'x2', 'y']));" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [], "source": [ "def nonlinear_toy_nn_regression_loss_function(nn_params):\n", " yhat_N = predict_y_given_x_with_NN(x_ND, nn_params)\n", " return 0.5 / ag_np.square(sigma) * ag_np.sum(np.square(y_N - yhat_N))" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [], "source": [ "# Initialize 1-layer, 10 hidden unit network with small random noise on weights\n", "\n", "H10_init_nn_params = make_nn_params_as_list_of_dicts(\n", " n_hiddens_per_layer_list=[10], n_dims_input=2, n_dims_output=1,\n", " weight_fill_func=lambda sz_tuple: 0.1 * np.random.randn(*sz_tuple))" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "completed iter 1/300 after 0.1 sec | loss 1.194825e+07\n", "completed iter 2/300 after 0.3 sec | loss 1.163953e+07\n", "completed iter 3/300 after 0.6 sec | loss 1.117648e+07\n", "completed iter 50/300 after 9.1 sec | loss 2.276573e+06\n", "completed iter 100/300 after 18.9 sec | loss 2.803082e+06\n", "completed iter 150/300 after 29.4 sec | loss 1.879504e+06\n", "completed iter 200/300 after 39.9 sec | loss 1.785398e+06\n", "completed iter 250/300 after 50.0 sec | loss 1.680417e+06\n", "completed iter 300/300 after 60.4 sec | loss 1.588551e+06\n" ] } ], "source": [ "H10_opt_nn_params, H10_history = run_many_iters_of_gradient_descent_with_list_of_dict(\n", " nonlinear_toy_nn_regression_loss_function,\n", " autograd.grad(nonlinear_toy_nn_regression_loss_function),\n", " H10_init_nn_params,\n", " n_iters=300,\n", " step_size=0.000001)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Plot objective function vs iters" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(H10_history['iter'], H10_history['f'], 'k.-')\n", "plt.title('10 hidden units');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Plot predicted y vs. true y for each example as a scatterplot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yhat_N = predict_y_given_x_with_NN(x_ND, H10_opt_nn_params)\n", "\n", "plt.plot(yhat_N, y_N, 'k.');\n", "plt.xlabel('predicted y|x');\n", "plt.ylabel('true y');" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_, subplot_grid = plt.subplots(nrows=1, ncols=2, sharex=True, sharey=False, figsize=(10,3), squeeze=False)\n", "subplot_grid[0,0].plot(x_ND[:,0], y_N, 'k.');\n", "subplot_grid[0,0].plot(x_ND[:,0], yhat_N, 'b.')\n", "subplot_grid[0,0].set_xlabel('x_0');\n", "\n", "subplot_grid[0,1].plot(x_ND[:,1], y_N, 'k.');\n", "subplot_grid[0,1].plot(x_ND[:,1], yhat_N, 'b.')\n", "subplot_grid[0,1].set_xlabel('x_1');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## More units! Try 1 layer with H=30 hidden units" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize 1-layer, 30 hidden unit network with small random noise on weights\n", "H30_init_nn_params = make_nn_params_as_list_of_dicts(\n", " n_hiddens_per_layer_list=[30], n_dims_input=2, n_dims_output=1,\n", " weight_fill_func=lambda sz_tuple: 0.1 * np.random.randn(*sz_tuple))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "H30_opt_nn_params, H30_history = run_many_iters_of_gradient_descent_with_list_of_dict(\n", " nonlinear_toy_nn_regression_loss_function,\n", " autograd.grad(nonlinear_toy_nn_regression_loss_function),\n", " H30_init_nn_params,\n", " n_iters=300,\n", " step_size=0.000001)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Plot objective function vs iterations" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(H30_history['iter'], H30_history['f'], 'k.-');\n", "plt.title('30 hidden units');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Plot predicted y value vs true y value for each example" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yhat_N = predict_y_given_x_with_NN(x_ND, H30_opt_nn_params)\n", "\n", "plt.plot(yhat_N, y_N, 'k.');\n", "plt.xlabel('predicted y|x');\n", "plt.ylabel('true y');" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_, subplot_grid = plt.subplots(nrows=1, ncols=2, sharex=True, sharey=False, figsize=(10,3), squeeze=False)\n", "subplot_grid[0,0].plot(x_ND[:,0], y_N, 'k.');\n", "subplot_grid[0,0].plot(x_ND[:,0], yhat_N, 'b.')\n", "subplot_grid[0,0].set_xlabel('x_0');\n", "\n", "subplot_grid[0,1].plot(x_ND[:,1], y_N, 'k.');\n", "subplot_grid[0,1].plot(x_ND[:,1], yhat_N, 'b.')\n", "subplot_grid[0,1].set_xlabel('x_1');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Even more units! Try 1 layer with H=100 hidden units" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize 1-layer, 100 hidden unit network with small random noise on weights\n", "H100_init_nn_params = make_nn_params_as_list_of_dicts(\n", " n_hiddens_per_layer_list=[100], n_dims_input=2, n_dims_output=1,\n", " weight_fill_func=lambda sz_tuple: 0.05 * np.random.randn(*sz_tuple))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "H100_opt_nn_params, H100_history = run_many_iters_of_gradient_descent_with_list_of_dict(\n", " nonlinear_toy_nn_regression_loss_function,\n", " autograd.grad(nonlinear_toy_nn_regression_loss_function),\n", " H100_init_nn_params,\n", " n_iters=600,\n", " step_size=0.0000005)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yhat_N = predict_y_given_x_with_NN(x_ND, H100_opt_nn_params)\n", "\n", "plt.plot(yhat_N, y_N, 'k.');\n", "plt.xlabel('predicted y|x');\n", "plt.ylabel('true y');" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_, subplot_grid = plt.subplots(nrows=1, ncols=2, sharex=True, sharey=False, figsize=(10,3), squeeze=False)\n", "subplot_grid[0,0].plot(x_ND[:,0], y_N, 'k.');\n", "subplot_grid[0,0].plot(x_ND[:,0], yhat_N, 'b.')\n", "subplot_grid[0,0].set_xlabel('x_0');\n", "\n", "subplot_grid[0,1].plot(x_ND[:,1], y_N, 'k.');\n", "subplot_grid[0,1].plot(x_ND[:,1], yhat_N, 'b.')\n", "subplot_grid[0,1].set_xlabel('x_1');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Try it yourself!\n", "\n", "* Can you train a prediction network on the non-linear toy data so it has ZERO training error? Is this even possible?\n", "\n", "* Can you make the network train faster? What happens if you play with the step_size?\n", "\n", "* What if you made the network **deeper** (more layers)?\n", "\n", "* What other dataset would you want to try out this regression on?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.15" } }, "nbformat": 4, "nbformat_minor": 2 }