{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Physics 503, Lecture 18: Regression\n", "### Phys 503, J.R. Gladden, University of Mississippi" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Fitting data to a model\n", "Scientists often need to compare measurements to a theoretical model which has some number of adjustable parameters. The task here is to find the optimal values of the parameters which minimize the difference between the model and observed data. This is quantified by the Sum of Squares of the Residuals (SSR). See the slides for definitions.\n", "The simplest model is that of a line $y(x) = mx + b$) with adjustable parameters of the slope ($m$) and intercept ($b$). For this model, we can solve exact;y for the optimal slope and intercept (see slides for details)." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Example: Linear Fit\n", "\n", "** Imports and function definitions **" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "%matplotlib wx\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "def avg(x):\n", " return sum(x)/float(len(x))\n", "\n", "def ssii(x):\n", " sum=0.0\n", " for i in range(len(x)):\n", " sum += x[i]**2\n", " return sum - len(x)*avg(x)**2\n", "\n", "def ssij(x,y):\n", " sum = 0.0\n", " for i in range(len(x)):\n", " sum += x[i] * y[i]\n", " return sum - len(x) * avg(x)*avg(y)\n", "def slope(x,y):\n", " num=0.0\n", " den=0.0\n", " for i in range(N):\n", " num += y[i]*(x[i] - xbar)\n", " den += x[i]*(x[i] - xbar)\n", " return num/den\n", "\n", "def sumsqr(x,y,m,b):\n", " sum = 0.0\n", " for i in range(N):\n", " sum += (y[i] - m*x[i] - b)**2\n", " return sum\n", "\n", "def gensumsqrs(x,y,mmin,mmax,b):\n", " mvals=np.linspace(mmin,mmax,60)\n", " r2=np.zeros(len(mvals))\n", " for i in range(len(mvals)):\n", " r2[i]=sumsqr(x,y,mvals[i],b)\n", " return mvals,r2" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Usage" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==============================\n", "Fitted slope: 0.845 +/ 0.063 (7.5%) \n", "Fitted intercept: 1.236 +/-0.371 (30.0%) \n", "Corr. Coeff.: 0.978\n", "==============================\n" ] } ], "source": [ "plt.close('all')\n", "#data=loadtxt('data.dat')\n", "data=[[1,2.1], [2,2.8],[4.2,5.0],[7.1,6.5],[6.5,7.2],[9.6,9.5]]\n", "data=np.array(data)\n", "xdata = data[:,0]\n", "ydata = data[:,1]\n", "\n", "N = len(data)\n", "\n", "xbar = avg(xdata)\n", "ybar = avg(ydata)\n", "\n", "rsq = ssij(xdata,ydata)**2/(ssii(xdata) * ssii(ydata))\n", "\n", "m = slope(xdata,ydata)\n", "b = ybar - m*xbar\n", "\n", "sigma = np.sqrt(sumsqr(xdata,ydata,m,b)/(N-2.))\n", "\n", "delb = sigma*np.sqrt(1/float(N) + avg(xdata)**2/ssii(xdata))\n", "\n", "delm = sigma/(np.sqrt(ssii(xdata)))\n", "\n", "print \"=\"*30\n", "print \"Fitted slope: %2.3f +/ %2.3f (%2.1f%%) \" % (m,delm,delm/m*100.)\n", "print \"Fitted intercept: %2.3f +/-%2.3f (%2.1f%%) \" % (b,delb,delb/b*100.)\n", "print \"Corr. Coeff.: %2.3f\" % rsq\n", "print \"=\"*30\n", "x=np.linspace(min(xdata),max(xdata),20)\n", "\n", "plt.plot(xdata,ydata,'o',x,m*x+b,'g-')\n", "plt.legend(('Data','Linear Fit'),loc=2)\n", "plt.xlabel('X data')\n", "plt.ylabel('Y data')\n", "plt.xlim(0,max(xdata))\n", "plt.ylim(0,max(ydata))\n", "\n", "plt.figure(2)\n", "for bval in np.linspace(b*0.3,b*1.8,7): \n", " mvals,r2=gensumsqrs(xdata,ydata,m*0.7,m*1.3,bval)\n", " plt.plot(mvals,r2,'-',label='b=%2.3f'%bval)\n", "plt.ylabel('$R^2$',fontsize=18)\n", "plt.xlabel('$m$',fontsize=18)\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Exercise: Polynomial Fits\n", "Some classes of non-linear functions can be linearized to use the same methods described above. The slides show examples of exponential and logarithms. Polynomials can also be linearized (trickier derivation and beyond our scope), but there is a built in function to fit polynomials in pyplot: plt.polyfit(). Here is an example:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "## generate some noisey data following a 4rd order polynomial\n", "def gendata(xmin,xmax,order,noise,N=25):\n", " coeff=(2.0,-0.2,0.1,-0.01,-0.0002,0.00013,0.0000056) #coefficients for each term (low to high)\n", " xdata=np.linspace(xmin,xmax,N)\n", " ydata=np.zeros(N)\n", " for pt in range(N):\n", " for term in range(order+1):\n", " ydata[pt]+=coeff[term]*xdata[pt]**term\n", " #add noise to the data using randn which returns an array of gaussian dist rand. nums.\n", " ydata=ydata+noise*np.random.randn(len(xdata))\n", " return xdata,ydata" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "====================\n", "Fit coefficients:\n", "a_4 = 2.162e-03\n", "a_3 = -5.362e-02\n", "a_2 = 3.730e-01\n", "a_1 = -8.818e-01\n", "a_0 = 2.731e+00\n", "====================\n" ] } ], "source": [ "xdata,ydata=gendata(0,9,4,0.4,N=10)\n", "fitCoeffs=np.polyfit(xdata,ydata,4)\n", "xfit=np.linspace(min(xdata),max(xdata),100)\n", "yfit=np.polyval(fitCoeffs,xfit)\n", "\n", "plt.plot(xdata,ydata,'o',label='Data')\n", "plt.plot(xfit,yfit,'-',label='Polynomial Fit')\n", "plt.legend(loc=2)\n", "plt.xlabel('X data')\n", "plt.ylabel('Y data')\n", "plt.show()\n", "\n", "\n", "print '='*20\n", "print 'Fit coefficients:'\n", "for i in range(len(fitCoeffs)):\n", " print 'a_%i = %2.3e' %(len(fitCoeffs)-1-i,fitCoeffs[i]) \n", "print '='*20\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 0 }