Polynomials

Horner’s Scheme

Consider the polynomial \(\mathsf{P(x)} = \mathsf{3x^3 + 2x^2 - 4x + 7}\). Computing its value for a particular \(\mathsf{x}\) by means of the most naive ordering of arithmetic operations leads to the following code:

3*x*x*x + 2*x*x - 4*x + 7

The number of operations (multiplications and additions) is

\[\mathsf{3 + 1 + 2 +1 +1} = \mathsf{8}.\]

But if instead we factor the polynomial as a nested evaluation of monomials, i.e., \(\mathsf{P(x)} = \mathsf{((3x + 2)x - 4)x + 7}\), then we find that

((3*x + 2)*x - 4)*x + 7

requires only 6 operations. This is a very small savings, but it might be meaningful if computing \(\mathsf{P(x)}\) had to be performed many times (in an inner loop of the program, say). The efficiency gain is correspondingly more significant if the degree of the polynomial is higher. For a polynomial of degree \(\mathsf{n}\) the naive calculation requires

\[\mathsf{n + 1 + (n-1) + 1 + (n-2) + \cdots + 2 + 1 + 1} = \mathsf{\frac{n(n-1)}{2} + (n-1)} = \mathsf{\frac{n(n+1)}{2}}\]

operations. On the other hand, organized in the form \(\mathsf{P(x)} = \mathsf{(((a_nx_n + a_{n-1})x + a_{n-2})x + )x + a_0},\) the calculation requires only \(\mathsf{2n}\) operations—one addition and one multiplication at each of the \(\mathsf{n}\) levels of nesting.

In so-called “big-O” notation, we say that the naive scheme scales as \(\mathsf{O(n^2)}\), whereas Horner’s scheme scales as \(\mathsf{O(n)}\).

Polynomial division

Suppose the polynomial \(\mathsf{P(x)}\) can be written as \(\mathsf{P(x)} = \mathsf{Q(x)p(x) + r(x)}\) where \(\mathsf{r(x)}\) has degree less than that of \(\mathsf{p(x)}\). If we specialize to the monomial case, \(\mathsf{p(x)} = \mathsf{x-c}\), then it must be that \(\mathsf{r(x)} = \mathsf{r}\) has degree zero (constant function). Hence,

\[\mathsf{P(x)} = \mathsf{Q(x)p(x) + r(x)} = \mathsf{Q(x)(x-c) + r}\]

and

\[\mathsf{P(c) = r}.\]

The roots of \(\mathsf{P(x)}\) constitute the set of values \(\mathsf{\{c\}}\) for which polynomial division \(\mathsf{P(x)/(x-c)}\) yields remainder \(\mathsf{r} = \mathsf{0}\). (This is just another way of saying that the roots of \(\mathsf{P(x)}\) can be read off from its monomial factorization.) The process of dividing out a root factor is called deflation. We say that \(\mathsf{Q(x)}\) is deflated with respect to \(\mathsf{P(x)}\).

Synthetic division

Horner’s scheme is sometimes referred to as synthetic division. To see why this is, think about the intermediate values that are calculated along the way. For example, to evaluate the order-\(\mathsf{N}\) polynomial \(\mathsf{P(x)} = \mathsf{(\cdots((a_Nx^N+ a_{N-1})x + a_{N-2})x + \cdots)x + a_0}\) at \(\mathsf{x = c}\), we must compute from the inside out the following \(\mathsf{N}\) terms:

\[\begin{aligned} \mathsf{q_{N-1}} &= \mathsf{a_N} \\ \mathsf{q_{N-2}} &= \mathsf{cq_{N-1} + a_{N-1}} \\ &\ \vdots \\ \mathsf{q_2} &= \mathsf{cq_3 + a_3} \\ \mathsf{q_1} &= \mathsf{cq_2 + a_2} \\ \mathsf{q_0} &= \mathsf{cq_1 + a_1} \end{aligned}\]

This cycle of updates (\(\mathsf{q_N} = \mathsf{0, q_{k-1} = cq_k+ a_k}\)) terminates with the result \(\mathsf{P(c)} = \mathsf{cq_0 + a_0}\).

Moreover, we can show that the coefficients \(\mathsf{q_0}, \mathsf{q_1}, \ldots, \mathsf{q_{N-1}}\) are precisely those of the quotient polynomial \(\mathsf{Q(x)}\) that results when \(\mathsf{P(x)}\) is divided by \(\mathsf{(x-c)}\):

\[\mathsf{P(x)} = \mathsf{P(c) + (x-c)(q_0+ q_1x + q_2x^2+ q_3x^3+ + q_{N-1}x^{N-1})}\]

The proof is quite simple.

\[\begin{aligned} \mathsf{P(x)} &= \mathsf{a_0 + a_1 x + \cdots a_{N-1} x^{N-1} + a_N x^N} \\ &= \mathsf{(P(c) - cq_0) + (q_0 - cq_1)x + \cdots + (q_{N-2}-cq_{N-1})x^{N-1} + q_{N-1}x^N} \\ &= \mathsf{P(c) + (x-c)(q_0 + q_1x + \cdots + q_{N-1}x^{N-1})} \\ &= \mathsf{P(c) = (x-c)Q(x)} \end{aligned}\]

Consider the following book-keeping trick (much like long division!) for generation of the \(\mathsf{q_k}\) coefficients. Suppose we want to evaluate \(\mathsf{P(x)} = \mathsf{3x^3+2x^2-4x+7}\) at \(\mathsf{x = 1}\). We start by setting up a table containing the value of \(\mathsf{x}\) and the coefficients of \(\mathsf{P}\) in a row at the top:

-1 | 3    2   -4    7
   |
   |

Allow the first entry to drop.

-1 | 3    2   -4    7
   |
   | 3

Create a new entry in the second column by multiplying the lowest entry in the first column by the value to the left of the vertical bar.

-1 | 3    2   -4    7
   |     -3
   | 3

Sum the second column and repeat.

-1 | 3    2   -4    7
   |     -3
   | 3   -1

-1 | 3    2   -4    7
   |     -3    1
   | 3   -1   -3

-1 | 3    2   -4    7
   |     -3    1    3
   | 3   -1   -3   10

We read off the remainder from the final entry on the bottom right. That is, \(\mathsf{P(1) = 10}\). Similarly, we can show that \(\mathsf{P(0) = 7}\), \(\mathsf{P(1) = 8}\), and \(\mathsf{P(2) = 31}\).

0 | 3    2   -4    7
  |      0    0    0
  | 3    2   -4    7

1 | 3    2   -4    7
  |      3    5    1
  | 3    5    1    8

2 | 3    2   -4    7
  |      6   16   24
  | 3    8   12   31

The other terms in the bottom row are the corresponding \(q_k\) values.

Consider the following polynomial with roots 1, 2, and 3:

\[\mathsf{P(x)} = \mathsf{2(x-1)(x-2)(x-3)} = \mathsf{2x^3-12x^2+22x-12}\]

The polynomial division theorem tells us that \(\mathsf{P(x)}\) has a factor \(\mathsf{(x-c)}\) if and only if \(\mathsf{P(c)} = \mathsf{0}\).

Let’s show explicitly that \(\mathsf{P(x)/(x-2)} = \mathsf{2x^2-8x+6}\). The remainder is \(\mathsf{P(2)} = \mathsf{0}\).

2 | 2   -12   22   -12
  |       4  -16    12
  | 2    -8    6     0

Similarly, \(\mathsf{P(x)/(x-3)} = \mathsf{2x^2-6x + 4}\).

3 | 2   -12   22   -12
  |       6  -18    12
  | 2    -6    4     0

Here’s an example with a nonzero remainder: \(\mathsf{P(x)/(x-4)} = \mathsf{2x^2 - 4x + 6 + 12/(x-4)}\)

4 | 2   -12   22   -12
  |       8  -16    24
  | 2    -4    6    12

And another example with a nonzero remainder: \(\mathsf{P(x)/x} = \mathsf{2x^2- 12x + 22 -12/x}\).

0 | 2   -12   22   -12
  |       0    0     0
  | 2   -12   22   -12

Cauchy’s bound

If we want to find the real roots of a polynomial \(\mathsf{P(x)} = \mathsf{a_0} + \mathsf{a_1x} + \cdots + \mathsf{a_Nx^N}\), we know that there are at most \(\mathsf{N}\) of them. But if we want to use Newton’s method, we would like to have some hint as to where to start looking for them. A theorem due to Cauchy tells us that every root \(\mathsf{c}\) satisfies \(|\mathsf{c}| \le \mathsf{1} + \max |\mathsf{a_k/a_N}|\), where the maximum is over all \(\mathsf{k} = \mathsf{0}, \mathsf{1}, \mathsf{2}, \ldots, \mathsf{N}\). That is, there is a bound on the distance from the origin.

For example, the roots of

\[\mathsf{P(x)} = \mathsf{2(x-1)(x-2)(x-3)} = \mathsf{2(x^2-3x+2)(x-3)} = \mathsf{2x^3-12x^2+22x-12}\]

all live in the interval \([-\mathsf{M},\mathsf{M}]\) with \(\mathsf{M} = \mathsf{1 + 22/2} = \mathsf{12}\).

Note that this is a relatively weak bound and not always useful for very high degree polynomials.

Multiplicities

It is possible to have roots that coincide, e.g., \(\mathsf{P(x)} = \mathsf{(x-1)(x-2)} \cdots \mathsf{(x-u)} \cdots \mathsf{(x-v)}\) with \(\mathsf{u = v}\). (It may either be that \(\mathsf{u = v}\) exactly or that they agree within floating point accuracy!) We say that \(\mathsf{P(x)}\) has a root \(c\) of multiplicity \(\mathsf{m}\) if \(\mathsf{(x-c)^m}\) divides \(\mathsf{P}\) but \(\mathsf{(x-c)^{m+1}}\) does not. Higher multiplicities reduce the rate of convergence of Newton’s algorithm but are not fatal.

Root finding algorithm

Putting all these idea together, we arrive at a good root-finding algorithm for polynomials:

Use Cauchy’s bound to find an interval \([-\mathsf{M},\mathsf{M}]\) that contains the roots
Perform Newton’s method iterations starting from some \(\mathsf{x}_{\textsf{guess}} \in [-\mathsf{M},\mathsf{M}]\)
If Newton’s method converges to a point \(\mathsf{c}\), try to deflate \(\mathsf{P(x)}\) by \(\mathsf{(x-c)}\), \(\mathsf{(x-c)^2}\), \(\mathsf{(x-c)^3}\), etc.
Repeat until there are no more roots.

Polynomial Interpolation

Suppose that we are given a set of \(N+1\) coordinate pairs \(\{ (\mathsf{x_n,y_n}) \}\), which we interpret as being derived from a functional relationship \(\mathsf{y_n} = \mathsf{f(x_n)}\). Is there any way to deduce \(\mathsf{f}:\mathbb{R}\to\mathbb{R}\) from the data points? In fact, there is not, even if we impose additional conditions, such as smoothness or differentiability.

It is possible to find a polynomial approximation \(\mathsf{p(x)}\) such that \(\mathsf{y_n} = \mathsf{p(x_n)} = \mathsf{f(x_n)}\). An \(\mathsf{N}\)th-order polynomial

\[\mathsf{p(x)} = \mathsf{p_0 + p_1 x + p_2 x^2} + \cdots + \mathsf{p_N x^N}\]

has \(\mathsf{N+1}\) coefficients (free parameters) that can be fixed uniquely by the requirement that it pass through the \(\mathsf{N+1}\) known coordinates.

\[\begin{aligned} \mathsf{p(x_0)} &= \mathsf{p_0 + p_1 x_0 + p_2 x^2_0} + \cdots + \mathsf{p_N x^N_0} \\ \mathsf{p(x_1)} &= \mathsf{p_0 + p_1 x_1 + p_2 x^2_1} + \cdots + \mathsf{p_N x^N_1} \\ &\ \,\vdots\\ \mathsf{p(x_N)} &= \mathsf{p_0 + p_1 x_N + p_2 x^2_N} + \cdots + \mathsf{p_N x^N_N} \end{aligned}\]

This system of equations is linear in the coefficients of \(\mathsf{p(x)}\). Hence, it can be recast as a matrix equation

\[\begin{pmatrix} \mathsf{1} & \mathsf{x_0} & \mathsf{x_0^2} & \cdots & \mathsf{x_0^{N-1}} \\ \mathsf{1} & \mathsf{x_1} & \mathsf{x_1^2} & \cdots & \mathsf{x_1^{N-1}} \\ \vdots \\ \mathsf{1} & \mathsf{x_{N-1}} & \mathsf{x_{N-1}^2} & \cdots & \mathsf{x_{N-1}^{N-1}} \\ \end{pmatrix} \begin{pmatrix} \mathsf{p_0} \\ \mathsf{p_1} \\ \vdots \\ \mathsf{p_{N-1}} \end{pmatrix} = \begin{pmatrix} \mathsf{y_0} \\ \mathsf{y_1} \\ \vdots \\ \mathsf{y_N} \\ \end{pmatrix} \]

of the form \(\mathsf{X\mathbf{p} = \mathbf{y}}\). Provided that the matrix \(\mathsf{X}\) is non-singular, there is a unique solution \(\mathsf{\mathbf{p} = X^{-1} \mathbf{y}}\).

For example, suppose that we know the function at two points \(\mathsf{(x_0,y_0)}\) and \(\mathsf{(x_1,y_1)}\). A linear interpolation between the two points must satisfy

\[\begin{pmatrix} \mathsf{1} & \mathsf{x_0}\\ \mathsf{1} & \mathsf{x_1} \end{pmatrix} \begin{pmatrix} \mathsf{p_0} \\ \mathsf{p_1} \end{pmatrix} = \begin{pmatrix} \mathsf{y_0} \\ \mathsf{y_1} \\ \end{pmatrix}. \]

Hence,

\[\begin{pmatrix} \mathsf{p_0} \\ \mathsf{p_1} \end{pmatrix} = \begin{pmatrix} \mathsf{1} & \mathsf{x_0}\\ \mathsf{1} & \mathsf{x_1} \end{pmatrix}^{-1} \begin{pmatrix} \mathsf{y_0} \\ \mathsf{y_1} \\ \end{pmatrix} = \mathsf{\frac{1}{x_1 - x_0}} \begin{pmatrix} \mathsf{x_1} & \mathsf{-x_0}\\ \mathsf{-1} & \mathsf{1} \end{pmatrix} \begin{pmatrix} \mathsf{y_0} \\ \mathsf{y_1} \\ \end{pmatrix} = \mathsf{\frac{1}{x_1 - x_0}} \begin{pmatrix} \mathsf{x_1y_0 - x_0y_1}\\ \mathsf{y_1-y_0} \end{pmatrix}, \]

which yields a linear polynomial

\[\mathsf{p(x) = \frac{x_1y_0 - x_0y_1}{x_1-x_0} + \Biggl(\frac{y_1-y_0}{x_1-x_0}\Biggr)x}.\]

Try this on your own

Convince yourself that \(\mathsf{p(x_0) = y_0}\) and \(\mathsf{p(x_1) = y_1}\).

Lagrange Polynomials

The polynomial in the previous example can be rewritten as follows:

\[\begin{aligned} \mathsf{p(x)} &= \mathsf{\frac{x_1y_0 - x_0y_1}{x_1-x_0} + \Biggl(\frac{y_1-y_0}{x_1-x_0}\Biggr)x}\\ &= \mathsf{y_0\Biggl(\frac{x-x_0}{x_1-x_0}\Biggr) + y_1\Biggl(\frac{x-x_1}{x_0-x_1}\Biggr)}\\ &\equiv \mathsf{y_0 l_0(x) + y_1 l_1(x)} \end{aligned}\]

The functions \(\mathsf{l_0(x)}\) and \(\mathsf{l_1(x)}\) are called Lagrange polynomials. There are defined relative to a particular set \(\{\mathsf{x_n}\}\) and can be generalized to arbitrary order:

\[\mathsf{l_n(x)} = \prod_{\substack{\mathsf{k=0}\\ \mathsf{k\neq n}}}^\mathsf{N} \,\mathsf{\frac{x-x_k}{x_n - x_k} = \Biggl(\frac{x-x_0}{x_n-x_0}\Biggr)\cdots \Biggl(\frac{x-x_{n-1}}{x_n-x_{n-1}}\Biggr) \Biggl(\frac{x-x_{n+1}}{x_n-x_{n+1}}\Biggr) \cdots \Biggl(\frac{x-x_N}{x_n-x_N}\Biggr)}.\]

Note that these are constructed such that each \(\mathsf{l_n(x)}\) is a polynomial of degree \(\mathsf{N}\) and has the property that \(\mathsf{l_n(x_m) = \delta_{n,m}}\). The polynomial has an expansion

\[\mathsf{p(x) = \sum_{n=0}^N y_n l_n(x)}.\]

Runge’s Phenomenon

We explored this is in class in some depth. Runge’s phenomenon refers to the tendency of a high-order interpolating polynomial to becoming increasingly oscillatory at high order.

There is a unique order-\(\mathsf{N}\) polynomial \(\mathsf{P(x)}\) passing through the \(\mathsf{N+1}\) coordinate pairs \(\mathsf{(x_i,y_i)}\). If those points represent a sampling of some function \(\mathsf{f}\), with \(\mathsf{y_i = f(x_i)}\) on a mesh \(\mathsf{x_i}\), then \(\mathsf{P}\) can be understood as an approximation to \(\mathsf{f}\). As \(\mathsf{N}\) gets large, and especially if the \(\mathsf{x_i}\) mesh is uniform, the polynomial approximation actually gets worse.

Splines

An important lesson from Runge’s phenomenon is that very high order polynomials are generally not reliable. An alternative—especially useful for interpolation—is to take the fitting function to be a low-order polynomial defined piecewise. Suppose that we have a data set \(\{(\mathsf{x_n,y_n})\}\) and a collection of polynomial fitting functions. Here, we’ll suppose that \(\mathsf{s_{i-1}(x)}\), \(\mathsf{s_i(x)}\), and \(\mathsf{s_{i+1}(x)}\) cover the intervals \([\mathsf{x_{i-1},x_i}]\), \([\mathsf{x_i,x_{i+1}}]\), and \([\mathsf{x_{i+1},x_{i+2}}]\), respectively.

Linear Splines

If the fitting functions are linear, then the requirements of continuity and interpolation through \(\{(\mathsf{x_n,y_n})\}\) are enough to completely determine all the free parameters: \(\mathsf{s_{i-1}(x_i)} = \mathsf{s_i(x_i)} = \mathsf{y_i}\), \(\mathsf{s_i(x_{i+1})} = \mathsf{s_{i+1}(x_{i+1})} = \mathsf{y_{i+1}}\), etc.

\[\mathsf{s_i(x) = y_i + \frac{y_{i+1} - y_i}{x_{i+1}-x_i}(x-x_i)}\]

We can improve the quality of the interpolation by raising the order of the polynomials and imposing some degree of smoothness at each joint (or knot).

Quadratic Splines

In each slice of the domain, we assume a second order polynomial fitting function,

\[\mathsf{s_n(x)} = \mathsf{a_n} + \mathsf{b_n(x-x_n)} + \mathsf{c_n(x-x_n)^2}.\]

Fidelity with the data set requires \(\mathsf{s_n(x_n)} = \mathsf{y_n}\), which implies that \(\mathsf{a_n} = \mathsf{y_n}\). Ensuring that the slopes agree, \(\mathsf{s_{n-1}'(x_n)} = \mathsf{s_n'(x_n)}\), leads to

\[\mathsf{b_{i-1}} + \mathsf{2c_{i-1}(x_i-x_{i-1})} = \mathsf{b_i}.\]

Hence, \(\mathsf{2c_i} = \mathsf{(b_{i+1}-b_i)/(x_{i+1}-x_i)}\), and

\[\mathsf{s_n(x)} = \mathsf{y_n} + \mathsf{b_n(x-x_n)} + \mathsf{\frac{1}{2}\Biggl(\frac{b_{n+1}-b_n}{x_{n+1}-x_n} \Biggr)(x-x_n)^2}.\]

The continuity requirement \(\mathsf{s_{n-1}(x_n)} = \mathsf{s_n(x_n)}\) gives

\[\mathsf{y_{n-1}} + \mathsf{b_{n-1}(x_n-x_{n-1})} + \mathsf{\frac{1}{2}\Biggl(\frac{b_{n}-b_{n-1}}{x_n-x_{n-1}} \Biggr)(x_n-x_{n-1})^2} = \mathsf{y_n}\]

\[\textsf{or}\ \ \mathsf{b_{n+1}} = -\mathsf{b_n} + \mathsf{2\Biggl(\frac{y_{n+1}-y_n}{x_{n+1}-x_n}\Biggr)},\]

which is to say that the next linear term is equal to minus the previous linear term plus twice the next straight-line splope. But this recursion has to be seeded with a value for \(b_0\).

One possibility is to choose \(\mathsf{b_0} := \mathsf{(y_1-y_0)/(x_1-x_0)}\). Another is to minimize the total length of the curve:

\[\begin{aligned} \mathsf{L(b_0)} &= \int_{\mathsf{x_0}}^{\mathsf{x_N}} \sqrt{(\mathsf{dx})^2 + [\mathsf{f(x+dx)-f(x)}]^2}\\ &= \int_{\mathsf{x_0}}^{\mathsf{x_N}}\!\mathsf{dx}\,\sqrt{1 + [\mathsf{f'(x)}]^2}\\ &= \sum_{\mathsf{n=0}}^{\mathsf{N-1}} \int_{\sum{x_n}}^{\sum{x_{n+1}}}\!\mathsf{dx}\, \sqrt{ \mathsf{1} + [\mathsf{b_n} + \mathsf{2c_n(x-x_n)]^2}} \end{aligned}\]

Cubic Splines

The additional conditions of matching first and second derivatives \(\mathsf{s_{i-1}'(x_i)} = \mathsf{s_i'(x_i)}\) and \(\mathsf{s_{i-1}''(x_i)} = \mathsf{s_i''(x_i)}\) determine the coefficients of a cubic spline. The only complication is that two additional conditions are required at the endpoints. One can either fix the slopes at either end, if those values are known, or impose \(\mathsf{s_0''(x_0)} = \mathsf{s_{N-1}(x_N)} = \mathsf{0}\).

\[\mathsf{s_n(x)} = \mathsf{a_n} + \mathsf{b_n(x-x_n)} + \mathsf{c_n(x-x_n)^2} + \mathsf{d_n(x-x_n)^3}\]

Note that there are two few conditions imposed and hence two free parameters in the fit. We have to impose addition conditions, constraints, or assumptions. Here are two commonly used schemes:

Clamped: The slopes at the two end points are fixed, \(\mathsf{s_0'(x_0)} = \mathsf{u}\) and \(\mathsf{s_{N-1}'(x_N)} = \mathsf{v}\).
Natural: The curvature at the two end points is made to vanish, \(\mathsf{s_0''(x_0)} = \mathsf{s_{N-1}''(x_N)} = \mathsf{0}\).

Try this on your own

Determine \(\mathsf{a}\), \(\mathsf{b}\), and \(\mathsf{c}\) such that

\[\mathsf{s(x)} = \begin{cases} \mathsf{1-x+ax^2+x^3} & \mathsf{0} \le \mathsf{x} < \mathsf{1}\\ \mathsf{3+bx+cx^2 - x^3} & \mathsf{1} \le \mathsf{x} < \mathsf{2} \end{cases}\]

is a natural cubic spline on the interval \([\mathsf{0,2}]\).