Chapter 10. Numerical Integration and Differentiation
Section 10.0. Introduction
Recipe 10.1. Integrating a Definite Integral
Recipe 10.2. Implementing the Trapezoidal Rule in VBA
Recipe 10.3. Computing the Center of an Area Using Numerical Integration
Recipe 10.4. Calculating the Second Moment of an Area
Recipe 10.5. Dealing with Double Integrals
Recipe 10.6. Numerical Differentiation
Numerical integration is the process of approximating integrals in lieu of evaluating them analytically. This subject is very rich in terms of the number of different techniques, history, applications, and even debate as to which approaches are better for given applications. Many of the methods commonly used are derived from Newton-Cotes integration formulas. Most modern books on numerical methods include some chapter(s) devoted to these methods. Some well-known Newton-Cotes integration formulas are the trapezoidal rule, Simpson's first and second rules, and Bode's rule.
There are other methods that are typically referred to as Gaussian methods or Gaussian quadratures . This class of methods includes such formulas as the Gauss-Legendre formula, the Gauss-Chebyshev formula, and the Gauss-Hermite formula.
Of course, there are other methods that combine elements of these classes, expand upon them, or take different approaches altogether, for example, Monte Carlo integration .
In this chapter I'll show you how to implement some of the more popular numerical integration rules and formulas to solve specific problems. My aim is to show you how to implement such rules and formulas in Excel, not to show how to implement every rule, formula, or variation available. In general, most of these rules and formulas follow similar patterns in the way they are constructed, with many of them differing only by coefficients and factors. Therefore, the techniques I discuss here on how to implement some common methods in Excel can be easily extended to deal with other rules and formulas.
Additionally, I've included a recipe here on numerical differentiation. As differentiation is the inverse of integration, I felt it appropriate to address numerical differentiation here as well, and to highlight some of the particular challenges associated with numerical differentiation. In other chapters of this book that deal with solving differential equations, I'll address some other numerical methods that involve differentiation.
Recipe 10.1. Integrating a Definite Integral
You have an analytic function that you need to integrate numerically.
Use the trapezoidal rule of numerical integration.
As discussed in the introduction to this chapter, there exists a multitude of numerical integration formulas and rules to choose from. Your choice depends on several factors including, but not limited to, desired accuracy and ease of implementation. For the purposes of this example, I want to use one of the easiest to implement numerical integration rules, namely the trapezoidal rule. This rule exemplifies the main steps in setting up numerical integration in a spreadsheet without clouding the issue with more complicated mathematics.
In this example, I'll consider the integration of a given analytic function of the form:
where the analytic form of f(x) is known. Even though you know f(x), you may not be able to integrate it analytically, and this is one reason why you would resort to numerical integration.
The trapezoidal rule is well documented in virtually every calculus, engineering analysis, or numerical methods book I've ever read. Basically, this technique approximates the function under consideration with a sequence of linear curves. The general formula (also called extended or composite formula) for the trapezoidal rule is:
Here, s is the distance along the x-axis between samples. This formula assumes that all of the y-values are computed, or correspond, to equally spaced x-values. Since we have an analytic form of f(x), we can easily choose the spacing and thus the number of x-values between the bounds of integration a and b. The smaller the spacing and the greater the number of samples, the higher the accuracy.
Excel is very well suited to integration formulas like the trapezoidal rule. In fact, many popular integration rules are of similar form, in that they consist of the sum of products between coefficients and y-values. This sort of computation is easily set up in Excel.
Figure 10-1 shows a spreadsheet that uses the trapezoidal rule to compute the integral of the function:
for x in the interval [0,1].
Figure 10-1. Trapezoidal rule example
The first step in performing this integration is to set up a table containing the x-values and corresponding y-values. I selected a spacing of 0.1 between x-values and put them in column B. The y-values are computed in column C using formulas like =EXP(-(B6^2)).
The next step is to set up a column containing the coefficients that appear in the integration formula. This is almost trivial in the case of the trapezoidal rule, since all but the first and last coefficients are 1, but it's a very handy way to organize the coefficients for other rules that don't involve a bunch of ones.
I put the x-spacing value (s = 0.1) in cell C18, just for clarity. Cell C20 contains the actual trapezoidal rule formula. This cell is selected in the screenshot so you can see the formula in the formula bar toward the top of the spreadsheet. The formula is =C18*SUMPRODUCT(C6:C16,D6:D16) and it uses the SUMPRODUCT worksheet function to compute the sum of the products of the y-values and the coefficients. The result in this case is 0.746 and represents the area under the curve from x = 0 to x = 1.
The trapezoidal rule shown here is very easy to implement, which makes it very attractive. The theoretical error of the trapezoidal rule is proportional to spacing between samples to the third power. If you increase the number of samples so as to decrease the spacing by a factor of 2, then the error will decrease by a factor of 8. Looking at this from another angle, for the example discussed here, reducing the spacing by a factor of 2 from 0.1 to 0.05 only affects the result beyond the fourth digit after the decimal place. The result goes from 0.746211 to 0.746671.
Increasing the number of samples (thus decreasing the spacing) is one way to improve accuracy when using the trapezoidal rule. Some practitioners take an iterative approach and stop increasing the number of samples when changes in the result from successive iterations fall below some tolerance (see Recipe 10.2).
There are other ways to improve the accuracy of numerical integration. An obvious way is to use a rule or method that is inherently more accurate. One very well-known rule is called Gaussian quadrature or Gaussian integration.
There are many other, inherently more accurate rules, many of which are easier to implement than Gaussian quadrature and which are better suited for tabulated data in lieu of having a known analytic function. Recipes 10.3 and 10.4 discuss examples using Simpson's rule for tabulated data, which is just as easy to implement as the trapezoidal rule, and more accurate.
For now, let's consider the same problem discussed earlier but this time solved using Gaussian integration. The idea behind Gaussian integration is that an integral of a function over a standard interval can be approximated by the weighted sum of functional values over that interval. In equation form this looks like:
There are a couple of tricks to applying this method. First, for integrals over nonstandard intervals, you have to transform (usually using a linear transformation) the actual interval to the standard interval, using a change of variables. Also, you need to derive the proper coefficients by which to multiply the functional values over the interval.
Fortunately, there are several different standard Gaussian integration formulas that have already been derived and are readily available from standard texts on the subject of numerical methods. Once such standard formula is the fifth-order formula :
To apply this formula to the example function discussed in this recipe, the first thing you need to do is transform from variable x over the interval from 0 to 1 to a variable t over the interval from -1 to 1. To do this, let t = 2x - 1. Then x = (t + 1)/2, which gives dx = (1/2)dt. Now we have:
And the integral to be evaluated is now:
The 1/2 factor comes from the change of variables dx = (1/2)dt. Applying the Gaussian formula in a spreadsheet cell formula looks like this:
=0.5 * (5/9 * f_of_t(-SQRT(3/5)) + 8/9 * f_of_t(0) + 5/9 * f_of_t(SQRT(3/5)))
This formula uses a custom VBA function named f_of_t, which simply evaluates the function f(t) shown earlier. The VBA code for this function is shown in Example 10-1.
Example 10-1. VBA code for f_of_t
You don't have to use a custom VBA function as I did here. I did so only to make the spreadsheet formula more readable.
The end result for this example is 0.746814. Comparing this result to that obtained using the trapezoidal rule shows that the results are identical out to the third decimal place. To make the result from the trapezoidal rule equal that obtained from the Gaussian formula out to the fourth decimal place, you'd have to increase the number of samples to 53. To make the result identical to the fifth decimal place, you'd have to increase the number of samples to 67. (I obtained these values iteratively.)
There are two ways to view these results. On one hand, it's clear that the required number of samples when using the trapezoidal rule is higher for the same level of accuracy. On the other hand, in this example at least, the higher number of samples results in an insignificant increase in computational time, whereas applying the Gaussian formula is certainly more effort.
I have to be honest here. I've never applied the Gaussian formula to solve any real-world problem. I've always found it too easy to simply increase the number of samples when using the trapezoidal rule, or more often Simpson's rule, and live with the few extra milliseconds it takes to compute the result, given the power of modern desktop computers. This works for me, but it's not to say the same will work for the type of problems that require you to implement numerical integration.
If you're dealing with a known analytic function, then you might consider using VBA to carry out the integration, as discussed in Recipe 10.2.