Use many enough digits to avoid unintended loss of significance. Use few enough digits to be reasonably convenient. Keep all the raw data. See section 4.2 for details on how to do things right.
If you want to express the uncertainty, express it separately and explicitly. For example, absolute uncertainty can be properly expressed as 1.234(55) or equivalently 1.234±0.055. Relative uncertainty can be expressed as 2900±13%.
State the form of the distribution, unless this is obvious from context. Examples include Gaussian, square, triangular, et cetera. See section 4.4 and section 9.9 for more on this.
The whole notion of significant digits is heavily flawed; see section 11 for more on this. Anything that can be done by means of significant digits can be done much better and more easily by other means. People who care about their data don’t use significant digits.
There are plenty of important cases where following the usual “significant figures” rules would introduce large errors into the calculations. See section 3.1 and section 11.3 for simple examples of miscalculating the nominal value. See section 11.8, section 4.7 for examples of wildly overestimating and wildly underestimating the width of the distribution.
It is not safe to assume that counting the digits in a numeral implies anything about the significance, uncertainty, accuracy, precision, repeatability, readability, resolution, tolerance, or anything else.
Some things are, for all practical purposes, completely certain. For instance, recently I bought a carton of eggs, and counted how many eggs it contained. The answer was 12. That means 12, exactly, with no uncertainty. I am quite certain that there were 12±0 eggs in that carton. That’s my story, and I’m sticking with it.
Similarly, I don’t know everything there is to know about the moon, and I don’t know everything there is to know about cheese, but I am certain for all practical purposes that the moon is not made of green cheese.
On the other hand, there is a very wide class of observed quantities and calculated quantities to which some uncertainty attaches, and those are the main focus of today’s discussion. Some introductory examples are discussed in section 2.2.
Best current practice is to speak in terms of the uncertainty. We use uncertainty in a broad sense. Other terms such as accuracy, precision, readability, tolerance, etc. are often used as nontechnical terms ... but sometimes connote various sub-types of uncertainty, i.e. as contributions to the overall uncertainty, as discussed in section 8. In most of this document, the terms “precise” and “precision” will be used as generic, not-very-technical antonyms for “uncertain” and “uncertainty”.
The only way to really understand uncertainty is in terms of probability distributions. In section 10 we will give a deep and formal definition of probability, but in the meantime we will try to skate by using rough intuitive notions of probability, as set forth in the following examples.
As a first example, suppose we roll an ordinary six-sided die and observe the outcome. The first time we do the experiment, we observe six spots, which we denote by x1=6. The second time, we observe three spots, which we denote by x2=3. If we repeat the experiment many times, ideally we get the probability distribution X shown in figure 1. To describe the distribution X, we need to say three things: the outline of the distribution is rectangular, the distribution is centered at x=3.5, and the distribution has a half-width at half-maximum (HWHM) of 2.5 units (as shown by the red bar).
The conventional but somewhat abusive notation for describing such a situation is to write x=3.5±2.5, where x is called an “uncertain quantity”. If you want to know formally and precisely what sort of thing this “x” is, the question is only partially answerable. Obviously the intent of the expression x=3.5±2.5 is to describe the distribution X. However, the form of the expression makes x look like an outcome drawn from X, perhaps some sort of abstract “average” outcome. This is an example of form not following function. The notation is not super-terrible, because the intent is reasonably clear.
It is important to appreciate the distinction between the abstraction x=3.5±2.5 and the real outcomes such as x1=6 and x2=3. The outcome x1 is not an uncertain quantity; it has the value x1=6 with no uncertainty whatsoever. The so-called uncertain quantity x=3.5±2.5 describes the distribution from which outcomes such as x1 and x2 are drawn.
Now suppose we roll a pair of dice. The first time we do the experiment, we observe 8 spots, which we denote by x1=8. The second time, we observe 11 spots, which we denote by x2=11. If we repeat the experiment many times, ideally we get the probability distribution X shown in figure 2. To describe the distribution X, we need to say that the outline of the distribution symmetrical and triangular, the distribution peaks at x=7, and the distribution has a half-width at half-maximum (HWHM) of 3 units (as shown by the red bar).
There are many probability distributions in the world, including experimentally-observed probability distributions as well as theoretical probability distributions.
Any set of experimental observations {xi} can be considered a probability distribution unto itself. Typically we assign equal weight (i.e. equal measure, to use the technical term) to each of the observations. To visualize such a distribution, you can make a graph that shows how often xi falls within a given interval. Such a graph is called a histogram.
Oftentimes, given enough observations, the histogram will converge to some well-known theoretical probability distribution. (Or, better, the cumulative distribution will converge, as discussed in section 6.) For example, it is very common to encounter a piecewise-flat distribution as shown by the magenta curve in figure 3. This is also known as a square distribution, a rectangular distribution, or the uniform distribution over a certain interval. Distributions of this form are common in nature: For instance, if you take a snapshot of an ideal rotating wheel at some random time, all angles between 0 and 360 degrees will be equally probable. Similarly, in a well-shuffled deck of cards, all of the 52-factorial permutations are equally probable. As another example, roundoff errors commonly contribute an uncertainty that is uniform over the interval [-0.5, 0.5]; see equation 13. Other quantization errors (such as discrete drops coming from a burette) contribute an uncertainty that is more-or-less uniform over some interval (such as ± half a drop).
It is also very common to encounter a so-called “normal” distribution, also (preferably) called a Gaussian distribution. In figure 3, the black curve is one example of a Gaussian distribution, and the green curve is another example.
The following table lists a few well-known families of distributions. See section 9.9 for more on this.
| Family | # of parameters | |
| Bernoulli | 1 | |
| Poisson | 1 | |
| Gaussian | 2 | |
| Rectangular | 2 | |
| Symmetric triangular | 2 | |
| Asymmetric triangular | 3 | |
Each name in the table applies to a family of distributions. Within each such family, to describe a particular member of the family (i.e. a particular distribution), it suffices to specify a few parameters. For a symmetrical two-parameter family, typically one parameter specifies the center-position and the second parameter has something to do with the halfwidth of the distribution. (The height of the curve is implicitly determined by the width, via the requirement that the area under the curve is always 1.0, as it must be for any well-behaved probability distribution.)
In particular, when we write A±B, that means A tells us the center1 and B tells us something about the halfwidth of the distribution.
We emphasize that B is more closely related to the halfwidth than to the full width, since the expression A±B means A plus-or-minus B, not plus-and-minus.
For a Gaussian distribution, conventionally B represents the standard deviation of the distribution; see section 13 for definitions and useful formulas. In figure 3, the standard deviation of the black curve is 1.0, and is depicted by a blue bar. Meanwhile, the HWHM (half-width at half-maximum) is depicted by a red bar. For any Gaussian, the HWHM is about 15% longer than the standard deviation – i.e. longer by a factor √(2 ln2), to be precise – as you can verify by plugging into equation 31 or equation 32.
For a square distribution, the expression A±B is somewhat ambiguous. In some circles, B denotes the halfwidth of the distribution, while in other circles, B denotes the standard deviation, which is very much shorter than the HWHM – shorter by a factor of √3, as you can verify from the definition, equation 30. You can visualize this in figure 3, since the black curve and the magenta curve have the same standard deviation.
Let’s be clear: An expression of the form A±B only makes sense provided everybody knows what family of distributions you are talking about, and provided it is a well-behaved two-parameter family. To say the same thing the other way: it is horrifically common for people to violate these provisos, in which case it A±B doesn’t suffice to tell you what you need to know. For example: in figure 3, the black curve and the magenta curve have the same mean and the same standard deviation, but they are certainly not the same curve. Data that is well described by the black curve would not be well described by the magenta curve, nor vice versa.
There are, of course, plenty of cases where writing A±B does make sense.
It is important to keep in mind that A±B does not represent a number, but rather a probability distribution. You learned in grade-school how to add, subtract, multiply, and divide numbers ... but now you are being asked to add, subtract, multiply and divide probability distributions. This requires a tremendously higher level of sophistication.
|
The “significant figures” method attempts to use a single decimal numeral to express both the center and the halfwidth of a distribution: the nominal value of the numeral encodes the center, while the length of the string of digits roughly encodes the halfwidth. This is a horribly clumsy way of doing things.
Professionals avoid these problems by avoiding the whole idea of “significant figures”. Instead, they use two separate numerals, expressing the nominal value and the standard deviation separately. For example, NIST (reference 1) reports the charge of the electron as
| 1.602176462(63) × 10−19 Coulombs (1) |
which can equally well be written as
| (1.602176462±0.000000063) × 10−19 Coulombs (2) |
Note that these numbers depart from the usual “sig-digs rules” by
a wide margin. The reported nominal value ends in not one but two
fairly uncertain digits.
This is as it should be. People who know what they’re doing know that the “significant digits rules” are absurd.
For specific recommendations on what you should do, see section 4.2.
In a moment (section 4.2) we will discuss how to express a given amount of uncertainty. Before we get there, though, we must face a bigger challenge, namely figuring out how much uncertainty there is. Sometimes you are given some quantities (called inputs), and asked to calculate other quantities (called outputs). If you know the uncertainties of the inputs you can calculate the uncertainties of the outputs, by a process known as propagation of uncertainty.
To say the same thing in other words, we are learning a new type of arithmetic, i.e. performing computations on probability distributions rather than on simple numbers.
Ideally, the task of figuring out how much uncertainty there is should be independent of how you express the uncertainty. But in practice the two issues are sometimes related, because bad methods of expressing the uncertainty (e.g. the “sig figs rules”) commonly contribute to making the uncertainty worse. And at a more superficial level, people who don’t think clearly about how to express uncertainty are unlikely to think clearly about propagation of uncertainty.
However, we should not blame the “sig figs rules”, however bogus they may be, for all the world’s problems. There are plenty of other things that can contribute to making the uncertainty worse.
This subsection demonstrates why there cannot possibly be a good “all-purpose” rule for rounding off numbers.
Let’s start with an ultra-simple example
| x = (((2 + 0.4) + 0.4) + 0.4) + 0.4 (3) |
where each of the addends has an uncertainty of ±10%, normally and independently distributed.
Common sense suggests that the correct answer is x = 3.6 with some uncertainty. You might guess that the uncertainty is about 10%, but in fact it is less than 6%, as you can verify using the methods of section 3.5 or otherwise.
In contrast, the usual “significant digits rules” give the ludicrous result x=2. Indeed the “rules” set each of the parenthesized sub-expressions is equal to 2.
This is a disaster. Not only do the “sig figs rules” get the answer wrong, they get it wrong by a huge margin. They miss the target by seven times the radius of the target!
Rounding off always introduces some error. This is called roundoff error or quantization error. The basic problem is that in all cases, the “significant digits rules” demand too much roundoff. An additional problem is that in this case (equation 3) the errors accumulate. We suffer from accumulation of errors during the multi-step calculation. The errors do not average out; they just accumulate. See section 3.2 for a way to solve part of the problem.
Let’s take another look at the multi-step calculation in equation 3.
Many people have discovered that they can perform multi-step calculations with much greater accuracy by using the following approach: At each intermediate step of the calculation, the use more digits than would be called for by the sig figs rules. (These extra digits are called guard digits.) They apply the sig figs rules only at the very last step, rounding only the final result. This introduces some roundoff error, but at least there was no accumulation of roundoff error. For a calculation with many, many steps this might be a huge, huge improvement.
Keeping a few guard digits reduces the roundoff error by a few orders of magnitude.
Guard digits are always good.
Guard digits do not, however, solve all the world’s problems. In particular, they do not solve all the problems with the sig figs rules:
Alas if you don’t propagate the uncertainty from step to step, when you get to the end of the multi-step calculation, in most cases you don’t know how much uncertainty to assign to the final result.
It is sometimes suggested that you should assume final result has an uncertainty somehow commensurate with the uncertainty of the original givens. This is, alas, just a wild guess. Sometimes the uncertainty is much less than this, and sometimes much more, as in the example in section 3.8. So, to repeat, in most cases you just don’t know how much uncertainty to assign to the final result of multi-step calculation.
The only way to solve this problem is to propagate the uncertainty from step to step without using sig figs. As always, the proper approach is to represent the uncertainty separately and explicitly.
The only way to solve this problem is to express the uncertainty of the final result without using sig figs. As always, the proper approach is to represent the uncertainty separately and explicitly.
When there is noise (i.e. uncertainty) in your raw data, guard digits don’t make the raw noise any smaller ... they just make the roundoff errors smaller.
See section 4.7 for more discussion of guard digits.
See section 8 for more discussion of various contributions to the uncertainty.
The example in equation 3 is certainly not the only example where the uncertainty in the final answer is less than the uncertainty in the raw data.
We now turn to a somewhat trickier example, where the nature of the problem is not quite so obvious. Again, for simplicity, let’s assume the data is normally distributed and uncorrelated. As we shall see shortly, roundoff errors can be quite serious even in this case.
Suppose each of the raw data points is uncertain at the 0.01 level. If we average 100 such points, the mean value will be uncertain at the 0.001 level. More generally, if we average N points, the mean will be less uncertain than the raw data by a factor of √N.
We denote the ith raw data point by A(i) ± σA(i), where σA(i) is the noise. This noise is already present in the raw data.
Next, we round off each data point. That leaves us with something like A(i) ± RA(i) ± σA(i), where RA(i) is the roundoff error. It is easy to fall into situations where even though the σA(i) are independent and normally distributed, the RA(i) have a viciously lopsided non-normal distribution.
Just because the raw data is normally distributed doesn’t mean the roundoff errors will be normally distributed!
| For normally-distributed errors, when you add two numbers, the absolute errors add in quadrature, as discussed in section 3.7. That’s good, because it means errors accumulate relatively slowly, and errors can be reduced by averaging. | For a lopsided distribution of errors, such as can result from roundoff, the errors just plain add, linearly. This can easily result in disastrous accumulation of error. Averaging doesn’t help. |
This is illustrated by the example worked out in the “roundoff” spreadsheet (reference 2), as we now discuss. The first few rows and the last few rows of the spreadsheet are reproduced here. The numbers in red are seriously erroneous.
| —– raw data —– | —– Alice —– | —– Bob —– | —– Carol —– | |||||||||||||
| 1 | 0.062 | ± | 0.018 | 0.062 | ± | 0.018 | 0.062 | ± | 0.018 | 0.06 | ± | 0.02 | ||||
| 2 | 0.036 | ± | 0.018 | 0.098 | ± | 0.025 | 0.098 | ± | 0.025 | 0.10 | ± | 0.03 | ||||
| 3 | 0.030 | ± | 0.018 | 0.128 | ± | 0.031 | 0.128 | ± | 0.031 | 0.13 | ± | 0.03 | ||||
| 4 | 0.026 | ± | 0.018 | 0.154 | ± | 0.036 | 0.154 | ± | 0.036 | 0.16 | ± | 0.04 | ||||
| ... | ||||||||||||||||
| 98 | 0.026 | ± | 0.018 | 4.285 | ± | 0.178 | 4.36 | ± | 0.18 | 3.4 | ± | 0.2 | ||||
| 99 | 0.044 | ± | 0.018 | 4.329 | ± | 0.179 | 4.40 | ± | 0.18 | 3.4 | ± | 0.2 | ||||
| 100 | 0.021 | ± | 0.018 | 4.350 | ± | 0.180 | 4.42 | ± | 0.18 | 3.4 | ± | 0.2 | ||||
| 4.35 | ± | 0.18 | 4.42 | 3.4 | ||||||||||||
| = | 4.35 | ± | 4.1% | |||||||||||||
The leftmost column is a label giving the row number. The next column is the raw data. You can see that the raw data consists of numbers like 0.048±0.018 and you already see that we are departing from the usual “significant figures” nonsense. There is considerable uncertainty in the second decimal place, so you may be wondering why I am recording the data to three decimal places.
Answer: as will become clear very soon, it is important to keep that third decimal place. We are going to calculate the average of 100 such numbers, and the average will be known tenfold more accurately than any of the raw inputs.
To say the same thing in slightly different terms: there is in fact an important signal – a significant signal – in that third decimal place. The signal is obscured by noise; that is, there is a poor signal-to-noise ratio. Your mission, should you decide to accept it, is to recover that signal.
This sort of signal-recovery is at the core of many activities in real research labs, and in industry. The second thing I ever did in a real physics lab was to build a communications circuit that picked up a signal that was ten million times less powerful than the noise (SNR = -70 dB). Your typical GPS receiver deals with even worse SNRs – and the stuff that JPL puts in the Deep Space Network is mind-boggling. Throwing away the signal at the first step by “rounding” the raw data would be a Bad Idea.
Take-home message #1: Signals can be dug out from the noise. Uncertainty is not the same as insignificance, because a digit that is uncertain (and many digits to the right of that!) can be dug out by techniques such as signal-averaging. Given just a number and its uncertainly level, without knowing the context, you cannot say whether the uncertain digits are significant or not.Take-home message #2: An expression such as 0.048 ± 0.018 expresses two quantities: the value of the signal, and an estimate of the noise. Combining these two quantities into a single numeral by rounding (according to the “significant figures rules”) is highly unsatisfactory. In cases like this, if you round to express the noise, you destroy the signal.
Now, returning to the numerical example: I assigned three students (Alice, Bob, and Carol) to analyze this data. Alice didn’t round any of the raw data or intermediate results. She got an average of
| 0.0435±0.0018 (4) |
and the main value (0.0435) is the best that could be done given the points that were drawn from the ensemble. (The error-estimate is a worst-case error; the probable error is somewhat smaller.)
Meanwhile, Bob was doing fine until he got to row 31. At that point he decided it was ridiculous to carry four figures (three decimal places) when the estimated error was more than 100 counts in the last decimal place. He figured that if rounded off one digit, there would still be at least ten counts of uncertainty in the last place. He figured that would give him not only “enough” accuracy, but would even give him a guard digit for good luck.
Alas, Bob was not lucky. Part of his problem is that he assumed that roundoff errors would be random and would add in quadrature. In this case, they aren’t and they don’t. The errors accumulate linearly (not in quadrature) and cause Bob’s answer to be systematically high. The offset in the answer in this case is slightly less than the error bars, but if we had averaged a couple hundred more points the error would have accumulated to disastrous levels.
|
Carol was even more unlucky. She rounded off her intermediate results so that every number on the page reflected its own uncertainty (one count, possibly more, in the last digit). In this case, her roundoff errors accumulate in the “down” direction, with spectacularly bad effects.
The three students turned in the following “bottom line” answers:
| (5) |
Note that Alice, Bob, and Carol are all analyzing the same raw data; the discrepancies between their answers are entirely due to the analysis, not due to the randomness with which the data was drawn from the ensemble.
Take-home message #3: Do not assume that roundoff errors are random. Do not assume that they add in quadrature. It is waaaay too easy to run into situations where they accumulate nonrandomly, introducing a bias into the result. Sometimes the bias is obvious, sometimes it’s not.
Important note: computer programs2 and hand calculators round off the data at every step. IEEE 64-bit floating point is slightly better than 15 decimal places, which is enough for most purposes but not all. Homebrew numerical integration routines are particularly vulnerable to serious errors arising from accumulation of roundoff errors.
One of the things that contributes to Bob’s systematic error can be traced to the following anomaly: Consider the number 0.448. If we round it off, all at once, to one decimal place, we get 0.4. But if we round it off in two steps, we get 0.45 (correct to two places) which we then round off to 0.5. This can be roughly summarized by saying that the roundoff rules do not have the associative property. If you have this problem, you might find it amusing to try the round-to-even rule: round the fives toward even digits. That is, 0.75 rounds up to 0.8, but 0.65 rounds down to 0.6. There are cases where this is imperfect (e.g. 0.454) but it’s better overall, it’s easy to implement, and it has a pleasing symmetry. (This rule has been invented and re-invented many times; I re-invented it myself when I was in high school.) However, you should not imagine that this will solve all your problems. In any situation where fiddling with the roundoff rules makes any significant difference in the results, you are in serious trouble. The situation is overly burdened by roundoff errors, and fiddling with the roundoff rules will only affect the tip of the iceberg. The only real solution is to use more precision (more guard digits) during the calculation. If the rounding is part of a purely mathematical exercise, keep tacking on guard digits until the result is no longer sensitive to the details of the roundoff rules. If the rounding is connected to experimental data, consider redesigning the experiment so that less rounding is required, perhaps by nulling out a common-mode signal early in the process. This might be done using a bridge, or phaselock techniques, or the like.
You can play with the spreadsheet yourself. For fun, see if you can fiddle the formulas so that Bob’s bias is downward rather than upward. Save the spreadsheet (reference 2) to disk and open it with your favorite spreadsheet program.
Notes:
Additional constructive suggestions and rules of thumb:
There exist very detailed guidelines for rounding off if that turns out to be necessary.
Here’s a simple yet powerful way of estimating the uncertainty of a result, given the uncertainty of the thing(s) it depends on.
Set up the calculation. Do it once in the usual way, using the nominal, best-estimate values. Then pick one input variable that you reckon makes the dominant contribution to the uncertainty of the result. Then re-do the calculation with this one variable at the top end of its error bar. Then do it again at the bottom end of the error bar.
I call this the Crank Three Times method. Here is an example:
| (6) |
Equation 6 tells us that if x is distributed according to x = 2±.02 then 1/x is distributed according to 1/x = .5±.005. Equivalently we can say that if x = 2±1% then 1/x = .5±1%. We remark in passing that the percentage uncertainty (aka the relative uncertainty) is the same for x and 1/x, which is what we expect provided the uncertainty is small.
The Crank Three Times method is a type of “what if” analysis. We can also consider it a simple example of an iterative method of estimating the uncertainty (in contrast to the algebraic methods described in section 3.7). This simple method is a nice lead-in to fancier iterative methods such as Monte Carlo, as discussed in section 3.5.
The Crank Three Times method is by no means an exact error analysis. It is an approximation. The nice thing is that you can understand the nature of the approximation, and you can see that better and better results are readily available (for a modest price).
As far as I can tell, for every flaw that this method has, the sig-figs method has the same flaw plus others ... which means Crank Three Times is Pareto superior.
This method requires no new software, no learning curve, and no new concepts beyond the concept of uncertainty itself. In particular, unlike significant digits, it introduces no wrong concepts.
Crank Three Times shouldn’t require more than a few minutes of labor. Once a problem is set up, turning the crank should take only a couple of minutes; if it takes longer than that you should have been doing it on a spreadsheet all along. And if you are using a spreadsheet, Crank Three Times is super-easy and super-quick.
If you have N variables that are (or might be) making a significant contribution to the uncertainty of the result, set up the spreadsheet and wiggle each variable in turn, and see what happens. Wiggle them one at a time, leaving the other N−1 at their original, nominal values. If you are worried about what happens when more than one variable at a time takes on a non-nominal value, the sig figs approach is hopeless, and algebraic methods are painfully impractical. In the rare situation where you want a worst-case analysis, you can move each variable to whichever end of its error bar makes a positive contribution to the final answer, and then flip them all so that each one makes a negative contribution. In most cases, however, a worst-case analysis is wildly over-pessimistic, especially when there are more than a few uncertain variables.
In most cases, especially when there are multiple uncertain variables and/or correlations among the variables and/or nonlinearities, your only reasonable option is Monte Carlo, as discussed in section 3.5.
Here is another example, which is more interesting because it exhibits nonlinearity:
| (7) |
Equation 7 tells us that if x is distributed according to x = 2±.9 then 1/x is distributed according to 1/x = .5(+.41−.16). Equivalently we can say that if x = 2±45% then 1/x = .5(+82%−31%). Even though the error bars on x are symmetric, the error bars on 1/x are markedly lopsided.
Lopsided error bars are fairly common in practice. They can arise whenever nonlinearities are involved. Note that the Crank Three Times method can handle nonlinearities just fine. This is vastly superior to the manual, algebraic methods discussed in section 3.7, which treat everything as approximately linear. That is, they effectively expand everything in a Taylor series, and keep only the zeroth-order and first-order terms.
Here is yet another example, which is interesting because it shows how to handle correlated uncertainties in simple cases. The task is to calculate the molar mass of natural bromine, given the nuclide mass for each isotope, and the corresponding natural abundance.
The trick here is to realize that the abundances must add up to 100%. So if one isotope is at the low end of its error bar, the other isotope must be at the high end of its error bar. So the abundance numbers are anticorrelated. This is an example of a sum rule. For more about correlations and how to handle them, see section 3.5.
(The uncertainties in the mass of each nuclide are negligible.)
| nuclide mass | natural | light case | nominal case | heavy case | ||||||||||||
| / dalton | abundance | |||||||||||||||
| 79Br | 78.9183376(20) | × | 50.686+.026% | = | 40.02107 | more | ||||||||||
| 79Br | 78.9183376(20) | × | 50.686% | = | 40.00055 | nominal | ||||||||||
| 79Br | 78.9183376(20) | × | 50.686-.026% | = | 39.98003 | less | ||||||||||
| 81Br | 80.9162911(30) | × | 49.314+.026% | = | 39.92410 | more | ||||||||||
| 81Br | 80.9162911(30) | × | 49.314% | = | 39.90306 | nominal | ||||||||||
| 81Br | 80.9162911(30) | × | 49.314-.026% | = | 39.88202 | less | ||||||||||
| ——— | ——— | ——— | ||||||||||||||
| 79.90309 | 79.90361 | 79.90412 |
So by comparing the three columns (light case, nominal case, and heavy case), we find the bottom-line answer: The computed molar mass of natural bromine is 79.90361(52). This is the right answer based on a particular sample of natural bromine. The usual “textbook” value is usually quoted as 79.904(1), which has nearly twice as much uncertainty, in order to account for sample-to-sample variability.
Note that if you tried to carry out this calculation using “significant figures” you would get the uncertainty wrong. Spectacularly wrong. Off by two orders of magnitude. The relative uncertainty in the molar mass is two orders of magnitude smaller than the relative uncertainty in the abundances.
This is based on question 3:21 on page 122 of reference 4.
Suppose we want to calculate (as accurately as possible) the molar mass of natural magnesium, given the mass of the various isotopes and their natural abundances.
Many older works referred to this as the atomic mass, or (better) the average atomic mass ... but the term molar mass is strongly preferred. For details, see reference 5.
The textbook provides the raw data shown in table 1.
isotope molar mass / dalton abundance 24Mg 23.9850 78.99% 25Mg 24.9858 10.00% 26Mg 25.9826 11.01% Table 1: Isotopes of Magnesium, Rough Raw Data
The textbook claims that the answer is 24.31 dalton and that no greater accuracy is possible. But we can get a vastly more accurate result.
The approach in the textbook has at least three problems:
It is tempting to blame all the problems on the “sig digs” notation, but that wouldn’t be fair in this case. The primary problem is mis-accounting for the uncertainty, and as we shall see, we are still vulnerable to mis-accounting even if the uncertainty is expressed using proper notation.
Similarly note that if we had solved the primary problem (getting a good estimate of the uncertainty) then the “sig digs” rules would not have called for such drastic rounding. So the propagation-of-error issue really is primary.
Let’s deal with the tertiary problem first. Let’s convert to a more reasonable notation for expressing the uncertainty. The problem is that the “sig digs” notation in table 1 gives us only the crudest idea of the uncertainty ... is it one count in the last decimal place, or two, or many? If we use only the numbers presented in the textbook, we have to guess. Let’s hypothesize a middle-of-the-road value, namely three counts of uncertainty in the last decimal place. We can express this in proper notation, as shown in table 2.
isotope molar mass / dalton abundance 24Mg 23.9850(3) 78.99(3)% 25Mg 24.9858(3) 10.00(3)% 26Mg 25.9826(3) 11.01(3)% Table 2: Isotopes of Magnesium, Rough Data with Explicit Uncertainty
This gives the molar mass of the 25Mg isotope with a relative accuracy of 12 parts per million (12 ppm), while the abundance is given with a relative accuracy of 3 parts per thousand (3000 ppm). So in some sense, the abundance number is 250 times less accurate.
It is important to notice that all three isotope masses are in the same ballpark. That means that uncertainties in the abundance numbers will have little effect on the sought-after average mass. Imagine what would happen if all three isotopes had the same identical mass. Then the percentages wouldn’t matter at all; we would know the average mass with 12 ppm accuracy, no matter how inaccurate the percentages were.
If we look deeper, we discover another interesting point that is absolutely necessary if we are to benefit from the aformentioned “ballpark” property: it is important that the percentage numbers are in fact percentages, so they add up to 100%. We say there is a sum rule.
That means the uncertainty in any one of the abundance numbers is strongly anticorrelated with the uncertainty in the other two. The usual elementary “propagation of uncertainty” rules don’t take this into account; instead, they rashly assume that all errors are uncorrelated. If you just add up the abundance numbers without realizing they are percentages, i.e. without any sum rule, you get
| 78.99(3) + 10.00(3) + 11.01(3) = 100.00(5) ??? (8) |
with 500 ppm uncertainty, but the sum rule tells us they actually add up to 100 with some utterly negligible uncertainty:
| 78.99(3) + 10.00(3) + 11.01(3) = 100.0±0 (9) |
because if you add up all the possibilities you have to get 100%. Maybe there is some fourth, hitherto-unknown isotope that makes the LHS of equation 9 less than 100%, but any such contribution is utterly negligible relative to the other uncertainties in the problem.
There are various ways to alleviate this problem.
One method, as pointed out by Matt Sanders, is to subtract off the common-mode contribution by artfully regrouping the terms in the calculation. That is, you can subtract 25 (exactly) from each of the masses in table 2, then take the weighted average of what’s left in the usual way, and then add 25 (exactly) to the result. The differences in mass are on the order of unity, i.e. 25 times smaller than the masses themselves, so this trick makes us 25 times less sensitive to problems with the percentages. We are still mis-accounting for the correlated uncertainties in the percentages, but the mis-accounting does 25 times less damage.
The idea of subtracting off the common-mode is a good one, and has many applications. The idea was applied here to a mathematical calculation, but it also applies to the design of experimental apparatus: for best accuracy, make a differential measurement or a null measurement whenever you can.
To summarize, subtracting off the common-mode is a good trick, but (a) it requires understanding the problem to some extent, (b) it only works if the problem is linear, and (c) it doesn’t entirely solve the problem, because it doesn’t fully exploit the sum rule.
We now turn to a completely different technique, namely Monte Carlo.
This has many advantages. It is a very general and very powerful technique. It can be applied to nonlinear problems. It is flexible enough to allow us to exploit the sum rule explicitly.
Remember (as mentioned in section 2) that an uncertain quantity is really a probability distribution. There are many ways of representing a probability distribution. We could represent it parametrically (specifying the center and standard deviation). Or we could represent it graphically. Or (!) we could represent it by a huge sample, i.e. a huge list of observations drawn according to the distribution.
The representation in terms of a huge sample is often considered an inelegant, brute-force technique, to be used when you don’t understand the problem ... but sometimes brute force has an elegance all its own. Doing this problem analytically requires a great deal of sophistication (calculus, statistics and all that) and even then it’s laborious and error-prone. The Monte Carlo approach just requires knowing one or two simple tricks, and then the computer does all the work.
You can download the spreadsheet for solving the Mg molar mass question. See reference 6.
The strategy is to treat each of the uncertain quantities in table 3 as a probability distribution, and to represent each distribution by 100 observations. Using these observations, we make 100 independent trial calculations of the average mass, and then compute the mean and standard deviation of these 100 trial values.
Actually, if we’re going to go to all that trouble, we might as well use the best available data, taken from reference 7, as shown in table 3. (Actually, using the rough textbook data – i.e. table 2 – would lead to nearly the same results, as you can verify by plugging it into the spreadsheet.)
isotope molar mass / dalton abundance 24Mg 23.9850423(8) 78.99(4)% 25Mg 24.9858374(8) 10.00(1)% 26Mg 25.9825937(8) 11.01(3)% Table 3: Isotopes of Magnesium, IUPAC Data
In the trial calculations on the spreadsheet, all of the points are drawn independently, except that the numbers representing the 24Mg abundance are not independent at all, but instead are calculated from the other two abundance numbers using the sum rule.
The final answer appears in cells H5 and H6, namely 24.3050(6).
Technical notes:
If you compare my value against the IUPAC standard value given in reference 8, you find that the Monte Carlo calculation gives the same nominal value and the same standard deviation. That’s encouraging.
Pretend that we didn’t have a sum rule. That is, pretend that the abundance data consisted of three independent random variables, with standard deviations as given in table 2. Modify the spreadsheet accordingly. Observe what happens to the nominal value and the uncertainty of the answer. How important is the sum rule?
Hint: There’s an entire column of independent Gaussian random numbers lying around unused in the spreadsheet.
To summarize: As mentioned near the top of section 3.5, the textbook has at least three problems: Primarily, it does the propagation-of-uncertainty calculations without taking the sum rule into account (which is a huge source of error). Then the dreaded “sig digs” rules make things worse in two ways: they compel the non-use of guard digits, and they express the uncertainty very imprecisely.
The textbook answer is 24.31 dalton, with whatever degree of uncertainty is implied by that number of “sig digs”.
We now compare that with the our preferred answer, 24.3050(6) dalton. Our standard deviation is 25 ppm; theirs is something like one part per thousand (although we can’t be sure). In any case, their uncertainty is about 40 times worse than ours.
Their nominal value differs from our nominal value by something like eight times the length of our error bars. Actually the factor is somewhere between 2.5 and 25; we can’t be sure because of the crudity of the halfwidth information in the textbook data (table 1). In any case, it’s a huge discrepancy.
Suppose I’m measuring the sizes of some blocks using a ruler. The ruler is graduated in millimeters. If I look closely, I can measure the blocks more accurately than that, by interpolating between the graduations. As pointed out by Michael Edmiston, sometimes the situation arises where it is convenient to interpolate to the nearest 1/4th of a millimeter. Imagine that the blocks are slightly misshapen so that it is not possible to interpolate more accurately than that.
Let’s suppose you look in my lab notebook and find a column containing the following numbers:
40 40.25 40.75 41 Table 4: Length of Blocks, Raw Data
and somewhere beside the column is a notation that all the numbers are good to 1/4th of a millimeter.
If we worshipped the “sig digs rules” we would say that that the first number (40) had one “sig dig” and therefore had an uncertainty of a few dozen units. But that would be wrong. The actual uncertainty is a hundred times smaller than that. The lab book says the uncertainty is 1/4th of a unit, and it means what it says.
At the other end of the spectrum, the fact that I wrote 40.75 with two digits beyond the decimal point does not mean that it is accurate to a few percent of a millimeter. The actual uncertainty is ten times larger than that. The lab book says that all the numbers are good to 1/4th of a millimeter, and that’s the end of the story.
The numbers in table 4 are perfectly suitable for typing into a computer for further processing. Other ways of recording are also suitable, but it is entirely within my discretion to choose among the various suitable formats that are available.
The usual ridiculous “significant digits rules” would compel me to round off 40.75 to 40.8. That changes the nominal value by 0.05mm. That shifts the distribution by 20% of its half-width. Twenty percent seems like a lot. It may or may not be harmless. In contrast, writing 3/4ths as .75 cannot cause problems, might be better than rounding off, and costs nothing.
Bottom line: Paying attention to the “sig digs rules” is unnecessary at best. Record the nominal value and the uncertainty separately. Keep many enough digits to make sure there is no roundoff error. Keep few enough digits to be reasonably convenient. Keep all the raw data. See section 4.2 for more details.
Even more-extreme examples can be found. Many rulers are graduated in 1/8ths of an inch. When measuring things to the nearest 1/8th, it is often convenient and sensible to write things to three decimal places, e.g. 4.375 in. If we worshipped the “sig digs rules” we might think such a number was accurate with a few thousandths of an inch, but that would be completely wrong. The actual uncertainty is one or two orders of magnitude larger.
Any time your measurements are quantized with a step-size that doesn’t divide 10 evenly, the “sig digs rules” will mess things up.
I’ve seen alleged rules that say you should read instruments by interpolating to 1/10th of the finest scale division, and/or that the precision of the instrument is 1/10th of the finest scale division. There is no reasonable basis for any such rule. For obvious reasons, instruments are typically calibrated in conventional units (e.g. SI units) times some power of ten. If the readability and/or precision of the instrument happens to coincide with some unit times a power of ten, it’s probably a coincidence.
Usually, the fundamental limit of readability is set by some sort of noise, fluctuations, or fuzz. That makes sense, because if the reading were not fuzzy, you could just apply some magnification and get more accuracy for free.
People often ask for rules for calculating uncertainty by hand. In general, I recommend against it, because in almost all cases you’re better off using an iterative approach: perhaps the Crank Three Times method discussed in section 3.4, or if that’s not good enough, the Monte Carlo method as discussed in section 3.5. Remember, you don’t have to re-invent all the Monte Carlo technology on your own; just copy the existing spreadsheet (reference 6) and re-jigger it to do what you want.
However ... if you insist on doing things without a computer, I will now provide some rough-and-ready techiques you can use. I assume you already know how to add, subtract, multiply, and divide numbers, so we will now discuss how to add, subtract, multiply, and divide probability distributions, subject to certain restrictions.
Each of the capital-letter quantities here (A, B, and C) is a probability distribution. We can write A := mA±σA, where mA is the mean and σA is the standard deviation.
Remarks:
Suppose somebody who adheres to the sig-digs cult asks you to work the following problem3
| 4.4 × 2.617 − 9.064 (10) |
Each of the three quantities involved has some uncertainty, so your first task is to figure out how much uncertainty. Another way of saying it is that each of those quantities looks like a number but is really a probability distribution in disguise, and you have to figure out the width of the distribution. One semi-reasonable guess is that each quantity has about three counts of uncertainty in the last digit. But it could be a lot more, or a lot less ... you never know, which is one of the abominable things about significant figures.
So let’s make that guess, and restate the problem as 4.4±.3 × 2.617±.003 − 9.064±.003.
Using the usual precedence rules, we do the multiplication first. According to the propagation rules in section 3.7, we will need to convert the absolute uncertainties to relative uncertainties. That gives us: 4.4±6.82% × 2.617±0.1%. When we carry out the multiplication, the result is 11.5148±6.82%. Note that the uncertainty in the product is entirely dominated by the uncertainty in the first factor, because the uncertainty in the other factor is relatively small.
Next we convert back from relative to absolute uncertainties, then carry out the subtraction. That results in 11.5136±0.785 − 9.064±.003 = 2.4496±0.785.
Now we have to decide how to present this result. One reasonable possibility would be to round it to 2.45±0.79 or equivalently 2.45(79). One could also justify heavier rounding, to 2.4(8). Note that this version differs from the previous version by only 6% of an error bar, so the extra rounding wasn’t particularly disastrous.
Trying to express the foregoing result using sig digs would be a nightmare, as discussed in more detail in section 11.7.
The calculation set forth in equation 10 is an example of what we call a noise amplifier. We started with three numbers, one of which had about 7% relative uncertainty, and the others much less. We ended up with about 32% relative uncertainty.
It appears that the uncertainty grew during the calculation, but you should not blame the calculation. The calculation did not cause the uncertainty; it merely made manifest the uncertainty that was inherent in the situation from the beginning.
As a rule of thumb: Any time you compute a small difference between large numbers, the relative uncertainty will be magnified.
If you have a noise amplifier situation that results in unacceptable uncertainty in the final answer, you will need to make major changes and start over. In some cases, it suffices to a more precise measurement of the raw data. In other cases, you will need to make major architectural changes in the experimental apparatus and procedures, perhaps using some sort of “null” technique (electrical bridge, acoustical beats, etc.) so that subtracting off such a large “baseline” number is not required.
Suppose you are taking data. How many raw data points should you take? How accurately should you measure each point? There are reliable schemes for figuring out how much is enough. But the reliable schemes are not simple, and the simple schemes are not reliable. Any simple rule like “Oh, just measure everything to three significant digits and don’t worry about it” is highly untrustworthy. Some helpful suggestions will be presented shortly, but first let’s take a moment to understand why this is a hard problem.
First you need to know how much accuracy is needed in the final answer, and then you need to know how the raw data (and other factors) affect the final answer.
Sometimes the uncertainties in the raw data can have less effect than you might have guessed, because of signal-averaging or other clever data reduction (section 3.3) or because of anticorrelated errors (section 3.5). Conversely, sometimes the uncertainties in the raw data can be much more harmful than you might have guessed, because of correlated errors, or because of unfavorable leverage, as we now discuss.
As an example of how unfavorable leverage can hurt you, suppose we have an angle theta that is approximately 89.3 or 89.4 degrees. If you care about knowing tan(theta) within one part in a hundred, you need to know theta within less than one part in ten thousand.
Whenever there is a singularity or near-singularity, you risk having unfavorable leverage. The proverbial problem of small differences between large numbers falls into this category, if you care about relative error (as opposed to absolute error).
There are several equally good ways of expressing a number along with an explicit uncertainty. It usually doesn’t matter whether the uncertainty is expressed in absolute or relative terms, so long as it is expressed clearly. For example, here is one common way to express relative uncertainty:
| (11) |
Meanwhile, here are a couple ways to express absolute uncertainty. The following are synonymous:
| (12) |
There are other ways of expressing uncertainty that do not use the “plus or minus” idea. Sometimes it is appropriate to express the range of the data explicitly, for example by saying the observed sizes were in the range of 0.5 to 2.5 femtoliter. Stating the range is particularly useful if you have a flat distribution rather than a Gaussian.
There are special rules for raw data, as described in section 4.3. Otherwise, all these recommendations apply equally well to measured quantities and calculated quantities.
If you have a long list of numbers, you may be able to save yourself some writing by “distributing out” the statement of uncertainty, e.g. by writing a note that applies to the whole list, saying that the list elements are all ±0.055, or all ±4%, or all limited by roundoff. That saves you the effort of attaching the uncertainty directly to each element of the list.
You should report the form of the distribution, as discussed in section 4.4. Once we know the form of the distribution, if it is a two-parameter distribution, then either of the expressions in equation 12 gives us a complete description of the distribution.
In the not-too-unusual case where roundoff error is the dominant contribution to the uncertainty, this can be expressed using a slash in parentheses:
| (13) |
This can be viewed as shorthand for 0.087(½) i.e. an uncertainty of half a count in the last place, or equivalently 0.0870(5), but it also conveys the fact that the distribution of roundoff errors is usually highly non-Gaussian, usually closer to a flat distribution. In particular, the standard deviation may be markedly smaller than the halfwidth, as discussed in connection with figure 3.
Similarly, if a number has been truncated, this an be expressed using a plus-sign in parentheses:
| (14) |
This can be viewed as shorthand for 0.0875(5), with a highly non-Gaussian distribution.
None of these recommendations dictate an “exactly right” number of digits. You should not be surprised by this; you should have learned by now that many things – most things – do not have exact answers. For example, suppose I know something is ten inches long, plus or minus 10%. If I convert that to millimeters, I get 254 mm, ± 10%. I might choose to round that off to 250 mm, ± 10%, or I might choose not to. In any case I am not required to round it off.
Keep in mind that there are plenty of numbers for which the uncertainty doesn’t matter, in which case you are free to write the number (with plenty of guard digits) and leave its uncertainty unstated. For example, an experiment might involve ten numbers, one of which makes an obviously dominant contribution to the uncertainty, in which case you don’t need to obsess over the others.
When comparing numbers, don’t round them before comparing, except maybe for qualitative, at-a-glance comparisons, and maybe not even then, as discussed in section 4.6.
When doing multi-step calculations, whenever possible leave the numbers in the calculator between steps, so that you retain as many digits as the calculator can handle.4 Leaving numbers in the calculator is vastly preferable to copying them from the calculator to the notebook and then keying them back into the calculator; if you round them off you introduce roundoff error, and if you don’t round them off there are so many digits that it raises the risk of miskeying something.
Similarly: When cut-and-pasting numbers from one program to another, you should make sure that all the available digits get copied. And again similarly: When a program writes numbers to a file, to be read back in later, it should ordinarily write out all the available digits. (In very exceptional cases where this would incur unacceptable inefficiency, some sort of careful data compression is needed. Simple rounding does not count as careful data compression.)
Note that the notion of “no unintended loss of significance” is meant to be somewhat vague. Indeed the whole notion of “significance” is often hard to quantify. You need to take into account the details of the task at hand to know whether or not you care about the roundoff errors introduced by keeping fewer digits. For instance, if I’m adjusting the pH of a swimming pool, I suppose I could use an analytical balance to measure the chemicals to one part in 105, but I don’t, because I know that nobody cares about the exact pH, and there are other far-larger sources of uncertainty.
When thinking about precision and roundoff, it helps to think about the same quantity two ways:
Therefore it makes sense to use a two-step process: First figure out how much roundoff error you can afford, and then use that to give you a lower bound on how many digits to use.
Beware that the terminology can be confusing here: N digits is not the same as N decimal places. Let’s temporarily focus attention on numbers in scientific notation (since the sig-digs rules are even more confusing otherwise). A numeral like 1.234 has four digits, but only three decimal places. Sometimes it makes sense to think of it in four-digit terms, since it can represent 104 different numbers, from 1.000 through 9.999 inclusive. Meanwhile it sometimes makes sense to think of it in three-decimal-place terms, since the stepsize (stepping from one such number to the next) is 10−3.
If you want to keep the roundoff errors below one part in 10 to the Nth, you need N decimal places, i.e. N+1 digits of scientific notation. For example numbers near 1.015 will be rounded up to 1.02 or rounded down to 1.01. That is, the roundoff error is half a percent.
Also beware that roundoff errors are not normally distributed. In multi-step calculations, roundoff errors accumulate faster than normally-distributed errors would. Details on this problem, and suggestions for dealing with it, can be found in section 3.3. Additional discussion of roundoff procedurs can be found in reference 3.
The cost of carrying more guard digits than are really needed is usually very small. In contrast, the cost of carrying too few guard digits can be disastrously large. You don’t want to do a complicated, expensive experiment and then ruin the results due to roundoff errors, due to recording too few digits.
When you are making observations, the rule is that you should record all the raw data, just as it comes from the apparatus. Do not make any “mental conversions” on the fly.
We are making a distinction between the raw data and the calculations used to analyze the data. The point is that if you keep all the raw data, if you discover a problem with the calculation, you can always redo the calculation. Redoing the calculation may be irksome, but it is usually much less laborious and much less costly than redoing all the lab work.
There is a wide class of analog apparatus – including rulers, burettes, graduated cylinders etc. – for which the following rule applies: It is good practice to record all of the certain digits, plus one estimated digit. For example, if the finest marks on the ruler are millimeters, in many cases you can measure a point on the ruler with certainty to the nearest millimeter … and then you should try to estimate how far along the point is between marks. If you estimate that the point is halfway between the 13 mm and 14 mm marks, record it as 13.5 mm. This emphatically does not indicate that you know the reading is exactly 13.5 mm. It is only an estimate. You are keeping one guard digit beyond what is known with certainty, to reduce the roundoff errors. You don’t want roundoff errors to make any significant contribution to the overall uncertainty of the measurement. [Also, if possible, include some indication of how well you think you have estimated the last digit: perhaps 13.5(5)mm or 13.5(3)mm or even 13.5(1)mm if you have really sharp eyes.]
There is a class of instruments, notably analog voltmeters and multimeters, where in order to make sense of the reading you need to look at the needle and at the range-setting knob. (This is in contrast to digital meters, where the display often tells the whole story.) I recommend the following notation:
| Reading | Scale | |||
| 2.88 | /3*300mV | |||
| 2.88 | /10*1V |
which is to be interpreted as follows:
| Reading | Scale | Interpretation | ||
| 2.88 | /3*300mV | “2.88 out of three on the 300mV scale” | ||
| 2.88 | /10*1V | “2.88 out of ten on the 1V scale” |
Note that both of the aforementioned readings correspond to 0.288 volts.
There are two things going on here: First of all, converting on-the-fly from what the scale says (2.88) to SI units (0.288) is too error prone, so don’t do it that way; record the 2.88 as is, and do the conversion later. Secondly, there are two ways of getting this reading, either most of the way up on the 300mV scale (the first line in the table above) or partway up on the 1V scale (the second line). It is important to record which scale was used, in case the two scales are not equally well calibrated.
Note that the notation “/3*300mV” also tells you the algebraic operations needed to convert the raw data to SI units: in this case divide by 3, and multiply by 300mV.
Whenever you report an uncertain quantity, keep in mind that you are describing some sort of probability distribution.
Therefore it is important to report the form of the distribution, i.e. the family from which your distribution comes. For instance if the data is Gaussian and IID, you should say so, unless this is obvious from context. Only after the family is known does it make sense to report the parameters (such as position and halfwidth) that specify a particular member of the family.
On the other side of the same coin, people have a tendency to assume distributions are Gaussian and IID, even when there is no reasonable basis for such an assumption. Therefore if your data is known to be – or even suspected to be – non-Gaussian and/or non-IID, it is doubly important to point this out explicitly. See section 9.9 for more on this.
Consider the following example: Newton’s constant of universal gravitation, G, is known to about 100 ppm. The mass of the earth, M, is also known to about 100 ppm. So far so good. The tricky thing is that the product GM is known to about 2 parts per billion.
Now suppose you have a value of G and a value of M that are consistent in the sense that when multiplied together, they give the correct value of GM, accurate to 2 ppb. If you round off G and/or M to four or five digits, attempting to match the sig figs of each quantity to the uncertainty of that quantity, you will very seriously degrade the accuracy of the product GM.
One way to visualize this situation is shown in figure 4. (In general, you can always visualize a probability distribution in terms of a scatter plot). In the figure, the the abscissa represents G and its standard deviation is shown by the magenta bar. The ordinate represents M and its standard deviation is shown by the blue bar. The standard deviation of the product GM is shown by the yellow bar.
In this figure, the amount of correlation has been greatly de-emphasized for clarity. The uncertainty of the product is only six times less than the uncertainty of the raw variables. (This is in contrast to the real physics of mass and gravitation, where the uncertainty of the product is millions of times less than the uncertainty of the raw variables.)
In the case of a simple multi-dimensional Gaussian, the contours of constant probability are ellipses when we plot the probability as in figure 4. If the variables are highly correlated, the ellipses are highly elongated, and the axes of the ellipse nowhere near aligned with the axes of the plot. (Conversely, in the special case of uncorrelated variables, the axes of the ellipse are aligned with the axes of the plot, and the ellipse may or may not be highly elongated.)
This example serves to reinforce the rule that you should not round off unless you are sure it’s safe. Figuring out what’s safe and what’s not is often quite difficult.
One of the rare situations where rounding off might arguably be helpful concerns eyeball comparison of numbers. In particular, suppose we have the numbers
| (15) |
and we are sure that a half-percent variation in these numbers will never be significant. From that we conclude that on the first line there is no significant difference between a and b, while on the second line there is. It is easier to compare rounded-off numbers, since rounding makes the similarities and differences more immediately apparent to the eye:
| (16) |
However, roundoff to facilitate comparison is definitely not the best procedure. Rounding can get you into trouble, for example if 3.4997 gets rounded down to 3 and 3.5002 gets rounded up to 4, you can easily get a severely false mismatch. In other situations you can get a false match. So, again we see that significant figures are a convenient way of getting the wrong answer.
It is far more sensible to subtract the numbers at full precision, tabulate the results (as in equation 17), and then see whether the magnitude of the difference is smaller than some appropriate amount of “fuzz”.
| (17) |
If you are doing things by computer, computing the deltas is no harder than computing the rounded-off versions, and you should always write programs to display the deltas without rounding. (Here “delta” is shorthand for the difference b−a.) While you are at it, you might as well have the computer display a flag whenever the delta exceeds some configurable threshold.
Compared to equation 15 or even equation 16, the advantage goes to equation 17. It makes it incomparably less likely that important details will be overlooked.
Even if you are doing things by hand, you should consider calculating the deltas, especially if the numbers are going to be looked at more times than they are calculated. It is both easier and less error-prone to look for large-percentage variations in the deltas than to look for small-percentage variations in the original values.
The need for guard digits is intimately connected to the fact that uncertainty is not the same as insignificance. See section 3.3, section 11.4, and section 8 especially figure 7 in section 9.2.
As a first example, uncertain digits can be significant in signal averaging, as discussed in section 3.3.
As another example, guard digits are often necessary when there are correlated uncertainties. Such correlations are commonly encountered in situations where there is a small difference between large numbers. As an illustration, suppose the uncertain quantitites A and B represent two locations on the number line. Suppose we know that A = 30.050±3.0 km, while B = 30.040±3.0 km. As you can see, both of these quantities have roughly 10% uncertainty. If that were the whole story, you might feel justified in rounding them off quite a bit. However, in this scenario we have additional information, namely that the difference vector (A−B) known – somehow – to be 10±1 meters. We emphasize that the absolute uncertainty in (A−B) is on the order of meters, whereas the uncertainty in A or B separately is vastly greater, on the order of kilometers. In a situation like this, if you are storing the A and B positions in a database, you might very well decide to store them with ±1 meter precision or better, so that you can faithfully represent the information you have about (A−B).
I often get questions from people who are afraid there will be an outbreak of too many insignficant digits. A typical question is:
“What if a student divides distance by time and reports the result as 0.285714286 m/s? Isn’t that just wrong? In the absence of other information, it implies an uncertainty of 0.0000000005 m/s, which is a gross underestimate, isn’t it?”
My answer is always the same: No, there is no problem.
To me, that nine-digit number doesn’t imply anything about the uncertainty. Yes, I see nine digits, but no, that doesn’t tell me the uncertainty. The uncertainty might be much greater than one part in 109, or it might be much less. If the situation called for stating the uncertainty, I might fault the student for not doing so. But there are plenty of cases where the uncertainty is not yet known, and the only smart thing to do is to write down loads of guard digits.
Suppose we later discover the uncertainty was 10%. Then I interpret 0.285714286 as having eight guard digits. Is that a problem? I wish all my problems were as trivial as that.
If you think excess digits are a crime, we should make the punishment fit the crime. Let’s do the math:
My time is valuable. The amount of my time wasted by people who are worried about the “threat” of excess digits greatly exceeds the amount of my time wasted reading excess digits.
My advice: Breathe in. Breathe out. Relax already. Excess digits aren’t going to kill you.
In an introductory course, the most sensible approach is to adopt the following rules:
This is much simpler than dealing with sig figs. It also more honest. Reporting no information about the uncertainty is preferable to reporting wrong information about the uncertainty (which is what you get with sig figs).
If the students are “mathematically challenged” and even “reading challenged”, it is a safe bet that they are not doing multi-digit calculations longhand. And they probably aren’t using slide rules either. So let’s assume they are using calculators. Therefore the burden of keeping intermediate results to 6-digit precision or better (indeed much better) is negligible. It has the advantage of getting them in the habit of keeping plenty of guard digits.
Yes, some of those digits will be insignificant. So what? Extra digits will not actually kill anybody.
At some point in the course, we want the students to develop “some” feeling for uncertainty. So let’s do that. We can do it easily and correctly, using the crank-three-times method as described in section 3.4. (Apply it to selected problems now and then, not every problem.) It requires less sophistication and produces better results than anything involving sig figs.
Using sig figs is like trying to eat a bowlful of clear soup using a fork. It’s silly, especially since spoons are readily available. Even if somebody has a phobia about spoons, the fork is still silly; they’d be better off throwing it away and using no utensil at all.
In an introductory course, some students (especially the more thoughtful students) will be appalled by the crudity and unreliability of the sig figs doctrine, and will appreciate the value of guard digits.
On the other hand, there will also be some students (especially the more insecure students) for whom various psychological issues make it hard to appreciate the necessity for guard digits. These issues include the following:
This rule of barnyard ethology applies to some spheres of human activity, including lawyering, politics, and military combat. Never admit weakness, and never admit uncertainty.
However ... students need to realize that science is not like lawyering, or politics, or combat. Scientists do admit uncertainty. The surest way to be recognized as a non-scientist is to pretend to be certain when you’re not.
It may seem ironic or even paradoxical, but it is true: One of the most basic steps toward reducing uncertainty is to admit that there is some uncertainty. For more on this, see reference 9.
Being able to admit uncertainty requires some emotional maturity, some emotional security, some grownupness. This is an important part of why students go to school, to learn such things.
I have seen students go to great lengths to avoid having the slightest imperfection in their lab books. These students need to realize that real science involves approximation, including what we call successive refinement. That is, we first make a rough measurement, and then based on what we just learned, we make successively more refined measurements. If the first measurement were perfect, we wouldn’t need the later measurements. Learning is not a sin.
The goal is not to avoid all mistakes. Everybody makes mistakes. Students are expected to make more mistakes than professionals, but even professionals make mistakes. The goal is to (a) minimize the cost of the mistakes, and (b) learn from the mistakes. For example, real-world chemical engineers commonly build pilot plants, so they can learn from mistakes relatively cheaply, before they commit to building a multi-billion-dollar full-scale plant.
They need to realize that any plain number, with or without guard digits, is not the “true” answer. That’s because the object of attention is not a plain number, but rather a probability distribution. A single number drawn from the distribution will never do a good job of describing the distribution.
If it makes you feel better, first write down the width of the distribution, and then write down the nominal value. If the distribution has a half-width of ±7%, it doesn’t matter whether you write down 51, or 51.13, or 51.1394744. The fact that the trailing digits are uncertain doesn’t make these numbers untrue. They are perfectly respectable elements drawn from the distribution.
If you were to claim that any number such as 51, or 51.13, or 51.1394744 (with or without guard digits) represented an exact measurement, that would be wrong. So don’t pretend it’s exact. Say it has an uncertainty of ±7%. Once you’ve said that, you are free to write down as many guard digits as you like. (You need at least some uncertain digits, to guard against roundoff errors.)
The real world does not offer certainty. Students should not blame themselves for uncertainty, and should not blame the teacher. We live in an uncertain world. The goal is not to eliminate all uncertainty; the goal is to learn how to live in an uncertain world.
One of the crucial techniques for dealing with uncertainty is to represent things as distributions rather than as plain numbers.
There are two issues: writing sig figs, and reading sig figs.
If you ever feel you need to write something using sig figs, you should lie down until the feeling goes away. Figure out what you are trying to say, and find a better way of saying it. If you are going to express the uncertainty at all, express it separately and explicitly. See also section 4.9.
The rest of this section is devoted to reading sig figs. That is, suppose you are given a bunch of numbers and are required to interpret them as having significant digits.
If that’s all you have to go on, it is not necessary – and not possible – to take the situation seriously. If the authors had intended their uncertainties to be taken seriously, they would have encoded the data properly, not using significant digits.
Sometimes, though, you do have more information available.
One good strategy, if possible, is to simply ask the authors what they think the data means. If the data is from a book, there may be a statement somewhere in the book that says what rules the authors are playing by. Along similar lines, I have seen blueprints where explicit tolerance rules were stated in the legend of the blueprint: one example said that numbers with 1, 2, or 3 decimal places had a tolerance of ±0.001 inches, while numbers with 4 decimal places had a tolerance of ±0.0001 inches. That made sense.
Another possibility is to use your judgement as to how much uncertainty attaches to the given data. This judgement may be based on what you know about the source of the data. For instance, if you know that the data results from a counting process, you might decide that 1100 is an exact integer, even though the sig figs rules might tell you it had an uncertainty of ±50 or even ±500 or worse.
As a next-to-last resort, you can try the following procedure. We need to attribute some uncertainty to each of the given numbers. Since we don’t know which sect of the sig-digs cult to follow, we temporarily and hypothetically make the worst-case assumption, namely just shy of ten counts of uncertainty in the last place. For example, 1.23 becomes 1.23±0.099, on the theory that 1.23±0.10 would have been rounded to 1.2 according to the multi-count sect. (The multi-count sect is generally the worst case when you are decoding numbers that are already represented in sig-figs notation. Conversely, the half-count sect is generally the worst case when you are encoding numbers into the sig-figs representation, because it involves the greatest amount of destructive rounding.)
Now turn the crank. Do the calculation, using plenty of guard digits on the intermediate results. Propagate the uncertainty using the methods suggested in section 3.
Now there are two possibilities:
At some point you might well decide that the given data is inadequate for the purpose. Go back to Square One and obtain some better data.
I categorically decline to suggest an explicit convention as to what sig figs “should” mean. There are two reasons for this: First of all, the sectarian differences are too huge; anything I could say would be wildly wrong, one way or the other, according to one sect or another. Secondly, as previously mentioned, what’s safest when writing sig figs is not what’s safest when reading and trying to interpret sig figs. Last but not least, sig figs “should” not be used at all; I don’t want to say anything that could be misinterpreted as endorsing their use.
Spreadsheets are great. You need to analyze the data one way or another, so you might as well do it on a spreadsheet. This gives you a big bonus: you can do some “what-if” analysis. You don’t need to do a full-blown Monte Carlo analysis as in section 3.5; instead just wiggle a few of your data points to see how that affects the final answer. The same goes for other quantities such as calibration factors: find out how much of a perturbation is needed to significantly affect the final answer.
If good-sized changes in a data point have negligible effect on the final answer, it means you can relax a bit; you don’t need to drive yourself crazy measuring that data point to extreme precision. Conversely, if you find that smallish changes in a single data point have a major effect on the answer, it tells you that you’d better measure each such data point as accurately as you can, and/or you’d better take a huge amount of data (so you you can do some signal-averaging, as discussed in section 3.3). You can also consider upgrading the apparatus, perhaps using more accurate instruments, and/or redesigning the whole experiment to give you better leverage.
There is a lesson here about procedures: It is a really bad idea to take all your data and then do all your analysis. Take some data and do some analysis, so you can see whether you’re on the right track and so you can do the sensitivity analysis we just discussed. Then take some more data and do some more analysis. This is called on-line analysis.
This is so important that engineers commonly build pilot plants and/or carry out pilot programs, so they can learn what the real issues are before they commit to full-scale production.
You should also find ways to make internal consistency checks. If there are good theoretical reasons why the data should follow a certain functional form, see if it does. Exploit any sum rules or other constraints you can find. Make sure there is enough data to overconstrain the intended interpretation. By that I mean do not rely on two points to determine a straight line; use at least three and preferably a lot more than that, so that there will be some internal error checks. Similarly, if you are measuring something that is supposed to be a square, measure all four sides and both diagonals if you can. Measure the angles also if you can.
There are few hard-and-fast rules in this business. It involves tradeoffs. It involves judgement. You have to ask: What is the cost of taking more data points? What is the cost of making them more accurate? What is the cost of a given amount of uncertainty in the final answer?
Additional good advice can be found in reference 10.
In classroom settings, people often get the idea that the goal is to report an uncertainty that reflects the difference between the measured value and the “correct” value. That idea certainly doesn’t work in real life – if you knew the “correct” value you wouldn’t need to make measurements.
In all cases – in the classroom and in real life – you need to determine the uncertainty of your measurement by scrutinizing your measurement procedures and your analysis.
Given two quantities, you can judge how well they agree.
For example, we say the quantities 10±2 and 11±2 agree reasonably well. That is because there is considerable overlap between the probability distributions. It is more-or-less equivalent to say that the two distributions are reasonably consistent. As a counterexample, 10±.2 does not agree with 11±.2, because there is virtually no overlap between the distributions.
If your results disagree with well-established results, you should comment on this, but you must not fudge your data to improve the agreement. You must start by reporting your nominal value and your uncertainty independently of other people’s values. As an optional later step, you might also report a “unified” value resulting from combining your results with others, but this must be clearly labelled as such, and in no way relieves you of your responsibility to report your data “cleanly”. The reason for this is the same as before: There is always the possibility that the your value is better than the “established” value. You can tell whether they agree or not, but you cannot really tell which (if either) of them is correct.
Of course, if a beginner measures the charge of the electron and gets an answer that is wildly inconsistent with the established value, it is overwhelmingly likely that the beginner has made a mistake as to the value and/or the uncertainty. Be that as it may, the honorable way to proceed is to report the data “as is”, without fudging it. Disagreement with established results might motivate you to go back and scrutize the measurement process and the analysis, looking for errors. That is generally considered acceptable, and seems harmless, but actually it is somewhat risky, because it means that answers that agree with expectations will receive less scrutiny than answers that don’t.
The historical record contains bad examples as well as good examples. Sometimes people who could have made an important discovery talked themselves out of it by fudging their data to agree with expectations. But on other occasions people have done the right thing.
As J.W.S. Rayleigh put it in reference 11:
One’s instinct at first is to try to get rid of a discrepancy, but I believe that experience shows such an endeavour to be a mistake. What one ought to do is to magnify a small discrepancy with a view to finding out the explanation....
When Rayleigh found a tiny discrepancy in his own data on the molar mass of nitrogen, he did not cover it up. He called attention to it, magnified it, and clarified it. The discrepancy was real, and led to the discovery of argon, for which he won the Nobel Prize in 1904.
Whenver possible, raw data should be taken “blind”, i.e. by someone who doesn’t know what the expected answer is, to eliminate the temptation to fudge the data. This is often relatively easy to arrange, for instance by applying a scale factor or baseline-shift that is recorded in the lab book but not told to the observer.
Bottom line: Your data is your data. The other guy’s data is the other guy’s data. You should discuss whether your data agrees with the other guy’s data, but you should not fudge your data to improve the agreement.
You should not assume that all the world’s errors are due to imperfect measurements.
Consider the situation where we are measuring the properties of, say, a real spring. Not some fairy-tale ideal spring, but a real spring. It will exhibit some nonlinear force-versus-extension relationship.
Now suppose that we do a really good job of measuring this relationship. The data is reproducible within some ultra-tiny uncertainty. For all practical purposes, the data is exact.
Next, suppose we want to model this data. Modelling is an important scientific activity. We can model the data using a straight line. We can also model it using an Nth-order polynomial. No matter what we do, there will always be some “error”. This is an error in the model, not in the observed data. It will lead to errors in whatever predictions we make with the model.
Proper error analysis will tell us bounds on the errors of the predictions.
Is this an example of “if it doesn’t work, it’s physics”? No! An inexact prediction is often tremendously valuable. An approximate prediction is a lot better than no prediction.
I mention this because far too many intro-level science books seem to describe a fairy-tale axiomatic world where the theorists are always right and the experimentalists are always wrong. Phooey!
It is very important to realize that error analysis is not limited to hunting for errors in the data. In the above example, the data is essentially exact. The spring is not “at fault” for not adhering to Hooke’s so-called law. Instead, the reality is that Hooke’s law is imperfect, in that it does not fully model the complexities of real springs.
A huge part of real-world physics (and indeed a huge part of real life in general) depends on making approximations, which includes finding and using phenomenological relationships. The thing that sets the big leagues apart from the bush leagues is the ability to make controlled approximations.
Figure 5 presents exactly the same information as figure 2, but presents it in a different format. Figure 5 presents the cumulative probability. By that we mean the following: Let’s call the curve in figure 5 P(). Then at any point x (except where there is a discontinuity), the value P(x) is the probability that a given roll of the dice will come out to be less than x. More generally (even when there are discontinuities), the limit of P(x) as we approach x from below is the probability that the outcome will be less than x, and the limit as we approach x from above is the probability that the outcome will be less than or equal to x.
So for example, we see that there is zero probability that the outcome will be less than 2, and 100% probability that the outcome will be less than or equal to 12.
When studying whether one probabilistic process resembles another, you can sometimes get away with comparing the probability density distributions as in figure 2, but it is much safer to look at the cumulative probability as in figure 5. It is less noisy and all-in-all safer to count the observations less than x than to count the observations “at” or “near” x.
In particular, it is quite possible to come up with examples where two probability density distributions are nowhere equal, but the corresponding cumulative probabilities are very nearly equal, as nearly as you like.
For some reason, the probability density distribution (as in figure 2) is more intuitive, i.e. easier for most humans to interpret. In contrast, the cumulative representation (as in figure 5) is more suitable for formal mathematical analysis.
It is easy to convert from one representation to the other, by differentiating or integrating.
When dealing with sets or clusters of measurements, we must deal with several different probability distributions at once, which requires a modicum of care. The conventional terminology in this area is a mess, so I will use some colorful but nonstandard terminology.
This gives us two equivalent ways of forming a cluster: We can draw a cluster directly from V, or we can draw N particles from U and then group them to form a cluster.
Therefore:
See also the definition(s) of sample mean and sample standard deviation in section 7.4.
Linearity guarantees that µV will always be equal to µU. In contrast, the definition of σ is nonlinear, and σV will be smaller than σU by a factor of √N, where N is the number of particles per cluster. And thereby hangs a tale: all too commonly people talk about “the” standard deviation, and sometimes it is hard to figure out whether they are talking about σU or σV.
Given a single cluster consisting of N measurements, we can form an estimate (denoted µU′) of the center (µU) of the underlying distribution. In fact, for a well-behaved distribution, we can set µU′ = y = ⟨x⟩C, i.e. we can let the y-value of the cluster serve as our estimate of µU. Meanwhile, we can also form an estimate (σU′) of the width (σU) of the underlying distribution, as discussed below.
Given a group consisting of M clusters, we can form an estimate (µV′) of the center of the distribution of y-values. Similarly we can form an estimate (σV′) of the width of the distribution of y-values.
To say the same things more formally:
| (18) |
Among other things, we note the following:
Note: Commonly we use [x] as our σU′ i.e. our estimate of σU, using the [⋯] notation defined in section 7.4.
When you report the results of a cluster of measurements, you have a choice:
In either case, you should be very explicit about the choice you have made. If you just report 4.3 ± 2.1 it’s ambiguous, since [x] differs from [y] by a factor of √N, which creates the potential for huge errors.
The relationships among the quantities of interest are shown in figure 6.
Conceptually, [y] would manifest itself in connection with drawing multiple clusters from the distribution V. However, you have enough information within a single cluster to calculate [y]. Just divide [x] by √N.
For a given cluster of data:
⟨x⟩ aka y is our estimate of µU and also of µV.
[x] is our estimate of σU.
[y] = [x]/√N is our estimate of σV.
The field of statistics, like most fields, has its own terminology and jargon.
Here are some terms where the statistical meaning is ambiguous and/or differs from the vernacular meaning.
In statistics, sample mean refers to y = ⟨x⟩, i.e. the mean of a given sample i.e. a given cluster. This is a natural consequence of the definition of sample.
In contrast, the standard deviation of a distribution is unambiguous. That’s because [x] and [x]b converge in the large-sample limit, and we can draw and arbitrarily-large sample from the distribution.
If an event is a set with only one element, it is called a simple event; if it contains multiple elements, it is called a compound event.
To repeat: When dealing with “standard deviation” in connection with clusters (samples) of size N, there are at least six ideas in play:
| (19) |
For large N, note that the left-to-right variation is rather small within each row, but the row-to-row variation is huge.
Suppose we draw a sample consisting of N values of a variable X. The conventional definition of the sample mean is:
| ⟨X⟩ := |
|
| X (20) |
There are, alas, multiple inconsistent definitions for “the” standard deviation. The bias corrected sample standard deviation is:
| [X] := √( |
|
| (X−⟨X⟩)2) (21) |
where my square-bracket notation ([X]) is nonstandard, but helpful since the standard notation suffers from ugliness and inconsistency.
Note the appearance of N−1 in the denominator in equation 21. This appears because we hope to use [X] as an unbiased estimator of the width of the distribution from which the values were drawn.
To understand what is going on, contrast this with the uncorrected sample standard deviation:
| [X]b := √( |
|
| (X−⟨X⟩)2) (22) |
and consider the case of a single observation, i.e. N=1. The value given equation 22 is automatically zero, because every observation is equal to the sample mean. This is a perfectly well behaved number, but it is not a very good estimator of the width of the underlying distribution.
In contrast, if we use equation 21, we find that when N=1, [X] = 0/0, which is an indeterminate form, which correctly tells us that we cannot estimate the width of the distribution from a single sample.
For all N≥1, equation 21 is an unbiased estimator of the width of the distribution. (We continue to assume the distribution is Gaussian.)
Meanwhile, we have the quantity
| [X]d := √( |
|
| (X−µ)2) (23) |
where µ is the mean of the distribution (not to be confused with ⟨x⟩ which is the mean of the sample). Equation 23 is useful in cases where you have reliable knowledge of µ, so that you don’t need to use ⟨x⟩ as an estimator for µ. In such cases [x]d is an unbiased estimator of the width of the distribution.
Remember, to construct an unbiased estimator of the width: If you need to estimate the mean, you need N−1 in the denominator (equation 21) whereas if you already have a non-estimated value for the mean, you need N in the denominator (equation 23).
The modern approach is to use uncertainty as a catch-all term. I recommend this approach. Sometimes it is useful to separate out various contributions to the overall uncertainty ... and sometimes not.
A few common sources of uncertainty include:
The first three items on this list are often present in real-world measurements, sometimes to a nontrivial and irreducible degree. In contrast, the last two items are equally applicable to purely theoretical quantities and to experimentally measured quantities.
Neither readability nor roundoff error are usually considered “irreducible” sources of experimental error, since they can usually be reduced by redesigning the experiment.
As an example of statistical fluctuations, suppose you have a tray containing 1000 coins. You randomize the coins, and count how many “heads” turn up. Suppose the first time you do the experiment, you observe x1 = 511, the second time you observe x2 = 493, et cetera.
There are several points we can make about this. First of all, there is no uncertainty of measurement associated with the individual observations x1, x2, etc. after they have been carried out. These are exact counts. On the other hand, if you want to describe the entire distribution X = {xi} from which such outcomes are drawn, it has some mean and some standard deviation. Similarly if you want to predict the outcome of the next observation, there will be some uncertainty. For fair coins, we expect x = 500±16 based on theory, so this is not necessarily an “experimental” uncertainty, unless you want to consider it a Gedanken-experimental uncertainty. If you do the actual experiment with actual coins, then experimental uncertainty would be the correct terminology.
See section 9.7 for more on this.
In some contexts (particularly in electronics), the statistical fluctuations of a counting process go by the name of shot noise.
As an example of roundoff error unrelated to measurement error, consider rounding off the value of π or the value of 1/81:
| (24) |
| (25) |
The point is that neither π nor 1/81 has any uncertainty of measurement. In principle they are known exactly, yet when we express them as a decimal numeral there is always some degree of roundoff error. This error is not statistical and is 100% reproducible, in the sense that whenever you round off π to five decimal places you get 3.14159 every time.
Consider the celebrated series expansion
| (26) |
This is a power series, in powers of x. (Digression: In a certain weak sense, equation 24 and equation 25 can also be considered power series, in powers of 10, since for each digit the place-value is tenfold smaller than for the previous digit.)
There are many situations in science where it is necessary to use a truncated series, perhaps because the higher order terms are unknown in principle, or simply because it would be prohibitively expensive to evaluate them. Such situations arise in mathematical analysis and in numerical simulations.
Every time you use a truncated series you introduce some error into the calculation. In an iterative calculation, such errors can add up, and can easily reach troublesome levels.
In Appendix D of TN1297 (reference 3) you can find a discussion of some commonly-encountered terms for various contributions to the overall uncertainty, and various related notions. I will now say a few words say about some of these terms.
A tolerance serves somewhat as the mirror image of uncertainty of measurement. Tolerances commonly appear in recipes, blueprints, and other specifications. They are used to specify the properties of some manufactured (or about-to-be manufactured) object. Each number on the specification will have some stated tolerance; for example in the expression e.g. 5.000 ± .003 the tolerance is ± .003. The corresponding property of the finished object is required to be within the stated tolerance-band; in this example, greater than 4.997 and less than 5.003.
The idea of tolerance applies to a process of going from numbers to objects. This is the mirror image of a typical scientific observation, which goes from objects to numbers.
The notation is somewhat ambiguous, since tolerance is expressed using exactly the same notation as used to express the uncertainty of a measurement. Same notation, different concept. We speak of the tolerance of a specification, the uncertainty of an observation. (The distinction is not entirely clear-cut, because sometimes tolerance in the specification might result in uncertainty about the properties of the finished object.)
Significance is related to uncertainty, but definitely not the same thing. The significance of data depends not just on the data but also on what you intend to do with the data. Value judgements are involved. For example, if I buy a pound of beans, it may contain a great number of small beans, or a lesser number of larger beans. If desired, I could determine the number of beans with essentialy zero uncertainty, simply by counting. However, if I just intend to cook and eat the beans, the cost of counting them exceeds the value of knowing the count. The total mass is more significant than the count (unless the count is wildly large or wildly small).
In summary: according to one definition, we say data is significant if it is worth knowing.
Formerly it was fashionable to use the term “significance” as a general-purpose antonym for uncertainty. Nowadays experts generally avoid the term “significance” and instead concentrate on quantifying the uncertainty.
Here’s how it often works in practice: Before attempting to measure something, you ought to identify one or two significant applications of the data ... not all applications, just enough to convince yourself that the measurement will be worth doing. That is, you establish a rough lower bound on the significance of the data.
After the measurement has been made, you need not (and probably should not) say much about the significance. It’s not up to you to judge the significance; the people who use your data will decide for themselves what’s significant for each particular application.
This explains why in (say) a compendium of fundamental constants, there is much discussion of uncertainty but almost no mention of significance.
When only a single number is involved, and only a single application, it is sometimes tempting to arrange things so that the uncertainty is well matched to the inverse of the significance. However, that is nowhere near being a reliable general rule. It is a common mistake to confuse significance with certainty ... but commonness does not make it any less of a mistake. As a first counterexample, as mentioned above, “bean counting” is proverbial for having low significance, despite its low uncertainty. At the other extreme, highly uncertain data may be highly significant; the significance can sometimes be made manifest by signal averaging or other data-reduction techniques, as exemplified in section 3.3. Various combinations are summarized in figure 7.
Nowadays experts generally avoid using the term “precision” except in a vague, not-very-technical sense, and concentrate instead on quantifying the uncertainty.
Multiple conflicting meanings of “precision” can be found in the literature.
One rather common meaning corresponds roughly to “an empirical estimate of the uncertainty”. That is, suppose we have a set of data that is empirically well described by a probability distribution with a half-width of 0.001; we say that data has a precision of 0.001. Alas that turns the commonsense meaning of precision on its head; it would be more logical to call the half-width the imprecision, because a narrow distribution is more precise.
For more discussion of empirical estimates of uncertainty, see section 9.7.
It is amusing to note that Appendix D of TN1297 (reference 3) pointedly declines to say what precision is, “because of the many definitions that exist for this word”. Apparently “precision” cannot be defined precisely.
Similarly, it says that accuracy is a “qualitative concept”. Apparently “accuracy” cannot be defined accurately.
This is particularly amusing because non-experts commonly make a big fuss about the distinction between accuracy and precision. A better strategy is to talk about the actual uncertainty versus an empirical estimate of the uncertainty, as discussed in section 9.7.
The term “accuracy” suffers from multiple inconsistent definitions.
One of the most-common meanings is as a general-purpose antonym for uncertainty. Nowadays experts by-and-large use “accuracy” only in an informal sense. For careful work, they focus on quantifying the uncertainty. For more on this, see section 9.7.
On a digital instrument, there are only so-many digits. That introduces some irreducible amount of roundoff error into the reading. This is one contribution to the uncertainty.
A burette is commonly used as an almost-digital instrument, because of the discreteness of the drops. Drop formation introduces quantization error.
On an analog instrument, sometimes you have the opportunity to interpolate between the smallest graduations on the scale. This reduces the roundoff error, but introduces other types of uncertainty, due to the vagaries of human perception. You also have to ask whether you should just replace it with an instrument with finer graduations.
As another example, suppose you are determining the endpoint of a titration by watching a color-change. This suffers from the vagaries of human perception. Often, determining the color-change point is the dominant source of uncertainty; interpolating between graduations on the burette won’t help, and using a more finely graduated burette won’t help. In this case, if more resolution is needed, you might consider using a photometer to quantify the color change, and if necessary use curve fitting to make best use of the photometer data.
The ultimate limit – the fundamental limit – to readability is noise. If the reading is hopping around all over the place, roundoff error is not the dominant contribution to the noise. Interpolating and/or using a finer scale won’t help.
Roughly speaking, errors can be classified as follows:
| Non-systematic errors are random, with a well-behaved distribution, and will average out if you take enough data. | Systematic errors don’t average out. |
| This classification leaves open a nasty gray area when there are random errors that don’t average out, as discussed below. This is a longstanding problem with the terminology, and with the underlying concepts. |
For example: Suppose you measure something using an instrument that is miscalibrated, and the miscalibration is large compared to the empirical scatter that you see in your readings. As far as anybody can tell, today, your results are reproducible, because there is no scatter in the data … yet next month we may learn that your colleagues – using a different instrument – are not able to reproduce your results.
So the question is, how do we describe this situation? The fundamental issue is that there are multiple contributions to the uncertainty. As usual, it should be possible to describe this in statistical terms.
We are in some formal sense “uncertain” as to how well your instrument is calibrated, and we would like to quantify that uncertainty. There is, at least in theory, an ensemble of instruments, some of which are calibrated, and some of which are miscalibrated in various ways, with a horribly abnormal distribution of errors. Your instrument represents an example drawn from this ensemble. Since you have drawn only one example, you have no empirical way of estimating the properties of this ensemble. So we’ve got a nasty problem. There is no convenient empirical method for quantifying how much overall uncertainty attaches to your results.
When we take a larger view, the situation becomes slightly clearer. Your colleagues have drawn additional examples from the ensemble of instruments, so there might be a chance of empirically estimating the distribution of miscalibrations.
However, the empirical approach will never be entirely satisfactory, because even including the colleagues, a too-small sample has been drawn from the ensemble of instruments. If there is any nontrivial chance that your instrument is significantly miscalibrated, you should recalibrate it against a primary standard, or against some more-reliable secondary standard. For instance, if you are worried that your meter stick isn’t really 1m long, take it to a machine shop. Nowadays they have laser interferometers on the beds of the milling machines, so you can reduce the uncertainty about your stick far beyond what is needed for typical purposes.
The smart way to proceed is to develop a good estimate of the reliability of the instrument, based on considerations such as how the instrument is constructed, whether two instruments are likely to fail in the same way, et cetera. This requires thought and effort, far beyond a simple histogram or scatter-plot of the data.
Also keep in mind that sometimes it is possible to redesign the whole experiment to measure a dimensionless ratio, so that calibration factors drop out. As a famous example, the ratio of (moon mass)/(earth mass) is known vastly better than either mass separately. (The uncertainty of any measurement of either individual mass would be dominated by the uncertainty in Newton’s constant of universal gravitation.)
It is possible to make an empirical measurement of the scatter in your data, perhaps by making a histogram of your data and measuring the width. However, the point remains that this provides only a lower bound on the true uncertainty of your results. This may be a tight lower bound, or it may be a serious underestimate of the true uncertainty. You can get into trouble if there are uncontrolled variables that don’t show up in the histogram. This can happen if you have inadvertently drawn a too-small sample of some variables.
Also beware that although many types of random error will average out if you take enough data, other types will not. Consider the contrast:
| If your measuring instrument has an offset, and the offset is undergoing an unbiased random walk, then we can invoke the central limit theorem to convince ourselves that the average of many measurements will converge to the right answer. | If the offset in your measuring instrument is undergoing a biased random walk, there will be an overall rate of drift, and the longer you sit there taking measurements the more the drift will accumulate. As another example, if the offset suffers from noise with a badly-behaved distribution, such as 1/f noise (“pink noise”), then the offset will never average out. (The statement of the central limit theorem has some important provisos, which are not satisfied in the case of 1/f noise.) |
Given any set of data, we can calculate the standard deviation of that data, as mentioned in section 9.3. This is a completely cut-and-dried mathematical operation on the empirical data. It gives a measure of the scatter in the data.
Things become much less clear when we try to make predictions based on the observed scatter. It would be nice if we could predict how well our data will agree with future measurements of the same quantity ... but this is not always possible, and is never cut-and-dried, because there may be sources of uncertainty that don’t show up in the scatter.
Note that what we have been calling “scatter” is conventionally called the “statistical” uncertainty. Alas, that is something of a misnomer, for the simple reason that virtually anything can be considered “statistical” in some sense. (Even absolute truth is statistical, with 100% probability of correctness, and falsity is statistical, with 0% probability of correctness.)It might be slightly better to call it an empirical estimate or even better an internal estimate of the uncertainty. The informal term scatter is as good as any.
Niels Bohr once said "Never express yourself more clearly than you are able to think”. By that argument, it is not worth coming up with a super-precise name for the concept we are discussing, because it is not a super-precise concept. It depends on the details of how the experiment is done. If one group uses an ensemble of voltmeters, and a second group uses only a single voltmeter, calibration errors will show up as scatter in the first group’s results but not in the second’s.
An oversimplified view of the relationship between scatter and systematic error is presented in figure 8. In all four parts of the figure, the black data points are essentially the same, except for scaling and/or shifting. Specifically: In the bottom row the spacing between points is 3X larger than the spacing in the top row, and in the right-hand column the pattern is off-center, i.e. shifted to the right relative to where it was in the left-hand column.
The data is a 300-point sample drawn from a two-dimensional Gaussian distribution. That is, the density of points falls of exponentially as a function of the square of the distance from the center of the pattern.
Figure 8 is misleading because it suggests that you can with one glance estimate how much the centroid suffers from systematic error. In contrast, in the real world, it is very very hard to get a decent estimate of this. You can’t tell at a glance how far the data is from the target, because you don’t know where the target is. (If you knew the location of the target, you wouldn’t have needed to take data.) The real-world situation is more like figure 9.
Remark: Terminology: Sometimes people use the word “precision” to mean the lack of scatter, and use the word “accuracy” to mean, roughly speaking, the lack of systematic error of the centroid. It is, alas, hard to quantify these terms, as discussed in section 9.3 and section 9.4.
Here’s another issue: Sometimes people imagine there is a clean dichotomy between precision and accuracy, or between scatter and systematic error ... but this is not right. Scatter is not the antonym or the alternative to systematic error. There can perfectly well be systematic errors in the scatter!
In particular, moving left-to-right in figure 8 illustrates a systematic offset of the centroid. In contrast, moving top-to-bottom in figure 8 illustrates a systematic 3x increase of the standard deviation.
Here’s how such issues can arise in practice: Suppose you want to measure the Brownian motion of a small particle. If the raw data is position, then the mean position is meaningless and the scatter in the data tells you everything you need to know. If you inadvertently use a 10x microscope when you think you are using a 30x microscope, that will systematically decrease the scatter by a factor of 3.
As another example in the same vein, imagine you want to measure the noise figure of a radio-frequency preamplifier. The raw data is voltage. The mean of the data is meaningless, and is zero by construction in an AC-coupled amplifier. The scatter in the data tells you everything you need to know.
On the other hand, in the last two examples, it might be more practical to shift attention away from the raw data to a slightly cooked (“parboiled”) representation of the data. In the Brownian motion experiment, let the parboiled data be the diffusion constant, i.e. the slope of the curve when you plot the square of the distance traveled versus time. Then we can talk about the mean and standard deviation of the measured diffusion constant.
Here’s a two-part constructive suggestion:
Scatter is one contribution to our uncertainty about the nominal value. The measured scatter provides a lower bound on the uncertainty. It tells you nothing about possible systematic offsets of the nominal value, and tells you nothing about possible systematic errors in the amount of scatter itself (as in the microscope example above).
When reporting the uncertainty, what really matters is the total, overall uncertainty. Breaking it down into separate contributions (scatter, systematic error, or whatever) is often convenient, but is not a fundamental requirement.
Quantifying the scatter is easy ... much easier than estimating the systematic errors in the mean and standard deviation. Do your best to estimate the total, overall uncertainty.
In an introductory class, students may not have the time, resources, or skill required to do a meaningful investigation of possible systematic errors. This naturally leads to an emphasis on analyzing the scatter ... but this emphasis should not become an overemphasis. Remember, the scatter is a lower bound on the uncertainty, and should be reported as such. There is nothing wrong with saying “We observed σX to be such-and-such. This provides a lower bound on the uncertainty of ⟨X⟩. There was no investigation of possible systematic errors”.
Remark: Notation: Sometimes you see a measurement reported using an expression of the form A±B±C, where A is the nominal value, B is the observed scatter, and C is an estimate of the systematic error of the centroid. This notation is not very well established, so if you’re going to use it you should be careful to explain what you mean by it.
The title of this section is in scare quotes, because you should be very wary of using the term “experimental error”. The term has a couple of different meanings, which would be bad enough ... but then each meaning has problems of its own.
By way of background, note that the word “error” has the same ancient roots as the word “errand” or “knight errant”, referring to wanderings and excursions, including ordinary normal excursions. However, for thousands of years, the word “error” has also denoted faults, mistakes, or even deceptions, which are all undesirable, i.e. reprehensible things that “should” have been avoided.
Sometimes the term “experimental error” is applied to unavoidable statistical fluctuations, and sometimes it is applied to avoidable mistakes and blunders. These two meanings are dramatically different. They are both problematic, but for different reasons:
Consider the contrast:
| Negative example: Saying “our result differs from the accepted value by 15% due to experimental error” is not a explanation. Often graders, reviewers, and/or editors will automatically reject a report that contains such a statement. | In contrast, you might get away with using “Experimental Error” as the headline of a section in which the specific sources of error were analyzed. Even that is not recommended; a better headline would be “Sources of Uncertainty” or some such. |
Beware that when you have more than one uncertain quantity, you cannot rely on the notion of giving each quantity a “nominal value” with some “uncertainty”. The problem is that if you have a multi-dimensional probability distribution, there will almost always be correlations, in which case you cannot describe the distribution using two numbers per dimension, even if we restrict attention to Gaussian normal distributions … except special cases as discussed below.
An example of correlated data is shown in figure 4 as discussed in section 4.5.
More generally, in D dimensions, a Gaussian can be described using a vector with D components (to describe the center of the distribution) plus a symmetric D×D matrix (to describe the uncertainties). That means you need, in general, D+D(D+1)/2 numbers to describe the Gaussian. In the special case where the uncertainties are all uncorrelated, the matrix is diagonal, so you only need 2D numbers to describe the whole Gaussian, and we recover the simple description in terms of “nominal value ± uncertainty” for each dimension separately. Obviously D=1 is a sub-case of the uncorrelated case.
In the real world, sometimes the uncertainties are uncorrelated, but sometimes they are not. See section 3.5 and section 4.5 for examples where correlations must be taken into account. See section 3.5 for an example of how you can handle correlated data.
Also, beware that not everything is Gaussian. Other distributions – including square, triangular, and Lorentzian among others – can be described using using two parameters, and represented using the “value” ± “uncertainty” notation. More-complicated distributions may require more than two parameters.
If you know that your data has correlations or has a non-normal distribution, be sure to say so explicitly.
Mathematicians define probability in terms of measure theory. I realize most people don’t know much about measure theory, but it doesn’t take very long to do it from scratch. Here are the crucial axioms:
In principle, that’s all there is to it.
Remark: By definition, a probability measure is bounded above, bounded by “some” constant. It is conventional and usually convenient to rescale everything so that the bounding constant is unity ... but this is not mandatory, and not always convenient.
Amusing story: I once coauthored an optical character recognition (OCR) program that used a lot of machine learning. Most of the code was devoted to calculating probabilities. The first version of the program did not produce satisfactory results.
After thinking about the measure-theoretic foundations of what we were trying to do, I decided to remove one line of code, namely a line that normalized the probabilities early in the calculation. This created a new version of the program. It did all the same calculations, except that now the probabilities were un-normalized. The probabilities were positive, properly additive, and bounded ... but not scaled to unity. Very late in the calculation we were able to construct a normalized probability, but all the steps leading up to that point – thousands of lines of code – relied on unnormalized probabilities. A lot of smart people had tried to do the job using conventional normalized probabilities, but they never got it to work right, and I doubt it is possible.
In industry, programmer productivity is measured by the number of lines of code written per year. On the OCR project, my most important contribution consisted of minus one lines of code.
In the short term, I don’t much care whether you fully understand the measure-theoretic definition of probability.
In the short term, I am content to make the point that mathematicians have a very precise, very general, very powerful notion of probability. It is not necessarily associated with any notion of randomness! You should not think that probability is sloppy, or that probability is what we do when we don’t know exactly what is going on. Measure theory forms the foundation of a great deal of modern mathematics, including (among other things) the modern definition of what an integral is. I’m talking about serious, industrial-strength rigor and formality.
We can use figure 10 to help get a handle on the main ideas. In this example, we choose to measure sets according to their area. The disk as a whole represents the universal set, and has 100% of the measure. The blue sector has 25% of the measure, while the red region has the other 75% of the measure. We can equally well say that the blue region has 25% of the probability, while the red region has the other 75% of the probability.
This measure-theoretic notion of probability does not require any notion of randomness. On the other hand, if we choose, we may dress it up with a statistical interpretation, as follows: Treat the disk as a dart board, and throw darts at it randomly. When a dart hits the board, ideally there is a 25% chance that hits the blue region, and a 75% chance that it hits the red region.
The notion of additivity in the axioms of probability measure corresponds to the statistical notion that the probability of independent events is additive. That is, the probability of “A or B” is the probability of A plus the probability of B, provided A and B are statistically independent.
By way of background, note that for most of the last 2000 years, people thought that Eulidean geometry was “the” geometry of the universe. Euclidean geometry was elegant and mathematically precise, and most people were content to leave it at that. Then in the mid-1800s there finally emerged a clear understanding of non-Euclidean geometries. In 1908 (reference 14) it became clear that the geometry of the universe was in fact non-Euclidean. What’s more, in 1915 (reference 15) it became clear that the local geometry in any given part of the universe cannot be determined by abstract mathematical postulates alone. The actual geometry is determined by physics. It must be measured, and/or calculated based on other measured quantities.
A similar situation exists with respect to probability. According to one school of thought, there exists an intuitive notion of probability predicated on “the” state of nature, i.e. “the” probability distribution that describes the natural universe. This notion is in some ways very powerful, but alas it has some troublesome limitations. It is better to proceed as follows: Just as there are innumerable non-Euclidean geometries, there are innumerable probability measures. Anything that satisfies the axiomatic definition of probability measure should be considered a probability measure.
In particular, suppose you have a sample of data drawn from some parent distribution. That data is a probability distribution unto itself. You can represent it using a histogram (as in figure 2) or as a pie chart (as in figure 10) or whatever. Remember that according to the axioms, probability does not have to be random or statistical; it is just a way of assigning measure to things. Loosely speaking, it is just a way of dividing the pie (as in figure 10). Your sample, once drawn, has no remaining randomness. The data points sit where they sit, immutable forevermore.
If you draw another sample, it constitutes another probability distribution, another way of dividing the pie. We get to ask to what degree these two distributions resemble each other, and/or resemble the parent distribution. In contrast, you should not think of the parent distribution as being “the” only probability distribution, or think of the samples as being somehow not quite real probability distributions. In fact they are perfectly good distributions, they’re just not the same distribution.
Curve fitting and similar forms of data analysis are best considered a search throught the space of probability distributions, trying to find a model-distribution that best agrees with your data-distribution (and with other information you have about the task). The same goes for much of machine learning, as in the OCR example mentioned above.
See section 12 for a discussion of the mathematical notion of place value and significance.
As discussed in section 2, there is an important distinction between a distribution and an observation drawn from that distribution. An expression of the form 12.3±0.5 clearly refers to a distribution. One problem with the whole idea of significant figures is that in an expression such as x=12.3, you can’t tell whether it is meant to describe a particular observation or an entire distribution over observations.
A chemistry teacher once asked 1000 colleagues the following question:
Consider an experiment to determine the density of some material: mass = 10.065 g and volume = 9.95 mL Should the answer be reported as 1.01 g/mL or 1.011 g/mL?
Soon another teacher replied
Maybe I missed something, that’s a very straightforward problem. The answer should be reported as 1.01 g/mL.
The claim was that since one of the givens is only known to three sig figs, the answer should be reported with only three sig figs, strictly according to the sig-figs rules.
Shortly thereafter, a third teacher chimed in, disagreeing with the previous answers and saying that the answer should be reported as 1.011 g/mL. The argument started by asserting that the aforementioned digit-counting rules were «simplistic» and should be discarded in favor of calculations involving relative uncertainty. The final answer, however, was expressed in terms of sig figs.
Eventually a fourth teacher pointed out that if you do the math carefully, you find that 1.012 is a better answer than either of the choices offered in the original question.
Remarkably, none of these responses attached an explicit uncertainty to the answer. So we don’t know whether 1.01 means 1.01(½) or 1.01(5). It’s ambiguous.
At this point you may be wondering whether this ambiguity is the whole problem. Perhaps we should accept all three answers – 1.01(½), 1.011(5), and 1.012(5) – since they are all close together, within the stated error bars.
Well, sorry, that doesn’t solve the problem. First of all, the ambiguity is a problem unto itself, and secondly there is a deeper problem that should not be swept under the rug of ambiguity.
The deeper problem is that if you solve the problem properly – for instance using the “crank three times” method as described in section 3.4 – you find it might be reasonable to report a density of 1.0116(5) g/mL. (In this paragraph, and the next few paragraphs, we assume the mass and density started out with a half-count of uncertainty, such as might result from roundoff.)
Figure 11 is a histogram of the probability distributions. It uses the same principles as figure 3 does. You can see at a glance that the answer based on the sig figs rules, namely 1.01(½), bears hardly any resemblance to the correct answer. The approximately-similar answers of 1.01(½), 1.011(5) and 1.012(5) are all equally terrible, so we see that appealing to ambiguity does not even begin to solve the problem.
The second answer that was offered was 1.011. If we are generous and interpret that as 1.011(½), it’s not completely crazy, but it’s not very good, either. It has less than 50% overlap with the correct answer, as you can see in figure 11. (If we are ungenerous and interpret it as 1.011(5), it’s terrible, as previously discussed.)
The third answer (namely 1.012) is somewhat better. It is not shown in the figure. If we are generous and interpret it as 1.012(½), it has more than 50% overlap with the correct answer, but still considerably less than 100% agreement. That is, it has been noticeably degraded by roundoff errors.
Therefore it is much better to report 1.0116(5). This complies with the recommendations in section 4.2: it uses few enough digits to be reasonably convenient, it uses many enough digits to keep the roundoff errors from causing problems, and it states the uncertainty separately and explicitly.
To make the discussion more complete, we now switch assumptions and assume that the given mass and volume started out with five counts of uncertainty in the last decimal place, such as might result from the sensible laboratory practice of recording all the certain digits plus one estimated digit.
Under the new assumptions, 1.012(5) is the best answer, although 1.011(5) and even 1.01(½) are entirely acceptable. All three answers agree within the stated error bars.
Note that no matter which assumption you make, it is hard to justify the unadorned answer 1.01 – which is the answer that comes directly from applying the sig figs rules. If the givens have a half-count of uncertainty, 1.01 does not have a half count of uncertainty ... and if the givens have a few counts of uncertainty, 1.01 does not have a few counts of uncertainty. There is no way that 1.01 can be a good answer if you want the rules to be self-consistent.
Recall that uncertainty is not the same as insignificance; see section 3.3, section 4.7, and section 8 especially figure 7 in section 9.2.
The usual “sig figs rules” cause you to round things off far too much. If possible, do not round intermediate results at all. If you must round, keep at least one guard digit.
As an illustration of the harm that “sig figs” can cause, let’s re-do the calculation in section 3.8. The only difference is that when we compute the quotient, 11.5136, we round it to two digits ... since after all it was the result of an operation involving a two-digit number. That gives us 12, from which we subtract 9.064 to obtain the final “result” ... either 2.9 or 3. Unfortunately neither of these results is correct. Not even close.
Oddly enough, folks who believe in significant digits typically use them to represent uncertainty. Hmmmm. If they use significant digits to represent uncertainty, what kind of digits do they use to represent significance?
The “sig-digs rules” were never claimed to be precise. They are claimed to give a “rough” idea of the uncertainty of a quantity.
The problem is, this rough idea is very, very rough. Even ignoring the sectarian differences discussed in section 11.6, the “sig-digs rules” convey at best only a range of uncertainties. The top of the range has ten times more uncertainty than the bottom of the range. If you draw the graph of two distributions, one of which is tenfold lower and tenfold broader than the other, you will see that they don’t resemble each other at all. They are radically different distributions. Compare figure 3.
If you do your work even moderately carefully, you will know your uncertainty much more precisely than that, and you will need a way of expressing it. So don’t use significant figures. Instead, follow the guidelines in section 4.2.
Within the sig-digs cult, there are sects that hold mutually-incompatible beliefs. There is no consensus. You cannot get a group of teachers to agree within an order of magnitude what “significant figures” mean.
That makes a certain amount of sense when you are recording readings from laboratory apparatus and instruments. The point is that you want the quantization error (i.e. roundoff error) to be smaller than the the intrinsic uncertainty of the instrument. You want the uncertainty of the recorded reading to be dominated by the intrinsic uncertainty of the instrument, and not needlessly increased by rounding.
As is always the case with any form of significant digits, we run into trouble because of the coarseness of the encoding; it is impossible to know by looking at the number how much uncertainty there is in the last digit.
Things get even worse when we consider calculated (rather than observed) numbers. For example, consider the quantity 5.123(9). Nine counts of uncertainty in the third decimal place not only makes the third place uncertain, it makes the second place “somewhat” uncertain. There is no logical basis for deciding how much uncertainty is “too much”, i.e. deciding when to drop a digit.
For present purposes, let’s assume that this sect puts the cutoff just shy of ten counts, so that 1.234(9) will be expressed as 1.234, while 1.234(10) will be rounded to 1.23. (We ignore sub-sects that put the cutoff elsewhere.)
This sect has the advantage, relatively speaking, of requiring less rounding than the other sects mentioned below ... but in absolute terms it still requires too much rounding. It can seriously degrade your data, as discussed in section 3.3.
This rule actually makes sense provided you know that the quantity has been rounded off, and that roundoff error is the dominant contribution to the uncertainty.
On the other hand, there are innumerable important situations where roundoff should not the dominant contribution, in which case this is the worst of all the sects. It is the most destructive, because it demands the most rounding. It demands an order of magnitude more rounding than the few-count sect. It basically forces you to keep rounding off until the roundoff error becomes a large contribution to the uncertainty.
Let’s try applying these “rules” and see what happens. Some examples are shown in the following table.
| 0.10 | 0.99 | |||
| multi-count sect: | 0.100(10) ⋯ 0.100(99) | 0.990(10) ⋯ 0.990(99) | ||
| percent sect: | 0.100(1) ⋯ 0.100(10) | 0.990(10) ⋯ 0.990(99) | ||
| half-count sect: | 0.100(5) | 0.990(5) | ||
| overall range: | 0.100(1) ⋯ 0.100(99) | 0.990(5) ⋯ 0.990(99) | ||
Let’s consider 0.10, as shown in the table. If we interpret 0.10 according to the multi-count sect’s rules, we get something in the range 0.100(10) to 0.100(99). Ouch! These two interpretations don’t even overlap. Meanwhile, if we interpret that according to the percent-sect’s rules, we get something in the range 0.100(1) to 0.100(10). The half-count sect interprets 0.10 as 0.100(5), which is near the middle of the range favored by the percent-sect.
Next, let’s consider 0.99. If we interpret 0.99 according to the multi-count sect’s rules, we get something in the range 0.990(10) to 0.990(99). Meanwhile, if we interpret it according to the percent sect’s rules and convert to professional notation, we get something in the range 0.990(10) to 0.990(99). So these two sects agree on the interpretation of this number. However, the half-count sect interprets 0.99 as 0.990(5), which is somewhere between 2x and 20x less uncertainty than the other sects would have you believe.
As shown in the bottom row of the table, when we take sectarian differences into account, there can be two orders of magnitude of vagueness as to what a particular number represents. If you draw the graph of two distributions, one of which is a hundredfold lower and a hundredfold broader than the other, the difference is shocking. It’s outrageous. You cannot possibly consider one to be a useful approximation to the other.
Let’s look again at the example of the six-sided die, as depicted in figure 1. The number of spots can be described by the expression x=3.5±2.5. There is just no good way to express this using significant figures. If you write x=3.5, those who believe in sig figs will interpret that as perhaps x=3.5(½) or x=3.5(5) or somewhere in between … all of which greatly understate the width of the distribution. If you round off to x=3, that would significantly misstate the center of the distribution.
As a second example, let’s look again at the result calculated in section 3.8, namely 2.4(8). Trying to express this quantity using sig digs would be a nightmare. If you write it as 2.4 and let the reader try to infer how much uncertainty there is, the most basic notions of consistency would suggest that this number has about the same amount of uncertainty as the two-digit number in the statement of the problem ... but in fact it has a great deal more, by a ratio of about eight to three. That is, any consistently-applied sig-digs rule understates the uncertainty of this expression. The right answer is about 260% of the “sig-figs answer”.
Note that the result 2.4(8) has eight counts of uncertainty in the last digit. Another way of saying the same thing is that there is 32% relative uncertainty. That’s so much uncertainty that if you adhere to the percent-sect (as defined in section 11.6) you are obliged to use ony one significant digit. That means means converting 2.4 to 2. That result differs from the correct value by 57% of an error bar, which is a significant degradation of your hard-won data, in the sense that the distribution specified by 2.45(79) is just not the same as a distribution centered on 2, no matter what width you attach to the latter.
So we discover yet again that the “sig-digs” approach gives us no reasonable way of expressing what needs to be expressed.
For some more-extreme examples of results that simply cannot be expressed in terms of sig figs, see section 11.12.
Consider the notion that one inch equals some number of centimeters. If you adhere to the sig-figs cult, how many digits should you use to express this number? It turns out that the number is 2.54, exactly, by definition. Unless you want to write down an infinite number of digits, you are going to have to give up on the idea of sig figs and express the uncertainty separately, as discussed in section 4.2.
Suppose you see the number 2.54 in the display of a calculator. How much significance attaches to that number? You don’t know! Counting digits will not tell you anything about the uncertainty. Calculators are notorious for displaying large numbers of insignificant digits, so counting digits might cause you to seriously underestimate the uncertainty (i.e. overestimate the precision). On the other hand, 2.54 might represent the centimeter-per-inch conversion factor, in which case it is exact, and counting digits will cause you to spectacularly overestimate the uncertainty (i.e. underestimate the precision).
Suppose somebody asks you what is 4 times 2.1. If you adhere to the sig-figs cult, you can’t tell from the statement of the problem whether 4 is an approximate quantity, with one sig-fig of uncertainty, or whether it is meant to be an exact quantity.
Occasionally somebody tries to distinguish these two cases by making a fuss about units. The idea apparently is that all inexact quantities are measured and have units, and conversely all quantities with units are measured and therefore inexact. Well, this idea is false. Both the obverse and converse are false.
For example, the aspect ratio mentioned above is measured and inexact, but dimensionless. Conversely, in the SI system, the speed of light is exact but has dimensions. (Specifically, the value is 2.99792458×108±0 m/s by definition. See e.g. reference 16.)
To summarize: Dimensionless does not imply exact. Exact does not imply dimensionless. Trying to estimate uncertainty by counting the digits in a numeral is a guaranteed losing proposition, and making a fuss about units does not appreciably alleviate the problem.
There is no mathematical principle that associates any uncertainty with a decimal numeral such as 2.54. On the contrary, 2.54 is defined to be a rational number, i.e. the ratio of two integers, in this case 254/100 or in lowest terms 127/50. In such ratios, the numerator is an exact integer, the denominator is an exact integer, and therefore the ratio is an exact rational number.
By way of contrast, sometimes it may be convenient to approximate a rational number; for instance the ratio 173/68 may be rounded off to 2.54(⁄) if you think the roundoff error is unimportant in a given situation. Still, the point remains that 2.54(⁄) is not the same thing as 2.54.
Once I was discussing a quantity that had been calculated to be x=2.1(2). A sig-figs partisan objected that sometimes you don’t know that the uncertainty is exactly 0.2 units, and in such a case it was preferable to write x=2.1 using sig figs, thereby making a vague and ambiguous statement about the uncertainty. The fact that nobody knows what the sig figs expression really means was claimed to be an advantage in such a case. Maybe it means x=2.1(½), or maybe x=2.1(5), or maybe something else.
There are several ways of seeing how silly this argument is. First of all, even if the argument were technically true, it would not be worth learning the sig-figs rules just to handle this unusual case.
Secondly, nobody ever said the uncertainty was “exactly” 0.2 units. In the expression x=2.1(2), nobody would interpret the (2) as being exact, unless they already belonged to the sig-fig cult. The rest of us know that the (2) is just an estimate.
Thirdly, it is true that the notation x=2.1(2) or equivalently x=2.1±0.2 does not solve all the world’s problems. However, if that notation is problematic, the solution is not to switch to a worse notation such as sig figs. Instead, you should switch to a better notation, such as plain language. If you don’t have a good handle on the uncertainty, just say so. For example, you could say “we find x=2.1. The uncertainty has not been quantitatively analyzed, but is believed to be on the order of 10%”. This adheres to the wise, simple rule:
Sig figs neither say what they mean nor mean what they say.
A number such as 4.32±.43 expresses an absolute uncertainty of .43 units. A number such as 4.32±10% expresses a relative uncertainty of 10%. Both of these expressions describe nearly the same distribution, since 10% of 4.32 is nearly .43.
Sometimes relative uncertainty is convenient for expressing the idea behind a quantity, sometimes absolute uncertainty is convenient, and sometimes you can do it either way.
It is interesting to consider the category of null experiments, that is, experiments where the value zero lies well within the distribution that describes the results. Null experiments are fairly common, and some of them are celebrated as milestones or even turning-points in the history of science. Examples include the difference between gravitational and inertial mass (Galileo, Eötvös, etc.), the luminiferous ether (Michelson and Morley), the mass of the photon, the rate-of-change of the fine-structure constant and other fundamental “constants” over time, et cetera.
The point of a null experiment is to obtain a very small absolute uncertainty.
Suppose you re-do the experiment, improving your technique by a factor of ten, so that the absolute uncertainty σA of the result goes down by a factor of ten. You can expect that the mean value of the result mA will also go down by a factor of ten, roughly. So to a rough approximation the relative uncertainty is unchanged, even though you did a much better experiment.
On closer scrutiny we see that the idea of relative uncertainty never did make much sense for null experiments. For one thing, there is always the risk that the mean value mA might come out to be zero. (In a counting experiment, you might get exactly zero counts.) In that case, the relative uncertainty is infinite, and certainly doesn’t tell you anything you need to know.
Scientists have a simple and common-sensical solution: In such cases they quote the absolute uncertainty, not the relative uncertainty.
Life is not so simple if you adhere to the sig-figs cult. The problem is that the sig-figs rules always express relative uncertainty.
To put an even finer point on it, consider the case where the relative uncertainty is greater than 100%, which is what you would expect for a successful null experiment. For concreteness, consider .012±.034. How many digits should be used to express such a result? Let’s consider the choices:
Bottom line: There is an important class of distributions that simply cannot be described using the significant-figures method. This includes distributions that straddle the origin. Such distributions are common; indeed they are expected in the case of null experiments.
There exists a purely mathematical concept of “place value” which is related to the concept of significance. We mention it only for completeness, because it is never what chemistry textbooks mean when they talk about “significant digits”.
For example, in the numeral 12.345, the “1” is has the highest place value, while the “5” has the lowest place value.
Sometimes the term “significance” is used to express this mathematical idea. For example, in the numeral 12.345, the “1” is called the most-significant digit, while the “5” is called the least-significant digit. These are relative terms, indicating that the “1” has relatively more significance, while the “5” has relatively less significance. We have no way of knowing whether any of the digits has any absolute significance with respect to any real application.
This usage is common, logical, and harmless. However, since the other usages of the term “significant digit” are so very harmful, it may be prudent to avoid this usage as well, especially since some attractive alternatives are available. One option is to speak of place value (rather than significance) if that’s what you mean.
Another option is to speak of mantissa digits. For example, if we compare 2.54 with 2.5400, the trailing zeros have no effect on the mantissa. (In fact, they don’t contribute to the characteristic, either, so they are entirely superfluous, but that’s not relevant to the present discussion.) Similarly, if we compare 2.54 to 002.54, the leading zeros don’t contribute to the mantissa (or the characteristic).
It is more interesting to compare .0254 with .000254. In this case, the zeros do not contribute to the mantissa (although they do contribute to the characteristic, so they are not superfluous). This is easy to see if we rewrite the numbers in scientific notation, comparing 2.54×10−2 versus 2.54×10−4.
To make a long story short, the mantissa digits are all the digits from the leftmost nonzero digit to the rightmost nonzero digit, inclusive. For example, the number 0.00008009000 has four mantissa digits, from the 8 to the 9 inclusive. In more detail, we say it has a superfluous leading zero, then four place-holder digits, then four mantissa digits, then four superfluous trailing zeros.
Keep in mind that the number of mantissa digits does not tell you anything about the uncertainty, accuracy, precision, readability, reproducibility, tolerance, or anything like that. If you see a number with N digits of mantissa, it does not imply or even suggest that the number was rounded to N digits; it could well be an exact number, as in 2.54 centimeters per inch or 2.99792458×108 meters per second.
When the number system is taught in elementary school, mantissa digits are called “significant digits”. This causes conflict and confusion when the high-school chemistry text uses the same term with a different meaning. For example, some people would say that 0.025400 has three significant digits, while others would say it has five significant digits. I don’t feel like arguing over which meaning is “right”. Suggestions:
The first moment of a distribution P is also known as the arithmetic mean. In this section we denote it by µ:
| µ := ⟨ x ⟩P (27) |
where angle brackets ⟨⋯⟩P denote the operation of averaging something over the distribution P:
| ⟨ ⋯ ⟩P := ∫ ⋯ dP (28) |
In the case where P is represented by a set of equally-weighted examples, the mean can be written as:
| µ = ⟨ x ⟩ = |
| (29) |
It is common to write ⟨⋯⟩ as shorthand for ⟨⋯⟩P when it is clear from context what distribution P is involved.
The variance is denoted σ2 and is the second moment about the mean:
| (30) |
Exercise: Prove that the two expressions for the second moment given in equation 30 are equivalent. The proof is only a couple of lines long.
The standard deviation is denoted by σ, and is just the square root of the variance.
Two examples of a Gaussian normal distribution is shown in figure 3. The Gaussian family can also be expressed as an equation. When the mean is zero and the standard deviation is unity, we have:
| probability density ∝ exp(−x2 / 2) (31) |
or more generally, if µ is the mean and σ is the standard deviation, we have:
| dP(x | µ, σ) = |
| exp |
| dx (32) |
The height of the curve has been normalized so that the area under the curve is unity; that is, the total probability is 100%. The corresponding cumulative probability is:
| P(x | µ, σ) = ½ + ½ erf( |
| ) (33) |
Footnotes