_ [Contents]

Copyright © 2003 jsd

1  Scientific Methods – Overview

Many textbooks and web-sites describe “the scientific method” in terms most scientists find objectionable. Here is an attempt to do better.

1.    There is no such thing as “the” scientific method. Science uses many methods. There will never be a pat answer to the question “what is science”. The very notion that there could be a pat answer bespeaks an attachment to rote learning that is incompatible with scientific thinking.

2.    The major goals of science include making useful predictions and avoiding mistakes.

3.    Very often, scientific predictions are not exact. A prediction does not need to be exact to be useful. Laws, theories, and models have limitations. You should neither over-react nor under-react to these limitations. See item 9 and section 2.3 for more on this.

4.    It is often impractical to measure and/or calculate things exactly. Therefore it is good scientific practice to report not just the nominal value of a result, but also the uncertainty of the result, as discussed in reference 1. This point is related to item 3 and item 9.

5.    The role of hypothesis testing in scientific work is sometimes important, but not as important as non-experts seem to think. It is common to begin a series of experiments with one set of hypotheses, and to end with a completely different set. It is also common to conduct experiments with no clear hypotheses at all, just to explore the territory. See section 4 and reference 2 for more on this.

6.    Scientists use words like rule, law, equation, identity, principle, formula, algorithm, etc. almost interchangeably, to describe the process for making predictions (although there are slight variations in connotations).

7.    The word “theory” can be used in two radically different ways. The first usage means something like law or rule, only much grander, namely a system of rules giving a coherent description and explanation of a broad topic. The other usage refers to a mere speculation. Remarkably, both versions are correct, and the ambiguity can be traced back more than 2000 years. It is best to avoid the word entirely when talking to non-scientists, and especially when debating with persons who can’t be trusted, since if you intend one meaning they’ll use the other meaning against you. Suggestion:

8.    Mathematical results are validated by formality and rigor. This gives us logical statements of the form “If A then B” and suchlike. Physical-science results are sometimes validated by logic, but may also be validated by appeal to experiment. This gives us statements of the form “We observe A” and suchlike. Generally science is a complex lattice of facts and rules, combining observations and logic.

9.    Scientific rules generally have a limited domain of applicability. To state just the headline of a rule – without stating the limits of validity – is improper. For more on this, see section 2.3.

10.    From time to time, an established rule may be refined. It may be supplemented by other rules so as to extend the domain of validity. It may be supplemented by exceptions to improve the accuracy. However a rule with too many caveats and exceptions is likely to be not only inconvenient but unreliable. Occam’s razor and all that.

11.    From time to time, a rule may be supplanted entirely by a simpler and better rule. See reference 3 for a famous study of how new theories compete with old ones.

12.    If a rule stands in need of improvement, you should offer specific and constructive criticism. See section 2.4. In science, as elsewhere, non-specific and/or non-constructive criticism doesn’t do anybody any good. And it’s bad manners.

13.    Consider all the available data. When evaluating a hypothesis, do not “select” just the data the happens to support your pet theory. By the same token, consider all the plausible hypotheses, not just the first one that comes along that seems to more-or-less fit the data. For more on this, see section 7.2.

14.    Do not ignore contradictions between things that you “know”. These are a powerful indication that your knowledge needs to be refined.

15.    Creating new rules from scratch is exceedingly difficult. There is an infinite number of possible rules, and you will never have enough data to decide which of the contenders is best – unless there is some sort of additional guidance. Sometimes guidance is taken from intuition and from notions of “simplicity” or “elegance”. This is bordering on metaphysics, but it is an important part of science.

16.    In science, as in everyday life, it is often necessary to make approximations. Therefore it is important to distinguish good approximations from bad approximations (rather than treating that all approximations as equally good or equally bad). See section 3.1.

17.    Scientists, like business executives, government leaders, and everyone else, must often make decisions based on highly incomplete data. Therefore it is important to be able to change your mind as soon as you get new data that contradicts old hunches. This requires keeping score on each of the rules, keeping track of which are well-supported by existing data, and which are less-well-supported and therefore more open to revision. This requires questioning assumptions, as discussed in section 3.3.

18.    An important scientific activity (which applies not just to pure science but also to engineering and even farming, etc.) is called design of experiment. That means designing a series of measurements that will tell you what you need to know, without undue waste. See section 7.1 for more on this.

19.    An important part of scientific thinking is being able to recognize non-scientific and unscientific thinking, as discussed in section 2.

For additional discussion of “thinking skills” per se – including how to learn, and how to teach thinking skills – see reference 4.

See also reference 5, reference 6, and reference 7 for sensible discussions of what science is, and how scientists do science.

*   Contents

2  Scientific versus Unscientific Thought Patterns

It is important to know the difference between science and pseudo-science. An amusing story about this can be found in reference 8.

2.1  Fallacy is More Dangerous than Absurdity

There is an important distinction between fallacy and absurdity. An idea that makes wrong predictions every time is absurd, and is not dangerous, because nobody will pay any attention to it. The most dangerous ideas are the ones that are often correct or nearly correct, but then betray you at some critical moment.

Pernicious fallacies are pernicous precisely because they are not absurd. They work OK some of the time, especially in simple “textbook” situations … but alas they do not work in general.

You need not worry about the “most erroneous” errors. You should worry about the “most deceptive” and “most destructive” errors.

2.2  Examples of Unscientific Thinking

You should avoid using fallacious arguments, and you should object loudly if somebody tries to use them on you. Common examples of unscientific thinking include:

2.3  The Provisos are Part of the Rule

As mentioned in item 3 and item 9, most rules have limitations on the accuracy and/or their range of validity. You should neither over-react nor under-react to these limitations.

Consider the contrast: Equation 1 is very different from equation 2:

x = y          provided a, b, and c              (1)

 

x = y                                                (2)

which means x = y in all generality.

It is a common mistake to mislearn, misremember, or misunderstand the provisos, and thereby to overestimate the range of validity of such a rule.

There are several ways such mistakes can come about. I’ve seen cases where the textbook soft-pedals the provisos “in the interest of simplicity” (at the expense of correctness). I’ve seen even more cases where the text and the teacher emphasize the restrictions in equation 1, yet some students gloss over the provisos and therefore learn the wrong thing, namely equation 2.

Another possibility is that we don’t fully know the provisos. A good example concerns the Wiedemann-Franz law. There are good theoretical reasons to expect it to be true, and experiments have shown it to be reliably true over a very wide range of conditions. That was the whole story until the discovery of superconductivity. The Wiedemann-Franz law does not apply to superconductors, and you will get spectacularly wrong predictions if you try to apply it to superconductors. My point is that before the discovery of superconductivity – which was a complete surprise – there was no way anyone could have had the slightest idea that there was any such limitation to the Wiedemann-Franz law.

2.4  Constructive Criticism

As mentioned in item 12, offering non-specific and/or non-constructive criticism doesn’t help anybody.

It is important to keep track of the limitations of each model, and to communicate the limitations. If you see some folks at risk of error because they are disregarding the limitations, it is helpful to remind them of the limitations. Sometimes it is worth trying to find improved ways of expressing the limitations.

If a model stands in need of improvement, the best thing you can do is to improve it. Devise a rule that has more accuracy and fewer limitations. (You may find this is more easily said than done.) Communicate the new rule to the community, and explain why it is better.

If you can’t devise a better rule on your own, you might hire a scientist to do it for you. (Again, you might find that devising accurate, robust models is more easily said than done.)

There’s a rule that says “don’t borrow trouble”. Conversely, you shouldn’t spread trouble around, either. Let me explain what that means: Suppose a rule is good enough to solve Joe’s problem, but is too limited to solve Moe’s problems. Then it’s not constructive for Moe to complain about what Joe is doing. It’s none of Moe’s business. If Moe accuses Joe of using a “wrong” rule, the accusation is false; just because the rule is no good for Moe’s purposes doesn’t make it no good for Joe’s purposes. Conversely, if Joe notices that the rule is too limited to handle Moe’s problem, that is no reason for Joe to distrust the rule within its proper limitations.

This is worth mentioning, because some people think that “the truth” must be exact and unlimited, and conversely anything that has limitations must be worthless. This can be seen as an extreme form of over-reacting to the limitations of a model, but it is all too common. See section 2.5 for more on this.

If Joe and Moe choose to work together to devise a new, grander model that has fewer limitations, so that it can handle both their problems, that is great – but it is their choice, not their obligation, and should not be an impediment to using the old model to solve Joe’s problems.

2.5  Beyond Black and White

Sometimes we are faced with black-and-white choices, as indicated in figure 1.

bw
Figure 1: Black and White Choices

More often, though, the choices form a one-dimensional continuum: not just black and white, but all shades of gray in between, as indicated in figure 2.

gray-scale
Figure 2: A Gray-Scale Continuum

It is an all-too-common mistake to see things in black-and-white when really there is a continuum. This well-known fallacy has been called by many names, including false dichotomy, black-and-white fallacy, four-legs-good two-legs-bad, Manichaean fallacy, et cetera.

To say the same thing again, it is all too common for people to assume that everything that is not black is completely white, everything that is not white is completely black, everything that is not perfect is worthless, everything that is not completely true is completely false, their friends are always good and their enemies are always evil, et cetera.

A related but more-subtle fallacy is to assume that all things that are not perfect are equally imperfect. In contrast, the fact is that point B in figure 2 is much blacker than point A, even though neither one is perfectly black nor perfectly white.

Understanding this is a crucial part of scientific thinking, because as mentioned in item 3, scientists are continually dealing with rules that are inexact or otherwise imperfect. The point is that we must make judgements about which rules are better or worse for this-or-that application. We cannot just say they are all imperfect and leave it at that. They are definitely not equally imperfect.

Actually, sophisticated thinking requires even more than shades of gray. Often things must be evaluated in multiple dimensions, evaluated according to multiple criteria at once, as indicated in figure 3. Option A is better for some purposes, and option B is better for other purposes.

color-wheel
Figure 3: A Multi-Dimensional Continuum

See reference 12 for more about the distinction between truth and knowledge.

3  Approximations, Assumptions, and Uncertainty

3.1  Approximations

In science as in daily life, it is necessary to make approximations, as mentioned in item 16. For example, when you buy shoes, you don’t buy a pair that is exactly the right size; you buy a pair that is close enough to the right size.

Elementary arithmetic is exact, in the sense that 2 plus 2 equals 4 exactly. In contrast, physics, chemistry, biology, etc. are not exact sciences; they are natural sciences. For example, Newton’s law of universal gravitation

FI = G 
M m
r2
             (3)

is one of the greatest triumphs in the history of human thought … but we know it is not exact. It is a very good approximation when the gravitational field is not too strong and not changing too quickly. It is also misleading, because FI is not the only contribution to the weight of ordinary terrestrial objects; there are significant correction terms from other sources including the rotation of the earth, as discussed in reference 13.

It is a common mistake to treat all approximations as equally good, or equally bad.

To say the same thing another way, when you are in a situation that requires making an approximation, that does not give you a license to make a bad approximation. It’s your job to figure out what’s good and what’s bad.

It is not always easy to distinguish good approximations from bad approximations. It requires knowledge, skill, and judgement.

3.2  Uncertainty

Science rarely offers certainty. Often it offers near certainty, but not absolute certainty. (This is in contrast to religion, which sometimes offers absolute certainty, and to things like elementary arithmetic, which offers absolute certainty over a limited range.)

One of the surest ways to be recognized as a non-scientist is to pretend to be certain when you’re not.

The world is full of uncertainty. It always has been, and always will be. You should not blame science for “causing” this uncertainty, and you should not expect science to eliminate this uncertainty. Instead, science tells us good ways to live in an uncertain world.

Techniques for quantifying uncertainty are discussed in reference 1.

3.3  Questioning Assumptions

As mentioned in item 17, it is impossible for anyone to do anything without making assumptions.

Remember that a major purpose of scientific methods is to make useful predictions and to avoid mistakes. False assumptions are a common source of serious mistakes.

At this point, non-experts commonly say “don’t make assumptions” or perhaps “check all your assumptions”. Alas, that’s not helpful. After all, most assumptions are true and useful ... otherwise people wouldn’t assume them. The trick is to filter out the tiny minority of assumptions that turn out to be false. This is far easier said than done. There are too many assumptions, and it is impractical to even list them all, let alone check them all.

The real question is, which assumptions should be checked under what conditions? There is no easy answer to this question.

Assumptions can be classified, approximately, as explicit assumptions and implicit assumptions. Explicit assumptions are the ones you know you are making. They are usually not the main problem; you can make a list of the explicit assumptions and then check them one by one.

The big trouble comes from implicit assumptions that aren’t quite true. This includes things that “everybody knows” to be true, but are not in fact true, as discussed in reference 10. They also include rules that have become invalid because you have mistaken the provisos, as discussed in section 2.3.

Skilled scientists can question assumptions somewhat more quickly and more methodically than other folks, because they have had more experience doing it. But it’s never easy. All of us must rack our brains to figure out which assumptions have let us down.

It always looks relatively easy in retrospect. Once somebody has identified the assumption that needed repair, it is easy for everybody else to hop onto the bandwagon.

One sometimes-helpful suggestion is this: If you find a contradiction, inconsistency, or paradox in what you “know”, that is a good reason to start questioning assumptions. Start by questioning the assumptions that are most closely connected to the contradiction.

Some scientists keep lists of paradoxes. If an item stays on the list for a long time, it means there is a problem that is not easily solved, and the solution is likely to be a turning point in the history of science. Examples from the past include the Gibbs paradox, the black-body paradox, various paradoxes associated with the luminiferous ether, the Olbers paradox, et cetera.

An important component of science, especially of scientific research, involves exploring new territory. Commonly assumptions that were valid in the old territory break down in the new territory. Indeed when researchers choose where to explore, they often seek out situations where assumptions can be expected to break down, since that will reveal new information. For more on this, see section 5 and reference 14.

In ordinary applications, when you want to rely on the model, you should stay safely within the limitations of the model.   In research mode, where the model is the object of research, you are testing the model, not relying on it. Then it makes sense to patrol along the boundaries, to see if the limits need to be tightened or loosened. It also sometimes makes sense to go far beyond the limits, in hopes of making a surprising discovery.

4  Hypothesis Testing

The word “hypothesis” has two distinct meanings, as discussed in reference 2. The discussion here applies to “hypothesis testing” (as opposed to “hypothetical scenarios”).

The lattermost stages of any systematic investigation can often be formalized in terms of hypothesis testing. What’s more, it is often possible to describe an already-complete experiment by stating what hypotheses are ruled out by the results, and what hypotheses are consistent with the results. One should not imagine, however, that all scientific work is motivated by hypotheses or organized in terms of hypotheses. Some is, and some isn’t.

Science – and especially research – usually involves a multi-stage iterative process, where the results of early stages are used to guide the later stages. The early stages are exploratory, and are not well described in terms of hypothesis testing, unless we abuse the terminology by including ultra-vague hypotheses such as “I hypothesize that if we explore the jungle we might find something interesting”.

Typical example: When Bardeen, Brattain, and Shockley did their famous work, they started from the hypothesis that a semiconductor amplifier device could be built. This hypothesis turned out to be true, but it was neither novel nor specific. The general idea had been patented decades earlier by Lilienfield. Indeed a glance at the following table would have led almost anyone to a vague hypothesis about semiconductor triodes.

vacuum-tube diode (known)      vacuum-tube triode (known)
semiconductor diode (known)            ???

The problem was, all non-vague early hypotheses about this topic turned out to be false. It is easy to speculate about semiconductor amplifiers, but hard to make one that actually works. The devil is in the details. Bardeen, Brattain, and Shockley had to do a tremendous amount of work. Experiments led to new theories, which led to new experiments ... and so on, iteratively. Many iterations were required before they figured out the details and built a transistor that worked.

Example: When Kamerlingh Onnes began his famous experiments, he was not entertaining any hypotheses involving superconductivity. He was wondering what the y-intercept would be on the graph of resistivity versus temperature; it had never occurred to him (or anyone else) that the graph might have an x-intercept instead.

Example: When Jansky began his famous experiments, he was not entertaining any hypotheses about radio astronomy. He spent over a year taking data before he discovered that part of the signal had a period of one sidereal day. At this instant – and not before – the correct hypothesis came to mind: that part of the signal was emanating from somewhere far outside the solar system.

Example: At the opposite extreme, in a typical forensic DNA-testing laboratory, a very specific hypothesis is being entertained: Either sample A is consistent with sample B, or it isn’t. This may be “scientific”, but it isn’t research.

As the proverb says: If the only tool you have is a hammer, everything begins to look like a nail. Now, I have nothing against hammers, and I have nothing against hypothesis testing. But the fact remains that in many circumstances, they are not the right tools for the job. Scientists know how to use many different tools.

It is common for people who don’t understand science to radically overemphasize the hypothesis-testing model, and to underestimate the number of iterative stages required before a good set of hypotheses can be formulated. It is a common but ghastly mistake to think that a good set of hypotheses can be written down in advance, and then simply tested.

Overemphasizing hypothesis-testing tends to overstate the importance of deduction and to understate the importance of induction, exploration, and serendipity.

As usual, it is a mistake to focus on the extremes:

5  Well-Managed Risks

At one extreme is the view that science proceeds via routine, plodding hypothesis testing. This view is far off the mark, as discussed in section 4.

At the opposite extreme is the view that science proceeds via dumb luck, via a succession of wildly-improbable accidental discoveries. There’s even a word for this, namely “serendipity”. This view is even farther off the mark.

The reality lies in the middle, far from either extreme, as we now discuss.

Scientists understand probability and statistics. By definition, you can’t make a particular fortunate accident happen on demand, but you can work in an area where valuable discoveries are likely to be made from time to time. Everybody must accept some risk. For example, any sensible farmer knows there is some risk that a freak storm will destroy his entire crop. The key idea is that successful crops are sufficiently common – and the crop is sufficiently valuable – that the farmer makes money on average.

Scientists do not accept all risks, nor do they decline all risks. They accept risks that are likely to pay off well on average.

See reference 14 for more on this.

6  Correctness and Modesty

As mentioned in item 2, a major purpose of scientific methods is to make useful predictions and to avoid mistakes. The known scientific methods are a collection of guidelines that have been found to work reasonably well.

One of the most important steps in avoiding mistakes is to always keep in mind that mistakes are possible. This is so important that this whole section is devoted to emphasizing it and re-expressing it in assorted ways.

James Randi said you should take care not to fool yourself, keeping in mind that “the easiest person to fool is yourself”.

Another word for this is modesty. Being aware of your own fallibility is modest. Pretending you are infallible is immodest.

It is OK to a limited extent to be an advocate for your favorite idea, but you must not get carried away. When you collect data in support of an idea, you must also look just as diligently for data that conflicts with that idea. See section 7.2.

A related form of modesty, which is also crucial for avoiding mistakes, is to not overstate your results. Scientists use certain figures of speech that are designed to avoid overstatement. Among other things, this includes recognizing the distinction between data and the interpretation that you wish to place upon the data. As an illustration, imagine some children go on a field trip to the dairy. Upon their return, they write a childish report that says “cows are brown” – or, worse, “all cows are brown”. A more modest, scientific approach would be to say “the cows we observed were all predominantly brown”. A statement about the observed cows sticks closely to the data, while a generalization about all cows requires a leap beyond the data.

As mentioned in item 9 and section 2.3, practically all scientific results have some limits to their validity, and you must clearly understand and clearly communicate these limits.

7  Experimental Techniques

Here is a very incomplete sketch of some of the issues that arise when taking data. (This stands in contrast to the rest of this document, which mostly emphasizes the analysis phase.)

7.1  Design of Experiment

Consider the famous Twelve Coins Puzzle as discussed in reference 15. Suppose you find a casino that is willing to pay you $350 for identifying the odd coin, but makes you pay $100 for each weighing. If you weigh the right combinations of coins, you can do the job in three weighings, so you make money every time. In contrast, if you follow a sub-optimal strategy that requires four or more weighings, you will lose money on average.

This scenario is reasonably analogous to many real-world situations. Commonly there’s a significant price for making a measurement, and you want to maximize the amount of information you get for this price.

I mention this because all too often, people claim that a principle of scientific experimentation is to “change only one variable at a time”. It’s easy to see that such a claim is hogwash. The Twelve Coins Puzzle suffices as a counterexample. If each weighing differs from the previous weighing by only one coin, you cannot come anywhere close to an optimal solution.

The suggestion to “change only one variable at a time” might nevertheless be good advice in some special situations. That’s because the cost of making a measurement is not always the dominant cost in the overall information-gathering process. For example, imagine a situation where gathering the raw data is very cheap, while just plain thinking about it is expensive. Then you might want to follow a strategy, such as changing only one variable at a time, that makes the data easy to interpret, even though you had to do a large number of experiments (much larger than theoretically necessary). Consider the contrast:

For young children doing cheap, simple experiments, it might make sense to tell them to change only one thing at a time, because the rate-limiting step is interpreting and understanding the data, and we want to make that step as easy as possible.   For skilled scientists (and engineers, farmers, etc.) doing complex, expensive experiments, changing only one variable at a time would be an unnecessary burden, and often a disastrous burden.

Changing only one variable at a time is a crutch, which may partially compensate for the investigator’s lack of skill in interpreting the data. In contrast, for performers with ordinary ability and training, crutches are harmful, not helpful.

7.2  Fair Sampling

It is OK to a limited extent to be an advocate for your favorite idea, but you must not get carried away. When you collect data in support of an idea, you must also look just as diligently for data that conflicts with that idea. Then you must weigh all the data fairly, and disclose all the data when you discuss your idea. (If you don’t do this it is called “selecting” the data, which is considered a form of scientific fraud.)

The same applies to theories: It does not suffice to show that your favorite theory does a good job of fitting the data. You should diligently search for other theories that do a comparably good job of fitting the data.

This is what sets science apart from debating and lawyering, where advocacy is carried to an extreme, and it is considered acceptable to skip or make light of data that tends to support the “opposing side”.

In science, the phrase “selecting the data” has very nasty connotations, namely selecting the data so as to support some preconceived notion. For example, it is unacceptable to discard some of the data because it seems “out of range” or “implausible”.

On the other hand, there are cases where it is acceptable or indeed necessary to examine a subset. The requirement is to draw a fair sample. That is, the data should be sampled in such a way that the sampling does not bias the result.

For example, if you want to compute the average aspect ratio of eggs, and you have millions of eggs available, it is acceptable to choose a moderately small sample and measure only the sample. You must, however, arrange that the sample is chosen in such a way that no bias is introduced.

Sometimes during the course of an experiment, it is necessary to abandon or veto a measurement. For example, if you are trying to measure the length and width of an egg, and the egg falls to the floor and gets smashed before the measurements are complete, you have to veto that egg. You must, however, make sure that such losses are independent of the thing you are trying to measure. In particular, if it should happen that high-aspect-ratio eggs are more likely to be dropped, this could seriously degrade the aspect-ratio measurement. Redesign the experiment to prevent such losses, and start over.

On the other side of the same coin, unless you are absolutely sure that your sampling does not bias the result, you should assume that it does bias the result. This is not tragic; it just means that the details of the sampling procedure become part of the definition of what you are measuring. For example, you can measure the height of a group of basketball players. That is OK so long as you don’t think that basketball players are representative of the population at large.

Design your experiments with plenty of dynamic range and plenty of “headroom” so as to minimize the chance of data falling outside the range of your instruments. Whenever data is out of range, vetoing the data is just the beginning. You then have to analyze how much distortion that introduces into the measurement you are trying to make. Such an analysis is usually difficult and sometimes impossible. Vetoing out-of-range data is a notorious source of serious error.

Roundoff errors are another notorious problem. To avoid this, record the raw data using plenty of guard digits. Do the analytical calculations using plenty of guard digits. See reference 1.

Recall that a measurement generally has both a nominal value and an uncertainty (“error bars”) as discussed in reference 1. Vetoing out-of-range data is particularly likely to distort the error bars, which is unacceptable, even if the nominal value is not greatly affected.

If you are careful, it is OK to do a practice run, then do a for-real run, and publish only the real data. You must, however, decide in advance which runs are for practice and which runs are for real. Otherwise this could become a nasty scheme for selecting the data. In particular, performing run after run until you obtain the “desired” result is completely unacceptable.

Sometimes it is necessary to have a trigger or veto or triage mechanism, i.e. some rule that selects which data will be kept and which will be discarded. Doing this right is very, very tricky ... so tricky that usually it is simpler, cheaper, and all-around better to just keep all the data. Also: when you write up your results, you should describe the trigger criterion in detail, so that readers can judge for themselves whether the trigger introduced any significant bias.

To repeat: The phrase “selecting the data” may not sound nasty the first time you hear it, but the connotations are very nasty indeed. There are various things that could be going on:

The burden of proof is on you, to show that whatever you are doing is legitimate, i.e. that it does not bias the conclusions.

7.3  Pilot Plants, Practice Runs, and Feedback

Keep in mind the common-sense principle that you should never put yourself in a position where the first mistake is fatal.

In the real world, scientists and engineers do simulations and dry runs. They build pilot plants before committing to full-scale operation, so that most of their mistakes will be small mistakes.

Therefore you should arrange your experiment so that you can take some data, then do some analysis, and then come back and take some more data. This allows feedback from the analysis phase back to the data-taking phase, so that you can improve the data-taking if necessary.

This is all the more important for students who are, after all, students not experts, and can be expected to make mistakes. Yet we want students to get good results in the end.

If at all possible, arrange it so that analysis (at least some sort of preliminary analysis) happens in real time, so that if anything funny happens during the experiment, the experimenter knows about it, for reasons explained in section 7.4.

7.4  Why You Need Feedback

Let me tell a little story. Once upon a time in a mythical place called La Jolla there was a fellow named John who really liked numbers. The more digits the better. He wanted unbiased numbers so dearly that he would have his grad student, Richard, face the instrument with his eyes closed; when John said “Now!” Richard would momentarily open his eyes and observe the number. John would write it in the lab book. After doing this for many days and weeks, they had lots of lab books filled with lots of numbers.

They were looking for some sign of a superfluid transition in liquid 3He at low temperatures. John had already made a string of important discoveries in low-temperature physics, and finding the superfluid would move him from pre-eminence to immortality.

Aside: The nice thing about theoretical predictions is that there are so many to choose from. Predictions of Tc started at about 300 milliKelvin. When that was disproved experimentally, the predictions moved to lower temperatures: 100 mK was disproved; 50 mK was disproved; 30 mK was disproved; 10 mK was disproved. In fact John and Richard had checked experimentally down to less than 2.5 mK without seeing anything. So the theorists gave up and lowered their prediction to something in the microKelvin range.

Once upon the same time, in a mythical place called Ithaca, there were a couple of guys named Bob and Doug. They liked numbers OK, but they also liked graphs. They wanted to see the data. And they didn’t want to wait until the experiment was over to analyze the data and see what it meant; they wanted to see the data in real time. This is in the days before computers, so if you wanted a graph you had to suffer for it, fooling with Leeds chart recorders, i.e. mechanical contraptions with paper that always jammed and pens that always clogged.

One Thursday in late November, Doug was watching the chart as the cell cooled through 2.5 mK. There was a glitch on the T versus t trace. Doug circled it and wrote “Glitch!!” on the chart. He warmed back up and saw it again on the way up. And then again on the way back down. He called Bob. Bob put down his plate of turkey and zoomed into the lab. The two of them stayed up all night walking back and forth through the “glitch”.

I’ve seen that chart. It doesn’t look like what I would call a glitch. It’s more of a corner. A small-angle corner, just a barely-visible departure from the underlying linear trend. Eyes are good at spotting corners in otherwise-straight lines.

Doug and Bob assumed, based on the aforementioned experimental and theoretical evidence, that this wasn’t the superfluid transition. They figured it was something going on in the nearby solid 3He. But they eventually figured out that it was indeed the superfluid. Right there at 2.5 mK.

Everybody assumes that if John and Richard had been plotting their data on a strip-chart recorder, they would in fact have discovered the superfluid. But they didn’t.

Now, imagine what it was like working in Bob’s lab after that. With strictness bordering on fanaticism, strip-chart recorders were attached to all the significant variables ... and even some of the not-very-significant variables.

Every so often, a new baby grad student, his fingers stained N different colors from trying to unclog the chart pens, would ask whether we really needed all those chart recorders. Somebody would explain by saying, Once upon a time in a mythical place called La Jolla, .......

7.5  Keep Good Records

Lab notebooks are not supposed to be perfect. If there are no mistakes in the lab book, the lab book is a fraud. You are allowed to mark bad data as bad, but you are not allowed to obliterate it or eradicate it. See reference 16 for an example of a well-kept lab book containing a correction.

Perfection is not required;
deception is not allowed.
     

You are not allowed to “clean up” the books. You are certainly not allowed to keep two sets of books (a dirty one to record raw data, and a clean one to show off).

There are many reasons for keeping good records. First and foremost, you and your collaborators need the information on a day-to-day basis. During the analysis phase, it is all-too common to find that the data cannot be analyzed, because even though the nominal result of the measurement was recorded, the conditions of the measurement were not adequately recorded. There’s no value in knowing the ordinate if you don’t know the abscissa.

As mentioned in section 7.2, record all the data. If possible, hook up a computer to stream all the data into a file. Record the abscissas as well as the ordinates.

The rule is to keep good records. It is traditional to keep data in a so-called lab book, aka laboratory notebook, which is one way of keeping records, but not the only acceptable way. Good electronic records are an acceptable alternative, and are in some ways better. For example, if a witness signs and dates a page in a lab book, a bad guy might be able to add something to the page later. Sometimes this is detectable, but sometimes not. (Don’t try it.) In contrast, if an electronic document is cryptographically signed, there is no way to alter any part of the document without invalidating the signature.

There are various ways of obtaining unforgeable date-stamps on electronic documents; see reference 17 for an example. If you don’t want to bother with that, one option is to just email the document (or a hash thereof) to your lawyer, with a cover letter saying that you are not asking him to take any action, just to file away the letter, so that if there is ever any dispute he can authenticate the date. Note that if the document is huge and/or highly sensitive, you don’t need to send the document itself; it suffices to send a cryptologic HMAC (hashed message authentication code).

For things like blueprints, circuit diagrams and computer programs, electronic records have huge advantages. The point is that such things get revised again and again, and if you just have “the” document, you have no idea who contributed what when. In contrast, a modern revision control system such as git will conveniently keep track of the current version and all previous versions. It also keeps track of who submitted what changes, and when. Submissions can be digitally signed.

If you are collaborating with people at other locations, electronic documents have tremendous advantages.

In addition to helping you and your collaborators directly, good records have other uses. They are particularly important in connection with patents. Questions about inventorship and priority come up a lot. I’ve seen it first hand, many times. Once I came out on the short end of the stick, which didn’t bother me, because the other guy had records that convinced everybody (including me) that he invented the thing a month before I did, fair and square. More commonly though, one claimant has copious records documenting the gradual evolution of the invention, while the other claimant is just a pretender, with nothing but a bold assertion covering the final, perfect result. Sometimes disputes arise after an application has been filed or after a patent has been granted, but in a large company or large university, disputes can arise intramurally, before the application is filed, when the lawyers try to figure who should be named as inventor(s) on the application.

You should ask your patent attorneys what they want your records to look like ... and then ask them what they will settle for. It seems some lawyers would “prefer” to see every detail recorded in a lab book, then signed and notarized in triplicate ... but they will settle for a lot less formality.

8  References

1.
John Denker “Measurements and Uncertainties” ./uncertainty.htm

2.
John Denker, “How to Define Hypothesis” ./hypothesis.htm

3.
Thomas Kuhn, The Structure of Scientific Revolutions

4.
John Denker “Teaching (and Learning) Thinking Skills” ./thinking.htm

5.
Richard Feynman, The Character of Physical Law

6.
Richard Feynman, “What is Science?” http://www.southerncrossreview.org/32/feynman3.htm

7.
Larry Woolf, “How do scientists really do science?” http://www.sci-ed-ga.org/pdfs/how-do-science-10-10-04.pdf

8.
Richard Feynman, The Pleasure of Finding Things Out especially the chapter Cargo Cult Science.

9.
John Denker “Gas Laws” ./gas-laws.htm

10.
John Denker, ”Valid versus Invalid Arguments: Appeal to Authority etc.” ./authority.htm

11.
John Denker, “Argument from No Evidence” ./no-evidence.htm

12.
John Denker, “Truth in Contrast to Knowledge and Belief” ./truth.htm

13.
John Denker, “Definition of Weight, Gravitational Force, Gravity, g, et cetera” ./hypothesis.htm

14.
John Denker “How To Evaluate Creative Ideas” ./projectology.htm

15.
John Denker, “The Twelve Coins Puzzle” ./twelve-coins.htm

16.
A page from one of Linus Pauling’s lab books (with links to many other pages), http://osulibrary.oregonstate.edu/specialcollections/rnb/05/05-020.html

17.
USPS Electronic Postmark Services http://www.usps.com/electronicpostmark/welcome.htm
[Contents]

Copyright © 2003 jsd

_