It is very dangerous to live in a society where a few people have high-level thinking skills, and the rest don’t. Democracy does not work well in such a society.
Also: People who have high-level thinking skills are generally more productive than people who don’t. As a consequence, jobs that require high-level thinking generally pay better than jobs that don’t.
| Almost everybody knows how to run, after a fashion. However, if you sign up for the track team, or the soccer team, or anything like that, the coach will train you to run better, possibly a lot better. | Everybody knows how to think. It would be incorrect and insulting to tell someone they don’t know how to think. However, the fact remains that a good science class will train you to think better, possibly a lot better. |
That is, thinking skills are by-and-large separate from domain knowledge. To solve real-world problems in a particular domain, you need knowledge about the domain plus general thinking skills.
If you have high-level thinking skills, you can become proficient in a new domain just by learning the new domain-specific knowledge; you don’t need to learn the thinking skills all over again.
Einstein said “An education is what remains after you have forgotten everything you learned in school.” I’m pretty sure that he was referring to thinking skills, which remain after you have forgotten all the domain-specific factoids.
Anecdote: Once upon a time, a friend and I were conducting sea trials in a large, brand-new sailboat. The two of us had worked together before, debugging large computer programs. As you can imagine, debugging a computer program requires a detailed understanding of the computer language ... whereas debugging a boat requires considerable knowledge about how boats work, which is quite a different body of knowledge. However, both of us were struck by the fact that we used essentially the same process in both cases. We checked the typical case, then we checked the edges of the envelope, then we checked the corners of the envelope. When we observed small anomalies, we made a note of them, and then did whatever was necessary to make them reproducible. And so forth.
I call such things “game-show tests”.
I used to say that such tests don’t predict anything at all, but if things keep going the way they are, such tests will begin to predict success in school ... for the simple reason that success in school is being measured, more and more, by such tests. This is circular in a truly ghastly way. It encourages rote learning and discourages thinking.
In particular, we need tests that measure thinking skills.
If you do it right, kids will increase their thinking skills and enjoy it.
After years of a steady diet of such problems, students will be alarmed and recalcitrant if you suddenly assign them homework that requires nontrivial thinking. You will have to explain that your course is different from other courses, past and present. Then you will have to patiently teach them the required thinking skills. Then you can assign problems that require thinking, with gradually increasing complexity.
Here’s a classic example: The task is to add 198 plus 215. The easiest way to solve this problem in your head is to rearrange it as (215 + (200 − 2)) which is 415 − 2 which is 413. The small point is that by rearranging it, a lot of carrying can be avoided. The larger points are:
In this case, the straightforward approach would have worked; it just would have been inconvenient. This stands in contrast to the nine dots puzzle (section 13.2), where the straightforward approach doesn’t work, and an imaginative approach is absolutely necessary.
For the Mississippi Flow problem (section 13.3), there are two methods of solution, both of which are roughly equally convenient and equally accurate. Having two independent methods of solution is tremendously valuable, because it increases the reliability of the result.
In some quarters, the term “compensation” is applied to situations like this. I’m not sure exactly what it’s supposed to mean, but I think it just means rearranging the problem to make it easier to solve. I deem “compensation” to be an ugly and not-very-descriptive term. I prefer to talk about multiple, imaginative, devious, indirect, and/or outside-the-box approaches to the problem.
What is a puzzle? Loosely speaking, any problem that requires thinking is called a puzzle or (equivalently) a riddle. Also, most puzzles have the further property that it is much harder to find a solution than it is to verify and understand a solution once it has been found. For example, consider the “eleven words in one” puzzle (reference 3). A given solution can be verified directly ... but a direct attack to find the solution would be thousands of times harder, since it would require searching through all the six-letter words in the English language.
Note: Easy verification is related to what computer scientists call the NP property. (If you don’t know what this means, don’t worry about it.) This is also related to what some puzzle aficionados call this the “Aha!” property, especially if the puzzle hinges on a single point that is obvious in retrospect.
Puzzles can be classified along various axes, as we now discuss.
One axis indicates how much domain knowledge the puzzle requires. Let’s call this the K axis. There are thousands of available puzzles that are near K=0. They are completely self-contained, i.e. the statement of the problem contains all the information necessary to solve it. Good starting places include the “20 questions” game (reference 4) and the “twelve coins” puzzle (reference 5). Reference 6 is a classic source; some of them are word puzzles, while others involve (in subtle ways) a fair bit of mathematical sophistication. There are also whole series of books by the likes of Raymond Smullyan and Martin Gardner. Self-contained puzzles are useful as a starting point, so that students can get accustomed to thinking even before they have much domain knowledge. As it says in reference 7, “Children lack knowledge and experience, but not reasoning ability.”
Moving along the K axis we come to problems that are “almost” self-contained, in the sense that they depend on facts that are unstated but well-known and easy to bring to mind. Farther along this axis are problems that require some amount of domain-specific knowledge. Reference 8 is a well-known source of puzzles that involve modest amounts of physics knowledge.
At the far end of the K axis we find problems that require broad and deep knowledge. To illustrate the range of the K axis, consider the following contrast:
| The “Who Owns the Fish” problem (reference 9) is intricate enough to scare away most people, but it is completely self-contained and well-posed. The statement of the problem contains just the information required to solve the problem ... no less, and essentially no more. | The “Mississippi Flow” problem (section 13.3) problem is very far from being self-contained. It requires you to rack your brain searching for information that might help solve the problem. A wide search is necessary, because seemingly very disparate tidbits of information turn out to be helpful. This is characteristic of a wide range of real-world problems. |
We can also define a B axis, which indicates to what extent a direct approach suffices, or not. The nine-dots puzzle (section 13.2) is the quintessential example and the source of the expression “outside-the-box thinking”. Other venerable examples where the direct approach fails include the dog-duck-grain problem and the orchard with 10 trees in five straight rows of four trees each.
If an indirect approach is needed, you need to use your imagination to find it. This involves an element of critical thinking, as discussed in section 5.
We can also define a H axis, which indicates how large is the space of hypotheses that must be considered. For example, in “four hats” puzzle in reference 9, there are four main hypotheses that must be considered. In contrast, in the “Who Owns the Fish” puzzle in reference 9, if you formulate the problem in the obvious way there are billions of hypotheses to be considered. Similarly, chess puzzles commonly involve millions or billions of possibilities.
As large as those numbers are, they are still finite, and one could enumerate all the possibilities, in principle, by straightforward means. This contrasts with Bongard problems (reference 10) where there are no a priori limits on what hypotheses should be considered. Students generally enjoy Bongard problems. They teach some useful thinking skills, including the necessity of looking at a problem from more than one viewpoint.
It is also worth noting that some puzzles (and many real-world problems) have multiple solutions; that is, there are multiple members of the solution set. As an elementary example, suppose the desired answer is known to solve the equation x2 = 81. If you find a solution to the equation, you may or may not have found the desired answer.
A much more challenging example is to find the complete solution-set to the “south/east/north triangle problem” (section 13.4). Many people find one solution and express absolute certainty that it is the only solution. It’s not.
For some reason that I don’t fully understand, finding one solution creates a tremendous psychological barrier to finding another solution. Perhaps this is just a result of poor training: the students have been trained to expect that every homework problem will have only one solution.
We now turn to a topic that is somewhat related but somewhat different, namely methods of solution. (This topic was introduced in section 3.) For example, there are two completely independent ways of finding how much water flows in the Mississippi. That means we can ask questions at two different metaphysical levels:
Question (a) has essentially only one answer, but question (b) has a solution-set with at least two members.
Again it seems that finding one answer to question (b) creates a tremendous psychological barrier to finding another answer.
It must be emphasized that being able to solve question (a) in two different ways is a tremendously valuable skill, because it vastly decreases the chance of making an undetected error.
We now consider problems that are underspecified, overspecified, or otherwise ill-posed. The most troublesome kind of ill-posed problems involve inconsistencies. That is, sometimes the “facts” you’re working with are not entirely true.
To deal with such problems, you need to move beyond black-and-white notions of true-and-false; instead you need to weigh the probabilities. Similarly, you are no longer dealing with facts; instead you are weighing the evidence.
Some of the inconsistencies are exogenous, i.e. they come from what other people have told you. Other inconsistencies are endogenous, i.e. they come from assumptions that you have made on your own.
Some “recreational” puzzles, especially those that involve outside-the-box thinking, are useful for developing a subset of critical thinking skills, because they tempt you to make false assumptions, and force you to question your assumptions.
On the other hand, the overwhelming majority of “recreational” puzzles are well-posed, which means they don’t really exercise the full range of critical thinking skills.
For more discussion of ill-posed problems, see reference 11.
By way of example, suppose you were asked to fit a sine wave to a set of measured points as shown in figure 1. The obvious solution to this problem is shown in figure 2.
That looks like a good fit. The amplitude, frequency, and phase of the fitted function are determined to high precision.
But we still have the question, how sure are you that this is the right answer? How well does this fitted function predict the position of the next measured point? Hint: reference 12 shows another way of fitting a sine wave to these points.
There are some deep ideas here, ideas of proof, disproof, predictive power, et cetera. These ideas can be quantified using the Vapnik-Chernovenkis dimensionality and related machine-learning ideas. For more on this, see reference 13.
As a much simpler example, a polynomial with N adjustable coefficients has a VC dimensionality of at most N ... and you know that the coefficients are well determined if you have N or more data points in general position.
This sine-wave example calls attention to the fact that the family of fitting functions we are using (sine waves with adjustable amplitude, frequency, and phase) has an infinite VC dimensionality, even though there are only three adjustable parameters. We see that three data points – or even a couple dozen data points – are nowhere near sufficient to pin down these three parameters. This tells us that VC dimensionality is the important concept, and “number of parameters” is only an approximate concept, sometimes valid but definitely not always.
Another example of what can go wrong is shown in figure 3. The black curve represents the raw data. We have lots and lots of data points, with very high precision. We know a priori that the area under the black curve is the sum of two rectangles – a red rectangle and a blue rectangle. All we need to do is a simple fit, to determine the height, width, and center of the two rectangles. As you can see from the figure, there are two equally good solutions. There are two equally perfect fits. Alas, this leaves us with very considerable uncertainty about the area, width, and center of the blue rectangle.
Some problems in this category can be solved by introducing some sort of regularizer, as discussed in reference 11.
Additional examples to show how easy it is for people to fool themselves into “knowing” that they have “the” answer (when in fact they have not considered all the possibilities) can be found in reference 14.
| The school experience, especially the testing experience, gives many people the destructive idea that if it takes more then 45 seconds to solve a problem, they should give up. In the real world, you don’t get 40 questions in 30 minutes. That’s off by multiple orders of magnitude. More commonly you get 4 questions in 300 minutes, or something even beyond that. Therefore you must learn not to give up too soon. | At some point you should give up. You don’t want to spend the rest of your life stuck on some problem that you can’t solve. If you don’t want to give up entirely, you can set the problem aside temporarily, and return to it later, after you have acquired more knowledge and skill. |
| If you give up on the main goal you are admitting defeat. Many people are too quick to give up on the main goal. | Many problems require exploring the possibilities. That involves choosing tentative, hypothetical sub-goals. If such a hypothesis doesn’t work out satisfactorily, you need to backtrack and redo the analysis, choosing the next item from the list of hypotheses. Many people are too slow to give up on an untenable hypothesis (and therefore too slow to begin consideration of alternative hypotheses). |
The process of exploring the hypotheses can often be formalized as a search tree. Many chess problems involve search trees.
Indeed, sometimes solving a small instance of the problem puts you in a position to solve all larger instances by induction.
Any discussion of critical thinking must necessarily cover much of the same ground as a discussion of scientific methods. See reference 15.
Reference 16 explains how a scaling argument based on figure 4 can be used to figure out the formula for the area of an ellipse.
This leaves us with multiple ways of figuring out the area of an ellipse: You could just plain remember the formula from high-school geometry, and/or you could look it up, and/or you could easily reconstruct it whenever it is needed.
I know some people who have quite bad memory who are successful physicists. They carefully remember a few fundamental facts, and rederive everything else on an as-needed basis. For example, with a little practice, you can rederive the formula for the area of an ellipse faster than most people can recall it from memory (and with less probability of error).
It may be that some people develop extra-sharp thinking skills as a way of compensating for bad memory ... in analogy to the way that blind persons often develop extra-sharp hearing skills. However, I am not going to recommend bad memory any more than I would recommend blindness. Memory is a valuable skill. Obviously it is best to have a good memory and good thinking skills.
Feynman said that knowledge is like a grand tapestry. A forgotten fact is like a hole in the tapestry. You should be able to repair the hole in several different ways, by reweaving down from the top, or up from the bottom, or in from the sides. Any important fact can be rederived in numerous ways, because our knowledge has numerous interconnections.
Therefore: You should practice rederiving things. Even if it is something that you remember, rederive it anyway. This provides multiple advantages: First, it serves as a cross-check on your memory. Secondly, it builds up your thinking skills. Thirdly, it improves your understanding and recall of facts related to the one you are looking for, by exercising the all-important connections between facts.
Remember that any important formula should be derivable in multiple different ways, so if you derived it one way last time, try to derive it another way next time.
Some things can’t be derived, so you just have to remember them.
Conversely, some things can’t be remembered, so you just have to figure them out. In particular, if/when you visit unexplored territory, it is nice to be able to derive new formulas on the spot. It is a really good feeling to know that even though you are in unexplored territory, you are not lost. Based on your good thinking skills, you can move around more freely than most people do in familiar territory.
In contrast, the guy who tries to get by on memory alone, to the neglect of good thinking skills, will get seriously stuck as soon as he sets foot in unexplored territory, because the facts he needs are nowhere in his memory.
Once upon a time, there was a sophomore who heard that fruits and vegetables are good for you. So he ate nothing but apples and celery for three months. Then he died.
Some members of the community reacted by saying “Apples are corruption! Celery is emblematic of everything that is wrong with society today! We must destroy all fruits and vegetables immediately!”
I beg to differ. I still think fruits and vegetables are good for you. I don’t think the problem was what the guy ate ... the problem was what they guy didn’t eat.
Let’s turn our attention now to algorithms and mnemonics.
I get really tired of that.
My point is that algorithms / mnemonics / equations / procedures / formalisms / methods are good for you. Really they are. If a student has some of those tools but lacks a gut feeling for how things work, the problem is not what the student has ... the problem is what the student doesn’t have.
Everyone needs a balanced diet. That is, everyone needs gut feelings and formalism.
Real understanding is represented by point B, in the upper-right corner, where there is a high level of feeling for the subject backed up by a high level of rigor.
As indicated by the red and blue arrows, you don’t get to the goal in one step. You start out with a little bit of feeling and a little bit of formalism. They reinforce each other and provide a foundation for the next step. The red leverages the blue and the blue leverages the red. And so you itsy-bitsy-spider your way up and over toward point B.
Let’s be clear:
The problem is not what the students have; the problem is what they don’t have. They don’t have a feeling for the subject.
This situation is represented by point D in figure 5. It sometimes goes by the name “rigor mortis”, which is a pretty good name for rigor without feeling.
This manifests itself in many ways. As an example, sometimes people sling buzzwords around without any real understanding. If they had checked their feelings against the theory, they would have known their feelings were nonsense.
Many additional examples are classified under the educationalese term “negative transference”. That means your gut feeling based on experience in one domain might give you the wrong answer when applied in another domain.
I’m not saying that gut feelings are bad. I’m saying that gut feelings have to be checked against the facts.
Red Queen: “Why, sometimes I’ve believed as many as six impossible things before breakfast.”
— Lewis Carroll
Also, I’m saying that sometimes having some sophistication gives you useful information about the limits of validity of your gut feelings.
Lady Thiang: “This is a man who thinks with his heart, His heart is not always wise.”
— Oscar Hammerstein
This sheds some light on the so-called “new math” and its relationship to “old math”, which has remained an unsettled issue since the 1960s. (If you’re interested in the history of this, reference 19 is a reasonably informative, non-hysterical, non-polemical news article.) This issue is commonly referred to as the “Math Wars” but I don’t like to use that term. The warlike aspects are a discredit to everyone involved. The sensible approach is to use smart, efficient algorithms1 and to understand the principles involved.
It’s true that you can memorize algorithms. But what’s wrong with that? As mentioned in section 9, I don’t recommend doing away with memory, for the same reason I don’t recommend blindness. Memory is not the opposite of thought, nor the enemy of thought. Using an algorithm is not necessarily the non-thoughtful approach; usually it is the most thoughtful approach. Algorithms are like tools. When I tighten a bolt, I use a wrench. That does not make me any less skillful than the guy who tries to tighten the bolt with his bare hands. I’m allowed to use the wrench, even though I didn’t invent it or even manufacture it.
Continuing that thought: There have many occasions where I did invent and construct a specialized wrench or other tool to solve a specialized problem. Building custom tools and jigs requires an investment, but often this approach pays off handsomely, leading to overall faster and better results, compared to the brute-force head-on approach.
It is always possible to learn an algorithm in a mindless way, and to practice the algorithm by rote. That’s unsurprising, because any tool can be abused. Similarly equations can be abused by students who plug and chug, without any thought as to what the symbols mean. However:
| You should never use “equation” as a synonym for plug-and-chug. You should never use “algorithm” as a synonym for mindless. You should never use “systematic” as a synonym for rote. | If you mean rote, say “rote”. If you mean mindless, say “mindless”. If you mean plug-and-chug, say “plug-and-chug”. |
| Having a tool does not oblige you to abuse the tool. | You must not blame the presence of one tool for the absence of another. |
There are of course good tools and bad tools, just as there are good approximations and bad approximations. It’s your job to ascertain which is which. This requires judgement. As an example: If you want a numerical solution to a system of N linear equations in N unknowns, Cramer’s rule really terrible compared to Gaussian elimination. It is much more laborious, and it is numerically unstable. See reference 20.
We should also say a few words about crutches:
| Sometimes there is a legitimate need for a crutch. That can happen if somebody has a broken leg .... after you have taken direct action to treat the underlying malady and provided the user has been briefed on the correct usage and limitations of the crutch. | On the other hand, crutches can actually cause secondary injuries, especially if overused or abused. For a person with normal abilities, a crutch is worse than useless. It gets in the way, and hinders development of normal performance. |
So ... there are upsides and downsides to crutches. We should not over-react to the upsides or the downsides. I’ve seen some algorithms – such as the infamous “density triangle” – that should be categorized as crutches. They may be useful in some rare, temporary situations, but otherwise are worse than useless.
If you see somebody using a crutch that is not really needed, it is a good idea to wean them off the crutch, sooner rather than later.
Last but not least: The right answer depends on the background and developmental level of the student. If a five year old kid asks “how does this flashlight work”, he does not want a lecture on the chemistry of batteries or the physics of LEDs. A more appropriate answer would be something purely operational, such as “you need to twist it, like so.”
If the student actually wants a more detailed answer, he can always ask a more detailed question.
Consider the following scenario: I pose the “Mississippi Flow” problem to two different people who have nominally similar educational backgrounds and experience.
| The usual case is that I work with the person for 45 minutes, telling them “don’t give up” and “if you need to know that, figure it out” ... and giving a series of hints. At the end of this time, they have a solution. They realize in retrospect that in principle they could have solved the problem, in the sense that they knew everything necessary to permit a solution. At the same time, they realize that in practice they could never have found the solution on their own, because they would not have been able to organize their thinking in such a way as to call attention to the relevant facts. | In a substantial minority of the cases, the person can solve the puzzle very very quickly. They outline the method of solution in about four seconds, and then take another few seconds to carry out the required multiplications. |
The fact that proficiency with this sort of problem-solving is so unevenly distributed makes this sort of problem difficult to discuss in a classroom situation. The class as a whole, working as a team, can solve the problem relatively quickly, but that defeats one of the major purposes, namely giving each person experience racking their brain to find and organize the required bits of information. I don’t really know how to solve this problem. It would be ideal to spend 45 minutes with each student one-on-one, going over this puzzle, but that would be prohibitively expensive in a typical school setting.
Similar considerations apply to homework. If the purpose of the exercise is to get experience racking one’s brain, the purpose is defeated if students google the solution, or get the solution from a classmate. This problem cannot be prevented, but it can be fairly well controlled, as follows: You can separate the sheep from the goats by assigning a modified version of the puzzle on a closed-book in-class quiz. Someone who understands the method of solution will be able to solve the modified version instantly, whereas someone who merely copied the solution will not. (I don’t know of any suitable modifications of the “Mississippi Flow” problem, but others such as the “Who Owns the Fish” problem are readily modifiable.)
Let us return to the question of what is a puzzle. Consider the contrast:
| Many puzzles have the unfortunate property that even if you solve the puzzle, it’s still just a puzzle. The reward for solving it is trivial, artificial, or very indirect. Most homework problems are in this catetory; that is, the teacher already knows the right answer, and is not going to make any life-or-death decisions based on the student’s answer. | In many real-world situations, there is a lot riding on the question. It may truly be a life-and-death decision. |
| As my friend Larry says: If it’s not worth doing, it’s not worth doing right. | If it’s really worth doing, it’s worth double-checking to make sure you did it right. |
Consider someone who is learning to ride a bike. Why are they doing it? They typically are not doing it for the challenge; they are not doing it because the learning process is difficult. They are doing it because being able to ride a bike will empower them to go places and do things they could not do otherwise.
Consider the following four scenarios:
| Problem A is hard, and the solution is worth $10.00. | Problem B is hard, and the solution is worth $100.00. |
| Problem C is easy, and the solution is worth $10.00. | Problem D is easy, and the solution is worth $100.00. |
Given the choice, I would prefer problem B over problem A every time. That is, we should not value puzzles because they are hard; instead we should value puzzles if and when the answer is important. Homework problems have indirect value if (and only if) they teach skills that will have direct value later.
It is also true that given the choice, I would prefer problem C over problem A. Easy problems are preferable to hard problems, other things being equal.
Of course problem D is the most preferable of all.
More generally, I need to do a cost/benefit analysis. Given the choice between an easy, low-value problem and a hard, high-value problem, a tradeoff must be made. Making wise tradeoffs requires analysis and judgement.
In any case, we need to maintain a clear understanding of what is primary versus what is secondary, what is directly valuable versus what is only indirectly valuable, and what is real versus what is artificial.
Therefore do not get carried away with doing puzzles for the sake of doing puzzles. Choose puzzles that cultivate some useful general skill. Explicitly discuss what skills are being taught, and why. (See section 7 for some basic thoughts about this.)
The idea is neither to work harder, nor to work less hard. The idea is to get more done, by being clever. Things that formerly seemed difficult become easy once you know how. Above all, you should learn to solve important problems.
For more on this, see reference 21.
Some of these are interesting because they have more than one answer, i.e. the solution-set is not a singleton. Others are interesting because even though there is only one final answer, there are multiple methods of solution.
I have a quantity x such that x2=81. Please tell me the value of x. How do you know? How sure are you?
Arrange nine dots in three rows of three:
| • | • | • | ||
| • | • | • | ||
| • | • | • |
The task is to draw a path consisting of four straight contiguous line segments, such that the path goes through all of the dots.
Please give me an estimate of how much Mississippi River water flows past New Orleans in a year. This is a closed-book question; don’t look anything up; figure it out.
You start out at point A. You travel strictly south for one mile. You then make a right-angle turn and travel strictly east for one mile. You then make another right-angle turn and travel strictly north for one mile. It turns out that you are now back at point A. So, please tell me, where is point A? How do you know? How sure are you?
Note: For present purposes, we approximate the earth as being perfectly spherical. Point A is on the surface, and all travel takes place along the surface.