Thermodynamics is celebrated for its power, generality, and elegance. However, all too often, students are taught some sort of pseudo-thermodynamics that is infamously confusing, limited, and ugly. This document is an attempt to do better, i.e. to present the main ideas in a clean, simple, modern way.
| The first law of thermodynamics is usually stated in a very unwise form. | We will see how to remedy this. |
| The second law is usually stated in a very unwise form. | We will see how to remedy this, too. |
| The so-called third law is a complete loser. It is beyond repair. | We will see that we can live without it just fine. |
| Many of the basic concepts and terminology (including heat, work, adiabatic, etc.) are usually given multiple mutually-inconsistent definitions. | We will see how to avoid the inconsistencies. |
Many people remember the conventional “laws” of thermodynamics by reference to the following joke:1
It is not optimal to formulate thermodynamics in terms of a short list of enumerated laws, but if you insist on having such a list, here it is, modernized and clarified as much as possible:
| The zeroth law of thermodynamics tries to tell us that certain thermodynamical notions such as “temperature”, “equilibrium”, and “macroscopic state” make sense. | Sometimes these make sense, to a useful approximation … but not always. See section 3. |
| The first law of thermodynamics states that energy obeys a local conservation law. | This is true and important. See section 1.4. |
| The second law of thermodynamics states that entropy obeys a local law of paraconservation. | This is true and important. See section 2. |
| There is no third law of thermodynamics. | The conventional so-called third law alleges that the entropy of some things goes to zero as temperature goes to zero. This is never true, except perhaps in a few extraordinary, carefully-engineered situations. It is never important. See section 4. |
To summarize the situation, we have two laws (#1 and #2) that are very powerful, reliable, and important (but often misstated and/or conflated with other notions) plus a grab-bag of many lesser laws that may or may not be important and indeed are not always true (although sometimes you can make them true by suitable engineering). What’s worse, there are many essential ideas that are not even hinted at in the aforementioned list, as discussed in section 5.
We will not confine our discussion to some small number of axiomatic “laws”. We will carefully formulate a first law and a second law, but will leave numerous other ideas un-numbered. The rationale for this is discussed in section 6.6.
This document is also available in PDF format. You may find this advantageous if your browser has trouble displaying standard HTML math symbols.
This section provides an overview, and indicates how our approach stands in relationship to other treatments of the subject.
In this section we will not explain the ideas, but only mention the ideas that will be explained later. If you want to go directly to the actual explanations, feel free to skip this section.
Most of the fallacies you see in thermo books are pernicious precisely because they are not absurd. They work OK some of the time, especially in simple “textbook” situations … but alas they do not work in general.
The main goal here is to formulate the subject in a way that is less restricted and less deceptive. This makes it vastly more reliable in real-world situations, and forms a foundation for further learning.
In some cases, key ideas can be reformulated so that they work just as well – and just as easily – in simple situations, while working vastly better in more-general situations. In other cases, we must be content with less-than-general results, but we will make them less deceptive by clarifying their limits of validity.
Alas there are some other ideas such as “heat content” that are attractive in the context of cramped thermodynamics but extremely deceptive if you try to extend them to uncramped situations.
We do not define entropy in terms of energy, nor vice versa. We do not define either of them in terms of temperature. Entropy and energy are well defined even in situations where the temperature is zero, unknown, or undefinable.
The usual math-textbook treatment of partial derivatives is dreadful. The standard notation for partial derivatives practically invites misinterpretation. A way of understanding and visualizing partial derivatives can be found in reference 1.
Uncramped thermodynamics is particularly intolerant of sloppiness, partly because it is so multi-dimensional, and partly because there is no notion of length or angle in thermodynamic parameter-space. In particular, the idea that “variables not mentioned are held constant” is a bad idea in general, as discussed in reference 2, and is particularly pernicious when applied to uncramped thermodynamics. Unfortunately, some thermo books are sloppy in the places where sloppiness is least tolerable.
Some fraction of this mess can be cleaned up just by being careful and not taking shortcuts. Even more of the mess can be cleaned up using differential forms, i.e. exterior derivatives and such, as discussed in reference 3. This raises the price of admission somewhat, but not by much, and it’s worth it. Some expressions that seem mysterious in the usual textbook presentation become obviously correct, easy to interpret, and indeed easy to visualize when re-interpreted in terms of gradient vectors. On the other edge of the same sword, some other mysterious expressions are easily seen to be unreliable and highly deceptive.
The term “inexact differential” is sometimes used in this connection, but it is something of a misnomer. You must treat path-dependent integrals as path-dependent integrals, not as potentials, i.e. not as functions of state. See section 17 for more on this.
To say the same thing another way, we will not express the first law as dE = dW + dQ or anything like that, even though it is traditional in some quarters to do so. For starters, although such an equation is meaningful within the narrow context of cramped thermodynamics, it is provably not meaningful for uncramped thermodynamics, as discussed in section 6.8 and section 17. It is provably impossible for there to be any W and/or Q that satisfy such an equation when thermodynamic cycles are involved.
Even in cramped situations where it might be possible to split E (and/or dE) into a thermal part and a non-thermal part, it is often unnecessary to do so. Often it works just as well (or better!) to use the unsplit energy, making a direct appeal to equation 3.
Heat remains central to unsophisticated cramped thermodynamics, but the modern approach to uncramped thermodynamics focuses more on energy and entropy. Energy and entropy are always well defined, even in cases where heat is not.
You can do thermodynamics without heat. You can even do quite a bit of thermodynamics without temperature. But you can’t do thermodynamics without energy and entropy.
There are multiple well-established mutually-inconsistent definitions of “heat”, as discussed in section 15.1. (This is wildly different from the situation with, say, entropy, where there is really only one idea, even if there are multiple ways of looking it.) There is no consensus as to “the” definition of heat, and no prospect of achieving consensus anytime soon. There is no need to achieve consensus about “heat”, because we already have consensus about entropy and energy, and that suffices quite nicely. Asking students to recite “the” definition of heat is worse than useless; it rewards rote regurgitation and punishes actual understanding of the subject.
It is more important to understand energy than to define energy. We can and will define it (section 1.2), but the definition is not super-simple nor super-concise. The concept of energy is so fundamental that there is no point in looking for a concise definition in terms of anything more fundamental.
To say the same thing in slightly different words, we can achieve more understanding by focussing on what energy does, rather than worrying too much about what energy is.
The most important thing about energy is its role in the law of conservation of energy, as discussed in section 1.4. In an introductory course, there is no point in talking about energy except in connection with conservation of energy.
Let’s start with some examples. Some well-understood examples of energy include the following:
| (1) |
In particular, if you need a starting-point for your understanding of energy, visualize a book on a high shelf. It has more energy than it would on a low shelf. Similarly a fast-moving book has more energy than it would at a lower speed.
The idea of conservation per se is well defined, as discussed in detail in reference 4. We use this as the second step in a recursive definition of energy. That is:
This concludes our definition of energy.
The definition of energy (section 1.2) is recursive. That means we can pull our understanding of energy up by the bootstraps. We can identify new forms of energy as they come along, because they contribute to the conservation law in the same way as the already-known examples. This is the same basic idea as in reference 5.
Recursive is not the same as circular. A circular argument would be fallacious and useless ... but there are many examples of correct, well-accepted definitions that are recursive. Recursion is very commonly used in mathematics and computer science. For example, it is correct and convenient to define the factorial function so that
| (2) |
As a more sophisticated example, have you ever wondered how mathematicians define the concept of integers? One very common approach is to define the positive integers via the Peano axioms. The details aren’t important, but the interesting point is that these axioms provide a recursive definition … not circular, just recursive. This is a precise, rigorous, formal definition.
This allows us to make another point: There are a lot of people who are able to count, even though they are not able to provide a concise definition of “integer” – and certainly not able to provide a non-recursive definition. By the same token, there are lots of people who have a rock-solid understanding of how energy behaves, even though they are not able to give a consise and/or non-recursive definition of “energy”.
Energy is somewhat abstract. There is no getting around that. You just have to get used to it – by accumulating experience, seeing how energy behaves in various situations. As abstractions go, energy is one of the easiest to understand, because it is so precise and well-behaved.
Tangential remark: The introductory examples of energy itemized in section 1.2 are only approximate, and are subject to various limitations. For example, the formula m g h is exceedingly accurate over laboratory lengthscales, but is not valid over cosmological lengthscales. Similarly the formula ½ m v2 is exceedingly accurate when speeds are small compared to the speed of light, but not otherwise. These limitations do not interfere with our efforts to understand energy.
In non-relativistic physics, energy is a scalar. That means it is not associated with any direction in space. However, in special relativity, energy is not a Lorentz scalar; instead it is recognized as one component of the [energy, momentum] 4-vector, such that energy is associated with the timelike direction. For more on this, see reference 6. To say the same thing in other words, the energy is invariant with respect to spacelike rotations, but not invariant with respect to boosts.
We will denote the energy by E. We will denote various sub-categories of energy by putting subscripts on the E, unless the context makes subscripts unnecessary. Sometimes it is convenient to use U instead of E to denote energy, especially in situations where we want to use E to denote the electric field. Some thermodynamics books state the first law in terms of U, but it means the same thing as E. We will use E throughout this document.
Beware of attaching qualifiers to the concept of energy. Note the following contrast:
| The symbol E (or U) denotes “the” energy of the system we are considering. If you feel obliged to attach some sort of additional words, you can call E the “system” energy or the “plain old” energy. This doesn’t change the meaning. | Most other qualifiers change the meaning. There is an important conceptual point here: “The” energy is conserved, but the various sub-categories of energy are not separately conserved. For example: The “internal” energy is not necessarily conserved, as discussed in section 13.1. Similarly, the “available” energy is not necessarily conserved, as discussed in section 1.5. |
Associated with the foregoing conceptual point there is point of terminology: E (or U) does not denote “internal” energy. It does not denote “available” energy.
Note: If you want to calculate the total energy of the system by summing the various categories of energy, beware that the categories overlap, so you need to be super-careful not to double count any of the contributions. For example, if you have a macroscopic notion of “thermal energy” and also understand “thermal energy” in terms of microscopic kinetic and potential energy, you must count either the macroscopic or microscopic description, not both. Another example that illustrates the same point concerns the rest energy, E0, which is related to mass via Einstein’s equation2 E0=mc2. You can describe the binding energy of a particle in terms of its internal kinetic energy and potential energy, or in terms of the mass deficit, but you must not add both descriptions together; that would be double-counting.
The first law of thermodynamics states that energy obeys a local conservation law.
By this we mean something very specific:
Any decrease in the amount of energy in a given region of space must be exactly balanced by a simultaneous increase in the amount of energy in an adjacent region of space.
Note the adjectives “simultaneous” and “adjacent”. The laws of physics do not permit energy to disappear now and reappear later. Similarly the laws do not permit energy to disappear from here and reappear at some distant place. Energy is conserved right here, right now.
It is usually possible3 to observe and measure the physical processes whereby energy is transported from one region to the next. This allows us to express the energy-conservation law as an equation:
| change(energy inside boundary) = − flow(energy, outward across boundary) (3) |
The word “flow” in this expression has the same meaning as it has in everyday life. See reference 4 for the details on this.
There is also a global law of conservation of energy: The total energy in the universe cannot change. The local law implies the global law but not conversely. The global law is interesting, but not nearly as useful as the local law, for the following reason: suppose I were to observe that some energy has vanished from my laboratory. It would do me no good to have a global law that asserts that a corresponding amount of energy has appeared “somewhere” else in the universe. There is no way of checking that assertion, so I would not know and not care whether energy was being globally conserved.4 Also there is would be very hard to reconcile a non-local law with the requirements of special relativity.
As discussed in reference 4, there is an important distinction between the notion of conservation and the notion of constancy. Local conservation of energy says that the energy in a region is constant except insofar as energy flows across the boundary.
Non-experts sometimes try to define energy as “the capacity to do work”. This notion of “available energy” is useful for some purposes, as discussed in section 1.6, but it would be a terrible mistake to confuse “available energy” with the real physical energy. Alas, this mistake is very common. See section 12.5 for additional discussion of this point.
Any attempt to define energy in terms of “capacity to do work” would be inconsistent with thermodynamics, as we see from the following examples:
| #1: Consider an isolated system containing a hot potato, a cold potato, a tiny heat engine, and nothing else, as illustrated in figure 2. This system has some energy and some ability to do work. | #2: Contrast that with a system that is just the same, but instead of a hot potato and a cold potato, it has two hot potatoes. |
The second system has more energy but less ability to do work.
This sheds an interesting side-light on the energy-conservation law. The law, by itself, does not tell you what will happen; it only tells you what cannot happen: you cannot have any process that fails to conserve energy. To say the same thing another way: if something is prohibited by the energy-conservation law, the prohibition is absolute, whereas if something is permitted by the energy-conservation law, the permission is conditional, conditioned on compliance with all the other laws of physics. In particular, as discussed in section 7.2, you can freely convert all the “non-thermal” energy of two rapidly-spinning flywheels to microscopic “thermal” energy, but not the reverse. The reverse would be perfectly consistent with energy conservation, but is forbidden on other grounds (namely the second law of thermodynamics, as discussed in section 2).
Let’s be clear: work can be converted to any other form of energy, but the converse is not true; not every form of energy can be used to do work. Equating energy with doable work is just not correct. (If you speak of energy that might have been done by work, then at least you’re on the correct side of the inequality in the second law of thermodynamics.)
Some people wonder whether the example given above (the two-potato engine) is invalid because it involves closed systems, not interacting with the surrounding environment. Well, the example is perfectly valid, but to clarify the point we can consider another example (due to Logan McCarty):
| #1: Consider a system consisting of a room-temperature potato, a cold potato, and a tiny heat engine. This system has some energy and some ability to do work. | #2: Contrast that with a system that is just the same, but except that it has two room-temperature potatoes. |
The second system has more energy but less ability to do work in the ordinary room-temperature environment.
In some impractical theoretical sense, you might be able to define the energy of a system as the amount of work the system would be able to do if it were in contact with an unlimited heat-sink at nearly zero temperature (arbitrarily close to absolute zero). That’s quite impractical because no such heat-sink is available. If it were available, many of the basic ideas of thermodynamics would become irrelevant.
As yet another example, consider the sytem shown in figure 3. The boundary of the “system” is shown as a dashed line. The system is thermally insulated from its surroundings. The system contains a battery, a moderately-high-resistance resistor, a motor, and a switch. The motor drives a thermally-insulated shaft, so that the system can do mechanical work on its surroundings.
| By closing the switch, we can get the system to perform work on its surroundings by means of the shaft. | On the other hand, if we just wait a moderately long time, the resistor will discharge the battery. This does not change the system’s energy (i.e. the energy within the boundary of the system) … but it greatly decreases the capacity to do work. |
To remove any vestige of ambiguity, imagine that the system was initially far below ambient temperature, so that the Joule heating in the resistor brings the system closer to ambient temperature. (See reference 7 for Joule’s classic paper on electrical heating.)
To repeat: In real-world situations, energy is not the same as “available energy” i.e. the capacity to do work.
What’s worse, any measure of “available” energy is not a function of state. Consider again the two-potato system shown in figure 2. Suppose you know the state of potato #1, including its energy E1, its temperature T1, its entropy S1, its mass m1, its volume V1, its free energy F1, and its free enthalpy G1. That all makes sense so far, because those are all functions of state, determined by the state of potato #1. Alas you don’t know what fraction of that energy should be considered thermodynamically “available” energy, and you can’t figure it out from the properties of object #1. In order to figure it out, you would need to know the properties of potato #2 as well.
Every beginnner wishes there to be a state function to specify the “available energy” content of a system. Alas, wishing does not make it so. No such state function can possibly exist.
Also keep in mind that the law of conservation of energy applies to the real energy, not to the “available” energy.
Beware that the misdefinition of energy in terms of “ability to do work” is extremely common. This misdefinition is all the more pernicious because it works OK in simple non-thermodynamical situations. Many people learn this misdefinition, and some of them have a hard time unlearning it.
There is only one scientific meaning for the term energy. For all practical purposes, there is complete agreement among physicists as to what energy is. (This stands in dramatic contrast to other terms – such as “heat” – that have a confusing multiplicity of technical meanings, on top of innumerable nontechnical meanings; see section 15.1 for more discussion of this point.)
The same goes for the term conservation. There is essentially only one technical meaning of conservation.
However, we run into trouble when we consider the vernacular meanings:
Therefore the simple phrase “energy conservation” is practically begging to be misunderstood. You can easily have two profound misconceptions in a simple two-word phrase.
For example, you may have seen a placard that says “Please Conserve Energy by turning off the lights when you leave” or something similar. Let’s be absolutely clear: the placard is using vernacular notions of “conservation” and “energy” that are grossly inconsistent with the technical notion of conservation of energy (as expressed by equation 3).
The vernacular notion of “energy” is only loosely defined. Often it seems to correspond, more-or-less, with either the Gibbs free enthalpy, G (as defined in section 12.4), or the some notion of “available energy” (as discussed in section 1.5 and section 12.5), or some other notion of low-entropy energy.
The vernacular notion of “conservation” means saving, preserving, not wasting, not dissipating. It definitely is not equivalent to equation 3, because it is applied to G, and to wildlife, and to other things that are not, in the technical sense, conserved quantities.
Combining these two notions, we see that when the placard says “Please Conserve Energy” it is nontrivial to translate that into technical terms.
At some schools, the students have found it amusing to add appropriate “translations” or “corrections” to such placards. The possibilities include:
The third version is far and away the most precise, and the most amenable to a quantitative interpretation. We see that the placard wasn’t really talking about energy at all, but about entropy instead.
The law of conservation of energy has been tested and found 100% reliable for all practical purposes, and quite a broad range of impractical purposes besides.
Of course everything has limits. It is not necessary for you to have a very precise notion of the limits of validity of the law of conservation of energy; that is a topic of interest only to a small community of specialists. The purpose of this section is merely to indicate, in general terms, just how remote the limits are from everyday life.
If you aren’t interested in details, feel free to skip this section.
Here’s the situation:
The second law states that entropy obeys a local paraconservation law. That is, entropy is “nearly” conserved.
By that we mean something very specific:
| change(entropy inside boundary) >= − flow(entropy, outward across boundary) (4) |
The structure and meaning of equation 4 is very similar to equation 3, except that it has an inequality instead of an equality. It tells us that the entropy in a given region can increase, but it cannot decrease except by flowing into adjacent regions.
As usual, the local law implies a corresponding global law, but not conversely; see the discussion at the end of section 1.2.
Entropy is absolutely essential to thermodynamics … just as essential as energy.
Entropy is defined in terms of statistics, as we will discuss in a moment. In some situations, there are important connections between entropy, energy, and temperature … but these do not define entropy. The first law (energy) and the second law (entropy) are logically independent. Entropy is well defined even when the temperature is unknown, undefinable, irrelevant, or zero.5 This is true and important.
Entropy is related to information. Essentially it is the opposite of information, as we see from the following scenarios.
As shown in figure 4, suppose we have three blocks and five cups on a table.
To illustrate the idea of entropy, let’s play the following game: Phase 0 is the preliminary phase of the game. During phase 0, the dealer hides the blocks under the cups however he likes (randomly or otherwise) and optionally makes an announcement about what he has done. As suggested in the figure, the cups are transparent, so the dealer knows the exact microstate at all times. However, the whole array is behind a screen, so the rest of us don’t know anything except what we’re told.
Phase 1 is the main phase of the game. During phase 1, we are required to ascertain the position of each of the blocks. Since in this version of the game, there are five cups and three blocks, the answer can be written as a three-symbol string, such as 122, where the first symbol identifies the cup containing the red block, the second symbol identifies the cup containing the black block, and the third symbol identifies the cup containing the blue block. Each symbol is in the range zero through four inclusive. There are 53 = 125 such strings. (More generally, in a version where there are N cups and B blocks, there are NB possible states.)
We cannot see what’s inside the cups, but we are allowed to ask yes/no questions, whereupon the dealer will answer. Our score in the game is determined by the number of questions we ask; each question contributes one bit to our score. Our objective is to finish the game with the lowest possible score.
To calculate what our score will be, we don’t need to know anything about energy; all we have to do is count states (specifically, the number of microstates consistent with what we know about the situation). States are states; they are not energy states.
If you wish to make this sound more thermodynamical, you can assume that the table is horizontal, and the blocks are non-interacting, so that all possible configurations have the same energy. But really, it is easier to just say over a wide range of energies, energy has got nothing to do with this game.
The point of all this is that we define the entropy of a given situation according to the number of questions we have to ask to finish the game, starting from the given situation. Each yes/no question contributes one bit to the entropy.
The central, crucial idea of entropy is that it measures how much we don’t know about the situation. Entropy is not knowing.
Here is a card game that illustrates the same points as the cup game. The only important difference is the size of the state space: roughly eighty million million million million million million million million million million million states, rather than 125 states. That is, when we move from 5 cups to 52 cards, the state space gets bigger by a factor of 1066 or so.
Consider a deck of 52 playing cards. By re-ordering the deck, it is possible to create a large number (52 factorial) of different configurations. (For present purposes we choose not to flip or rotate the cards, just re-order them. Also, for simplicity, we assume the number of cards is fixed at 52, although this restriction could easily be lifted.)
Phase 0 is the preliminary phase of the game. During phase 0, the dealer prepares the deck in a configuration of his choosing, using any combination of deterministic and/or random procedures. He then sets the deck on the table. Finally he makes zero or more announcements about the configuration of the deck.
Phase 1 is the main phase of the game. During phase 1, our task is to fully describe the configuration, i.e. to determine which card is on top, which card is second, et cetera. We cannot look at the cards, but we can ask yes/no questions of the dealer. Each such question contributes one bit to our score. Our objective is to ask as few questions as possible. As we shall see, our score is a measure of the entropy.
One configuration of the card deck corresponds to one microstate. The microstate does not change during phase 1.
The macrostate is the ensemble of microstates consistent with what we know about the situation.
At this point we know that the deck is in some microstate, and the microstate is not changing … but we don’t know which microstate. It would be foolish to pretend we know something we don’t. If we’re going to bet on what happens next, we should calculate our odds based on the ensemble of possibilities, i.e. based on the macrostate.
Our best strategy is as follows: By asking six well-chosen questions, we can find out which card is on top. We can then easily describe every detail of the configuration. Our score is six bits.
This illustrates that the entropy is a property of the ensemble, i.e. a property of the macrostate, not a property of the microstate. Cutting the deck the second time changed the microstate but did not change the macrostate.
Note that we are not depending on any special properties of the “reference” state. For simplicity, we could agree that our reference state is the factory-standard state (cards ordered according to suit and number), but any other agreed-upon state would work just as well. If we know deck is in Moe’s favorite state, we can easily rearrange it into Joe’s favorite state. Rearranging it from one known state to to another known state does not involve any entropy.
As a variation on the game described in section 2.3, consider what happens if, at the beginning of phase 1, we are allowed to peek at one of the cards.
In the case of the standard deck, example 1, this doesn’t tell us anything we didn’t already know, so the entropy remains unchanged.
In the case of the cut deck, example 3, this lowers our score by six bits, from six to zero.
In the case of the shuffled deck, example 6, this lowers our score by six bits, from 226 to 220.
The reason this is worth mentioning is because peeking can (and usually does) change the macrostate, but it cannot change the microstate. (This stands in contrast to cutting an already-cut deck or shuffling an already-shuffled deck, which changes the microstate but does not change the macrostate.)
To repeat: Obviously peeking does not change the microstate, but it can have a large effect on the macrostate. If you don’t think peeking changes the ensemble, I look forward to playing poker with you!
If you want to understand entropy, you must first have at least a modest understanding of basic probability. It’s a prerequisite, and there’s no way of getting around it. Anyone who knows about probability can learn about entropy. Anyone who doesn’t, can’t.
Our notion of entropy is completely dependent on having a notion of microstate, and on having a procedure for assigning a probability to each microstate.
In some special cases, the procedure involves little more than counting the “allowed” microstates, as discussed in section 7.7. This type of counting corresponds to a particularly simple, flat probability distribution, which may be a satisfactory approximation in special cases, but is definitely not adequate for the general case.
For simplicity, the cup game and the card game were arranged to embody a clear notion of microstate. That is, the rules of the game specified what situations would be considered the “same” microstate and what would be considered “different” microstates. Such games are a model that is directly and precisely applicable to physical systems where the physics is naturally discrete, such as systems involving only the nonclassical spin of elementary particles (such as the demagnetization refrigerator discussed in section 9.11).
For systems involving continuous variables such as position and momentum, counting the states is somewhat trickier. The correct procedure is discussed in section 10.2.
The point of all this is that the “score” in these games is an example of entropy. Specifically: at each point in the game, there are two numbers worth keeping track of: the number of questions we have already asked, and the number of questions we must ask to finish the game. The latter is what we call the the entropy of the situation at that point.
Remember that the macrostate is the ensemble of configurations consistent with what is known about the situation. The entropy is a property of the macrostate.
At each point during the game, the entropy is a property of the macrostate, not of the microstate. The system is in “some” microstate, but we don’t know which microstate, so all our decisions must be based on the macrostate.
The value we assign to the entropy depends on what we know about the situation, not what the dealer knows, or what anybody else knows. This makes the entropy somewhat context-dependent or even subjective. Some people find this irksome or even shocking, but it is real physics. For physical examples of context-dependent entropy, and a discussion, see section 10.7.
Note that entropy has been defined without reference to temperature and without reference to heat. Room temperature is equivalent to zero temperature for purposes of the cup game and the card game; theoretically there is “some” chance that thermal agitation will cause two of the cards to spontaneously hop up and exchange places during the game, but that is really, really negligible.
Non-experts often try to define entropy in terms of energy. This is a mistake. To calculate the entropy, I don’t need to know anything about energy; all I need to know is the probability of each relevant state. See section 2.6 for details on this.
Entropy is not defined in terms of energy, nor vice versa.
In some cases, there is a simple mapping that allows us to identify the ith microstate by means of its energy Ei. It is often convenient to exploit this mapping when it exists, but it does not always exist.
There is another group of non-experts who try to define entropy in terms of disorder. This is another mistake; it is perhaps all the more disruptive because it is in some ways “close” to the truth. We can agree that the number of disorderly states greatly exceeds the number of orderly states … so if all you know is that the system is not in an orderly state, you know the entropy is high. In contrast, though, the important point is that if you know the system is in some particular disorderly state, the entropy is zero. If you know what state the system is in, it doesn’t matter whether that state “looks” disorderly or not.
Loosely speaking, small disorder implies small entropy, but the converse does not hold, not even loosely.
Furthermore, there are additional reasons why the typical text-book illustration of a messy dorm room is not a good model of entropy. For starters, it provides no easy way to define and delimit the states. Even if we stipulate that the tidy state is unique, we still don’t know whether a shirt on the floor “here” is different from a shirt on the floor “there”. If we don’t know how many different disorderly states there are, we can’t quantify the entropy. (In contrast the games in section 2.2 and section 2.3 included a clear rule for defining and delimiting the states.)
As a related point: entropy is not a property of the microstate. Entropy is a property of the probability distribution as a whole. Entropy is defined as a weighted average over all microstates. Asking about the entropy of a particular microstate (disordered or otherwise) within the distribution is asking the wrong question. See section 2.7 for better ways of formulating the question.
There is a long-running holy war between those who try to define entropy in terms of energy, and those who try to define it in terms of disorder. This is based on a grotesquely false dichotomy: If entropy-as-energy is imperfect, then entropy-as-disorder must be perfect … or vice versa. I don’t know whether to laugh or cry when I see this. Actually, both versions are highly imperfect. You might get away with using one or the other in selected situations, but not in general.
The right way to define entropy is in terms of probability, we now discuss. (The various other notions can then be understood as special cases and/or approximations to the true entropy.)
The idea of entropy set forth in the preceding examples can be quantified quite precisely. Entropy is defined in terms of statistics.6 For any probability distribution P, we can define its entropy as:
| S[P] := |
| Pi log(1/Pi) (5) |
where the sum runs over all possible outcomes and Pi is the probability of the ith outcome. Here we write S[P] to make it explicit that S is a functional that depends on P. For example, if P is a conditional probability then S will be a conditional entropy. Beware that people commonly write simply S, leaving unstated the crucial dependence on P.
Subject to mild restrictions, we can apply this to physics as follows: Suppose the system is in a given macrostate, and the macrostate is well described by a distribution P, where Pi is the probability that the system is in the ith microstate. Then we can say S is the entropy “of the system”.
Expressions of this form date back to Boltzmann (reference 9 and reference 10) and to Gibbs (reference 11). The range of applicability was greatly expanded by Shannon (reference 12).
Beware that uncritical reliance on “the” observed microstate-by-microstate probabilities does not always give a full description of the macrostate, because the Pi might be correlated with something else (section 9.8) or amongst themselves (section 24). In such cases the unconditional entropy will be larger than the conditional entropy, and you have to decide which is/are physically relevant.
Equation 5 is the faithful workhorse formula for calculating the entropy. It ranks slightly below Equation 109, which is a more general way of expressing the same idea. It ranks above various less-general formulas that may be useful under more-restrictive conditions (as in section 7.7 for example). See section 20 and section 24 for more discussion of the relevance and range of validity of this expression.
In the games discussed above, it was convenient to measure entropy in bits, because I was asking yes/no questions. Other units are possible, as discussed in section 7.6.
Figure 5 shows the contribution to the entropy from one term in the sum in equation 5. Its maximum value is approximately 0.53 bits, attained when Pi=1/e.
Figure 6 shows the total entropy for a two-state system such as a coin. Here H represents the probability of the the “heads” state, which gives us one term in the sum. The “tails” state necessarily has probability (1−H) and that gives us the other term in the sum. The total entropy in this case is a symmetric function of H. Its maximum value is 1 bit, attained when H=½.
As discussed in section 7.6 the base of the logarithm in equation 5 is chosen according to what units you wish to use for measuring entropy. If you choose units of joules per kelvin (J/K), we can pull out a factor of Boltzmann’s constant and rewrite the equation as:
| S = −k |
| Pi lnPi (6) |
Entropy itself is conventionally represented by big S and is an extensive property, with rare peculiar exceptions as discussed in section 10.7. Molar entropy is conventionally represented by small s and is the corresponding intensive property.
Although it is often convenient to measure molar entropy in units of J/K/mol, other units are allowed, for the same reason that mileage is called mileage even when it is measured in metric units. In particular, sometimes additional insight is gained by measuring molar entropy in units of bits per particle. See section 7.6 for more discussion of units.
When discussing a chemical reaction using a formula such as
| 2 O3 → 3 O2 + Δ s (7) |
it is common to speak of “the entropy of the reaction” but properly it is “the molar entropy of the reaction” and should be written Δ s or Δ S/N (not Δ S). All the other terms in the formula are intensive, so the entropy-related term must be intensive also.
Of particular interest is the standard molar entropy, s0 or S0/N, measured at standard temperature and pressure. The entropy of a gas is strongly dependent on density, as mentioned in section 10.2.
If we have a system characterized by a probability distribution P, the surprise value of the ith state is given by
| $i := log(1/Pi) (8) |
By comparing this with equation 5, it is easy to see that the entropy is simply the appropriately-weighted average of the surprise value. In particular, it is the expected value of the surprise value.
Note the following contrast:
| Surprise value is a property of the state i. | Entropy is not a property of the state i; it is a property of the distribution P. |
This should make it obvious that entropy is not, by itself, the solution to all the world’s problems. Entropy measures a particular average property of the distribution. It is easy to find situations where other properties of the distribution are worth knowing.
There are a bunch of basic notions that are often lumped together and called the zeroth law of thermodynamics. These notions are incomparably less fundamental than the notion of energy (the first law) and entropy (the second law), so despite its name, the zeroth law doesn’t deserve priority.
Here are some oft-cited rules, and some comments on each.
| We can divide the world into some number of regions that are disjoint from each other. | If there are only two regions, some people like to call one of them “the” system and call the other “the” environment, but usually it is better to consider all regions on an equal footing. Regions are sometimes called systems and/or subsystems. Systems are sometimes called objects, especially when they are relatively simple. |
| There is such a thing as thermal equilibrium. | You must not assume that everything is in thermal equilibrium. Thermodynamics and indeed life itself depend on the fact that some regions are out of equilibrium with other regions. |
| There is such a thing as temperature. | There are innumerable important examples of systems that lack a well-defined temperature, such as the three-state laser discussed in section 9.4. |
| Whenever any two systems are in equilibrium with each other, they have the same temperature. See section 8. | This is true and important. (To be precise, we should say they have the same average temperature, since there will be fluctuations, which may be significant for very small systems.) |
| We can establish equilibrium within a system, and equilibrium between selected pairs of systems, without establishing equilibrium between all systems. | This is an entirely nontrivial statement. Sometimes it takes a good bit of engineering to keep some pairs near equilibrium and other pairs far from equilibrium. See section 9.12. |
| If/when we have established equilibrium within a system, a few variables suffice to entirely describe the thermodynamic state (i.e. macrostate) of the system.7 (See section 10.1 for a discussion of microstate versus macrostate.) |
This is an entirely nontrivial
statement, and to make it useful you have to be cagey about what
variables you choose; for instance,
|
As mentioned in the introduction, one sometimes hears the assertion that the entropy of a system must go to zero as the temperature goes to zero.
There is no theoretical basis for this assertion, so far as I know – just unsubstantiated opinion.
As for experimental evidence, I know of only one case where (if I work hard enough) I can make this statement true, while there are innumerable cases where it is not true:
Note: It is hard to measure the low-temperature entropy by means of elementary thermal measurements, because typically such measurements are insensitive to “spectator entropy” as discussed in section 10.5. So for typical classical thermodynamic purposes, it doesn’t matter whether the entropy goes to zero or not.
The previous sections have set forth the conventional laws of thermodynamics, cleaned up and modernized as much as possible.
At this point you may be asking, why do these laws call attention to conservation of energy, but not the other great conservation laws (momentum, electrical charge, lepton number, et cetera)? And for that matter, what about all the other physical laws, the ones that aren’t expressed as conservation laws? Well, you’re right, there are some quite silly inconsistencies here.
The fact of the matter is that in order to do thermo, you need to import a great deal of classical mechanics. You can think of this as the minus-oneth law of thermodynamics.
Sometimes the process of importing a classical idea into the world of thermodynamics is trivial, and sometimes not. For example:
| The law of conservation of momentum would be automatically valid if we applied it by breaking a complex object into its elementary components, applying the law to each component separately, and summing the various contributions. That’s fine, but nobody wants to do it that way. In the spirit of thermodynamics, we would prefer a macroscopic law. That is, we would like to be able to measure the overall mass of the object (M), measure its average velocity (V), and from that compute a macroscopic momentum (MV) obeying the law of conservation of momentum. In fact this macroscopic approach works fine, and can fairly easily be proven to be consistent with the microscopic approach. No problem. | The notion of kinetic energy causes trouble when we try to import it. Sometimes you want a microscopic accounting of kinetic energy, and sometimes you want to include only the macroscopic kinetic energy. There is nontrivial ambiguity here, as discussed in section 16.4 and reference 13. |
Let’s build up a scenario, based on some universal facts plus some scenario-specific assumptions.
We know that the energy of the system is well defined. Similarly we know the entropy of the system is well defined. These aren’t assumptions. Every system has energy and entropy.
Next, as a hypothesis of this scenario, we assume that the system has a well-defined thermodynamic state, i.e. macrostate. This macrostate can be represented as a point in some abstract state-space. At each point in macrostate-space, the macroscopic quantities we are interested in (energy, entropy, pressure, volume, temperature, etc.) take on well-defined values.
We further assume that this macrostate-space has dimensionality M, and that M is not very large. (This M may be larger or smaller than the dimensionality D of the position-space we live in, namely D=3.)
Assuming a well-behaved thermodynamic state is a highly nontrivial assumption.
We further assume that the quantities of interest vary smoothly from place to place in macrostate-space.
We must be careful how we formalize this “smoothness” idea. By way of analogy, consider a point moving along a great-circle path on a sphere. This path is nice and smooth, by which we mean differentiable. We can get into trouble if we try to describe this path in terms of latitude and longitude, because the coordinate system is singular at the poles. This is a problem with the coordinate system, not with the path itself. To repeat: a great-circle route that passes over the pole is differentiable, but its representation in spherical polar coordinates is not differentiable.Applying this idea to thermodynamics, consider an ice/water mixture at constant pressure. The temperature is a smooth function of the energy content, whereas the energy-content is not a smooth function of temperature. I recommend thinking in terms of an abstract point moving in macrostate-space. Both T and E are well-behaved functions, with definite values at each point in macrostate-space. We get into trouble if we try to parameterize this point using T as one of the coordinates, but this is a problem with the coordinate representation, not with the abstract space itself.
We will now choose a particular set of variables as a basis for specifying points in macrostate-space. We will use this set for a while, but we are not wedded to it. As one of our variables, we choose S, the entropy. The remaining variables we will collectively call V, which is a vector with D−1 dimensions. In particular, we choose the macroscopic variable V in such a way that the microscopic energy Ei of the ith microstate is determined by V. (For an ideal gas in a box, V is just the volume of the box.)
Given these assumptions, we can write:
| dE = |
| ⎪ ⎪ ⎪ ⎪ |
| dV + |
| ⎪ ⎪ ⎪ ⎪ |
| dS (9) |
which is just the chain rule for differentiating a function of two variables. More elaborate versions of this will be discussed in section 16.1.
It is conventional to define the symbols
| P := − |
| ⎪ ⎪ ⎪ ⎪ |
| (10) |
and
| T := |
| ⎪ ⎪ ⎪ ⎪ |
| (11) |
You might say this is just terminology, just a definition of T … but we need to be careful because there are also other definitions of T floating around. More importantly, if we are going to connect this T to our notion of temperature, there are some basic qualitative properties that we want temperature to have, as discussed in section 9.1. Equation 11 is certainly not the most general definition of temperature, because of several assumptions that we made in the lead-up to equation 9. By way of counterexample, in NMR or ESR, a τ2 process changes the entropy without changing the energy. As an even simpler counterexample, internal leakage currents within a thermally-isolated storage battery increase the entropy of the system without changing the energy; see figure 3.
Using the symbols we have just defined, we can rewrite equation 9 in the following widely-used form:
| dE = −P dV + T dS (12) |
(See equation 25 for a generalization of this equation.)
Similarly, if we choose to define
| (13) |
and
| (14) |
That’s all fine; it’s just terminology. Note that w and q are one-forms, not scalars, as discussed in section 6.8. They are functions of state, i.e. uniquely determined by the thermodynamic state.9 Using these definitions of w and q we can write
| dE = w + q (15) |
which is fine so long as we don’t misinterpret it. However you should keep in mind that equation 15 and its precursors are very commonly misinterpreted. In particular, it is tempting to interpret w as “work” and q as “heat”, which is either a good idea or a bad idea, depending on which of the various mutually-inconsistent definitions of “work” and “heat” you happen to use. See section 15.1 and section 16.1 for details.
You should also keep in mind that these equations (equation 9, equation 12 and/or equation 15) do not represent the most general case. An important generalization is mentioned in section 6.5.
Recall that we are not wedded to using (V,S) as our basis in macrostate space. As an easy but useful change of variable, consider the case where V = XYZ, in which case we can expand equation 9 as:
| (16) |
where we define the forces FX, FY, and FZ as directional derivatives of the energy: FX := −∂ E / ∂ X |Y,Z,S and similarly for the others.
Here’s another change of variable that calls attention to some particularly interesting partial derivatives. Now that we have introduced the T variable, we can write
| dE = |
| ⎪ ⎪ ⎪ ⎪ |
| dV + |
| ⎪ ⎪ ⎪ ⎪ |
| dT (17) |
assuming things are sufficiently differentiable.
The derivative in the second term on the RHS is called the heat capacity (at constant volume); that is:
| CV := |
| ⎪ ⎪ ⎪ ⎪ |
| (18) |
assuming the RHS exists. (This is a nontrivial assumption. By way of counterexample, the RHS does not exist near a first-order phase transition such as the ice/water transition, because the energy is not differentiable with respect to temperature there. This corresponds roughly to an infinite heat capacity, but it takes some care and some sophistication to quantify what this means. See reference 14.)
The heat capacity in equation 18 is an extensive quantity. The corresponding intensive quantities are the specific heat capacity (heat capacity per unit mass) and the molar heat capacity (heat capacity per particle).
The other derivative on the RHS of equation 17 doesn’t have a name so far as I know. It is identically zero for a table-top sample of ideal gas (but not in general).
The term isochoric means “at constant volume”, so CV is the isochoric heat capacity ... but more commonly it is just called the “heat capacity at constant volume”.
Using the chain rule, we can find a useful expression for CV in terms of entropy:
|