by Paul R. Chernoff
About Andy
The course was both a challenge and a pleasure. I can only echo what others have said about Andy’s luminous clarity and massive abstract power. But I must admit that the lectures, always exciting, weren’t absolutely perfect; in the course of a year Andy made one genuine blunder. As to his famous speed, John Schwarz, the well-known string theorist, once said after class that Andy had “the metabolism of a hummingbird”.
I was extremely lucky that Andy was affiliated with Lowell House, my undergraduate residence. Every week Andy came for lunch, where we sat around a large circular table. That’s how Andy and I became friends. Of course we discussed a lot of mathematics around that table, but lots of other things, including Andy’s “war stories”. I am not surprised that someone kept a great treasure: all of Andy’s napkin manuscripts.
Almost any mathematical problem could intrigue Andy. At one of the annual math department picnics, he had fun figuring out how to do cube roots on an abacus. But most important was his unpretentiousness, openness, and great interest in students. I suppose that all teachers are impatient at times; no doubt Andy was sorely tried on occasion. But he rarely, if ever, showed it. The students in one of his classes gave him a framed copy of Picasso’s early painting Mother and Child. Perhaps they chose this gift to symbolize Andy’s nurturing of them. It’s regrettable that there are some teachers for whom Guernica would be more appropriate.
Quantum mechanics
In this section we set the stage for a discussion of Andy’s unique contribution to physics: his remarkable paper “Measures on the closed subspaces of a Hilbert space” [1]. It’s interesting in several ways: its history; its influence in mathematics; and especially its unexpected importance to the analysis of “hidden variable” theories of quantum mechanics by the physicist John Bell.
In classical mechanics, the state of a particle of mass \( m \) is given by its position and momentum. The motion or dynamics of a set of particles with associated forces is determined by Newton’s second law of motion, a system of ordinary differential equations. This yields a picture of the macroscopic world which matches our intuition. The ultramicroscopic world requires a quite different description. The state of a particle of mass \( m \) in \( \mathbb{R}^3 \) is a complex valued function \( \psi \) on \( \mathbb{R}^3 \). Its momentum is similarly described by the function \( \phi \), the Fourier transform of \( \psi \), normalized by the presence in the exponent of the ratio \( \frac{h}{m} \), where \( h \) is Planck’s constant. (Using standard properties of the Fourier transform, one can deduce Heisenberg’s uncertainty principle.) For \( n \) particles, \( \psi \) is defined on \( \mathbb{R}^{3n} \). This is a brilliant extrapolation of the initial ideas of DeBroglie. The Schrödinger equation determines the dynamics. If both \( \psi \) and \( \phi \) are largely concentrated around \( n \) points in position and momentum space respectively, then the quantum state resembles a blurry picture of the classical state. The more massive the particles, the less the blurriness (protons versus baseballs).
The fundamental interpretation of the “wave function” \( \psi \) is the work of Max Born.1 His paper analyzing collisions of particles ends with the conclusion that \( |\psi |^2 \) should be interpreted as the probability distribution for the positions of the particles. Therefore the wave function must be a unit vector in \( L^2 \). Thus did Hilbert space enter quantum mechanics.
Prior to Schrödinger’s wave mechanics, Heisenberg had begun to develop a theory in which observable quantities are represented by Hermitian-symmetric infinite square arrays. He devised a “peculiar” law for multiplying two arrays by an ingenious use of the physical meaning of their entries. Born had learned matrix theory when he was a student and realized (after a week of “agony”) that Heisenberg’s recipe was just matrix multiplication. Hence the Heisenberg theory is called matrix mechanics. (Schrödinger showed that matrix mechanics and wave mechanics are mathematically equivalent.) As in classical mechanics, the dynamics of a quantum system is determined from its energy \( H \). Periodic orbits correspond to the eigenvalues of \( H \), i.e., the discrete energy levels. The calculation of the eigenvalues is very difficult, save for a few simple systems. The energy levels for the hydrogen atom were ingeniously calculated by Wolfgang Pauli; his results agreed with Bohr’s calculations done at the very beginning of the “old” quantum theory.
Born was quite familiar with Hilbert’s theory of integral equations in \( L^2 \). Accordingly, he was able to interpret Heisenberg’s matrices as Hermitian symmetric kernels with respect to some orthonormal basis, which might just as well be regarded as the corresponding integral operators on \( L^2 \). Formally, every Hermitian matrix could be regarded as an integral operator, usually with a very singular kernel. (The most familiar example is the identity, with kernel the Dirac delta function.) In this way, Born initiated the standard picture of observables as Hermitian operators A on \( L^2 \). But at that time, the physicists did not grasp the important distinction between unbounded Hermitian operators and unbounded self-adjoint operators. That was greatly clarified by John von Neumann, major developer of the theory of unbounded self-adjoint operators.
Having interpreted \( |\psi |^2 \) as the probability distribution for the positions of particles, Born went on to devise what immediately became the standard interpretation of measurements in quantum mechanics: the probability that a measurement of a quantum system will yield a particular result.
Born’s line of thought was this. A state of a quantum system corresponds to a unit vector \( \psi \in L^2 \). What are the possible values of a measurement of the observable represented by the operator A, and what is the probability that a specific value is observed? Born dealt only with operators with a discrete spectrum, namely, the set of all its eigenvalues. For simplicity, assume that there are no multiple eigenvalues. Let \( \phi_n \) be the unit eigenvector with eigenvalue \( \lambda_n. \) These form an orthonormal basis of \( L^2 \). Expand \( \psi \) as a series \( \sum_k c_k \phi_k \). Since \( \|\psi \|^2 = 1 \), we get \( \sum_k |c_k |^2 = 1 \). Born’s insight was that any measurement must yield one of the eigenvalues \( \lambda_n \) of \( A \), and \( |c_n |^2 \) is the probability that the result of the measurement is \( \lambda_n \). This is known as Born’s rule. It follows that the expected value of a measurement of \( A \) is \( \sum_k |c_k |^2 \lambda_k \). Note that this sum equals the inner product \( (A\psi , \psi) \). This is the same as \( \operatorname{trace}(P A) \), where \( P \) is the projection onto the one-dimensional subspace spanned by \( \psi \). (To jump ahead, George Mackey wondered if Born’s rule might involve some arbitrary choices. Gleason ruled this out.)
John von Neumann was the creator of the abstract theory of quantum mechanics. In his theory, a pure state is a unit vector in a Hilbert space \( \mathcal{H} \) . Observables are self-adjoint operators, unbounded in general, whose spectrum may be any Borel subset of \( \mathbb{R} \). Von Neumann also developed the important concept of a mixed state. A mixed state \( \mathbf{D} \) describes a situation in which there is not enough information to determine the pure state \( \psi \) of the system. Usually physicists write \( \mathbf{D} \) as a convex combination of orthogonal pure states, \( \sum_k w_k \psi_k \) . This notation is confusing; \( \mathbf{D} \) is not a vector in \( \mathcal{H} \)! It may be interpreted as a list of probabilities \( w_k \) that the corresponding pure state is \( \psi_k \). Associated with the state \( \mathbf{D} \) there is a positive operator \( D \) with trace 1, given by the formula \[ D = \sum_k w_k P_k \] where \( P_n \) is the projection on the eigenspace of \( \mathbf{D} \) corresponding to the eigenvalue \( w_n \). The expected value of an observable \( A \) is quite clearly \[ E (A) = \sum_k w_k (A\psi_k , \psi_k ) =\operatorname{trace}(D A). \] This is von Neumann’s general Born rule.
The eigenvalues of a projection operator are 1 and 0; those are the only values a measurement of the corresponding observable can yield. That is why Mackey calls a projection a question; the answer is always either 1 or 0: “yes” or “no”. The fundamental example is the following. Given a self-adjoint operator \( A \), we will apply the spectral theorem. Let \( S \) be any Borel subset of \( \mathbb{R} \) and let \( P_S \) be the corresponding “spectral projection” of \( A \). (If the set \( S \) contains only some eigenvalues of \( A \), then \( P_S \) is simply projection onto the subspace spanned by the corresponding eigenvectors.) Now suppose the state of the system is the mixed state \( \mathbf{D} \). From the general Born rule, the probability that a measurement of \( A \) lies in \( S \) is the expected value of \( P_S \), namely, \( \operatorname{trace}(DP_S ) \). That is the obvious generalization of Born’s formula for the probability that a measurement of \( A \) is a particular eigenvalue of \( A \) or a set of isolated eigenvalues.
Quite generally, consider a positive operator \( D \) with \( \operatorname{trace}(D ) = 1 \). The nonnegative real-valued function \( \mu (P ) = \operatorname{trace}(D P ) \) is a countably additive probability measure on the lattice of projections on \( \mathcal{H} \). This means that if \( \{P_n \} \) is a countable family of mutually orthogonal projections, \[ \mu\biggl(\sum_n P_n\biggr) = \sum_n \mu(P_n). \] Also \( \mu (I ) = 1 \). Mackey asked whether every such measure on the projections is of this form, i.e., corresponds to a state \( D \) . We already mentioned Mackey’s interest in Born’s rule. A positive answer to Mackey’s question would show that the Born rule follows from his rather simple axioms for quantum mechanics [e2], [e1], and thus, given these weak postulates, Born’s rule is not ad hoc but inevitable.
Gleason’s theorem
Mackey didn’t try very hard to solve his problem for the excellent reason that he had no idea how to attack it. But he discussed it with a number of experts, including Irving Segal, who mentioned Mackey’s problem in a graduate class at Chicago around 1949 or 1950. Among the students was Dick Kadison, who realized that there are counterexamples when \( \mathcal{H} \) is two-dimensional. The higher-dimensional case remained open.
There matters stood for some years. Then Gleason entered the story. In 1956 he sat in on Mackey’s graduate course on quantum mechanics at Harvard. To Mackey’s surprise, Andy was seized by the problem “with intense ferocity”. Moreover, Kadison was visiting MIT at the time, and his interest in Mackey’s problem was rekindled. He quickly perceived that there were many “forced inter-relations” entailed by the intertwining of the great circles on the sphere and in principle a lot could be deduced from an analysis of these relations, though the problem still looked quite tough. He mentioned his observation to Andy, who found it a useful hint. (But Kadison informed me that his observation did not involve anything like Andy’s key “frame function” idea.)
The proof has three parts. First, using countable additivity and induction, it is easy to reduce the case of any separable real Hilbert space of dimension greater than 2 to the 3-dimensional case. (The complex case follows from the real case.)
Next, consider a vector \( x \) on the unit sphere. Let \( P_x \) be the one-dimensional subspace containing \( x \), and define \( f (x) = \mu (P_x ) \). This function is called a frame function. The additivity of the measure \( \mu \) implies that for any three mutually orthogonal unit vectors, \[ f (x) + f (y ) + f (z ) = 1. \]
The proof comes down to showing that the frame function \( f \) is quadratic and therefore is of the form \( f (x) = \operatorname{trace}(D P_x ) \), where \( D \) is as in the statement of the theorem. Gleason begins his analysis by showing that a continuous frame function is quadratic via a nice piece of harmonic analysis on the sphere. The centerpiece of the paper is the proof that \( f \) is continuous. Andy told me that this took him most of the summer. It demonstrates his powerful geometric insight. However, despite Andy’s talent for exposition, much effort is needed to really understand his argument.
Quite a few people have worked on simplifying the proof. The paper by Cooke, Keane, and Moran [e13] is interesting, well written, and leads the reader up a gentle slope to Gleason’s theorem. The authors use an important idea of Piron [e9]. (The CKM argument is “elementary” because it does not use harmonic analysis.)
Generalizations of Gleason’s theorem
In his paper Andy asked if there were analogues of his theorem for countably additive probability measures on the projections of von Neumann algebras other than the algebra of bounded operators on separable Hilbert spaces.
A von Neumann algebra, or \( W^{\ast} \) algebra, is an algebra \( \mathcal{A} \) of bounded operators on a Hilbert space \( \mathcal{H} \), closed with respect to the adjoint operation. Most importantly, \( \mathcal{A} \) is closed in the weak operator topology. The latter is defined as follows: a net of bounded operators \( \{a_i \} \) converges weakly to \( b \) provided that, for all vectors \( x, y \in \mathcal{H} \), \[ \lim_{n\to\infty} (a_nx, y) = (b_x, y). \]
A state of a von Neumann algebra \( \mathcal{A} \) is a positive linear functional \( \phi : A \to \mathbb{C} \) with \( \phi(I ) = 1 \). This means that \( \phi(x) \geq 0 \) if \( x \geq 0 \) and also \( \|\phi\| = 1 \). The state \( \phi \) is normal provided that if \( a_i \) is an increasing net of operators that converges weakly to \( a \) , then \( \phi(a_i ) \) converges to \( \phi(a) \). The normal states on \( B (\mathcal{H}) \) are precisely those of the form \( \operatorname{trace}(D x) \), where \( D \) is a positive operator with trace 1.
Let \( P (\mathcal{A}) \) be the lattice of orthogonal projections in \( \mathcal{A} \). Then the formula \[ \mu (P ) = \phi(P ) \] defines a finitely additive probability measure on \( P (\mathcal{A}) \). If \( \phi \) is normal, the measure \( \mu \) is countably additive.
The converse for countably additive measures is due to A. Paszkiewicz [e14]. See E. Christensen [e10] and F. J. Yeadon [e11], [e12] for finitely additive measures. Maeda has a careful, thorough presentation of the latter in [e16].
It is not surprising that the arguments use the finite-dimensional case of Gleason’s theorem. A truly easy consequence of Gleason’s theorem is that \( \mu \) is a uniformly continuous function on the lattice of projections \( P \), equipped with the operator norm.
A great deal of work has been done on Gleason measures which are unbounded or complex valued. A good reference is [e19]. Bunce and Wright [e18] have studied Gleason measures defined on the lattice of projections of a von Neumann algebra with values in a Banach space. They prove the analogue of the results above. A simple example is Paszkiewicz’s theorem for complex-valued measures, which had been established only for positive real-valued measures.
Nonseparable Hilbert spaces
Gleason’s theorem is true only for separable Hilbert spaces. Robert Solovay has completely analyzed the nonseparable case. (Unpublished. However, [e22] is an extended abstract.) I consider Solovay’s work to be the most original extension of Gleason’s theorem.
A countable set is not gigantic. Indeed, gigantic sets are very, very large. Also, in standard set theoretic terminology, a gigantic cardinal is called a measurable cardinal.
Gleason’s theorem states that every Gleason measure on a separable Hilbert space is standard. But suppose \( \mathcal{H} \) is a nonseparable Hilbert space with a gigantic orthonormal basis \( \{e_i : i \in I \} \). Let \( \rho \) be the associated measure on \( I \). Then the formula \[ \mu(P) = \int_I (Pe_i , e_i)\,d\rho(i) \] defines an exotic Gleason measure, because \( \mu (Q) = 0 \) for every projection \( Q \) with finite-dimensional range.
On the other hand, it can be shown that if \( \mathcal{H} \) is any Hilbert space of nongigantic dimension greater than 2, then every Gleason measure on \( \mathcal{H} \) is standard. Solovay presents a proof. (A consequence is that an exotic Gleason measure exists if and only if a measurable cardinal exists.)
If \( I \) is any set, gigantic or not, and \( \rho \) is any probability measure, continuous or not, defined on all the subsets of \( I \), then the formula above defines a Gleason measure. Solovay’s main theorem says that every Gleason measure is of this form.
Observe that Gleason’s theorem is analogous; \( \rho \) is a discrete probability measure on the integers; the numbers \( \rho (n ) \) are the eigenvalues, repeated according to multiplicity, of the operator \( D \).
Solovay also proves a beautiful formula giving a canonical representation of a Gleason measure \( \mu \) as an integral over the set \( \mathcal{T} \) of positive trace-class operators \( A \) of trace 1: there is a measure \( \nu \) defined on all subsets of \( \mathcal{T} \) such that, for all \( P \), \[ \mu(P) =\int_{\mathcal{T}}\operatorname{trace}(AP) \,d\nu(A). \] Moreover, there is a unique “pure, separated” measure \( \nu \) such that the formula above holds. These two technical terms mean that \( \nu \) is similar to the sort of measure that occurs in spectral multiplicity theory for self-adjoint operators. The reader may enjoy proving this formula when \( \mathcal{H} \) is finite-dimensional; this simple case sheds some light on the general case.
Hidden variables and the work of John Bell
The major scientific impact of Gleason’s theorem is not in mathematics but in physics, where it has played an important role in the analysis of the basis of quantum mechanics. A major question is whether probabilistic quantum mechanics can be understood as a phenomenological theory obtained by averaging over variables from a deeper nonprobabilistic theory. The theory of heat exemplifies what is wanted. Heat is now understood as due to the collisions of atoms and molecules. In this way one can understand thermodynamics as a phenomenological theory derived by averages over “hidden variables” associated with the deeper particle theory; hence the term “statistical mechanics”. Einstein sought an analogous relation between quantum mechanics and — what? He is supposed to have said that he had given one hundred times more thought to quantum theory than to relativity.
The fourth chapter of John von Neumann’s great book [e7] is devoted to his famous analysis of the hidden variable question. His conclusion was that no such theory could exist. He writes, “The present system of quantum mechanics would have to be objectively false, in order that another description of the elementary processes than the statistical one may be possible.” That seemed to settle the question. Most physicists weren’t much interested in the first place when exciting new discoveries were almost showering down.
But in 1952 there was a surprise. Contrary to von Neumann, David Bohm exhibited a hidden variable theory by constructing a system of equations with both waves and particles which exactly reproduced quantum mechanics. But Einstein rejected this theory as “too easy”, because it lacked the insight Einstein was seeking. Worse yet, it had the feature Einstein most disliked. Einstein had no problem understanding that there can easily be correlations between the behavior of two distant systems, \( A \) and \( B \). If there is a correlation due to interaction when the systems are close, it can certainly be maintained when they fly apart. His objection to standard quantum mechanics was that in some cases a measurement of system \( A \) instantly determines the result of a related measurement of system \( B \). Einstein dubbed this “weird action at a distance.” Bohm’s model has this objectionable property.
In fact, soon after its publication, von Neumann’s argument was demolished by Grete Hermann [e8], a young student of Emmy Noether. Her point was that in quantum mechanics the expectation of the sum of two observables \( A \) and \( B \) is the sum of the expectations: \( E (A + B ) = E (A) + E (B ) \), even if \( A \) and \( B \) don’t commute. This is a “miracle” because the eigenvalues of \( A + B \) have no relation to those of \( A \) and \( B \) unless \( A \) and \( B \) commute. It is true only because of the special formula for expectations in quantum mechanics. It is not a “law of thought”. Yet von Neumann postulated that additivity of expected values must hold for all underlying hidden variable theories. That is the fatal mistake in von Neumann’s argument. However, although Heisenberg immediately understood Hermann’s argument when she spoke with him, her work was published in an obscure journal and was forgotten for decades.
The outstanding Irish physicist John Bell was extremely interested in the hidden variable problem. Early on he discovered a simple example of a hidden variable theory for a two-dimensional quantum system; it’s in chapter 1 of [e15], which is a reprint of [e4]. This is another counterexample for von Neumann’s “impossibility” theorem. (Bell did a great deal of important “respectable” physics. He said that he studied the philosophy of physics only on Saturdays. An interesting essay on Bell is in Bernstein’s book [e17].)
When Bell learned of Gleason’s theorem he perceived that in Hilbert spaces of dimension greater than 2, it “apparently” establishes von Neumann’s “no hidden variables” result without the objectionable assumptions about noncommuting operators. Bell is reported to have said that he must either find an “intelligible” proof of Gleason’s theorem or else quit the field. Fortunately Bell did devise a straightforward proof of a very special case: nonexistence of frame functions taking only the values 0 and 1. Such frame functions correspond to projections. This case sufficed for Bell’s purposes [e4]. See the first chapter of [e15].2
The gist of von Neumann’s proof is an argument that dispersion-free states do not exist. Here a state \( D \) is dispersion-free provided \( E (A^2 ) = E (A)^2 \) for any observable \( A \). In other words, every observation of \( A \) has the value \( E (A) \), its mean value. Quantum mechanics is supposedly obtained by averaging over such states. The frame functions considered by Bell correspond precisely to dispersion-free states. But these frame functions are not continuous. Gleason’s theorem implies that no such frame functions exist. Therefore there are no dispersion-free states. But Gleason’s theorem uses Mackey’s postulate of additivity of expectations for commuting projections. Bell’s argument based on Gleason’s theorem avoids the unjustified assumption of additivity of expectation values for noncommuting operators.
Bell writes: “That so much follows from such apparently innocent assumptions leads one to question their innocence.” He points out that if \( P \), \( Q \), and \( R \) are projections with \( P \) and \( Q \) orthogonal to \( R \) but not to each other, we might be able to measure \( R \) and \( P \), or \( R \) and \( Q \), but not necessarily both, because \( P \) and \( Q \) do not commute. Concretely, the two sets of measurements may well require different experimental arrangements. (This point was often made by Niels Bohr.) Bell expresses this fundamental fact emphatically: “The danger in fact was not in the explicit but in the implicit assumptions. It was tacitly assumed that measurement of an observable must yield the same value independently of what other measurements are made simultaneously.” In other words, the measurement may depend on its context. This amounts to saying that Gleason’s frame functions may not be well defined from the point of view of actual experiments. Accordingly, one should examine Mackey’s apparently plausible derivation that projection-valued measures truly provide part of a valid axiomatization of quantum mechanics.
Finally, a few words about the famous “Bell’s Inequality”.
The second chapter of Bell’s book is a reprint of [e3] (actually written after [e4]). In this very important paper, Bell derives a specific inequality satisfied by certain “local” hidden variable theory for nonrelativistic quantum mechanics. (“Locality” excludes “weird” correlations of measurements of widely separated systems.) There are many similar but more general inequalities. Moreover, the study of the “entanglement” of separated quantum systems has opened a new field of mathematical research.
Starting in 1969, difficult experimental work began, using variants of Bell’s inequality, to test if very delicate predictions of quantum mechanics are correct. Of course, quantum mechanics has given superb explanations of all sorts of phenomena, but these experiments waterboard quantum mechanics. Many experiments have been done; so far there is no convincing evidence that quantum mechanics is incorrect. In addition, experiments have been done which suggest that influence from one system to the other propagates enormously faster than light. These experiments point toward instantaneous transfer of information.
Bell’s papers on quantum philosophy have been collected in his book Speakable and Unspeakable in Quantum Mechanics [e15]. The first paper [e4] discusses Gleason’s theorem and the second “Bell’s inequality”. The entire book is a pleasure to read.3
Anagrams
Among his many talents, Andy was a master of anagrams. His fragmentary 1947 diary records a family visit during Harvard’s spring break:
March 30. …We played anagrams after supper and I won largely through the charity of the opposition.
April 1. …Played a game of anagrams with Mother and won.
April 2. …Mother beat me tonight at anagrams.
So we know a little about where he honed that talent.
Many years ago Andy and I had a little anagram “contest” by mail. (Dick Kadison said then, “You’re having an anagram competition with Andy Gleason? That’s like arm wrestling with Gargantua.”) Anyhow, I figured out ROAST MULES, and I was proud to come up with I AM A WONDER AT TANGLES, which is an anagram of ANDREW MATTAI GLEASON. Unfortunately, it should be MATTEI. But I didn’t have the chutzpah to ask Andy to change the spelling of his middle name. I am grateful for very interesting correspondence and conversations with the late Andy Gleason and George Mackey, together with Dick Kadison, Si Kochen, and Bob Solovay.