by Daniel W. Stroock
Suppose that \( \{\mu _n:\,n\in\mathbb{N}\} \) is a sequence of Borel probability measures on a Polish space \( \Omega \), and assume that, as \( n\to\infty \), \( \mu _n \) degenerates to the point mass \( \delta _{\omega_0} \) at \( \omega_0\in \Omega \). Then, it is reasonable to say that, at least for large \( n \), neighborhoods of \( \omega _0 \) represent “typical” behavior and that their complements represent “deviant” behavior; it is often important to know how fast their complements are becoming deviant. Finding a detailed solution to such a problem usually entails rather intricate analysis. However, if one’s interest is in behavior which is “highly deviant”, in the sense that it is dying out at an exponential rate, and if one is satisfied with finding the exponential rate at which it is disappearing, then one is studying large deviations and life is much easier. Indeed, instead of trying to calculate the asymptotic limit of quantities like \( \mu _n\bigl(B(\omega_0,r)^{\complement} \bigr) \) (where \( B(\omega ,r) \) denotes the ball of radius \( r \) centered at \( \omega \)), one is trying to calculate \begin{equation} \lim_{n\to\infty }\frac1n \log \mu _n\bigl(B(\omega_0,r)^{\complement} \bigr) , \end{equation}
which is an inherently simpler task.
To develop the intuition for this type of analysis, remember that we are dealing here with probability measures. Thus, the only way that the \( \mu _n \) can degenerate to \( \delta _{\omega_0} \) is that more and more of their mass is concentrated in a neighborhood of \( \omega_0 \). In the nicest situation, this concentration is taking place because \[ \mu_n(d\omega )=\frac1{Z_n}e^{-nI(\omega )}\lambda (d\omega ) ,\]
where \( I(\omega ) > I(\omega_0)\ge0 \) for \( \omega \neq \omega_0 \), and \( \lambda \) is some reference measure. Indeed, assuming that \[ \lim_{n\to\infty}\frac1n\log Z_n=0 ,\]
then \begin{align*} \lim_{n\to\infty }\frac1n\log \mu _n(\Gamma ) & =\lim_{n\to\infty } \log\|\mathbf{1}_\Gamma e^{-I}\|_{L^n(\lambda )} \\ & =\log \|\mathbf{1}_\Gamma e^{-I}\|_{L^\infty (\lambda )} \\ & =-\mathop{\mathrm{essinf}}_{\hskip-7pt \omega \in\Gamma }\,I(\omega ). \end{align*}
That is, \begin{equation} \label{1} \lim_{n\to\infty }\frac1n\log\mu _n(\Gamma )=-\mathop{\mathrm{essinf}}_{\hskip-7pt \omega \in\Gamma }\,I(\omega ). \end{equation}
Of course, for many applications (for example, to number theory, geometry, or statistical mechanics) the non-appearance of \( Z_n \) in the answer would mean that one has thrown out the baby with the wash. On the other hand, because it is so crude, the type of thinking used in the previous remark predicts correct results even when it has no right to. To wit, suppose that \( \Omega =\{\omega \in C([0,1];{\Bbb R}):\,\omega (0)=0\} \) and \( \mu _n \) is the distribution of \( \omega \in \Omega \longmapsto n^{-1/2}\,\omega \,\in\, \Omega \) under standard Wiener measure. Clearly, the \( \mu _n \) are degenerating to the point-mass at the path \( \mathbf{0} \) which is identically 0. Moreover, Feynman’s representation of \( \mu _n \) is \[ \mu _n(d\omega )=\frac1{Z_n}e^{-n\,I(\omega )}\,\lambda (d\omega ), \]
where \[ I(\omega )=\frac12\int_0^1|\dot\omega (t)|^2 dt \]
and \( \lambda \) is the Lebesgue measure on \( \Omega \). Ignoring the fact that none of this is mathematically kosher and proceeding formally, one is led to the guess that \eqref{1} may nonetheless be true, at least after one has taken into account some of its obvious flaws. In particular, there are two sources of concern. The first of these is the almost-sure non-differentiability of Wiener paths. However this objection is easily overcome by simply defining \( I(\omega )=\infty \) unless \( \omega \) has a square-integrable derivative. The second objection is that \( \lambda \) does not exist and therefore the “ess” before the “inf” has no meaning. This objection is more serious, and its solution requires greater subtlety. In fact, it was Varadhan’s solution to this problem which was one of his seminal contributions to the whole field of large deviations. Namely, our derivation of \eqref{1} was purely measure-theoretic: we took no account of topology. On the other hand, not even the sense in which the \( \mu _n \) are degenerating can be rigorously described in purely measure-theoretic terms. The best that one can say is that they are tending weakly to \( \delta _\mathbf{0} \). Thus, one should suspect that \eqref{1} must be amended to reflect the topology of \( \Omega \), and that topology should appear in exactly the same way that it does in weak convergence. With this in mind, one can understand Varadhan’s answer that \eqref{1} should be replaced by \begin{align} \label{2} -\inf_{\omega \in\Gamma ^\circ}I(\omega ) & \le\varliminf_{n\to\infty}\frac1n\log\mu _n(\Gamma ) \\ & \le\varlimsup_{n\to\infty}\frac1n\log\mu _n(\Gamma ) \nonumber \\ & \le -\inf_{\omega \in\overline\Gamma }I(\omega ). \nonumber \end{align}
Monroe Donsker provided the original inspiration for this type of analysis of rescaled Wiener measure, and his student Schilder was the first to obtain rigorous results. However, it was Varadhan [1] who first realized that Schilder’s work could be viewed in the context of large deviations, and it was he who gave and proved the validity of the formulation in \eqref{2}. Indeed, a strong case can be made for saying that the modern theory of large deviations was born in [1]. In particular, \eqref{2} quickly became the archetype for future results; families \( \{\mu _n:\,n\in\mathbb{N}\} \) for which \eqref{2} are now said to satisfy the large deviation principle with rate function \( I \). In addition, it was in the same article that Varadhan proved how to pass from \eqref{2} to the sort of results which Schilder had proved. Namely, he proved that if \eqref{2} holds with a rate function \( I \) which has compact level sets (that is, \( \{\omega :\,I(\omega )\le R\} \) is compact for each \( R\in[0,\infty ) \)) and \( F:\Omega \longrightarrow \mathbb{R} \) is a bounded, continuous function, then \begin{equation} \label{3} \lim_{n\to\infty }\mathbb{E}^{\mu _n}\bigl[e^{nF}\bigr]=\sup_{\omega \in\Omega } \bigl(F(\omega )-I(\omega )\bigr). \end{equation}
This result, which is commonly called Varadhan’s lemma, is exactly what one would expect from the model case when \( \mu _n(d\omega )=(1/{Z_n})\,e^{-nI}\,\lambda (d\omega ) \); its proof in general is quite easy, but one would be hard put to overstate its importance. Not only is it a practical computational tool, but it provides a link between the theory of large deviations and convex analysis. Specifically, when \( \Omega \) is a closed, convex subset of a topological vector space \( E \), then, under suitable integrability assumptions, Varadhan’s lemma combined with the inversion formula for Legendre transforms often can be used to identify the rate function \( I \) as the Legendre transform \[ \Lambda ^*(\omega )=\sup_{\lambda \in E^*}\{\langle\omega ,\lambda \rangle-\Lambda (\lambda ):\,\lambda \in E^*\}, \]
where \[ \lambda \in E^*\longrightarrow \Lambda (\lambda )\equiv \lim_{n\to\infty }\frac1n\log\mathbb{E}^{\mu_n}[ e^{n\lambda (\omega )}]. \]
Had he only laid the foundation for the field, Varadhan’s impact on the study of large deviations would have been already profound. However, he did much more. Perhaps his deepest contributions come from his recognition that large deviations underlie and explain phenomena in which nobody else even suspected their presence. The depth of his understanding is exemplified by his explanation of Marc Kac’s famous formula for the principle eigenvalue of a Schrödinger operator. Donsker had been seeking such an explanation for years, but it was not until he joined forces with Varadhan that real progress was made on the problem. Prior to their article [7], all applications (Schilder’s theorem, including its extensions and improvements by Varadhan, as well as the many beautiful articles by Freidlin and Wentcel) of large deviations to diffusion theory had been based on the observation that, during a short time interval, “typical” behavior of a diffusion is given by the solution to an ordinary differential equation. Thus, the large deviations in these applications come from the perturbation of an ordinary differential equation by a Gaussian-like noise term. The large deviations in [7] have an entirely different origin. Instead of short-time behavior of the diffusion paths themselves, the quantity under consideration is the long-time behavior of their empirical distribution. In this case, “typical” behavior is predicted by ergodic theory, and the large deviations are those of the empirical distribution from ergodic behavior. The situation in [7] is made more challenging by the fact that there really is no proper ergodic behavior of Brownian motion on the whole of space, since, in so far as possible, the empirical distribution of a Brownian path is trying to become the normalized Lebesgue measure. What saves the day is the potential term in the Schrödinger operator, whose presence penalizes paths that attempt to spread out too much.
The upshot of Donsker and Varadhan’s analysis [7] is a new variational formula for the principal eigenvalue. Although their formula reduces to the classical one in the case of self-adjoint operators, it has the advantage that it relies entirely on probabilistic reasoning (that is, the minimum principle) and, as they showed in [3], is therefore equally valid for operators which are not self-adjoint. More important, it launched a program which produced a spectacular sequence of articles. The general theory was developed in [3] and [10], each one raising the level1 of abstraction and, at the same time, revealing more fundamental principles. However, they did not content themselves with general theory. On the contrary, they applied their theory to solve a remarkably varied set of problems, ranging from questions about the range of a random walk in [17] to questions coming from mathematical physics about function-space integrals in [10] and [20], with each abstraction designed to tackle a specific problem.
As is nearly always the case when breaking new ground, the applications required ingenious modifications of the general theory. To give an indication of just how ingenious these modifications had to be, consider the “Wiener sausage” calculation in [5]. The problem there, which grew out of a question posed by the physicist J. Luttinger, was to find the asymptotic volume of the tubular neighborhood of a Brownian path as the time goes to infinity and the diameter of the neighborhood goes to 0. If one thinks about it, one realizes that this volume can be computed by looking at a neighborhood in the space of measures of the empirical distribution. However, the neighborhood that one needs is the one determined by the variation norm, whereas their general theory deals with the weak topology. Thus, except in one dimension where local time comes to the rescue, they had to combine their general theory with an intricate approximation procedure in order to arrive at their goal. Their calculation in [10] is a true tour de force, only exceeded by their solution to the polaron problem in [20].
In conclusion, it should be emphasized that Varadhan’s contributions to the theory of large deviations were to both its foundations and its applications. Because of his work, the subject is now seen as a basic tool of analysis, not simply an arcane branch of probability and statistics. With 20/20 hindsight, it has become clear that large deviations did not always provide the most efficient or best approach to some of the problems which he solved, but there can be no doubt that his insights have transformed the field forever.