#### by Daniel W. Stroock

Suppose that __\( \{\mu _n:\,n\in\mathbb{N}\} \)__ is a sequence of
Borel probability measures on a Polish space __\( \Omega \)__, and assume that, as
__\( n\to\infty \)__, __\( \mu _n \)__ degenerates to the point mass __\( \delta _{\omega_0} \)__ at
__\( \omega_0\in \Omega \)__. Then, it is reasonable to say that, at least for large __\( n \)__,
neighborhoods of __\( \omega _0 \)__ represent “typical” behavior and that their
complements represent “deviant” behavior; it is often important to
know how fast their complements are becoming deviant. Finding a detailed
solution to such a problem usually entails rather intricate analysis. However,
if one’s interest is in behavior which is
“highly deviant”, in the sense that it is dying out at an exponential
rate, and if one is satisfied with finding the exponential rate at which it
is disappearing, then one is studying
*large deviations*
and life is much easier. Indeed, instead of trying to calculate the
asymptotic limit of quantities like __\( \mu
_n\bigl(B(\omega_0,r)^{\complement} \bigr) \)__ (where __\( B(\omega ,r) \)__
denotes the ball of radius __\( r \)__ centered at __\( \omega \)__), one is trying to calculate
__\begin{equation}
\lim_{n\to\infty }\frac1n \log \mu _n\bigl(B(\omega_0,r)^{\complement} \bigr) ,
\end{equation}__
which is an inherently simpler task.

To develop the intuition for this type of analysis, remember that we are
dealing here with probability measures. Thus, the only way that the __\( \mu
_n \)__ can degenerate to __\( \delta _{\omega_0} \)__ is that more and more of their
mass is concentrated in a neighborhood of __\( \omega_0 \)__. In the nicest
situation, this concentration is taking place because
__\[ \mu_n(d\omega )=\frac1{Z_n}e^{-nI(\omega )}\lambda (d\omega ) ,\]__
where
__\( I(\omega ) > I(\omega_0)\ge0 \)__ for __\( \omega \neq \omega_0 \)__, and __\( \lambda \)__ is
some reference measure. Indeed, assuming that
__\[ \lim_{n\to\infty}\frac1n\log Z_n=0 ,\]__
then
__\begin{align*}
\lim_{n\to\infty }\frac1n\log \mu _n(\Gamma )
&
=\lim_{n\to\infty }
\log\|\mathbf{1}_\Gamma e^{-I}\|_{L^n(\lambda )}
\\
&
=\log \|\mathbf{1}_\Gamma e^{-I}\|_{L^\infty
(\lambda )}
\\
&
=-\mathop{\mathrm{essinf}}_{\hskip-7pt \omega \in\Gamma }\,I(\omega
).
\end{align*}__
That is,
__\begin{equation}
\label{1}
\lim_{n\to\infty }\frac1n\log\mu _n(\Gamma
)=-\mathop{\mathrm{essinf}}_{\hskip-7pt \omega \in\Gamma }\,I(\omega ).
\end{equation}__
Of course, for many applications (for example, to number theory, geometry, or
statistical mechanics) the non-appearance of __\( Z_n \)__ in the answer would
mean that one has thrown out the baby with the wash. On the other hand,
because it is so crude, the type of thinking used in the previous remark predicts
correct results even when it has no right to. To
wit, suppose that __\( \Omega =\{\omega \in C([0,1];{\Bbb
R}):\,\omega (0)=0\} \)__ and __\( \mu _n \)__ is the distribution of __\( \omega
\in \Omega \longmapsto n^{-1/2}\,\omega \,\in\, \Omega \)__ under standard
Wiener measure. Clearly, the __\( \mu _n \)__
are degenerating to the point-mass at the path __\( \mathbf{0} \)__ which is identically 0.
Moreover, Feynman’s representation of __\( \mu _n \)__ is
__\[
\mu _n(d\omega )=\frac1{Z_n}e^{-n\,I(\omega )}\,\lambda (d\omega ),
\]__
where
__\[
I(\omega )=\frac12\int_0^1|\dot\omega (t)|^2 dt
\]__
and __\( \lambda \)__ is the Lebesgue measure on __\( \Omega \)__. Ignoring the fact that none of
this is mathematically kosher and proceeding formally, one is led to the
guess that __\eqref{1}__ may nonetheless be true, at least after one has taken
into account some of its obvious flaws. In particular, there are two
sources of concern. The first of these is the almost-sure
non-differentiability of Wiener paths. However this objection is easily
overcome by simply defining __\( I(\omega )=\infty \)__ unless __\( \omega \)__ has a
square-integrable derivative. The second objection is that __\( \lambda \)__ does
not exist and therefore the “ess” before the “inf” has no meaning.
This objection is more serious, and its solution requires greater subtlety.
In fact, it was Varadhan’s solution to this problem which was one of his
seminal contributions to the whole field of large deviations. Namely, our
derivation of __\eqref{1}__ was purely measure-theoretic: we took no account
of topology. On the other hand, not even the sense in which the __\( \mu _n \)__
are degenerating can be rigorously described in purely measure-theoretic
terms. The best that one can say is that they are tending weakly to
__\( \delta _\mathbf{0} \)__. Thus, one should suspect that __\eqref{1}__ must be amended to
reflect the topology of __\( \Omega \)__, and that topology should appear in exactly
the same way that it does in weak convergence. With this in mind, one can
understand Varadhan’s answer that __\eqref{1}__ should be replaced by
__\begin{align}
\label{2}
-\inf_{\omega \in\Gamma ^\circ}I(\omega )
&
\le\varliminf_{n\to\infty}\frac1n\log\mu _n(\Gamma )
\\
&
\le\varlimsup_{n\to\infty}\frac1n\log\mu _n(\Gamma )
\nonumber
\\
&
\le -\inf_{\omega \in\overline\Gamma }I(\omega ).
\nonumber
\end{align}__

Monroe Donsker provided the original inspiration for this type of analysis
of rescaled Wiener measure, and his student Schilder was the
first to obtain rigorous results. However, it was Varadhan
[1]
who first realized that Schilder’s work could be viewed in the context of
large deviations, and it was he who gave and proved the validity of the
formulation in __\eqref{2}__. Indeed, a strong case can be made for saying
that the modern theory of large deviations was born in
[1].
In
particular, __\eqref{2}__ quickly became the archetype for future results;
families __\( \{\mu _n:\,n\in\mathbb{N}\} \)__ for which __\eqref{2}__ are now said to
satisfy *the large deviation principle with rate function \( I \)*. In
addition, it was in the same article that Varadhan proved how to pass from

__\eqref{2}__to the sort of results which Schilder had proved. Namely, he proved that if

__\eqref{2}__holds with a rate function

__\( I \)__which has compact level sets (that is,

__\( \{\omega :\,I(\omega )\le R\} \)__is compact for each

__\( R\in[0,\infty ) \)__) and

__\( F:\Omega \longrightarrow \mathbb{R} \)__is a bounded, continuous function, then

__\begin{equation} \label{3} \lim_{n\to\infty }\mathbb{E}^{\mu _n}\bigl[e^{nF}\bigr]=\sup_{\omega \in\Omega } \bigl(F(\omega )-I(\omega )\bigr). \end{equation}__This result, which is commonly called

*Varadhan’s lemma*, is exactly what one would expect from the model case when

__\( \mu _n(d\omega )=(1/{Z_n})\,e^{-nI}\,\lambda (d\omega ) \)__; its proof in general is quite easy, but one would be hard put to overstate its importance. Not only is it a practical computational tool, but it provides a link between the theory of large deviations and convex analysis. Specifically, when

__\( \Omega \)__is a closed, convex subset of a topological vector space

__\( E \)__, then, under suitable integrability assumptions, Varadhan’s lemma combined with the inversion formula for Legendre transforms often can be used to identify the rate function

__\( I \)__as the Legendre transform

__\[ \Lambda ^*(\omega )=\sup_{\lambda \in E^*}\{\langle\omega ,\lambda \rangle-\Lambda (\lambda ):\,\lambda \in E^*\}, \]__where

__\[ \lambda \in E^*\longrightarrow \Lambda (\lambda )\equiv \lim_{n\to\infty }\frac1n\log\mathbb{E}^{\mu_n}[ e^{n\lambda (\omega )}]. \]__

Had he only laid the foundation for the field, Varadhan’s impact on the study of large deviations would have been already profound. However, he did much more. Perhaps his deepest contributions come from his recognition that large deviations underlie and explain phenomena in which nobody else even suspected their presence. The depth of his understanding is exemplified by his explanation of Marc Kac’s famous formula for the principle eigenvalue of a Schrödinger operator. Donsker had been seeking such an explanation for years, but it was not until he joined forces with Varadhan that real progress was made on the problem. Prior to their article [7], all applications (Schilder’s theorem, including its extensions and improvements by Varadhan, as well as the many beautiful articles by Freidlin and Wentcel) of large deviations to diffusion theory had been based on the observation that, during a short time interval, “typical” behavior of a diffusion is given by the solution to an ordinary differential equation. Thus, the large deviations in these applications come from the perturbation of an ordinary differential equation by a Gaussian-like noise term. The large deviations in [7] have an entirely different origin. Instead of short-time behavior of the diffusion paths themselves, the quantity under consideration is the long-time behavior of their empirical distribution. In this case, “typical” behavior is predicted by ergodic theory, and the large deviations are those of the empirical distribution from ergodic behavior. The situation in [7] is made more challenging by the fact that there really is no proper ergodic behavior of Brownian motion on the whole of space, since, in so far as possible, the empirical distribution of a Brownian path is trying to become the normalized Lebesgue measure. What saves the day is the potential term in the Schrödinger operator, whose presence penalizes paths that attempt to spread out too much.

The upshot of Donsker and Varadhan’s analysis [7] is a new variational formula for the principal eigenvalue. Although their formula reduces to the classical one in the case of self-adjoint operators, it has the advantage that it relies entirely on probabilistic reasoning (that is, the minimum principle) and, as they showed in [3], is therefore equally valid for operators which are not self-adjoint. More important, it launched a program which produced a spectacular sequence of articles. The general theory was developed in [3] and [10], each one raising the level1 of abstraction and, at the same time, revealing more fundamental principles. However, they did not content themselves with general theory. On the contrary, they applied their theory to solve a remarkably varied set of problems, ranging from questions about the range of a random walk in [17] to questions coming from mathematical physics about function-space integrals in [10] and [20], with each abstraction designed to tackle a specific problem.

As is nearly always the case when breaking new ground, the applications
required ingenious modifications of the general theory. To give an
indication of just how ingenious these modifications had to be, consider
the “Wiener sausage” calculation in
[5].
The problem there, which
grew out of a question posed by the physicist J. Luttinger, was to find the
asymptotic volume of the tubular neighborhood of a Brownian path
as the time goes to infinity and the diameter of the neighborhood goes to
0. If one thinks about it, one realizes that this volume can be computed
by looking at a neighborhood in the space of measures of the empirical
distribution. However, the neighborhood that one needs is the one
determined by the variation norm, whereas their general theory deals with
the weak topology. Thus, except in one dimension where local time comes to
the rescue, they had to combine their general theory with an intricate
approximation procedure in order to arrive at their goal. Their
calculation in
[10]
is a true *tour de force*, only exceeded by
their solution to the polaron problem in
[20].

In conclusion, it should be emphasized that Varadhan’s contributions to the theory of large deviations were to both its foundations and its applications. Because of his work, the subject is now seen as a basic tool of analysis, not simply an arcane branch of probability and statistics. With 20/20 hindsight, it has become clear that large deviations did not always provide the most efficient or best approach to some of the problems which he solved, but there can be no doubt that his insights have transformed the field forever.