Suppose that \( \{\mu _n:\,n\in\mathbb{N}\} \) is a sequence of
Borel probability measures on a Polish space \( \Omega \), and assume that, as
\( n\to\infty \), \( \mu _n \) degenerates to the point mass \( \delta _{\omega_0} \) at
\( \omega_0\in \Omega \). Then, it is reasonable to say that, at least for large \( n \),
neighborhoods of \( \omega _0 \) represent “typical” behavior and that their
complements represent “deviant” behavior; it is often important to
know how fast their complements are becoming deviant. Finding a detailed
solution to such a problem usually entails rather intricate analysis. However,
if one’s interest is in behavior which is
“highly deviant”, in the sense that it is dying out at an exponential
rate, and if one is satisfied with finding the exponential rate at which it
is disappearing, then one is studying
large deviations
and life is much easier. Indeed, instead of trying to calculate the
asymptotic limit of quantities like \( \mu
_n\bigl(B(\omega_0,r)^{\complement} \bigr) \) (where \( B(\omega ,r) \)
denotes the ball of radius \( r \) centered at \( \omega \)), one is trying to calculate
\begin{equation}
\lim_{n\to\infty }\frac1n \log \mu _n\bigl(B(\omega_0,r)^{\complement} \bigr) ,
\end{equation}
which is an inherently simpler task.
To develop the intuition for this type of analysis, remember that we are
dealing here with probability measures. Thus, the only way that the \( \mu
_n \) can degenerate to \( \delta _{\omega_0} \) is that more and more of their
mass is concentrated in a neighborhood of \( \omega_0 \). In the nicest
situation, this concentration is taking place because
\[ \mu_n(d\omega )=\frac1{Z_n}e^{-nI(\omega )}\lambda (d\omega ) ,\]
where
\( I(\omega ) > I(\omega_0)\ge0 \) for \( \omega \neq \omega_0 \), and \( \lambda \) is
some reference measure. Indeed, assuming that
\[ \lim_{n\to\infty}\frac1n\log Z_n=0 ,\]
then
\begin{align*}
\lim_{n\to\infty }\frac1n\log \mu _n(\Gamma )
&
=\lim_{n\to\infty }
\log\|\mathbf{1}_\Gamma e^{-I}\|_{L^n(\lambda )}
\\
&
=\log \|\mathbf{1}_\Gamma e^{-I}\|_{L^\infty
(\lambda )}
\\
&
=-\mathop{\mathrm{essinf}}_{\hskip-7pt \omega \in\Gamma }\,I(\omega
).
\end{align*}
That is,
\begin{equation}
\label{1}
\lim_{n\to\infty }\frac1n\log\mu _n(\Gamma
)=-\mathop{\mathrm{essinf}}_{\hskip-7pt \omega \in\Gamma }\,I(\omega ).
\end{equation}
Of course, for many applications (for example, to number theory, geometry, or
statistical mechanics) the non-appearance of \( Z_n \) in the answer would
mean that one has thrown out the baby with the wash. On the other hand,
because it is so crude, the type of thinking used in the previous remark predicts
correct results even when it has no right to. To
wit, suppose that \( \Omega =\{\omega \in C([0,1];{\Bbb
R}):\,\omega (0)=0\} \) and \( \mu _n \) is the distribution of \( \omega
\in \Omega \longmapsto n^{-1/2}\,\omega \,\in\, \Omega \) under standard
Wiener measure. Clearly, the \( \mu _n \)
are degenerating to the point-mass at the path \( \mathbf{0} \) which is identically 0.
Moreover, Feynman’s representation of \( \mu _n \) is
\[
\mu _n(d\omega )=\frac1{Z_n}e^{-n\,I(\omega )}\,\lambda (d\omega ),
\]
where
\[
I(\omega )=\frac12\int_0^1|\dot\omega (t)|^2 dt
\]
and \( \lambda \) is the Lebesgue measure on \( \Omega \). Ignoring the fact that none of
this is mathematically kosher and proceeding formally, one is led to the
guess that \eqref{1} may nonetheless be true, at least after one has taken
into account some of its obvious flaws. In particular, there are two
sources of concern. The first of these is the almost-sure
non-differentiability of Wiener paths. However this objection is easily
overcome by simply defining \( I(\omega )=\infty \) unless \( \omega \) has a
square-integrable derivative. The second objection is that \( \lambda \) does
not exist and therefore the “ess” before the “inf” has no meaning.
This objection is more serious, and its solution requires greater subtlety.
In fact, it was Varadhan’s solution to this problem which was one of his
seminal contributions to the whole field of large deviations. Namely, our
derivation of \eqref{1} was purely measure-theoretic: we took no account
of topology. On the other hand, not even the sense in which the \( \mu _n \)
are degenerating can be rigorously described in purely measure-theoretic
terms. The best that one can say is that they are tending weakly to
\( \delta _\mathbf{0} \). Thus, one should suspect that \eqref{1} must be amended to
reflect the topology of \( \Omega \), and that topology should appear in exactly
the same way that it does in weak convergence. With this in mind, one can
understand Varadhan’s answer that \eqref{1} should be replaced by
\begin{align}
\label{2}
-\inf_{\omega \in\Gamma ^\circ}I(\omega )
&
\le\varliminf_{n\to\infty}\frac1n\log\mu _n(\Gamma )
\\
&
\le\varlimsup_{n\to\infty}\frac1n\log\mu _n(\Gamma )
\nonumber
\\
&
\le -\inf_{\omega \in\overline\Gamma }I(\omega ).
\nonumber
\end{align}
Monroe Donsker provided the original inspiration for this type of analysis
of rescaled Wiener measure, and his student Schilder was the
first to obtain rigorous results. However, it was Varadhan
[1]
who first realized that Schilder’s work could be viewed in the context of
large deviations, and it was he who gave and proved the validity of the
formulation in \eqref{2}. Indeed, a strong case can be made for saying
that the modern theory of large deviations was born in
[1].
In
particular, \eqref{2} quickly became the archetype for future results;
families \( \{\mu _n:\,n\in\mathbb{N}\} \) for which \eqref{2} are now said to
satisfy the large deviation principle with rate function \( I \). In
addition, it was in the same article that Varadhan proved how to pass from
\eqref{2}
to the sort of results which Schilder had proved. Namely, he proved that
if \eqref{2} holds with a rate function \( I \) which has compact level
sets (that is, \( \{\omega :\,I(\omega )\le R\} \) is compact for each
\( R\in[0,\infty ) \)) and \( F:\Omega \longrightarrow \mathbb{R} \) is a bounded,
continuous function, then
\begin{equation}
\label{3}
\lim_{n\to\infty }\mathbb{E}^{\mu _n}\bigl[e^{nF}\bigr]=\sup_{\omega \in\Omega }
\bigl(F(\omega )-I(\omega )\bigr).
\end{equation}
This result, which is commonly called Varadhan’s lemma, is exactly
what one would expect from the model case when \( \mu _n(d\omega
)=(1/{Z_n})\,e^{-nI}\,\lambda (d\omega ) \); its proof in general is
quite easy, but one would be hard put to overstate its importance. Not
only is it a practical computational tool, but it provides a link between the
theory of large deviations and convex analysis. Specifically, when \( \Omega \) is a closed, convex subset of a topological vector space \( E \), then, under
suitable integrability assumptions, Varadhan’s lemma combined with the
inversion formula for Legendre transforms often can be used to identify the rate
function \( I \) as the Legendre transform
\[ \Lambda ^*(\omega )=\sup_{\lambda \in E^*}\{\langle\omega ,\lambda
\rangle-\Lambda (\lambda ):\,\lambda \in E^*\},
\]
where
\[
\lambda \in E^*\longrightarrow \Lambda (\lambda )\equiv
\lim_{n\to\infty }\frac1n\log\mathbb{E}^{\mu_n}[
e^{n\lambda (\omega )}]. \]
Had he only laid the foundation for the field, Varadhan’s impact on the
study of large deviations would have been already profound. However, he
did much more. Perhaps his deepest contributions come from his recognition
that large deviations underlie and explain phenomena in which nobody
else even suspected their presence. The depth of his understanding is
exemplified by his explanation of Marc Kac’s famous formula for the
principle eigenvalue of a Schrödinger operator. Donsker had
been seeking such an explanation for years, but it was not until he joined
forces with Varadhan that real progress was made on the problem. Prior to
their article
[7],
all applications (Schilder’s theorem, including its
extensions and improvements by Varadhan, as well as the many beautiful
articles by Freidlin and Wentcel) of large deviations to diffusion
theory had been based on the observation that, during a short time interval,
“typical” behavior of a diffusion is given by the solution to an ordinary
differential equation. Thus, the large deviations in these applications
come from the perturbation of an ordinary differential equation by a
Gaussian-like noise term. The large deviations in
[7]
have an entirely
different origin. Instead of short-time behavior of the diffusion paths
themselves, the quantity under consideration is the long-time behavior of
their empirical distribution. In this case, “typical” behavior is
predicted by ergodic theory, and the large deviations are those of the
empirical distribution from ergodic behavior. The situation in
[7]
is made more challenging by the fact that there really is no proper ergodic
behavior of Brownian motion on the whole of space, since, in so far as
possible, the empirical distribution of a Brownian path is trying to become
the normalized Lebesgue measure. What saves the day is the potential term in
the Schrödinger operator, whose presence penalizes paths that attempt to
spread out too much.
The upshot of Donsker and Varadhan’s analysis
[7]
is a new
variational formula for the principal eigenvalue. Although their formula
reduces to the classical one in the case of self-adjoint operators,
it has the advantage that it relies entirely on probabilistic reasoning
(that is, the minimum principle) and, as they showed in
[3],
is therefore
equally valid for operators which are not self-adjoint. More important, it
launched a program which produced a spectacular sequence of articles.
The general theory was developed in
[3]
and
[10],
each one raising the level1
of abstraction and, at the same time, revealing
more fundamental principles. However, they did not content themselves with
general theory. On the contrary, they applied their theory to solve
a remarkably varied set of problems, ranging from questions about the range
of a random walk in
[17]
to questions coming from mathematical physics about
function-space integrals in
[10]
and
[20],
with each abstraction
designed to tackle a specific problem.
As is nearly always the case when breaking new ground, the applications
required ingenious modifications of the general theory. To give an
indication of just how ingenious these modifications had to be, consider
the “Wiener sausage” calculation in
[5].
The problem there, which
grew out of a question posed by the physicist J. Luttinger, was to find the
asymptotic volume of the tubular neighborhood of a Brownian path
as the time goes to infinity and the diameter of the neighborhood goes to
0. If one thinks about it, one realizes that this volume can be computed
by looking at a neighborhood in the space of measures of the empirical
distribution. However, the neighborhood that one needs is the one
determined by the variation norm, whereas their general theory deals with
the weak topology. Thus, except in one dimension where local time comes to
the rescue, they had to combine their general theory with an intricate
approximation procedure in order to arrive at their goal. Their
calculation in
[10]
is a true tour de force, only exceeded by
their solution to the polaron problem in
[20].
In conclusion, it should be emphasized that Varadhan’s contributions to the
theory of large deviations were to both its foundations and its
applications. Because of his work, the subject is now seen as a basic tool
of analysis, not simply an arcane branch of probability and statistics.
With 20/20 hindsight, it has become clear that large deviations did
not always provide the most efficient or best approach to some of the problems
which he solved, but there can be no doubt that his insights have
transformed the field forever.
[6]M. D. Donsker and S. R. S. Varadhan:
“Large deviations for Markov processes and the asymptotic evaluation of certain Markov process expectations for large times,”
pp. 82–88
in
Probabilistic methods in differential equations
(Victoria, BC, August 19–20, 1974).
Edited by M. A. Pinsky.
Lecture Notes in Mathematics451.
Springer (Berlin),
1975.
MR0410942incollection
[8]M. D. Donsker and S. R. S. Varadhan:
“On some problems of large deviations for Markov processes,”
pp. 409–416, 417–418
in
Proceedings of the 40th session of the International Statistical Institute
(Warsaw, 1975),
published as Bulletin of the International Statistical Institute46 : 1.
Héritiers Botta,
1975.
MR0488298Zbl0351.60036inproceedings
[9]M. D. Donsker and S. R. S. Varadhan:
“Asymptotic evaluation of certain Wiener integrals for large time,”
pp. 15–33
in
Functional integration and its applications
(London, April 1974).
Edited by A. M. Arthurs.
Clarendon Press (Oxford),
1975.
MR0486395Zbl0333.60078incollection
[12]M. D. Donsker and S. R. S. Varadhan:
“Some problems of large deviations,”
pp. 313–318
(INDAM, Rome, 1975).
Symposia Mathematica21.
Academic Press (London),
1977.
MR0517541Zbl0372.60036incollection
[13]G. C. Papanicolaou, D. Stroock, and S. R. S. Varadhan:
“Martingale approach to some limit theorems,”
pp. ii+120 pp.
in
Duke turbulence conference
(Durham, NC, April 23–25, 1976).
Edited by P. L. Chow.
Duke University Mathematics SeriesIII.
Duke University (Durham, NC),
1977.
MR0461684incollection
[15]M. D. Donsker and S. R. S. Varadhan:
“On the principal eigenvalue of elliptic second order differential operators,”
pp. 41–47
in
Proceedings of the international symposium on stochastic differential equations
(Kyoto, 1976).
Edited by K. Itō.
Wiley (New York),
1978.
MR536002Zbl0447.35030inproceedings
[19]S. R. S. Varadhan:
“Some problems of large deviations,”
pp. 755–762
in
Proceedings of the International Congress of Mathematicians
(Helsinki, 1978),
vol.2.
Edited by O. Lehto.
Acad. Sci. Fennica (Helsinki),
1980.
MR562683Zbl0421.60025inproceedings
[21]M. D. Donsker and S. R. S. Varadhan:
“Some problems of large deviations,”
pp. 41–46
in
Stochastic differential systems
(Visegrád, Hungary, September 15–20, 1980).
Edited by M. Arató, D. Vermes, and A. V. Balakrishnan.
Lecture Notes in Control and Information Sciences36.
Springer (Berlin),
1981.
MR653644Zbl0472.60028incollection
[22]S. R. S. Varadhan:
“Large deviations,”
pp. 382–392
in
Advances in filtering and optimal stochastic control
(Cocoyoc, Mexico, February 1–6, 1982).
Edited by W. H. Fleming and L. G. Gorostiza.
Lecture Notes in Control and Information Sciences42.
Springer (Berlin),
1982.
MR794532Zbl0496.60024incollection
[24]S. R. S. Varadhan:
Large deviations and applications.
CBMS-NSF Regional Conference Series in Applied Mathematics46.
Society for Industrial and Applied Mathematics (Philadelphia, PA),
1984.
MR758258Zbl0549.60023book
[25]S. R. S. Varadhan:
“Large deviations and applications,”
Exposition. Math.3 : 3
(1985),
pp. 251–272.
MR861018Zbl0567.60030article
[26]M. D. Donsker and S. R. S. Varadhan:
“Large deviations for stationary Gaussian processes,”
pp. 108–112
in
Stochastic differential systems
(Marseille-Luminy, 1984).
Edited by M. Métivier and É. Pardoux.
Lecture Notes in Control and Information Sciences69.
Springer (Berlin),
1985.
MR798313Zbl0657.60036incollection
[27]S. R. S. Varadhan:
“Stochastic differential equations–large deviations,”
pp. 625–678
in
Phénomènes critiques, systèmes aléatoires, théories de jauge
(Les Houches, 1984),
PartII.
Edited by K. Osterwalder, R. Stora, and D. Brydges.
North-Holland (Amsterdam),
1986.
MR880536incollection
[29]S. R. S. Varadhan:
“Large deviations and applications,”
pp. 1–49
in
École d’été de probabilités de Saint-Flour XV–XVII, 1985–87
(Saint-Flour, France, 1985–87).
Edited by P.-L. Hennequin.
Lecture Notes in Mathematics1362.
Springer (Berlin),
1988.
MR983371Zbl0661.60040incollection
[34]S. R. S. Varadhan:
“Large deviations for interacting particle systems,”
pp. 373–383
in
Perplexing problems in probability.
Edited by M. Bramson and R. Durrett.
Progress in Probability44.
Birkhäuser (Boston, MA),
1999.
MR1703141Zbl0941.60096incollection
[35]S. R. S. Varadhan:
“Large deviation and hydrodynamic scaling,”
pp. 265–286
in
Taniguchi conference on mathematics Nara ’98
(Nara, Japan, December 15–20, 1998).
Edited by M. Maruyama and T. Sunada.
Advanced Studies in Pure Mathematics31.
Mathematical Society of Japan (Tokyo),
2001.
MR1865096Zbl1006.60019incollection
[36]S. R. S. Varadhan:
“Rare events, large deviations,”
pp. 85–92
in
Mathematical finance — Bachelier Congress 2000
(Paris, June 29–July 1, 2000).
Edited by H. Geman, D. Madan, S. R. Pliska, and T. Vorst.
Springer Finance.
Springer (Berlin),
2002.
MR1960560incollection
[37]S. R. S. Varadhan:
“Large deviations and entropy,”
pp. 199–214
in
Entropy.
Edited by A. Greven, G. Keller, and G. Warnecke.
Princeton Series in Applied Mathematics.
Princeton University Press,
2003.
MR2035822Zbl1163.60312incollection
[39]S. R. S. Varadhan:
“Large deviations for the asymmetric simple exclusion process,”
pp. 1–27
in
Stochastic analysis on large scale interacting systems.
Edited by T. Funaki and H. Osada.
Advanced Studies in Pure Mathematics39.
Math. Soc. Japan (Tokyo),
2004.
MR2073328Zbl1114.60026incollection
[44]S. R. S. Varadhan:
“Workshop on large deviations: Lecture notes,”
pp. 1–14
in
Proceedings of the international symposium on probability theory and stochastic processes
(Cochin University of Science and Technology, Kochi, India, February 6–9, 2009),
published as Bull. Kerala Math. Assoc.Special Issue.
Issue edited by S. R. S. Varadhan.
2009.
MR2590249incollection
[45]Y. Kifer and S. R. S. Varadhan:
Nonconventional limit theorems in discrete and continuous time via martingales.
Preprint,
December2010.
ArXiv1012.2223techreport
[46]S. R. S. Varadhan:
“Large deviations,”
pp. 622–639
in
Proceedings of the International Congress of Mathematicians
(Hyderabad, India, August 19–27, 2010),
vol.I: Plenary lectures and ceremonies.
Edited by R. Bhatia.
Hindustan Book Agency (New Delhi),
2010.
MR2827907Zbl1228.60037inproceedings
[47]D. Stroock and S. R. S. Varadhan:
“Theory of diffusion processes,”
pp. 149–191
in
Stochastic differential equations
(Cortona, Italy, May 29–June 10, 1978).
Edited by J. Cecconi.
CIME Summer Schools77.
Springer (Heidelberg),
2010.
MR2830392incollection
[48]S. Chatterjee and S. R. S. Varadhan:
Large deviations for random matrices.
Preprint,
June2011.
ArXiv1106.4366techreport
[49]S. Sethuraman and S. R. S. Varadhan:
Large deviations for the current and tagged particle in 1D nearest-neighbor symmetric simple exclusion.
Preprint,
January2011.
ArXiv1101.1479techreport