Sup­pose that \( \{\mu _n:\,n\in\mathbb{N}\} \) is a se­quence of Borel prob­ab­il­ity meas­ures on a Pol­ish space \( \Omega \), and as­sume that, as \( n\to\infty \), \( \mu _n \) de­gen­er­ates to the point mass \( \delta _{\omega_0} \) at \( \omega_0\in \Omega \). Then, it is reas­on­able to say that, at least for large \( n \), neigh­bor­hoods of \( \omega _0 \) rep­res­ent “typ­ic­al” be­ha­vi­or and that their com­ple­ments rep­res­ent “de­vi­ant” be­ha­vi­or; it is of­ten im­port­ant to know how fast their com­ple­ments are be­com­ing de­vi­ant. Find­ing a de­tailed solu­tion to such a prob­lem usu­ally en­tails rather in­tric­ate ana­lys­is. However, if one’s in­terest is in be­ha­vi­or which is “highly de­vi­ant”, in the sense that it is dy­ing out at an ex­po­nen­tial rate, and if one is sat­is­fied with find­ing the ex­po­nen­tial rate at which it is dis­ap­pear­ing, then one is study­ing large de­vi­ations and life is much easi­er. In­deed, in­stead of try­ing to cal­cu­late the asymp­tot­ic lim­it of quant­it­ies like \( \mu _n\bigl(B(\omega_0,r)^{\complement} \bigr) \) (where \( B(\omega ,r) \) de­notes the ball of ra­di­us \( r \) centered at \( \omega \)), one is try­ing to cal­cu­late \begin{equation} \lim_{n\to\infty }\frac1n \log \mu _n\bigl(B(\omega_0,r)^{\complement} \bigr) , \end{equation} which is an in­her­ently sim­pler task.

To de­vel­op the in­tu­ition for this type of ana­lys­is, re­mem­ber that we are deal­ing here with prob­ab­il­ity meas­ures. Thus, the only way that the \( \mu _n \) can de­gen­er­ate to \( \delta _{\omega_0} \) is that more and more of their mass is con­cen­trated in a neigh­bor­hood of \( \omega_0 \). In the nicest situ­ation, this con­cen­tra­tion is tak­ing place be­cause \[ \mu_n(d\omega )=\frac1{Z_n}e^{-nI(\omega )}\lambda (d\omega ) ,\] where \( I(\omega ) > I(\omega_0)\ge0 \) for \( \omega \neq \omega_0 \), and \( \lambda \) is some ref­er­ence meas­ure. In­deed, as­sum­ing that \[ \lim_{n\to\infty}\frac1n\log Z_n=0 ,\] then \begin{align*} \lim_{n\to\infty }\frac1n\log \mu _n(\Gamma ) & =\lim_{n\to\infty } \log\|\mathbf{1}_\Gamma e^{-I}\|_{L^n(\lambda )} \\ & =\log \|\mathbf{1}_\Gamma e^{-I}\|_{L^\infty (\lambda )} \\ & =-\mathop{\mathrm{essinf}}_{\hskip-7pt \omega \in\Gamma }\,I(\omega ). \end{align*} That is, \begin{equation} \label{1} \lim_{n\to\infty }\frac1n\log\mu _n(\Gamma )=-\mathop{\mathrm{essinf}}_{\hskip-7pt \omega \in\Gamma }\,I(\omega ). \end{equation} Of course, for many ap­plic­a­tions (for ex­ample, to num­ber the­ory, geo­metry, or stat­ist­ic­al mech­an­ics) the non-ap­pear­ance of \( Z_n \) in the an­swer would mean that one has thrown out the baby with the wash. On the oth­er hand, be­cause it is so crude, the type of think­ing used in the pre­vi­ous re­mark pre­dicts cor­rect res­ults even when it has no right to. To wit, sup­pose that \( \Omega =\{\omega \in C([0,1];{\Bbb R}):\,\omega (0)=0\} \) and \( \mu _n \) is the dis­tri­bu­tion of \( \omega \in \Omega \longmapsto n^{-1/2}\,\omega \,\in\, \Omega \) un­der stand­ard Wien­er meas­ure. Clearly, the \( \mu _n \) are de­gen­er­at­ing to the point-mass at the path \( \mathbf{0} \) which is identic­ally 0. Moreover, Feyn­man’s rep­res­ent­a­tion of \( \mu _n \) is \[ \mu _n(d\omega )=\frac1{Z_n}e^{-n\,I(\omega )}\,\lambda (d\omega ), \] where \[ I(\omega )=\frac12\int_0^1|\dot\omega (t)|^2 dt \] and \( \lambda \) is the Le­besgue meas­ure on \( \Omega \). Ig­nor­ing the fact that none of this is math­em­at­ic­ally kosh­er and pro­ceed­ing form­ally, one is led to the guess that \eqref{1} may non­ethe­less be true, at least after one has taken in­to ac­count some of its ob­vi­ous flaws. In par­tic­u­lar, there are two sources of con­cern. The first of these is the al­most-sure non-dif­fer­en­ti­ab­il­ity of Wien­er paths. However this ob­jec­tion is eas­ily over­come by simply de­fin­ing \( I(\omega )=\infty \) un­less \( \omega \) has a square-in­teg­rable de­riv­at­ive. The second ob­jec­tion is that \( \lambda \) does not ex­ist and there­fore the “ess” be­fore the “inf” has no mean­ing. This ob­jec­tion is more ser­i­ous, and its solu­tion re­quires great­er sub­tlety. In fact, it was Varadhan’s solu­tion to this prob­lem which was one of his sem­in­al con­tri­bu­tions to the whole field of large de­vi­ations. Namely, our de­riv­a­tion of \eqref{1} was purely meas­ure-the­or­et­ic: we took no ac­count of to­po­logy. On the oth­er hand, not even the sense in which the \( \mu _n \) are de­gen­er­at­ing can be rig­or­ously de­scribed in purely meas­ure-the­or­et­ic terms. The best that one can say is that they are tend­ing weakly to \( \delta _\mathbf{0} \). Thus, one should sus­pect that \eqref{1} must be amended to re­flect the to­po­logy of \( \Omega \), and that to­po­logy should ap­pear in ex­actly the same way that it does in weak con­ver­gence. With this in mind, one can un­der­stand Varadhan’s an­swer that \eqref{1} should be re­placed by \begin{align} \label{2} -\inf_{\omega \in\Gamma ^\circ}I(\omega ) & \le\varliminf_{n\to\infty}\frac1n\log\mu _n(\Gamma ) \\ & \le\varlimsup_{n\to\infty}\frac1n\log\mu _n(\Gamma ) \nonumber \\ & \le -\inf_{\omega \in\overline\Gamma }I(\omega ). \nonumber \end{align}

Mon­roe Don­sker provided the ori­gin­al in­spir­a­tion for this type of ana­lys­is of res­caled Wien­er meas­ure, and his stu­dent Schilder was the first to ob­tain rig­or­ous res­ults. However, it was Varadhan [1] who first real­ized that Schilder’s work could be viewed in the con­text of large de­vi­ations, and it was he who gave and proved the valid­ity of the for­mu­la­tion in \eqref{2}. In­deed, a strong case can be made for say­ing that the mod­ern the­ory of large de­vi­ations was born in [1]. In par­tic­u­lar, \eqref{2} quickly be­came the ar­che­type for fu­ture res­ults; fam­il­ies \( \{\mu _n:\,n\in\mathbb{N}\} \) for which \eqref{2} are now said to sat­is­fy the large de­vi­ation prin­ciple with rate func­tion \( I \). In ad­di­tion, it was in the same art­icle that Varadhan proved how to pass from \eqref{2} to the sort of res­ults which Schilder had proved. Namely, he proved that if \eqref{2} holds with a rate func­tion \( I \) which has com­pact level sets (that is, \( \{\omega :\,I(\omega )\le R\} \) is com­pact for each \( R\in[0,\infty ) \)) and \( F:\Omega \longrightarrow \mathbb{R} \) is a bounded, con­tinu­ous func­tion, then \begin{equation} \label{3} \lim_{n\to\infty }\mathbb{E}^{\mu _n}\bigl[e^{nF}\bigr]=\sup_{\omega \in\Omega } \bigl(F(\omega )-I(\omega )\bigr). \end{equation} This res­ult, which is com­monly called Varadhan’s lemma, is ex­actly what one would ex­pect from the mod­el case when \( \mu _n(d\omega )=(1/{Z_n})\,e^{-nI}\,\lambda (d\omega ) \); its proof in gen­er­al is quite easy, but one would be hard put to over­state its im­port­ance. Not only is it a prac­tic­al com­pu­ta­tion­al tool, but it provides a link between the the­ory of large de­vi­ations and con­vex ana­lys­is. Spe­cific­ally, when \( \Omega \) is a closed, con­vex sub­set of a to­po­lo­gic­al vec­tor space \( E \), then, un­der suit­able in­teg­rabil­ity as­sump­tions, Varadhan’s lemma com­bined with the in­ver­sion for­mula for Le­gendre trans­forms of­ten can be used to identi­fy the rate func­tion \( I \) as the Le­gendre trans­form \[ \Lambda ^*(\omega )=\sup_{\lambda \in E^*}\{\langle\omega ,\lambda \rangle-\Lambda (\lambda ):\,\lambda \in E^*\}, \] where \[ \lambda \in E^*\longrightarrow \Lambda (\lambda )\equiv \lim_{n\to\infty }\frac1n\log\mathbb{E}^{\mu_n}[ e^{n\lambda (\omega )}]. \]

Had he only laid the found­a­tion for the field, Varadhan’s im­pact on the study of large de­vi­ations would have been already pro­found. However, he did much more. Per­haps his deep­est con­tri­bu­tions come from his re­cog­ni­tion that large de­vi­ations un­der­lie and ex­plain phe­nom­ena in which nobody else even sus­pec­ted their pres­ence. The depth of his un­der­stand­ing is ex­em­pli­fied by his ex­plan­a­tion of Marc Kac’s fam­ous for­mula for the prin­ciple ei­gen­value of a Schrödinger op­er­at­or. Don­sker had been seek­ing such an ex­plan­a­tion for years, but it was not un­til he joined forces with Varadhan that real pro­gress was made on the prob­lem. Pri­or to their art­icle [7], all ap­plic­a­tions (Schilder’s the­or­em, in­clud­ing its ex­ten­sions and im­prove­ments by Varadhan, as well as the many beau­ti­ful art­icles by Freidlin and Wentcel) of large de­vi­ations to dif­fu­sion the­ory had been based on the ob­ser­va­tion that, dur­ing a short time in­ter­val, “typ­ic­al” be­ha­vi­or of a dif­fu­sion is giv­en by the solu­tion to an or­din­ary dif­fer­en­tial equa­tion. Thus, the large de­vi­ations in these ap­plic­a­tions come from the per­turb­a­tion of an or­din­ary dif­fer­en­tial equa­tion by a Gaus­si­an-like noise term. The large de­vi­ations in [7] have an en­tirely dif­fer­ent ori­gin. In­stead of short-time be­ha­vi­or of the dif­fu­sion paths them­selves, the quant­ity un­der con­sid­er­a­tion is the long-time be­ha­vi­or of their em­pir­ic­al dis­tri­bu­tion. In this case, “typ­ic­al” be­ha­vi­or is pre­dicted by er­god­ic the­ory, and the large de­vi­ations are those of the em­pir­ic­al dis­tri­bu­tion from er­god­ic be­ha­vi­or. The situ­ation in [7] is made more chal­len­ging by the fact that there really is no prop­er er­god­ic be­ha­vi­or of Browni­an mo­tion on the whole of space, since, in so far as pos­sible, the em­pir­ic­al dis­tri­bu­tion of a Browni­an path is try­ing to be­come the nor­mal­ized Le­besgue meas­ure. What saves the day is the po­ten­tial term in the Schrödinger op­er­at­or, whose pres­ence pen­al­izes paths that at­tempt to spread out too much.

The up­shot of Don­sker and Varadhan’s ana­lys­is [7] is a new vari­ation­al for­mula for the prin­cip­al ei­gen­value. Al­though their for­mula re­duces to the clas­sic­al one in the case of self-ad­joint op­er­at­ors, it has the ad­vant­age that it re­lies en­tirely on prob­ab­il­ist­ic reas­on­ing (that is, the min­im­um prin­ciple) and, as they showed in [3], is there­fore equally val­id for op­er­at­ors which are not self-ad­joint. More im­port­ant, it launched a pro­gram which pro­duced a spec­tac­u­lar se­quence of art­icles. The gen­er­al the­ory was de­veloped in [3] and [10], each one rais­ing the level1 of ab­strac­tion and, at the same time, re­veal­ing more fun­da­ment­al prin­ciples. However, they did not con­tent them­selves with gen­er­al the­ory. On the con­trary, they ap­plied their the­ory to solve a re­mark­ably var­ied set of prob­lems, ran­ging from ques­tions about the range of a ran­dom walk in [17] to ques­tions com­ing from math­em­at­ic­al phys­ics about func­tion-space in­teg­rals in [10] and [20], with each ab­strac­tion de­signed to tackle a spe­cif­ic prob­lem.

As is nearly al­ways the case when break­ing new ground, the ap­plic­a­tions re­quired in­geni­ous modi­fic­a­tions of the gen­er­al the­ory. To give an in­dic­a­tion of just how in­geni­ous these modi­fic­a­tions had to be, con­sider the “Wien­er saus­age” cal­cu­la­tion in [5]. The prob­lem there, which grew out of a ques­tion posed by the phys­i­cist J. Lut­tinger, was to find the asymp­tot­ic volume of the tu­bu­lar neigh­bor­hood of a Browni­an path as the time goes to in­fin­ity and the dia­met­er of the neigh­bor­hood goes to 0. If one thinks about it, one real­izes that this volume can be com­puted by look­ing at a neigh­bor­hood in the space of meas­ures of the em­pir­ic­al dis­tri­bu­tion. However, the neigh­bor­hood that one needs is the one de­term­ined by the vari­ation norm, where­as their gen­er­al the­ory deals with the weak to­po­logy. Thus, ex­cept in one di­men­sion where loc­al time comes to the res­cue, they had to com­bine their gen­er­al the­ory with an in­tric­ate ap­prox­im­a­tion pro­ced­ure in or­der to ar­rive at their goal. Their cal­cu­la­tion in [10] is a true tour de force, only ex­ceeded by their solu­tion to the po­laron prob­lem in [20].

In con­clu­sion, it should be em­phas­ized that Varadhan’s con­tri­bu­tions to the the­ory of large de­vi­ations were to both its found­a­tions and its ap­plic­a­tions. Be­cause of his work, the sub­ject is now seen as a ba­sic tool of ana­lys­is, not simply an ar­cane branch of prob­ab­il­ity and stat­ist­ics. With 20/20 hind­sight, it has be­come clear that large de­vi­ations did not al­ways provide the most ef­fi­cient or best ap­proach to some of the prob­lems which he solved, but there can be no doubt that his in­sights have trans­formed the field forever.


