by Daniel W. Stroock
Given a second-order (possibly degenerate) elliptic differential operator \begin{equation}\tag{1} \label{1} L=\frac12\sum_{i,j=1}^Na_{ij}(x)\,\partial {x_i}\partial _{x_j}+\sum_{i=1}^N b_i(x)\,\partial _{x_i} \end{equation} acting on \( C^2(\Bbb R;{\Bbb C}) \) and a Borel probability measure \( \mu \) on \( \mathbb{R}^N \), what does it mean for a Borel probability measure \( \mathbb{P}_\mu \) on \( C\bigl([0,\infty );\mathbb{R}^N\bigr) \) to be a diffusion process determined by \( L \) with initial value \( \mu \)? When Varadhan and I asked ourselves this question in the mid-1960s, we found no answer which satisfied us.
It was not that there were no answers. Indeed, already in the 1930s Kolmogorov [e1] had mapped out a path which started from \( L \) and ended at a measure \( \mathbb{P}_\mu \). Namely, he began by solving what is now called Kolmogorov’s forward equation \begin{equation}\tag{2} \label{2} \partial _tp(t,x,\,\cdot\,)=L^*p(t,x,\,\cdot\,)\quad\text{with } p(0,x,\,\cdot\,) =\delta _x, \end{equation} where \( L^* \) is the formal adjoint of \( L \) given by \[ L^*\psi (y)=\frac12\sum_{i,j=1}^N\partial _{y_i}\partial _{y_j}\bigl( a_{ij}(y)\,\psi (y)\bigr)-\sum_{i=1}^N\partial _{y_i}\bigl(b_i(y)\,\psi (y)\bigr). \] He then showed that, under suitable regularity and non-degeneracy conditions on the coefficients \( a \) and \( b \), there is one and only solution to \eqref{2} which is the density of a probability measure on \( \mathbb{R}^N \). Further he showed that \( p(t,x,y) \) is a continuous function of \( (t,x,y)\in (0,\infty )\times \mathbb{R}^N\times \mathbb{R}^N \) and that it satisfies the Chapman–Kolmogorov equation \begin{equation}\tag{3} \label{3} p(s+t,x,y)=\int p(t,\xi ,y)\,p(s,x,\xi )\,d\xi . \end{equation} Knowing \eqref{3}, he could check that, for any \( \mu \in\mathbf{M}(\mathbb{R}^N) \), the family of probability measures given by \begin{multline*} P_\mu \bigl((t_1,\dots,t_n);\Gamma \bigr) =\idotsint\limits_\Gamma \prod_{m=1}^np\bigl(t_m-t_{m-1},y_{m-1},y_m\bigr)\,\mu (dy_0)\,dy_1\cdots dy_n \\ \text{ for }n\ge0,\; 0=t_0 < t_1 < \cdots < t_n,\text{ and }\Gamma \in\mathcal{B}_{(\mathbb{R}^N)^{n+1}} \end{multline*} is consistent, in the sense that if \( \{t^{\prime}_1,\dots,t^{\prime}_{n^{\prime}}\}\subset \{t_1,\dots,t_n\} \) then \( P_\mu(( t^{\prime}_1,\dots,t^{\prime}_{n^{\prime}});\,\cdot\,) \) is the marginal distribution of \( P_\mu ((t_1,\dots,t_n);\,\cdot\,) \) on the coordinates corresponding to \( \{t^{\prime}_1,\dots,t^{\prime}_{n^{\prime}}\} \). Having established this consistency, Kolmogorov applied his Consistency Theorem to produce a measure \( \widetilde{\mathbb{P}}_\mu \) on \( (\mathbb{R}^N)^{[0,\infty )} \) whose finite-dimensional marginals are given by \[ \widetilde{\mathbb{P}}_\mu\bigl( \{\omega :\,(\omega (t_0),\dots,\omega (t_n))\in\Gamma \}\bigr)=P_\mu \bigl((t_1,\dots,t_n);\Gamma \bigr). \] Finally, using his continuity criterion, he showed that \( C\bigl([0,\infty );\mathbb{R}^N\bigr) \) has outer measure 1 under \( \tilde{\mathbb{P}}_\mu \) and therefore that the restriction of \( \tilde{ \mathbb{P}}_\mu \) to \( C\bigl([0,\infty );\mathbb{R}^N\bigr) \) determines a unique Borel probability measure \( \mathbb{P}_\mu \) there.
The trail which Kolmogorov blazed was widened and smoothed by many people, including W. Feller [e2], J. Doob [e3], G. Hunt [e4], [e5], R. Blumenthal and R. Getoor [e10], E. B. Dynkin [e6], and a host of others. What emerged from this line of research, especially after the work of Hunt, was an isomorphism between Markov processes and potential theory. However, none of these really addressed the question that was bothering Varadhan and me. Namely, from our point of view, the route mapped out by Kolmogorov was too circuitous: one started with \( L \), constructed from it either a transition probability function or resolvent operator, and only then characterized the resulting Markov processes in terms of these ancillary objects. What we were looking for was a characterization of \( \mathbb{P}_\mu \) directly in terms of \( L \), one from which it would be clear that, if (for each \( n\ge1 \)) \( \mathbb{P}_n \) is related to \( L_n \) with initial distribution \( \mu _n \), and if \( \mu _n\to \mu \) weakly and \( L_n\varphi \to L\varphi \) for smooth \( \varphi \) with compact support, then \( \mathbb{P}_n \) tends weakly to \( \mathbb{P}_\mu \). Obviously, the best way to achieve this goal is to phrase the characterization in terms of \( \mathbb{P}_\mu \)-integrals involving \( L\varphi \).
Formulating such a characterization was relatively easy. Namely, no matter how one goes about its construction, \( \mathbb{P}_\mu \) should have this property:1 \begin{gather} \tag{4} \label{4} \mathbb{E}^{\mathbb{P}_\mu } \bigl[\varphi \bigl(\omega (t_2)\bigr)\,\big|\,\mathcal{B}_{t_1}\bigr] -\varphi \bigl(\omega (t_1)\bigr)= \mathbb{E}^{\mathbb{P}_\mu }\Bigl[\int_{t_1}^{t_2}L\varphi \bigl(\omega (\tau )\bigr)\,d\tau \,\Big|\,\mathcal{B}_{t_1}\Bigr] \\ \quad \text{for all }\varphi \in C_{\mathrm{c}}^\infty (\mathbb{R}^N;{\Bbb C}), \nonumber \end{gather} where \( \mathcal{B}_t \) is the \( \sigma \)-algebra generated by \( \{\omega (\tau ):\,\tau \in[0,t]\} \). More succinctly, \begin{gather} \tag{5} \label{5} \Bigl(\varphi \bigl(\omega (t)\bigr)- \int_{0}^{t}L\varphi \bigl(\omega (\tau )\bigr)\,d\tau , \mathcal{B}_t, \mathbb{P}_\mu \Bigr) \\ \text{is a martingale for all }\varphi \in C_{\mathrm{c}}^\infty (\mathbb{R}^N;{\Bbb C}). \nonumber \end{gather} Thus, we said that \( \mathbb{P}_\mu \) solves the martingale problem for \( L \) with initial distribution \( \mu \) if \( \mu \) is the \( \mathbb{P}_\mu \)-distribution of \( \omega \rightsquigarrow \omega (0) \) and \eqref{5} holds.
Having formulated the martingale problem, our first task was to prove that, in great generality, solutions exist. Of course, seeing as all roads lead to solutions to the martingale problem (in situations to which they applied), we could have borrowed results from earlier work (for example, work of either the Kolmogorov or Itô schools). However, not only would that have made no contribution to known existence results, it would have been very inefficient. At least when \( a \) and \( b \) are bounded and continuous, existence of solutions to the martingale problem is an easy application of compactness, completely analogous to the proof of existence of solutions to ordinary differential equations with bounded continuous coefficients. Our second task was to show that, under appropriate conditions, solutions are unique, and we knew that uniqueness posed a challenge of an entirely different order from the one posed by existence. Indeed, if all roads lead to solutions to the martingale problem, then uniqueness would prove that all roads must have the same destination.
Our approach to uniqueness depended on the assumptions being made about the coefficients \( a \) and \( b \). One approach turned on the observation that any solution to the martingale problem is the distribution of a solution to an Itô stochastic integral equation. When the coefficients are reasonably smooth, this observation allowed us to prove uniqueness for the martingale problem as a consequence of uniqueness for the associated stochastic integral equation. However, when the coefficients are not smooth, Itô’s theory cannot cope. Specifically, because his theory ignores ellipticity properties, it is difficult to see how one would go about bolstering it with powerful results from the vast and beautiful PDE literature.
The key to overcoming the problem just raised is to realize that uniqueness for the martingale problem is a purely distributional question, whereas the uniqueness found in Itô’s theory is a path-by-path statement about a certain transformation (the “Itô map”) of Brownian paths. Thus, to take advantage of the PDE existence theory, we abandoned Itô and adopted an easy duality argument which shows that uniqueness for the martingale problem requires only a sufficiently good existence statement about the initial-value problem for the backward equation \( \partial _tu=Lu \). Namely, if \( u \) is a classical solution2 to the backward equation with initial data \( f \), then one can show that, for any solution \( \mathbb{P} \) to the martingale problem for \( L \), \[ \Bigl(u\bigl((t_2-t)^+,\omega (t)\bigr), \mathcal{B}_t, \mathbb{P}\Bigr) \] is a martingale and therefore that, for all \( 0\le t_1 < t_2 \), \[ \mathbb{E}^\mathbb{P}\bigl[f\bigl(\omega (t_2)\bigr)\,\big|\,\mathcal{B}_{t_1}\bigr]=u\bigl(t_2-t_1,\omega (t_1)\bigr). \] Hence, if such classical solutions exist for enough of the \( f \), then the conditional distribution of \( \omega \rightsquigarrow\omega (t_2) \) under \( \mathbb{P} \) given \( \mathcal{B}_{t_1} \) is uniquely determined for all \( 0\le t_1 < t_2 \). From this, together with the condition that the \( \mathbb{P} \)-distribution of \( \omega \rightsquigarrow\omega (0) \) is \( \mu \), it follows that there is only one solution to the martingale problem. This argument works without a hitch when, for example, \( a \) and \( b \) are bounded and uniformly Hölder continuous and \( a \) is uniformly elliptic (that is, \( a\ge \varepsilon I \) for some \( \varepsilon > 0 \)), since in that case PDE theorists had long ago provided the necessary existence statement about the backward equation.
When uniform Hölder continuity is replaced by simple uniform continuity, the existence theory for the backward equation is less satisfactory. To be precise, although solutions continue to exist, they are no longer classical. Instead, their second derivatives exist only in \( L^p(\mathbb{R}^N;{\Bbb C}) \) for \( p\in(1,\infty ) \). As a consequence, the preceding duality argument breaks down unless one can show a priori that every solution to the martingale problem is sufficiently regular to tolerate limit procedures involving \( L^p(\mathbb{R}^N) \)-convergence of the integrands. Without question, the proof of such an a priori regularity result was the single most intricate part of our program. In broad outline, our proof went as follows: First, we developed machinery which allowed us to reduce everything to the case when \( b=0 \) and \( a \) is an arbitrarily small perturbation of the identity. Second, we again used the connection with the Itô stochastic integral equations to show that any solution \( \mathbb{P} \) can be approximated by \( \mathbb{P}_n \)’s which, conditioned on \( \mathcal{B}_{m/n} \), are Gaussian during the time interval \( [m/n,\,(m+1)/n] \), and therefore have the required regularity properties. Finally, as an application of the Calderon–Zygmund theory of singular integral operators, for each \( p\in(1,\infty ) \) we were able to show that, when the perturbation is sufficiently small, the \( L^p(\mathbb{R}^N) \)-regularity properties of the approximating \( \mathbb{P}_n \)’s can be controlled independent of \( n \) and are therefore enjoyed by \( \mathbb{P} \) itself.
The steps outlined above led us to a proof of the following theorem, which should be considered the centerpiece of [3], [2].
Theorem. Let \( L \) be given by \eqref{1}, where \( a \) and \( b \) are bounded, \( a \) is continuous, \( b \) is Borel measurable, and \( a(x) \) is symmetric and strictly positive definite for each \( x\in\mathbb{R}^N \). Then, for each Borel probability measure \( \mu \) on \( \mathbb{R}^N \), there is precisely one solution to the martingale for \( L \) with initial distribution \( \mu \). Moreover, if \( \delta _x \) is the unit mass at \( x \) and \( \mathbb{P}_x=\mathbb{P}_{\delta _x} \), then \( x\rightsquigarrow \mathbb{P}_x \) is weakly continuous and \( \{\mathbb{P}_x:\,x\in\mathbb{R}^N\} \) is a strong Markov family.
Concluding remarks
There are several additional comments which may be helpful.
1. Although, with 20/20 hindsight, the characterization given by \eqref{4} is the most obvious one, it is not the one which we chose initially. Instead, because we were influenced at the time both by Henry McKean’s treatment [e12] of Itô’s theory of stochastic integral equations and by a possible analogy with characteristic functions, we chose to characterize \( \mathbb{P}_\mu \) by saying that \begin{multline*}\tag{6} \label{6} \Bigl(\exp\Bigl[\Bigl(\xi ,\omega (t)-\int_0^tb\bigl(\omega (\tau )\bigr)\,d\tau \Bigr)_\mathbb{R}^N-\tfrac12\int_0^t\bigl(\xi ,a(\tau )\xi \bigr)_\mathbb{R}^N\,d\tau\Bigr], \mathcal{B}_t, \mathbb{P}_\mu \Bigr)\\ \text{is a martingale for all }\xi \in{\Bbb C}^N. \end{multline*} The equivalence of \eqref{4} and \eqref{6} is a quite easy application of elementary Fourier analysis and the observation that, aside from integrability issues, \[ \Bigl(M(t)\,V(t)-\int_0^tM(\tau )\,dV(\tau ), \mathcal{B}_t, \mathbb{P}\Bigr) \] is a martingale when \( \bigl(M(t),\,\mathcal{B}_t,\,\mathbb{P}\bigr) \) is a continuous martingale and \( V(t) \) is a continuous, \( \{\mathcal{B}_t:\,t\ge0\} \)-adapted process of locally bounded variation.
2. The preceding theorem is not the one on which we had originally hoped to settle. When we started, we had hoped for a statement in which \( a \) needed only be Borel measurable and locally uniformly elliptic. Using special, heavily dimension-dependent arguments, we did prove such a result when \( N\in\{1,2\} \), and we somewhat naively assumed that it was only the limits on our own technical powers which prevented us from doing so in higher dimensions. However, a few years ago Nikolai Nadirashvili [e11] gave a beautiful example that showed that the limits on our powers were not responsible for our inability to go to higher dimensions. Namely, he produced a uniformly elliptic, bounded, Borel measurable \( a \) on \( \mathbb{R}^3 \) for which the associated martingale problem admits more than one solution.
3. In retrospect, there are many aspects of this work which look rather crude and unnecessarily cumbersome. For instance, around the same time, first P. A. Meyer [e7] and then N. Kunita and S. Watenabe [e9] gave a much better and cleaner treatment of the stochastic integral calculus which we developed. More significant, had we been aware of the spectacular work being done by N. Krylov, in particular the estimate in [e8], we could have vastly simplified the regularity proof just described.
4. In a later work, we extended the martingale problem approach to cover diffusions with boundary conditions. The resulting paper [4] has, for good reasons, been much less influential: it is nearly unreadable. On the other hand, it contains information which, so far as I know, is available nowhere else. In particular, I do not think the theorems there dealing with approximations by discrete Markov chains have ever been improved.
5. Varadhan and my joint work on diffusion theory culminated in our papers [8] and [9] on degenerate diffusions. The first of these contains our initial version of “the support theorem,” which has been given several other proofs over the years. Among other things, the second paper contains an extension of “the support theorem” to cover situations in which the diffusion matrix \( a \) is smooth but does not admit a smooth square root. It seems that most probabilists are not particularly impressed by this extension, but we were very proud of it at the time.