#### by Daniel W. Stroock

Given a second-order (possibly degenerate) elliptic differential operator
__\begin{equation}\tag{1}
\label{1}
L=\frac12\sum_{i,j=1}^Na_{ij}(x)\,\partial {x_i}\partial _{x_j}+\sum_{i=1}^N
b_i(x)\,\partial _{x_i}
\end{equation}__
acting on __\( C^2(\Bbb R;{\Bbb C}) \)__ and a Borel probability measure __\( \mu \)__ on
__\( \mathbb{R}^N \)__, what does it mean for a Borel probability measure __\( \mathbb{P}_\mu \)__ on __\( C\bigl([0,\infty
);\mathbb{R}^N\bigr) \)__ to be a diffusion process determined by __\( L \)__ with initial
value __\( \mu \)__?
When Varadhan and I asked ourselves this question in the mid-1960s, we
found no answer which satisfied us.

It was not that there were no answers. Indeed, already in the 1930s
Kolmogorov
[e1]
had mapped out a path which started from
__\( L \)__ and ended at a measure __\( \mathbb{P}_\mu \)__. Namely, he began by solving what is
now called Kolmogorov’s forward equation
__\begin{equation}\tag{2}
\label{2}
\partial _tp(t,x,\,\cdot\,)=L^*p(t,x,\,\cdot\,)\quad\text{with } p(0,x,\,\cdot\,)
=\delta _x,
\end{equation}__
where __\( L^* \)__ is the formal adjoint of __\( L \)__ given by
__\[
L^*\psi (y)=\frac12\sum_{i,j=1}^N\partial _{y_i}\partial _{y_j}\bigl(
a_{ij}(y)\,\psi (y)\bigr)-\sum_{i=1}^N\partial _{y_i}\bigl(b_i(y)\,\psi
(y)\bigr).
\]__
He then showed that, under suitable regularity and non-degeneracy
conditions on the coefficients __\( a \)__ and __\( b \)__, there is one and only solution
to __\eqref{2}__ which is the density of a probability measure on
__\( \mathbb{R}^N \)__. Further he showed that __\( p(t,x,y) \)__ is a continuous function of
__\( (t,x,y)\in (0,\infty )\times \mathbb{R}^N\times \mathbb{R}^N \)__ and that it satisfies the
Chapman–Kolmogorov equation
__\begin{equation}\tag{3}
\label{3}
p(s+t,x,y)=\int p(t,\xi ,y)\,p(s,x,\xi )\,d\xi .
\end{equation}__
Knowing __\eqref{3}__, he could check that, for any __\( \mu \in\mathbf{M}(\mathbb{R}^N) \)__, the
family of probability measures given by
__\begin{multline*}
P_\mu \bigl((t_1,\dots,t_n);\Gamma \bigr)
=\idotsint\limits_\Gamma
\prod_{m=1}^np\bigl(t_m-t_{m-1},y_{m-1},y_m\bigr)\,\mu (dy_0)\,dy_1\cdots
dy_n
\\
\text{ for }n\ge0,\; 0=t_0 < t_1 < \cdots < t_n,\text{ and }\Gamma
\in\mathcal{B}_{(\mathbb{R}^N)^{n+1}}
\end{multline*}__
is consistent, in the sense that if __\( \{t^{\prime}_1,\dots,t^{\prime}_{n^{\prime}}\}\subset
\{t_1,\dots,t_n\} \)__ then __\( P_\mu((
t^{\prime}_1,\dots,t^{\prime}_{n^{\prime}});\,\cdot\,) \)__ is the marginal distribution of
__\( P_\mu ((t_1,\dots,t_n);\,\cdot\,) \)__ on the coordinates corresponding to
__\( \{t^{\prime}_1,\dots,t^{\prime}_{n^{\prime}}\} \)__. Having established this consistency, Kolmogorov
applied his Consistency Theorem to produce a measure __\( \widetilde{\mathbb{P}}_\mu \)__ on
__\( (\mathbb{R}^N)^{[0,\infty )} \)__ whose finite-dimensional marginals are given by
__\[
\widetilde{\mathbb{P}}_\mu\bigl( \{\omega :\,(\omega (t_0),\dots,\omega
(t_n))\in\Gamma \}\bigr)=P_\mu \bigl((t_1,\dots,t_n);\Gamma
\bigr).
\]__
Finally, using his continuity criterion, he showed that __\( C\bigl([0,\infty
);\mathbb{R}^N\bigr) \)__ has outer measure
1 under __\( \tilde{\mathbb{P}}_\mu \)__ and therefore that the restriction of __\( \tilde{
\mathbb{P}}_\mu \)__ to __\( C\bigl([0,\infty );\mathbb{R}^N\bigr) \)__ determines a unique Borel probability
measure __\( \mathbb{P}_\mu \)__ there.

The trail which Kolmogorov blazed was widened and smoothed by many people,
including W. Feller
[e2],
J. Doob
[e3],
G. Hunt
[e4],
[e5],
R. Blumenthal and R. Getoor
[e10],
E. B. Dynkin
[e6],
and a host of
others. What emerged from this line of
research, especially after the work of Hunt, was an isomorphism between Markov processes and potential theory.
However, none of these really addressed the question that was bothering
Varadhan and me. Namely, from our point of view, the route mapped out by
Kolmogorov was too circuitous: one started with __\( L \)__, constructed from it
either a transition probability function or resolvent operator, and only
then characterized the resulting Markov processes in terms of these
ancillary objects. What we were looking for was a characterization of
__\( \mathbb{P}_\mu \)__ directly in terms of __\( L \)__, one from which it would be clear that, if (for each __\( n\ge1 \)__) __\( \mathbb{P}_n \)__ is related to __\( L_n \)__ with initial distribution __\( \mu
_n \)__, and if __\( \mu _n\to \mu \)__ weakly and __\( L_n\varphi
\to L\varphi \)__ for smooth __\( \varphi \)__ with compact support,
then __\( \mathbb{P}_n \)__ tends weakly to __\( \mathbb{P}_\mu \)__. Obviously, the best way to achieve
this goal is to phrase the characterization in terms of __\( \mathbb{P}_\mu \)__-integrals
involving __\( L\varphi \)__.

Formulating such a characterization was relatively easy. Namely, no matter
how one goes about its construction, __\( \mathbb{P}_\mu \)__ should have this
property:1
__\begin{gather}
\tag{4}
\label{4}
\mathbb{E}^{\mathbb{P}_\mu } \bigl[\varphi \bigl(\omega (t_2)\bigr)\,\big|\,\mathcal{B}_{t_1}\bigr]
-\varphi \bigl(\omega (t_1)\bigr)=
\mathbb{E}^{\mathbb{P}_\mu }\Bigl[\int_{t_1}^{t_2}L\varphi \bigl(\omega (\tau )\bigr)\,d\tau
\,\Big|\,\mathcal{B}_{t_1}\Bigr]
\\
\quad
\text{for all }\varphi \in C_{\mathrm{c}}^\infty (\mathbb{R}^N;{\Bbb C}),
\nonumber
\end{gather}__
where __\( \mathcal{B}_t \)__ is the __\( \sigma \)__-algebra generated by __\( \{\omega (\tau
):\,\tau \in[0,t]\} \)__. More succinctly,
__\begin{gather}
\tag{5}
\label{5}
\Bigl(\varphi \bigl(\omega (t)\bigr)-
\int_{0}^{t}L\varphi \bigl(\omega (\tau )\bigr)\,d\tau , \mathcal{B}_t, \mathbb{P}_\mu \Bigr)
\\
\text{is a martingale for all }\varphi \in C_{\mathrm{c}}^\infty (\mathbb{R}^N;{\Bbb C}).
\nonumber
\end{gather}__
Thus, we said that __\( \mathbb{P}_\mu \)__
*solves the martingale problem for \( L \) with initial distribution \( \mu \)*
if

__\( \mu \)__is the

__\( \mathbb{P}_\mu \)__-distribution of

__\( \omega \rightsquigarrow \omega (0) \)__and

__\eqref{5}__holds.

Having formulated the martingale problem, our first task was
to prove that, in great generality, solutions exist. Of course, seeing as
all roads lead to solutions to the martingale problem (in situations to which they
applied), we could have borrowed results from earlier work (for example, work of either the
Kolmogorov or Itô schools). However, not only would that have made no
contribution to known existence results, it would have been very
inefficient. At least when __\( a \)__ and __\( b \)__ are bounded and continuous,
existence of solutions to the martingale problem is an easy application of
compactness, completely analogous to the proof of existence of solutions to
ordinary differential equations with bounded continuous coefficients. Our
second task was to show that, under appropriate conditions, solutions are
unique, and we knew that uniqueness posed a challenge of an entirely
different order from the one posed by existence. Indeed, if all roads lead
to solutions to the martingale problem, then uniqueness would prove that all
roads must have the same destination.

Our approach to uniqueness depended on the assumptions being made about the
coefficients __\( a \)__ and __\( b \)__. One approach turned on the observation that any solution
to the martingale problem is the distribution of a solution to an Itô stochastic
integral equation. When the coefficients are reasonably
smooth, this observation allowed us to prove uniqueness for the martingale
problem as a consequence of uniqueness for the associated stochastic integral
equation. However, when the coefficients are not smooth, Itô’s theory
cannot cope. Specifically, because his theory ignores ellipticity
properties, it is difficult to see how one would go about bolstering it with
powerful results from the vast and beautiful PDE literature.

The key to overcoming the problem just raised is to realize that
uniqueness for the martingale problem is a purely
distributional question, whereas the uniqueness found in Itô’s theory is
a path-by-path statement about a certain transformation (the “Itô map”)
of Brownian paths. Thus, to take advantage of the PDE existence theory,
we abandoned Itô and adopted an easy duality
argument which shows that uniqueness for the martingale problem requires only
a sufficiently good existence statement about the initial-value problem for the
backward equation __\( \partial _tu=Lu \)__. Namely, if __\( u \)__ is a classical
solution2
to the
backward equation with initial data __\( f \)__, then one can show that, for any
solution __\( \mathbb{P} \)__ to the martingale problem for __\( L \)__,
__\[ \Bigl(u\bigl((t_2-t)^+,\omega (t)\bigr), \mathcal{B}_t, \mathbb{P}\Bigr) \]__
is a martingale and therefore that, for all __\( 0\le t_1 < t_2 \)__,
__\[ \mathbb{E}^\mathbb{P}\bigl[f\bigl(\omega
(t_2)\bigr)\,\big|\,\mathcal{B}_{t_1}\bigr]=u\bigl(t_2-t_1,\omega (t_1)\bigr). \]__
Hence, if such classical solutions exist for enough of the __\( f \)__, then the
conditional distribution of __\( \omega \rightsquigarrow\omega (t_2) \)__ under __\( \mathbb{P} \)__ given
__\( \mathcal{B}_{t_1} \)__ is uniquely determined for all __\( 0\le t_1 < t_2 \)__.
From this,
together with the condition that the __\( \mathbb{P} \)__-distribution of __\( \omega
\rightsquigarrow\omega (0) \)__ is __\( \mu \)__, it follows that there is only one solution to
the martingale problem. This argument works without a hitch when, for example,
__\( a \)__ and __\( b \)__ are bounded and uniformly Hölder continuous and __\( a \)__ is
uniformly elliptic (that is, __\( a\ge \varepsilon I \)__ for some __\( \varepsilon > 0 \)__), since
in that case PDE theorists had long ago provided the necessary
existence statement about the backward equation.

When uniform Hölder continuity is replaced by simple uniform
continuity, the existence theory for the backward equation is less
satisfactory. To be precise, although solutions continue to exist, they are no
longer classical. Instead, their second derivatives exist only in
__\( L^p(\mathbb{R}^N;{\Bbb C}) \)__ for __\( p\in(1,\infty ) \)__. As a consequence, the preceding
duality argument breaks down unless one can show a priori that every
solution to the martingale problem is sufficiently regular to tolerate
limit procedures involving __\( L^p(\mathbb{R}^N) \)__-convergence of the integrands.
Without question, the proof of such an a priori regularity result was the
single most intricate part of our program. In broad outline, our proof
went as follows: First, we developed machinery which allowed us to reduce
everything to the case when __\( b=0 \)__ and __\( a \)__ is an arbitrarily small
perturbation of the identity. Second, we again used the connection with the
Itô stochastic integral equations to show that any solution __\( \mathbb{P} \)__ can be
approximated by __\( \mathbb{P}_n \)__’s which, conditioned on __\( \mathcal{B}_{m/n} \)__, are
Gaussian during the time interval __\( [m/n,\,(m+1)/n] \)__, and
therefore have the required regularity properties. Finally, as an
application of the Calderon–Zygmund theory of singular integral operators,
for each __\( p\in(1,\infty ) \)__ we were able to show that, when the perturbation
is sufficiently small, the __\( L^p(\mathbb{R}^N) \)__-regularity properties of the approximating
__\( \mathbb{P}_n \)__’s can be controlled independent of __\( n \)__ and are therefore enjoyed by
__\( \mathbb{P} \)__ itself.

The steps outlined above led us to a proof of the following theorem, which should be considered the centerpiece of [3], [2].

**Theorem.**
*Let \( L \) be given by \eqref{1}, where \( a \) and \( b \) are
bounded, \( a \) is continuous, \( b \) is Borel measurable, and \( a(x) \) is
symmetric and strictly positive definite for each \( x\in\mathbb{R}^N \). Then, for
each Borel probability measure \( \mu \) on \( \mathbb{R}^N \), there is precisely one solution
to the martingale for \( L \) with initial distribution \( \mu \). Moreover, if
\( \delta _x \) is the unit mass at \( x \) and \( \mathbb{P}_x=\mathbb{P}_{\delta _x} \), then
\( x\rightsquigarrow \mathbb{P}_x \) is weakly continuous and \( \{\mathbb{P}_x:\,x\in\mathbb{R}^N\} \) is a strong
Markov family.*

#### Concluding remarks

There are several additional comments which may be helpful.

**1.**
Although, with 20/20 hindsight, the characterization
given by __\eqref{4}__ is the most obvious one, it is not the one which we
chose initially. Instead,
because we were influenced at the time both by Henry McKean’s treatment
[e12]
of Itô’s theory of stochastic integral equations and by
a possible analogy with characteristic functions, we chose to characterize
__\( \mathbb{P}_\mu \)__ by saying that
__\begin{multline*}\tag{6}
\label{6}
\Bigl(\exp\Bigl[\Bigl(\xi ,\omega (t)-\int_0^tb\bigl(\omega (\tau )\bigr)\,d\tau
\Bigr)_\mathbb{R}^N-\tfrac12\int_0^t\bigl(\xi
,a(\tau )\xi \bigr)_\mathbb{R}^N\,d\tau\Bigr], \mathcal{B}_t, \mathbb{P}_\mu \Bigr)\\
\text{is a martingale for all }\xi \in{\Bbb C}^N.
\end{multline*}__
The equivalence of __\eqref{4}__ and
__\eqref{6}__
is a quite easy application of elementary Fourier analysis and
the observation that, aside from integrability issues,
__\[
\Bigl(M(t)\,V(t)-\int_0^tM(\tau )\,dV(\tau ), \mathcal{B}_t, \mathbb{P}\Bigr)
\]__
is a martingale when __\( \bigl(M(t),\,\mathcal{B}_t,\,\mathbb{P}\bigr) \)__ is a continuous martingale
and __\( V(t) \)__ is a continuous, __\( \{\mathcal{B}_t:\,t\ge0\} \)__-adapted process of locally
bounded variation.

**2.**
The preceding theorem is not the one on which we
had originally hoped to settle. When we started, we had
hoped for a statement in which __\( a \)__ needed only be Borel measurable and
locally uniformly elliptic. Using special, heavily dimension-dependent
arguments, we did prove such a result when __\( N\in\{1,2\} \)__, and we
somewhat naively assumed that it was only the limits on our own
technical powers which prevented us from doing so in higher dimensions.
However, a few years ago Nikolai Nadirashvili
[e11]
gave a
beautiful example that showed that the limits on our powers were not
responsible for our inability to go to higher dimensions. Namely, he
produced a uniformly elliptic, bounded, Borel measurable __\( a \)__ on __\( \mathbb{R}^3 \)__
for which the associated martingale problem admits more than one solution.

**3.**
In retrospect, there are many aspects of this
work which look rather crude and unnecessarily cumbersome.
For instance, around the same time, first P. A. Meyer
[e7]
and
then N. Kunita and S. Watenabe
[e9]
gave a much better and cleaner
treatment of the stochastic integral calculus which we developed. More
significant, had we been aware of the spectacular work being done by
N. Krylov, in particular the estimate in
[e8],
we could have
vastly simplified the regularity proof just described.

**4.**
In a later work, we
extended the martingale problem approach to cover diffusions with boundary
conditions. The resulting paper
[4]
has, for good reasons, been
much less influential: it is nearly unreadable. On the other hand, it
contains information which, so far as I know, is available nowhere else.
In particular, I do not think the theorems there dealing with approximations
by discrete Markov chains have ever been improved.

**5.**
Varadhan and my joint work on diffusion theory
culminated in our papers
[8]
and
[9]
on degenerate
diffusions. The first of these contains our initial version of “the
support theorem,” which has been given several other proofs over the years. Among
other things, the second paper contains an extension of “the support theorem”
to cover situations in which the diffusion matrix __\( a \)__ is smooth but does
not admit a smooth square root. It seems that most probabilists are not
particularly impressed by this extension, but we were very proud of it at
the time.