#### by Elias M. Stein

I’ve decided to write this essay about “square functions” for two reasons. First, their development has been so intertwined with the scientific work of A. Zygmund that it seems highly appropriate to do so now on the occasion of his 80th birthday. Also these functions are of fundamental importance in analysis, standing as they do at the crossing of three important roads many of us have travelled by: complex function theory, the Fourier transform (or orthogonality in its various guises), and real-variable methods. In fact, the more recent applications of these ideas, described at the end of this essay, can be seen as confirmation of the significance Zygmund always attached to square functions.

This is going to be a partly historical survey, and so I hope you will allow me to take the usual liberties associated with this kind of enterprise: I will break up the exposition into certain “historical periods”, five to be precise; and by doing this I will be able to suggest my own views as to what might have been the key influences and ideas that brought about these developments.

One word of explanation about “square functions” is called for. A deep concept in mathematics is usually not an idea in its pure form, but rather takes various shapes depending on the uses it is put to. The same is true of square functions. These appear in a variety of forms, and while in spirit they are all the same, in actual practice they can be quite different. Thus the metamorphosis of square functions is all important.

#### First period (1922–1926): The primordial square functions

It appears that square functions arose first in an explicit form in a beautiful
theorem of
Kaczmarz
and Zygmund dealing with the almost everywhere summability
of orthogonal expansions. The theorem was proved in 1926
as the culmination of several papers each had written at about that time. The
theorem itself was an outgrowth of what certainly was one of the main
preoccupations of analysts at that time, namely the question of convergence
of Fourier series. The problem was the following. Suppose __\( f=f(\theta) \)__
is a continuous function on the circle, __\( 0\leq\theta\leq 2\pi \)__, or more
generally assume that __\( f \)__ is in __\( L^{2}(0,2\pi) \)__ or even that __\( f \)__
is merely integrable; then does its Fourier series
__\begin{equation}
\sum a_{n}e^{in \theta} \quad\text{with}\quad
a_{n}=\frac{1}{2\pi}\int_{0}^{2\pi}f(\theta)e^{-in \theta}\,d\theta,
\label{eqnon}
\end{equation}__
converge almost everywhere?

A related parallel issue was the corresponding question for a general
orthonormal expansion, but now limited to __\( f\in L^{2} \)__. Thus if
__\( \{\phi_{n}\} \)__ is an orthonormal system, and if
__\[ f\sim\sum a_{n}\phi_{n}
\quad\text{with}\quad
a_{n}=\int f\overline{\phi}_{n} ,\]__
where __\( \sum |a_{n}|^{2} < \infty \)__,
then what could be said about the convergence almost everywhere of
__\begin{equation}
\sum_{n=1}^{\infty}a_{n}\phi_{n}(x)?
\label{eqntw}
\end{equation}__

The period we are dealing with (1922–1926) was marked by several striking
achievements in this area, whose essential interest is not diminished
even when viewed from the distant perspective of more than a half
century. The first result to mention was the construction by
Kolmogorov
in
1923
[e3]
of an __\( L^{1} \)__ function whose Fourier series
__\eqref{eqnon}__ diverged almost everywhere.1
This construction made even more pressing the question of whether the
Fourier series __\eqref{eqnon}__ converges almost everywhere when (say) __\( f \)__
belongs to __\( L^{2} \)__, a problem that was not solved till more than forty
years later. We shall turn to that in a moment, but now we point out that
Kolmogorov’s example put into sharper relief the __\( L^{2} \)__ results for general
orthonormal developments that had been obtained
(in 1922 and 1923)
by
Rademacher
and
Menshov.
They showed that if
__\begin{equation}
\sum|a_{n}|^{2}(\log n)^{2} < \infty
\label{eqnth}
\end{equation}__
then the series __\eqref{eqntw}__ converges a.e.

Moreover the condition __\eqref{eqnth}__ is best possible in the sense that if
__\( \{\lambda_{n}\} \)__ is monotonic and
__\[ \smash{\lambda_{n}}/\log n\rightarrow 0 ,\]__
then
there exists an orthonormal system __\( \{\phi_{n}\} \)__ and expansion __\eqref{eqntw}__
which diverged a.e., while
__\[ \sum \smash{|a_{n}|^{2}\lambda_{n}^{2}} < \infty .\]__

For ordinary Fourier series it was proved2
that the condition __\eqref{eqnth}__ could be relaxed and be replaced by
__\begin{equation}
\sum_{-\infty}^{\infty}|a_{n}|^{2}\log(|n|+2) < \infty.
\label{eqnfo}
\end{equation}__

This last result stood unsurpassed for forty years until
Carleson
in
1966 showed that indeed the Fourier series of an __\( L^{2} \)__
function converged almost everywhere. It may be interesting to note here
that the basic tools required for Carleson’s theorem — the properties
of the Hilbert transform and their relation with partial sums of Fourier
series — were first brought to light in this early period: Kolmogorov’s
proof of the weak-type (1, 1) property in 1925;
M. Riesz’s
paper of 1927
[e11]
containing the __\( L^{p} \)__
inequalities for conjugate functions and partial sums; and
Besicovitch’s
work
(in 1923
[e2]
and 1926
[e6])
which
began the development of “real-variable” methods for Hilbert transforms.

Against this background we can now state the idea of Kaczmarz and Zygmund. It
asserts as a general principle that for an __\( L^{2} \)__ orthonormal expansion
(i.e., one where __\( \sum |a_{n}|^{2} < \infty \)__), at almost all points the
summability of the series __\( \sum a_{n}\phi_{n}(x) \)__ by one method one has
as a consequence the summability by
any other method which is essentially stronger than convergence. A special
(but typical) case is as follows:

Suppose __\( \sum|a_{n}|^{2} < \infty \)__. Then __\( \sum a_{n}\phi_{n}(x) \)__ is
Cesàro summable at almost each point __\( x \)__ where it is Abel summable.

Recall that the series is Abel summable at __\( x \)__ if
__\[ \lim_{r\rightarrow 1}-\sum a_{n}r^{n}\phi_{n}(x)
\quad\text{exists}. \]__
In
addition, setting
__\begin{align*}
s_{n} &=\sum_{k\equiv 0}^{n}a_{k}\phi_{k} \quad\text{and}\\
\sigma_{n} & =(s_{0}+s_{1}+\cdots+s_{n-1})/n ,
\end{align*}__
the
Cesàro summability at __\( x \)__ means the existence of the limit
__\[ \lim_{n\rightarrow\infty}\sigma_{n}(x) .\]__

If a series is Cesàro summable it is automatically Abel summable (an
exercise!), but the converse is in general not true. To gain a better idea
of the scope of Theorem 1 let us point out that
__\[
\sigma_{n}(x) =\sum_{k=0}^{n}\Bigl(1-\frac kn\Bigr)a_{k}\phi_{k}(x)
\]__
and a result similar to Theorem 1 holds when
__\( \sigma_{n}(x) \)__ is replaced by
__\[ \sigma_{n}^{\epsilon}(x)
= \sum_{k=0}^{n}\Bigl(1-\frac kn\Bigr)^{\epsilon}a_{k}\phi_{k}(x)
\quad\text{with }\epsilon > 0 \]__
(which corresponds
essentially to __\( (C, \epsilon) \)__ summability), but not for __\( \epsilon=0 \)__
which of course would give the usual convergence.

For the proof of Theorem 1 Kaczmarz and Zygmund used a square
function which they introduced for this purpose, namely
__\begin{equation}
K(f)=\biggl(\sum_{n=2}^{\infty}n|\sigma_{n}-\sigma_{n-1}|^{2}\biggr)^{\mkern-2mu{1/2}}
\label{eqnfi}
\end{equation}__
with __\( f\sim\sum a_{n}\phi_{n} \)__. The basic fact was the __\( L^{2} \)__ inequality.

__\[
\|K(f)\|_{L^{2}} \leq C \|f\|_{L^{2}}.
\]__

Clearly
__\[
\sigma_{n}=\sigma_{n-1}=\frac{1}{n(n-1)}\sum_{k=0}^{n-1}ka_{k}\phi_{k},\]__
so
__\[
\|\sigma_{n}-\sigma_{n-1}\|_{2}^{2}\leq\frac{c}{n^{4}}\sum_{k < n}k^{2}|a_{k}|^{2},
\quad n\geq 2,
\]__
and thus
__\[ \sum_{2}^{\infty}n\|\sigma_{n}-\sigma_{n-1}\|_{2}^{2}\leq
c^{\prime}\sum|a_{k}|^{2}=c^{\prime}\|f\|_{2}^{2} ,\]__
which proves the lemma.

To prove
the theorem one invokes a variant of the classical Tauberian argument,
namely, if __\( \smash{\sum A_{n}} \)__ is Abel summable and __\( \sum nA_{n}^{2} < \infty \)__,
then __\( \sum A_{n} \)__ converges. Now set __\( A_{n}=\sigma_{n}-\sigma_{n-1} \)__;
then the Abel summability of __\( \sum A_{n} \)__ follows from the corresponding
Abel summability of __\( \sum a_{n}\phi_{n} \)__. The Tauberian condition holds
at almost all points because of the lemma, and hence one obtains a.e. the
convergence of __\( \sum(\sigma_{n}-\sigma_{n-1}) \)__, proving the theorem.

We have seen the first example of a square function, namely
__\eqref{eqnfi}__. While here it plays a minor role, its basic character is
already revealed: Because of the agility of its quadratic nature it can
exploit easily any situation in which orthogonality might be important.

#### Second period (1931–1938): Littlewood and Paley

Our scene shifts now from the Continent to England, and to the work of Littlewood and Paley. Our attention will be focused on two important series of connected papers: three jointly by Littlewood and Paley 1931–1938 [e14], [e17], [e19], and two by Paley 1932 [e15], [e16]. The investigations described in these papers were initiated simultaneously (the first paper in each series was submitted in April 1931), but because of Paley’s death in 1933 the final versions of several of the papers were probably Littlewood’s work alone. It is also interesting to note that no reference is made in these papers to the results described above, and so it is a reasonable guess that they were not aware of the possible relevance of the ideas of Kaczmarz and Zygmund.

The main theme of the Littlewood–Paley work was to consider the “dyadic
decomposition” of Fourier series, namely
__\[
f(\theta)=\sum_{k=0}^{\infty}\Delta_{k}(\theta),
\]__
with
__\begin{align*}
& \Delta_{k}(\theta) =\sum_{2^{k-1}\leq|n| < 2^{k}}a_{n}e^{in \theta}, \quad
k\geq1;\\
& \Delta_{0} =a_{0}.
\end{align*}__

Their basic result was that the __\( L^{p} \)__ norm of a function was equivalent
with the __\( L^{p} \)__ norm of the square function associated with its dyadic
decomposition.

For __\( 1 < p < \infty \)__,
__\[
\biggl\Vert\Bigl(\sum_{k=0}^{\infty}|\Delta_{k}(\theta)|^{2}\Bigr)^{1/2}\biggr\Vert_{p}\simeq\|f\|_{p}.
\]__

To prove this theorem they needed and thus formulated an “abelian” analogue,
where partial sums are replaced by Abel means, i.e., the Poisson integral of
__\( f=u(r, \theta) \)__. Thus given __\( f \)__, let __\( \Phi \)__ be the holomorphic function
in the unit disc with __\( \operatorname{Re}(\Phi)=u \)__, and __\( \operatorname{Im}(\Phi(0))=0 \)__. They
defined another square function the “__\( g \)__-function” of __\( f \)__ by
__\[
g(f)(\theta)=\biggl(\int_{0}^{1}(1-r)\bigl|\Phi^{\prime}(r`e^{i\theta})\bigr|^{2}\,dr\biggr)^{1/2}
\]__
and proved the following

With __\( 1 < p < \infty \)__
__\begin{equation}
\|g(f)\|_{p}\simeq\|f\|_{p} \quad\textit{if } a_{0}=0.
\label{eqnsi}
\end{equation}__

Paley sought a better understanding of the nature of these problems by
considering variants of Theorem 2 where the Fourier series
expansion is replaced by the Walsh–Paley expansion. The Walsh–Paley
functions (called Walsh–Kaczmarz functions at that time) are now usually
described as follows. We identify the interval __\( [0,1] \)__ with the compact
group consisting of an infinite product of copies of the two-element group
(via the usual binary expansion). The characters of that group are the
Walsh–Paley functions. Writing each integer as a sum of powers of 2 gives
a natural enumeration of the characters
__\( \{\phi_{n}\}_{n=0}^{\infty} \)__. If we set
__\begin{align*}
& f\sim\sum a_{n}\phi_{n} \quad\text{and}\\
& \Delta_{k}=s_{2^{k}}-s_{2^{k-1}}=\sum_{2^{k-1} < n\leq 2^{k}}a_{n}\phi_{n}
\quad\text{with}\\
& \Delta_{0}=a_{0},
\end{align*}__
then Paley’s theorem reads as

For the Walsh–Paley series, with __\( 1 < p < \infty \)__
__\[
\biggl\Vert\Bigl(\sum|\Delta_{k}|^{2}\Bigr)^{1/2}\biggr\Vert_{p}\simeq\|f\|_{p}.
\]__

What makes the proof of Theorem 4 easier than that of
Theorem 2 are the various simplifications inherent in the fact that
__\( \{s_{2^{k}}(f)\} \)__ is a martingale sequence. The name “martingale” had
not yet been coined. Moreover, a systematic extension of Theorem 4
from the point of view of martingales, and its further exploration in
the magical world of Brownian motion — all these came much later,
as we shall see. However in Paley’s time some of the arguments typical
of martingale theory were already understood. Thus it had been observed
that __\( s_{2^{k}}(f) \)__ was constant on each __\( 2^{k} \)__ intervals (of length
__\( 2^{-k} \)__) of the form
__\[ \bigl((l-1)/2^{k}, \ l/2^{k}\bigr) ,
\qquad l=1,\ldots,2^{k},
\]__
and that the value of __\( s_{2^{k}}(f) \)__ on each of these intervals was the
mean-value of __\( f \)__ there. From this it is obvious when __\( f\in L^{p} \)__, __\( 1\leq
p\leq\infty \)__, then __\( \{s_{2^{k}}(f)\} \)__ are bounded in __\( L^{p} \)__ norm; the
analogue for Fourier series is definitely nonobvious when __\( 1 < p < \infty \)__,
and in fact false when __\( p=1 \)__ or __\( p=\infty \)__.

We shall now describe the main device Paley used in his proof of
Theorem 4. Paley was, from what one can learn about his life,
a man of courage and almost reckless daring. A hint of that spirit can be
found in his approach to difficult mathematical problems. When faced by the
proof of an inequality like
__\begin{equation}
\int\Bigl(\sum|\Delta_{k}|^{2}\Bigr)^{p/2}\,dx\leq A_{p}^{p}\int|f|^{p}\,dx
\label{eqnse}
\end{equation}__
where __\( p \)__ is e.g. an even integer __\( 2r \)__, he instinctively sought to face
the problem head-on by multiplying out the __\( r \)__ infinite sums, and then
coming to grips directly with the resulting multitude of terms. This kind of
audacious attack is not so common in our time when it is easier to rely on
a variety of sophisticated gadgets which are household items for the working
analyst. But given Paley’s resourcefulness this approach worked marvelously
well. His key observation was that
__\begin{equation}
\sum_{i_{r}}\int\Delta_{i_{1}}^{2}\Delta_{i_{2}}^{2}\cdots\Delta_{i_{r}}^{2}\,dx
\leq
\int\Delta_{i_{1}}^{2}\cdots\Delta_{i_{r-1}}^{2}f^{2}\,dx
\label{eqnei}
\end{equation}__
where the summation is taken over those __\( i_{r} \)__ for which
__\( i_{r} > \max(i_{1},\ldots \)__, __\( i_{r-1}) \)__, which in turn follows from the martingale
property that
__\begin{equation}
\int g(x)\Delta_{k}(x)\,dx=0
\label{eqnni}
\end{equation}__
whenever __\( g \)__ is “measurable with respect to the past”. From __\eqref{eqnei}__
Paley was able to achieve the proof of __\eqref{eqnse}__ in a few strokes.

The same idea inspired Littlewood and Paley’s proof of Theorem 3,
although the execution is more complicated; a more recondite form
of __\eqref{eqnei}__ must be proved, and here nothing as simple as
__\eqref{eqnni}__ holds. The appropriate substitute must be fashioned
with care out of Green’s theorem in conjunction with the identity
__\[ \Delta(|\Phi|^{2})=4|\Phi^{\prime}|^{2} .\]__
With Theorem 3 proved,
Littlewood and Paley
were able to deduce Theorem 4, but here also the steps required
were not easy. It was only after their theory was reexamined by Zygmund
and his student Marcinkiewicz, that a clearer and broader view of the whole
subject began to emerge. To this we shall now turn.

#### Third period (1938–1945): Marcinkiewicz and Zygmund

There are two significant events that marked the period we are now concerned with. The first, which even predated the Littlewood–Paley collaboration, was the introduction by Lusin in 1930 [e13] of his “area integral”. The idea of Lusin seems to have sparked no further interest until Marcinkiewicz and Zygmund took up the subject again about 8 years later. There began a brief but very creative period of work by them — a flowering of the theory where connections with a variety of other ideas were brought to light. The second event, a tragic one, followed soon thereafter with the death of Marcinkiewicz in 1940, and it was left to Zygmund alone to resolve some of the issues that their work had led them to.

It may help to clarify the description of the principal ideas that Marcinkiewicz and Zygmund contributed to the study of square functions if we organize our presentation in terms of the four main lines along which their work proceeded.

The first subject we shall treat (and the only one that was, strictly speaking,
joint work) deals with the area integral of Lusin. The definition of this is
as follows. Suppose __\( \Phi(z) \)__ is holomorphic in the unit disc and define
__\( A(\Phi)(\theta) \)__ by
__\begin{equation}
(A(\Phi)(\theta))^{2}=\int_{\Gamma(\theta)}|\Phi^{\prime}(z)|^{2}\,dx\,dy
\label{eqnonze}
\end{equation}__
with __\( \Gamma(\theta) \)__ a standard “triangle” (nontangential approach
region) in the unit disc with vertex at __\( e^{i\theta} \)__. Observe that the
expression represents the area of the image of __\( \Gamma(\theta) \)__ under the
mapping __\( z\rightarrow\Phi(z) \)__, with points counted according to their
multiplicity. Lusin’s discovery was that if __\( \Phi \)__ is bounded, then
__\( A(\Phi)(\theta) \)__ is finite for almost any __\( \theta \)__; more generally that
__\begin{equation}
\|A(\Phi)(\theta)\|_{2}\simeq\|\Phi\|_{2} \quad\text{if } \Phi(0)=0.
\label{eqnonon}
\end{equation}__

Marcinkiewicz and Zygmund realized that on the one hand there was a close
analogy between the Littlewood–Paley __\( g \)__-function and __\( A(\Phi) \)__
(in fact __\( A \)__ is a pointwise majorant of __\( g \)__, and the same kind of
__\( L^{p} \)__ inequalities held for __\( A \)__ as for __\( g) \)__; but on the other
hand they surmised that the parallel between these two square functions
should not be pushed too far. The main result they obtained for __\( A \)__ was
a localized version of Lusin’s result. This can be stated as follows. Let
__\[ \Phi^{*}(\theta)=\sup_{z\in\Gamma(\theta)}|\Phi(z)| .\]__

If __\( \Phi \)__ is holomorphic in the unit disc, then for almost every __\( \theta \)__,
__\( \Phi^{*}(\theta) < \infty \)__ implies __\( A(\Phi)(\theta) < \infty \)__.

The converse was proved five years later by Spencer,3 namely

If __\( \Phi \)__ is holomorphic in the unit disc, then for almost every __\( \theta \)__,
__\( A(\Phi)(\theta) < \infty \)__ implies __\( \Phi^{*}(\theta) < \infty \)__.

A corresponding converse for __\( g \)__-functions is false, and so the area
integral __\( A \)__ has some special affinities with the boundary behavior of
__\( \Phi \)__, going beyond what it shares with __\( g \)__.

The second line of investigation was Zygmund’s reexamination of the
Littlewood–Paley theorem for the dyadic decomposition of Fourier series. His
analysis led him to recast and simplify the ideas of the proof. These
simplifications had important consequences for later work, as we shall see;
but their immediate interest was that it allowed him to connect the square
function __\( \bigl(\sum|\Delta_{k}|^{2}\bigr)^{1/2} \)__ with the one he and Kaczmarz had
considered a dozen years earlier in their study of summability of orthogonal
series (see __\eqref{eqnfi}__). We suppose that we take the Fourier expansion
and set
__\[ f(\theta)\sim\sum_{n\geq 0}a_{n}e^{in \theta} ,\]__
__\( f\in L^{p} \)__, so
that __\( f\in H^{p} \)__. If we write as before
__\[
K(f)(\theta)=\biggl(\sum_{n\geq 1}n
\bigl|\sigma_{n}(\theta)-\sigma_{n-1}(\theta)\bigr|^{2}\biggr)^{1/2}
\]__
where
__\[ \sigma_{n}(\theta)=\sum_{0\leq k < n}\Bigl(1-\frac kn\Bigr)a_{k}e^{ik\theta} ,\]__
then we can state the following theorem:

__\( \|K(f)\|_{p}\leq A_{p}\|f\|_{p},\ 1 < p < \infty \)__.4

The proof of this theorem required two steps. First, like that of
Theorem 2, one needed the __\( L^{p} \)__ inequalities for the
__\( g \)__-function (see __\eqref{eqnsi}__). Here the major simplification was made
by Zygmund some years later5
and it came in the proof of the fact that
__\[ \|g(f)\|_{p}\leq A_{p}\|f\|_{p} ,\]__
when __\( p > 2 \)__. (The case __\( p=2 \)__ was easy, and the range __\( p < 2 \)__ was reducible
to __\( p=2 \)__ by the artifice standard in those days of using Blaschke product
decompositions for __\( H^{p} \)__ functions.) For the difficult case __\( p > 2 \)__
a “square duality” was used. An ingenious argument shows that whenever
__\( \phi \geq 0 \)__,
__\begin{equation}
\int g(f)^{2}\phi\,d\theta
\leq
c \biggl\{\int g(f)g(\phi)M(f)\,d\theta + \int|f|^{2}\phi\,d\theta\biggr\}
\label{eqnontw}
\end{equation}__
where __\( M \)__ is the Hardy–Littlewood maximal function. For __\( p\geq 4 \)__,
__\eqref{eqnontw}__ then gives the desired result as a consequence of the case
__\( p\leq 2 \)__ applied to __\( g(\phi) \)__. Incidentally, the notion of square
duality which seems to have originated in this context continues to find
other applications of interest.

The second simplification Zygmund made was in the manner in which one could
reduce the __\( L^{p} \)__ control of __\( \bigl(\sum|\Delta_{k}|^{2}\bigr)^{1/2} \)__ to that
of the __\( g \)__-function; and in fact a whole list of other square functions
(in particular, __\( \bigl(\sum_n |\sigma_{n}-\sigma_{n-1}|^{2}\bigr)^{1/2} \)__)
could be
handled in the same way.6
This streamlining of the proof he found can be said to have led directly to
the “Marcinkiewicz multiplier theorem”.

In its one-dimensional form the celebrated theorem that bears Marcinkiewicz’s
name can be stated as follows. Suppose we consider a transformation __\( T \)__ given
by a multiplier sequence __\( \{\lambda_{n}\}_{-\infty}^{\infty} \)__, defined by
__\[
Tf\sim\sum\lambda_{n}a_{n}e^{in \theta}
\quad\text{ whenever }
f\sim\sum a_{n}e^{in \theta}.
\]__
Then __\( T \)__ is bounded on __\( L^{p},\ 1 < p < \infty \)__, if (i) the sequence
__\( \{\lambda_{n}\} \)__ is bounded, and (ii) if it varies boundedly over
each dyadic block; more precisely,
__\[ \sum_{2^{k}\leq|j| < 2^{k+1}} |\lambda_{j}-\lambda_{j-1}|\leq M .\]__
(Note that the special case when the
sequence is constant on each dyadic block is an immediate consequence of
Theorem 2.) In one dimension the theorem’s greatest merit is,
I believe, in its formulation rather than its proof; the latter is much the
same as that of Theorem 6.

It is in the passage to higher dimensions, however, that one finds the great
significance of Marcinkiewicz’s work on multipliers. Its importance was not
only the fact that one could use hitherto one-dimensional methods to prove
__\( n \)__-dimensional results; even more profound were the applications to other
questions, such as estimates for partial differential equations, already
envisaged at that time. We can now see in retrospect that Marcinkiewicz
thus anticipated some of the basic inequalities later proved by the theory
of singular integrals.7
For simplicity of notation we shall state the Marcinkiewicz multiplier
theorem in the case of two dimensions. Consider the multiplier operator
__\( T \)__ given by
__\[ Tf\sim\sum\lambda_{nm}a_{nm}e^{i(n\theta+m\phi)}
\quad\text{for }
f\sim\sum a_{nm}e^{i(n\theta+m\phi)} .\]__
Let __\( I_{k} \)__
denote the dyadic interval
__\[ \{n\mid 2^{k-1}\leq|n| < 2^{k}\}
\quad\text{and}\quad
J_{l}=\{m\mid 2^{l-1}\leq|m| < 2^{l}\} .\]__
Write
__\begin{align*}
& \Delta_{1}\lambda_{n,m}=\lambda_{n+1,m}-\lambda_{n,m},\\
& \Delta_{2}\lambda_{n,m}=\lambda_{n,m+1}-\lambda_{n,m}, \text{ and}\\
& \Delta_{1,2}=\Delta_{1}\cdot\Delta_{2}.
\end{align*}__
Now assume the finiteness of
the following four quantities:

__\( \sup_{n,m}|\lambda_{n,m}| \)__;__\( \sup_{k,m}\sum_{n\in I_{k}}|\Delta_{1}\lambda_{n,m}| \)__, and__\( \sup_{m,l}\sum_{m\in J_{l}}|\Delta_{2}\lambda_{n,m}| \)__; and__\( \sup_{k,l}\sum_{n\in I_{k}}\sum_{n\in J_{l}}|\Delta_{1}\Delta_{2}\lambda_{n,m}| \)__.

Under the assumption made above, __\( T \)__ is bounded on __\( L^{p},\ 1 < p < \infty \)__.

The last of the four major lines of investigation concerning square functions
that Marcinkiewicz and Zygmund undertook dealt with the attempt to find
a completely “real-variable” analogue of the functions of Lusin and
Littlewood–Paley. Starting with a function __\( f \)__ on the circle, the area
integral and __\( g \)__-functions are defined in terms of holomorphic (or harmonic)
functions whose boundary values are related to __\( f \)__. Also the dyadic square
function of Theorem 2 requires the Fourier expansion of __\( f \)__. What
was desired was a variant that could be defined more directly in terms of
the basic real-variable operations such as integration, differentiation, etc.

After some experimentation Marcinkiewicz hit upon the idea of considering
__\begin{equation}
\mu(F)(x)=\biggl(\int_{0}^{\pi}\bigl|F(x+t)+F(x-t)-2F(x)\bigr|^{2} \frac{dt}{t^{3}}\biggr)^{1/2}
\label{eqnonth}
\end{equation}__
with
__\[
F(x)=\int^{x}f(t)\,dt.
\]__

It was not difficult to see that
__\[
\|\mu(F)\|_{L^{2}}\simeq\|f\|_{L^{2}} \quad\text{ if }\,\int_{0}^{2\pi}f(x)\,dx =0.
\]__
With this, and using the real-variable tools he had already developed, he
was able to prove the analogue of the theorem he and Zygmund had found for
the area integral (Theorem 5a). The result was as follows.

Suppose __\( F\in L^{2} \)__. If __\( F^{\prime}(x) \)__ exists in a set __\( E \)__, then
__\( \mu(F)(x) < \infty \)__ for almost every __\( x \in E \)__.

The questions that arose were first, whether some of the other
properties of the area integral or __\( g \)__-function held as well for
__\( \mu \)__; and, more interestingly, what was the real significance of the
Marcinkiewicz function. Zygmund found an answer to the first question in
1944
[5]
when he proved

For __\( 1 < p < \infty \)__,
__\[
\|\mu(F)\|_{L^{p}}\simeq\|f\|_{L^{p}} \quad\text{ if }\, \int_{0}^{2\pi}f(x)\,dx=0.
\]__

The argument he developed to show this was not an easy one. He was required
to invoke the most arcane of the square functions, the function __\( g^{*} \)__,
which Littlewood and Paley had also studied. He established the __\( L^{p} \)__
inequalities for it and showed that it actually was a majorant of the
Marcinkiewicz function. Incidentally __\( g^{*} \)__ is defined by
__\[
(g^{*}(\Phi)(\theta))^{2}=\int_{0}^{1}\int_{0}^{2\pi}\bigl|\Phi^{\prime}(r e^{i(\theta+\phi)})\bigr|^{2}\bigg|\frac{1-r}{1-r e^{i\phi}}\bigg|^{2}\,d\phi\,
dr,
\]__
and so majorizes also of the area integral __\eqref{eqnonze}__, but it takes into
account “the tangential” approach to the boundary.8
The problem that remained was to discover whether there was a converse to
the local result given by Theorem 8a, or to put the question more
broadly, to find the meaning of the Marcinkiewicz function. It was to be
almost twenty more years before
an answer to that question would be found.

#### Fourth period (1950–1964): Zygmund and his students

Starting about 1950 a new direction of considerable importance began to
emerge in force. Hinted at in earlier work (of Besicovitch and Marcinkiewicz,
among others), its thrust was the development of “real-variable” methods to
replace complex function theory — that favored ally of one-dimensional
Fourier analysis. What made this new emphasis particularly timely, in fact
indispensable, was that only with techniques coming from real-variable theory
could one hope to come to grips with many interesting __\( n \)__-dimensional analogues
of the one-dimensional theory.

The mathematician animating this development was Antoni Zygmund. In many ways he set the broad outlines of the effort, he mastered by his work some of the crucial difficulties, and was throughout the source of inspiration for his students and collaborators.

##### a: The area integral

A pioneering result in this new direction was
Calderón’s extension to
__\( \mathbf{R}^{n} \)__ of the theorem of Marcinkiewicz and Zygmund concerning the
area integral, a subject he had taken up at the suggestion of Zygmund. The
setting for this is as follows. We let
__\[
\mathbf{R}_{+}^{n+1}=\{(x, y),
x=(x_{1},\ldots,x_{n})\in \mathbf{R}^{n}, y\in \mathbf{R}^{+}\}
\]__
be the upper
half-space, and suppose that __\( u(x, y) \)__ is harmonic (with respect to the
__\( n+1 \)__ variable __\( x_{1},\ldots, \)__ __\( x_{n}, y \)__). Sometimes we shall assume
that __\( u \)__ is in fact the Poisson integral of an appropriate function __\( f \)__
defined on __\( \mathbf{R}^{n} \)__, and then we shall write __\( u=\operatorname{PI}(f) \)__. We let
__\( \Gamma=\{(x, y), \)__ __\( |x| < y\} \)__ be a standard cone with vertex at the origin,
__\( \Gamma^{\prime} \)__ its truncated version, __\( \Gamma^{\prime}=\Gamma\cap\{y < 1\} \)__. For
any __\( \bar{x}\in \mathbf{R}^{n} \)__, __\( \Gamma(\bar{x}) \)__ and
__\( \Gamma^{\prime}(\bar{x}) \)__ will be the corresponding cones with vertices at
__\( \bar{x} \)__. The area integral of __\( u \)__ is defined by
__\begin{equation}
(A(u)(\bar{x}))^{2}=\int_{\Gamma(\bar{x})}|\nabla u|^{2}y^{1-n}\,dx\,
dy
\label{eqnonfo}
\end{equation}__
where __\( |\nabla u|^{2}=|\partial u/\partial y|^{2}+\sum_{j=1}^{n}|\partial
u/\partial x_{\partial}|^{2} \)__.

Similarly for the local theory one needs the analogue of __\eqref{eqnonfo}__
where __\( \Gamma(\bar{x}) \)__ is replaced by __\( \Gamma^{\prime}(\bar{x}) \)__;
this defines __\( A_{\mathrm{loc}}(u)(\bar{x}) \)__. The maximal function __\( u^{*} \)__
is defined by
__\[ u^{*}(\bar{x})=\sup_{(x,y)\in\Gamma(\bar{x})}|u(x, y)| ,\]__
and its local analogue __\( u_{\mathrm{loc}}^{*} \)__ is given by replacing
__\( \Gamma(\bar{x}) \)__ by __\( \Gamma^{\prime}(\bar{x}) \)__ in the definition.

Suppose __\( u \)__ is harmonic in __\( \mathbf{R}_{+}^{n+1} \)__. Then
__\( A_{\mathrm{loc}}u(\bar{x}) < \infty \)__ at almost every point __\( \bar{x}\in
\mathbf{R}^{n} \)__ where __\( u_{\mathrm{loc}}^{*}(\bar{x}) < \infty \)__.

Calderón’s proof of this theorem was published at the same
time (1950)
as another important result
he found, namely the extension of Privalov’s theorem: __\( u \)__ has a
nontangential limit at almost every __\( \bar{x}\in \mathbf{R}^{n} \)__, where
__\( u_{\mathrm{loc}}^{*}(\bar{x}) < \infty \)__. We shall discuss the ideas behind
the proof of Theorem 9a later when we take up its converse. Now
we turn to the “global” version, i.e., the higher-dimensional analogue of
the Littlewood–Paley theorem (Theorem 3).

Suppose __\( u= \operatorname{PI} (f) \)__, then
__\[
\|A(u)\|_{L^{p}}\simeq\|f\|_{L^{p}},\quad 1 < p < \infty.
\]__

It would be difficult after 25 years to recall the precise thoughts
that motivated the proof of Theorem 9b, nor would it be easy
now for one to appreciate the difficulties that seemed then to stand in
the way. But I do remember that those of us who were graduate students
of Zygmund in the middle 1950’s were shaped by the event, akin to the
Creation, which appeared to some of us to be the beginning of everything
important: the 1952 *Acta* paper which developed via the
Calderón–Zygmund lemma, the real variable methods giving the extension
of the Hilbert transform to __\( n \)__-dimensions. What was more natural, therefore,
than to attempt to prove the __\( L^{p} \)__ boundedness of __\( f\rightarrow A(u) \)__
by adapting these methods? This idea indeed worked, although the initial
complicated proofs were later much simplified. The analysis succeeded as
well for the Marcinkiewicz function __\eqref{eqnonth}__, and proved also that the
mappings __\( f\rightarrow A(u) \)__ and __\( f\rightarrow\mu(F) \)__ were of weak-type
(1, 1).

We turn now to the proof of Theorem 9a. Its one-dimensional version
(Theorem 5a) had been done by using complex function theory,
in particular conformal mappings. So a completely different approach was
needed. The idea behind it can be understood by examining the case __\( p=2 \)__
of Theorem 9b,
which has an easy proof. A direct calculation shows that
__\begin{equation}
\int_{\mathbf{R}^{n}}A^{2}(u)\,dx=c\int_{\mathbf{R}_{+}^{n+1}}y|\nabla u|^{2}\,dx\,dy,
\label{eqnonfi}
\end{equation}__
where __\( c \)__ is the volume of the unit ball. Next we can use the fact that
__\[ |\nabla u|^{2}=\tfrac{1}{2}\Delta(|u|^{2}) ,\]__
and so by Green’s theorem
__\begin{align*}
\int_{\mathbf{R}^{n}}A^{2}(u)\,dx
& =\frac{c}{2}\iint_{\mathbf{R}_{+}^{n+1}}y\Delta(|u|^{2})\,dx\,dy\\
& =\frac{c}{2}\int|u(x,0)|^{2}\,dx,
\end{align*}__
which proves Theorem 9b for __\( p=2 \)__, since __\( u(x, 0)=f(x) \)__. Thus
in order to control __\( A_{\mathrm{loc}}(u)(x) \)__ on a set __\( E \)__, it is natural to
consider
__\[ \int_{E}A_{\mathrm{loc}}^{2}(u)(x)\,dx \]__
which in turn is dominated
by
__\[ c \int_{R(E)} y|\nabla u|^{2}\,dx\,dy ,\]__
where __\( R(E) \)__ is a standard
“sawtooth” region in __\( \mathbf{R}_{+}^{n+1} \)__ based on __\( E \)__. At this stage
(which is the turning point of the proof) Calderón invoked Green’s
theorem for another region containing __\( R(E) \)__, whose Green’s function
he could essentially bound from below by __\( c^{\prime}y \)__.

To prove the converse
of Theorem 9a along these lines appeared to require, among other
things, appropriate bounds from above for Green’s function for such regions,
and that seemed much beyond what could be done then.9
What turned out to be the right course of action was to finesse the problem
of Green’s function and to proceed directly with estimates that followed from
the finiteness of
__\[ \int_{R(E)}y|\nabla u|^{2}\,dx\,dy .\]__
These arguments
also proved to be useful in other situations, as we shall see later. The
result obtained was

Suppose __\( u \)__ is harmonic in __\( \mathbf{R}_{+}^{n+1} \)__. Then
__\( u_{\mathrm{loc}}^{*}(\bar{x}) < \infty \)__ for almost all points
__\( \bar{x}\in \mathbf{R}^{n} \)__ where __\( A_{\mathrm{loc}}(u)(\bar{x}) < \infty \)__.

I remember quite vividly the excitement surrounding the events at the time of this work. It was March 1959, and I had returned to the University of Chicago the fall before. Frequently I met with my friends Guido Weiss and Mary Weiss, and together we often found ourselves in Zygmund’s office (Eckhart 309, two doors from mine). With our teacher our conversations ranged over a wide variety of topics (not all mathematical) and more than once the subject of square functions arose. When this happened the mood would change, if only slightly, as if in deference to their special status, and the enigma that surrounded them. I had an idea which seemed promising. But before we could see where it might lead came the spring break. Further work would have to be held in abeyance since we were each going our own ways: Zygmund travelled to Boston to visit Calderón; Guido and Mary Weiss, having borrowed my Chevrolet, drove to Virginia for a vacation trip; and I went to New York to be married.

##### b: The Marcinkiewicz function

Influenced by the renewed interest in area integrals, and encouraged by some
recent work he had done with Mary Weiss,10
Zygmund returned to the study of the Marcinkiewicz integral __\eqref{eqnonth}__
and the problem of finding a converse to Theorem 8a. He was
convinced that now
(more than 20 years after Marcinkiewicz’s original work) the time was ripe to
see matters to a conclusion. He suggested to me that we work on the problem
together, and of course I was very happy to accept his offer. For me this was
a unique and rewarding collaboration — not just because of the special
satisfaction one derives when accepted as an equal by one’s teacher — but also because as it turned out he did most of the work that really counted!

We realized first that Theorem 8a itself could be somewhat
strengthened; what was required was the notion of the derivative __\( F^{\prime}(x) \)__
existing (at __\( x \)__) “in the __\( L^{2} \)__ sense”. Thus __\( F^{\prime}(x) \)__ existed in
this generalized sense if11
__\begin{equation}
\frac{1}{h}\int_{0}^{h}\bigg|\frac{F(x+t)-F(x)}{t}-F^{\prime}(x)\bigg|^{2}\,dt\rightarrow 0,
\quad\text{ as }\, h\rightarrow 0. \label{eqnonsi}
\end{equation}__

The finer version of Theorem 8a was then: If __\( F\in L^{2} \)__
had a derivative in the sense of __\eqref{eqnonsi}__ at each __\( x \in E \)__, then
__\( \mu(F)(x) < \infty \)__ for almost every __\( x\in E \)__. It was in this form that
one might seek a converse. The basic plan was to try to make matters turn
on the analogous situation which held for the area integral, where one can
pass from the finiteness of a quadratic expression to the existence of a
limit. After a series of reductions we were able to show that at each point
__\( x \)__ where __\( \mu(F)(x) < \infty \)__ one had
__\begin{equation}
\int_{|t|\leq y}\bigg|\frac{\partial^{2}u}{\partial y^{2}}(x+t,
y)+\frac{\partial^{2}u}{\partial y^{2}}(x-t, y)\bigg|^{2}\,dt\,dy < \infty
\label{eqnonse}
\end{equation}__
with __\( u=\operatorname{PI}(F) \)__. On the other hand we could show (using
Theorem 5b) that at almost every __\( x \)__ where
__\begin{equation}
\int_{|t|\leq y}\bigg|\frac{\partial^{2}u}{\partial y^{2}}(x+t, y)\bigg|^{2}\,dt\,dy < \infty
\label{eqnonei}
\end{equation}__
the conclusion __\eqref{eqnonsi}__ actually held.

The basic difficulty, the passage from __\eqref{eqnonse}__ to __\eqref{eqnonei}__,
was overcome by Zygmund using a clever “desymmetrization” argument;
several weeks later he presented me with an essentially final draft of the
paper which he had typed himself!

There were several variants of the final result — involving extensions
to __\( n \)__-dimensions, or higher derivatives, or even fractional derivatives. The
simplest version, however, was the following:

Let __\( F\in L^{2}(0,2\pi) \)__. Then the set of point __\( x \)__ where
__\[
\int_{0}^{\pi}\bigl|F(x+t)+F(x-t)-2F(x)\bigr|^{2}\,dt/t^{3} < \infty,
\]__
and the set of points where __\( F^{\prime}(x) \)__ exists in the __\( L^{2} \)__ sense
(i.e., __\eqref{eqnonsi}__) differ by a set of measure zero.

#### Fifth period (1966–present): Further applications of square functions

We have traced the development of square functions from their beginnings to
a stage where their nature was much better understood, in terms of a series of
deep theorems that had been obtained. Yet it is only more recently that
their central role in several fields of analysis has become more apparent. I
shall try to describe this very briefly in terms of three specific areas:
__\( H^{p} \)__ spaces, symmetric diffusion semigroups, and differentiation theory
in __\( \mathbf{R}^{n} \)__.

##### a: __\( H^{p} \)__ theory

Beginning in about 1966 two separate directions of research involving square
functions were undertaken, and when brought together these ultimately
led to a rich harvest in the theory of __\( H^{p} \)__ spaces. The first
started with Burkholder’s
[e31]
extension of Paley’s theorem
(Theorem 4 for Walsh–Paley series) to general martingales. He
observed that Paley’s argument extended to this general setting,
but also found his own approach which was very different. He showed
that if
__\[ E_{k}=E(\,\cdot\,\mid\mathcal{F}_{k}) \]__
are the conditional
expectations for an increasing sequence of __\( \sigma \)__-fields
__\( \{\mathcal{F}_{k}\}_{k=0}^{\infty} \)__, then with __\( E_{-1}(f)\equiv 0 \)__,
__\begin{equation}
\biggl\Vert\Bigl(\sum_{k=0}^{\infty}\bigl|(E_{k}-E_{k-1})(f)\bigr|^{2}\Bigr)^{1/2}\biggr\Vert_p\simeq\lim_{k\rightarrow\infty}\|E_{k}(f)\|_{p},
\quad
1 < p < \infty.
\label{eqnonni}
\end{equation}__

Next, in work with
Gundy,
and later also with
Silverstein,
the following
advances were made:12
It was shown that __\eqref{eqnonni}__ extended to __\( p\leq 1 \)__ if
__\( \lim_{k\rightarrow\infty}\|E_{k}(f)\|_{p} \)__ was replaced with
__\( \|\sup_{k}E_{k}(f)\|_{p} \)__, for a large class of martingales. This
class incidentally includes those occurring for the Walsh–Paley series,
but more importantly these results went over to the (continuous parameter)
martingales arising from Brownian motion applied to harmonic functions. To
be more precise, let __\( z_{t}(\omega) \)__ denote the standard Brownian motion in
the complex __\( z \)__-plane, starting at the origin and stopped when reaching the unit
circle. Here __\( 0\leq t < \infty \)__ is the time parameter, and __\( \omega \)__ labels
the Brownian path, with __\( \omega \in\Omega \)__, __\( \Omega \)__ being the probability
space. If __\( u \)__ is harmonic in the unit disc, __\( t\rightarrow u(z_{t}(\omega)) \)__
is a continuous-time martingale. Let
__\[ M_{B}(u)(\omega)=\sup_{0\leq t < \infty} |u(z_{t}(\omega))| \]__
be the Brownian maximal function, and __\( S(u)(\omega) \)__
the martingale square function,
__\[ S(u)(\omega)=\biggl(\int_{0}^{\infty}|\nabla u(z_{t}(\omega))|^{2}\,dt\biggr)^{1/2} .\]__
Their result then was that
__\begin{equation}
\|Su\|_{L^{p}(\Omega)}\simeq\|M_{B}(u)\|_{L^{p}(\Omega)},\quad 0 < p < \infty,
\label{eqntwze}
\end{equation}__
whenever __\( u(0)=0 \)__.

The most striking application of this circle of ideas was a conclusion drawn
from __\eqref{eqntwze}__, to wit, whenever __\( F=u+iv \)__ is holomorphic in the unit
disc, then __\( F\in H^{p} \)__ if and only if __\( u^{*}\in L^{p},\ 0 < p < \infty \)__.

The second line of research began when a more direct connection between
standard multiplier operators and square function was discovered. The
result was easy to state. Whenever __\( T \)__ is a multiplier operator of the
Marcinkiewicz type on __\( \mathbf{R}^{n} \)__ (more precisely one that satisfies the
kind of conditions put in Hörmander’s version of that multiplier theorem),
then the area integral corresponding to __\( T(f) \)__ is * pointwise* dominated
by a __\( g^{*} \)__ function of __\( f \)__, i.e.,
__\begin{equation}
A(T\mkern-4mu f)(x)\leq cg_{\lambda}^{*}(f)(x),
\label{eqntwon}
\end{equation}__
where
__\[
g_{\lambda}^{*}(f)(x)=\biggl(\int\bigl|\nabla u(x-t, y)\bigr|^{2}\Bigl(\frac{y}{y+|t|}\Bigr)^{n\lambda}y^{1-n}\,dy\,dt\biggr)^{1/2},
\]__
and __\( \lambda \)__ is a parameter which depends on the nature of the
multiplier. An __\( H^{p} \)__ theory in __\( \mathbf{R}^{n} \)__ had already been initiated
several years before (by the efforts of G. Weiss and others), and using
it and __\eqref{eqntwon}__ it followed that these multipliers also extended to
bounded operators on __\( H^{p} \)__.

From these considerations it might be guessed that a basic tool for
__\( H^{p} \)__ theory is the relation between square functions and maximal
properties of (harmonic) functions. Here important contributions were made
by
C. Fefferman.
One of the results obtained in this direction was the
following theorem:

Suppose that __\( u \)__ is harmonic in __\( \mathbf{R}_{+}^{n+1} \)__, and __\( u(x, y)\rightarrow
0 \)__ as __\( y\rightarrow\infty \)__. Then
[e40]
__\[
\|A(u)\|_{p}\simeq\|u^{*}\|_{p},
\quad 0 < p < \infty.
\]__

Incidentally it should be remarked that the proof used the same approach as its “local” analogue, Theorem 9c, but additional arguments of a quantitative nature were of course needed. More recently some of these results for square functions have been extended to product domains, and in this context generalizations of Theorems 9 and 11 have been found.13

##### b: Symmetric diffusion semigroups

The semigroups which are the subject of the title are a family of operators
__\( \{T^{t}\}_{t\geq 0} \)__, each bounded and selfadjoint on __\( L^{2} \)__, with
__\( T^{t} \)__ having norm __\( \leq 1 \)__ on every __\( L^{p} \)__, __\( 1\leq p\leq\infty \)__,
and
__\[ T^{t_{1}+t_{2}}=T^{t_{1}}T^{t_{2}} ,\]__
with
__\[ \lim_{t\rightarrow 0}T_{f}^{t}=f \]__
for __\( f\in L^{2} \)__. Sometimes the additional hypotheses
are made that __\( T^{t}(1)=1 \)__, and __\( T^{t} \)__ is positivity-preserving.

The significance of this notion derives from the many important examples of such semigroups in analysis, and the many rich properties that they share. In fact some of the basic results discussed above have sessions valid in this context. Here we mention two, a maximal theorem, and a multiplier theorem in the spirit of Marcinkiewicz’s theorem (Theorem 7).

__\( \bigl\|\sup_{t > 0}|T^{t}f|\bigr\|_{p}\leq A_{p}\|f\|_{p},\ 1 < p\leq\infty \)__.

To formulate the multiplier theorem we write __\( T^{t} \)__ in terms
of its spectral decomposition,
__\[ T^{t}\mkern-2mu=\mkern-2mu\int_0^{\infty} e^{-\lambda t}\,dE(\lambda) ,\]__
where __\( E(\lambda) \)__ is a spectral resolution on
__\( L^{2} \)__. For each bounded Borel measurable function __\( m \)__ on __\( (0,
\infty) \)__, consider the “multiplier” operator __\( T_{m} \)__ given by
__\[ T_{m}=\int_0^\infty m(\lambda)\,dE(\lambda) .\]__
Here we assume that __\( m \)__ is
of the form
__\[ m(\lambda)=\lambda\int_{0}^{\infty}M(s)e^{-\lambda s}\,ds ,\]__
with __\( M \)__ a bounded function.

__\( \|T_{m}(f)\|_{p}\leq A_{p}\|f\|_{p},\ 1 < p < \infty \)__.

A key tool used for the proof of both these theorems are the Littlewood–Paley
type functions
__\[
g_{k}(f)(x)=\biggl(\int_{0}^{\infty}t^{2k-1}\Bigl|\frac{\partial^{k}}{\partial
t^{k}}T^{t}(f)\Bigr|^{2}\,dt\biggr)^{1/2}
\quad\text{with }k=1,2,\ldots.
\]__
Also for __\( T_{m} \)__ a relation of the same kind as __\eqref{eqntwon}__ holds.14

##### c: Differentiation theorems in __\( \mathbf{R}^{n} \)__

Probably the most dramatic applications of square functions occur in
differentiation theory. The general problem here is to prove that
__\begin{equation}
\lim_{\operatorname{diam} R\rightarrow 0}\frac{1}{\mu(R)}\int_{R}f(x-y)\,d\mu(y)=f(x)\quad
\text{ a.e.}
\label{eqntwtw}
\end{equation}__
where __\( R \)__ ranges over a suitable collection __\( \mathcal{R} \)__ of sets “centered” at the
origin. The classical examples of these are (i) where __\( \mathcal{R} \)__ is the collection
of all balls (or cubes) containing the origin, and (ii) where __\( \mathcal{R} \)__ is the
collection of all rectangles containing the origin, with sides parallel to the
axes. For each of these results a Vitali-type covering theorem has played a
decisive result. Thus it may seem surprising that the alien notion of square
functions would turn out to be the appropriate idea in related situations,
where covering arguments were unavailing. In formulating the results obtained
this way we shall, as is usual, deal with the corresponding maximal function
__\[
M_{\mathcal{R}}(f)(x)=\sup_{R\in\mathcal{R}}\frac{1}{\mu(R)}\bigg|\int_{R}f(x-y)\,d\mu(y)\bigg|,
\]__
and the possibility of asserting inequalities of the type
__\begin{equation}
\|M_{\mathcal{R}}(f)\|_{p}\leq A_{p}\|f\|_{p}.
\label{eqntwth}
\end{equation}__

The inequality __\eqref{eqntwth}__ holds in the following cases:

__\( \mathcal{R} \)__is the collection of spheres centered at the origin;__\( d\mu \)__is the uniform surface measure; and__\( n\geq 3 \)__, with__\( p > n/(n-1) \)__.__\( \mathcal{R} \)__is the collection of initial segments__\( \{\gamma(t), 0\leq t\leq h\} \)__of a smooth curve__\( t\rightarrow\gamma(t) \)__, with__\( \gamma(0)=0 \)__, and__\( \gamma \)__having nonzero “curvature” at the origin; here__\( d\mu \)__is arc-length,__\( n\geq 1 \)__and__\( p > 1 \)__.__\( \mathcal{R} \)__is the collection of rectangles (in__\( \mathbf{R}^{2} \)__) containing the origin, which make an angle__\( \theta_{k} \)__with a fixed direction, where__\( \{\theta_{k}\} \)__is a sequence of numbers tending rapidly to zero; here__\( p > 1 \)__.

The proof of each part of this theorem requires its own square function. We shall not describe these rather complicated quadratic functions here, but refer the reader to the literature for further details.15

#### Epilogue

Since the original draft of this essay was written two new results were found which use square functions in a decisive way.

The first is the solution of the problem of Cauchy’s integral for Lipschitz
curves by
Coifman,
McIntosh,
and
Meyer
[e50].
It is to be noted that in Calderón’s initial work on this problem
(1965), square functions were already used in a crucial
way. In particular the inequality
__\[ c\|F\|_{H^{p}}\leq\|A(F)\|_{p}, \quad p\leq 1 ,\]__
was proved there for this purpose.

The second result deals with the standard maximal function in __\( \mathbf{R}^{n} \)__
__\[
M_{n}(f)(x)=\sup_{r > 0}\frac{1}{c_{n}r^{n}}\bigg|\int_{|y|\leq r}f(x-y)\,dy\bigg|,
\]__
where __\( c_{n} \)__ is the volume of the unit ball in __\( \mathbf{R}^{n} \)__.

The question that arises is, how does the __\( L^{p} \)__ norm of __\( M_{n} \)__
behave for large __\( n \)__? The best that can be proved by the usual Vitali
covering arguments gives
__\[ \|M_{n}(f)\|_{p}\leq A(p, n)\|f\|_{p}, \quad 1 < p ,\]__
with __\( A(p, n)\leq A(p)\,2^{n/p} \)__, which is a large growth as
__\( n\rightarrow\infty \)__. However much more can be said.

__\( \|M_{n}(f)\|_{p}\leq A_{p}\|f\|_{p},\ 1 < p\leq\infty \)__, with __\( A_{p} \)__
independent of __\( n \)__.

The idea of the proof is to consider in __\( \mathbf{R}^{m} \)__ the maximal functions
__\( M_{m,k} \)__ defined by
__\[
M_{m,k}(f)(x)=\sup_{r > 0}\frac{\big|\int_{|y|\leq
r}f(x-y)|y|^{k}\,dy\big|}{\int_{|y|\leq r}|y|^{k}\,dy},\quad
k\geq 0.
\]__
Then if __\( m \)__ is so large that __\( p > m/(m-1) \)__,

__\begin{equation}
\|M_{m,k}(f)\|_{p}\leq A_{p,m}\|f\|_{p}
\label{eqntwfo}
\end{equation}__

with __\( A_{p,m} \)__ independent of __\( k \)__, __\( k\geq 0 \)__. This follows from
Theorem 12, Part (a). From this Theorem 13 is obtained
by lifting the __\( m \)__-dimensional result __\eqref{eqntwfo}__ into __\( \mathbf{R}^{n} \)__,
where __\( n\geq m \)__ (and __\( k=n-m \)__), by integrating over the Grassmannian of
__\( m \)__-planes in __\( \mathbf{R}^{n} \)__ through the origin.