by Elias M. Stein
I’ve decided to write this essay about “square functions” for two reasons. First, their development has been so intertwined with the scientific work of A. Zygmund that it seems highly appropriate to do so now on the occasion of his 80th birthday. Also these functions are of fundamental importance in analysis, standing as they do at the crossing of three important roads many of us have travelled by: complex function theory, the Fourier transform (or orthogonality in its various guises), and real-variable methods. In fact, the more recent applications of these ideas, described at the end of this essay, can be seen as confirmation of the significance Zygmund always attached to square functions.
This is going to be a partly historical survey, and so I hope you will allow me to take the usual liberties associated with this kind of enterprise: I will break up the exposition into certain “historical periods”, five to be precise; and by doing this I will be able to suggest my own views as to what might have been the key influences and ideas that brought about these developments.
One word of explanation about “square functions” is called for. A deep concept in mathematics is usually not an idea in its pure form, but rather takes various shapes depending on the uses it is put to. The same is true of square functions. These appear in a variety of forms, and while in spirit they are all the same, in actual practice they can be quite different. Thus the metamorphosis of square functions is all important.
First period (1922–1926): The primordial square functions
It appears that square functions arose first in an explicit form in a beautiful theorem of Kaczmarz and Zygmund dealing with the almost everywhere summability of orthogonal expansions. The theorem was proved in 1926 as the culmination of several papers each had written at about that time. The theorem itself was an outgrowth of what certainly was one of the main preoccupations of analysts at that time, namely the question of convergence of Fourier series. The problem was the following. Suppose \( f=f(\theta) \) is a continuous function on the circle, \( 0\leq\theta\leq 2\pi \), or more generally assume that \( f \) is in \( L^{2}(0,2\pi) \) or even that \( f \) is merely integrable; then does its Fourier series \begin{equation} \sum a_{n}e^{in \theta} \quad\text{with}\quad a_{n}=\frac{1}{2\pi}\int_{0}^{2\pi}f(\theta)e^{-in \theta}\,d\theta, \label{eqnon} \end{equation} converge almost everywhere?
A related parallel issue was the corresponding question for a general orthonormal expansion, but now limited to \( f\in L^{2} \). Thus if \( \{\phi_{n}\} \) is an orthonormal system, and if \[ f\sim\sum a_{n}\phi_{n} \quad\text{with}\quad a_{n}=\int f\overline{\phi}_{n} ,\] where \( \sum |a_{n}|^{2} < \infty \), then what could be said about the convergence almost everywhere of \begin{equation} \sum_{n=1}^{\infty}a_{n}\phi_{n}(x)? \label{eqntw} \end{equation}
The period we are dealing with (1922–1926) was marked by several striking achievements in this area, whose essential interest is not diminished even when viewed from the distant perspective of more than a half century. The first result to mention was the construction by Kolmogorov in 1923 [e3] of an \( L^{1} \) function whose Fourier series \eqref{eqnon} diverged almost everywhere.1 This construction made even more pressing the question of whether the Fourier series \eqref{eqnon} converges almost everywhere when (say) \( f \) belongs to \( L^{2} \), a problem that was not solved till more than forty years later. We shall turn to that in a moment, but now we point out that Kolmogorov’s example put into sharper relief the \( L^{2} \) results for general orthonormal developments that had been obtained (in 1922 and 1923) by Rademacher and Menshov. They showed that if \begin{equation} \sum|a_{n}|^{2}(\log n)^{2} < \infty \label{eqnth} \end{equation} then the series \eqref{eqntw} converges a.e.
Moreover the condition \eqref{eqnth} is best possible in the sense that if \( \{\lambda_{n}\} \) is monotonic and \[ \smash{\lambda_{n}}/\log n\rightarrow 0 ,\] then there exists an orthonormal system \( \{\phi_{n}\} \) and expansion \eqref{eqntw} which diverged a.e., while \[ \sum \smash{|a_{n}|^{2}\lambda_{n}^{2}} < \infty .\]
For ordinary Fourier series it was proved2 that the condition \eqref{eqnth} could be relaxed and be replaced by \begin{equation} \sum_{-\infty}^{\infty}|a_{n}|^{2}\log(|n|+2) < \infty. \label{eqnfo} \end{equation}
This last result stood unsurpassed for forty years until Carleson in 1966 showed that indeed the Fourier series of an \( L^{2} \) function converged almost everywhere. It may be interesting to note here that the basic tools required for Carleson’s theorem — the properties of the Hilbert transform and their relation with partial sums of Fourier series — were first brought to light in this early period: Kolmogorov’s proof of the weak-type (1, 1) property in 1925; M. Riesz’s paper of 1927 [e12] containing the \( L^{p} \) inequalities for conjugate functions and partial sums; and Besicovitch’s work (in 1923 [e2] and 1926 [e6]) which began the development of “real-variable” methods for Hilbert transforms.
Against this background we can now state the idea of Kaczmarz and Zygmund. It asserts as a general principle that for an \( L^{2} \) orthonormal expansion (i.e., one where \( \sum |a_{n}|^{2} < \infty \)), at almost all points the summability of the series \( \sum a_{n}\phi_{n}(x) \) by one method one has as a consequence the summability by any other method which is essentially stronger than convergence. A special (but typical) case is as follows:
Suppose \( \sum|a_{n}|^{2} < \infty \). Then \( \sum a_{n}\phi_{n}(x) \) is Cesàro summable at almost each point \( x \) where it is Abel summable.
Recall that the series is Abel summable at \( x \) if \[ \lim_{r\rightarrow 1}-\sum a_{n}r^{n}\phi_{n}(x) \quad\text{exists}. \] In addition, setting \begin{align*} s_{n} &=\sum_{k\equiv 0}^{n}a_{k}\phi_{k} \quad\text{and}\\ \sigma_{n} & =(s_{0}+s_{1}+\cdots+s_{n-1})/n , \end{align*} the Cesàro summability at \( x \) means the existence of the limit \[ \lim_{n\rightarrow\infty}\sigma_{n}(x) .\]
If a series is Cesàro summable it is automatically Abel summable (an exercise!), but the converse is in general not true. To gain a better idea of the scope of Theorem 1 let us point out that \[ \sigma_{n}(x) =\sum_{k=0}^{n}\Bigl(1-\frac kn\Bigr)a_{k}\phi_{k}(x) \] and a result similar to Theorem 1 holds when \( \sigma_{n}(x) \) is replaced by \[ \sigma_{n}^{\epsilon}(x) = \sum_{k=0}^{n}\Bigl(1-\frac kn\Bigr)^{\epsilon}a_{k}\phi_{k}(x) \quad\text{with }\epsilon > 0 \] (which corresponds essentially to \( (C, \epsilon) \) summability), but not for \( \epsilon=0 \) which of course would give the usual convergence.
For the proof of Theorem 1 Kaczmarz and Zygmund used a square function which they introduced for this purpose, namely \begin{equation} K(f)=\biggl(\sum_{n=2}^{\infty}n|\sigma_{n}-\sigma_{n-1}|^{2}\biggr)^{\mkern-2mu{1/2}} \label{eqnfi} \end{equation} with \( f\sim\sum a_{n}\phi_{n} \). The basic fact was the \( L^{2} \) inequality.
\[ \|K(f)\|_{L^{2}} \leq C \|f\|_{L^{2}}. \]
Clearly \[ \sigma_{n}=\sigma_{n-1}=\frac{1}{n(n-1)}\sum_{k=0}^{n-1}ka_{k}\phi_{k},\] so \[ \|\sigma_{n}-\sigma_{n-1}\|_{2}^{2}\leq\frac{c}{n^{4}}\sum_{k < n}k^{2}|a_{k}|^{2}, \quad n\geq 2, \] and thus \[ \sum_{2}^{\infty}n\|\sigma_{n}-\sigma_{n-1}\|_{2}^{2}\leq c^{\prime}\sum|a_{k}|^{2}=c^{\prime}\|f\|_{2}^{2} ,\] which proves the lemma.
To prove the theorem one invokes a variant of the classical Tauberian argument, namely, if \( \smash{\sum A_{n}} \) is Abel summable and \( \sum nA_{n}^{2} < \infty \), then \( \sum A_{n} \) converges. Now set \( A_{n}=\sigma_{n}-\sigma_{n-1} \); then the Abel summability of \( \sum A_{n} \) follows from the corresponding Abel summability of \( \sum a_{n}\phi_{n} \). The Tauberian condition holds at almost all points because of the lemma, and hence one obtains a.e. the convergence of \( \sum(\sigma_{n}-\sigma_{n-1}) \), proving the theorem.
We have seen the first example of a square function, namely \eqref{eqnfi}. While here it plays a minor role, its basic character is already revealed: Because of the agility of its quadratic nature it can exploit easily any situation in which orthogonality might be important.
Second period (1931–1938): Littlewood and Paley
Our scene shifts now from the Continent to England, and to the work of Littlewood and Paley. Our attention will be focused on two important series of connected papers: three jointly by Littlewood and Paley 1931–1938 [e14], [e17], [e19], and two by Paley 1932 [e15], [e16]. The investigations described in these papers were initiated simultaneously (the first paper in each series was submitted in April 1931), but because of Paley’s death in 1933 the final versions of several of the papers were probably Littlewood’s work alone. It is also interesting to note that no reference is made in these papers to the results described above, and so it is a reasonable guess that they were not aware of the possible relevance of the ideas of Kaczmarz and Zygmund.
The main theme of the Littlewood–Paley work was to consider the “dyadic decomposition” of Fourier series, namely \[ f(\theta)=\sum_{k=0}^{\infty}\Delta_{k}(\theta), \] with \begin{align*} & \Delta_{k}(\theta) =\sum_{2^{k-1}\leq|n| < 2^{k}}a_{n}e^{in \theta}, \quad k\geq1;\\ & \Delta_{0} =a_{0}. \end{align*}
Their basic result was that the \( L^{p} \) norm of a function was equivalent with the \( L^{p} \) norm of the square function associated with its dyadic decomposition.
For \( 1 < p < \infty \), \[ \biggl\Vert\Bigl(\sum_{k=0}^{\infty}|\Delta_{k}(\theta)|^{2}\Bigr)^{1/2}\biggr\Vert_{p}\simeq\|f\|_{p}. \]
To prove this theorem they needed and thus formulated an “abelian” analogue, where partial sums are replaced by Abel means, i.e., the Poisson integral of \( f=u(r, \theta) \). Thus given \( f \), let \( \Phi \) be the holomorphic function in the unit disc with \( \operatorname{Re}(\Phi)=u \), and \( \operatorname{Im}(\Phi(0))=0 \). They defined another square function the “\( g \)-function” of \( f \) by \[ g(f)(\theta)=\biggl(\int_{0}^{1}(1-r)\bigl|\Phi^{\prime}(r`e^{i\theta})\bigr|^{2}\,dr\biggr)^{1/2} \] and proved the following
With \( 1 < p < \infty \) \begin{equation} \|g(f)\|_{p}\simeq\|f\|_{p} \quad\textit{if } a_{0}=0. \label{eqnsi} \end{equation}
Paley sought a better understanding of the nature of these problems by considering variants of Theorem 2 where the Fourier series expansion is replaced by the Walsh–Paley expansion. The Walsh–Paley functions (called Walsh–Kaczmarz functions at that time) are now usually described as follows. We identify the interval \( [0,1] \) with the compact group consisting of an infinite product of copies of the two-element group (via the usual binary expansion). The characters of that group are the Walsh–Paley functions. Writing each integer as a sum of powers of 2 gives a natural enumeration of the characters \( \{\phi_{n}\}_{n=0}^{\infty} \). If we set \begin{align*} & f\sim\sum a_{n}\phi_{n} \quad\text{and}\\ & \Delta_{k}=s_{2^{k}}-s_{2^{k-1}}=\sum_{2^{k-1} < n\leq 2^{k}}a_{n}\phi_{n} \quad\text{with}\\ & \Delta_{0}=a_{0}, \end{align*} then Paley’s theorem reads as
For the Walsh–Paley series, with \( 1 < p < \infty \) \[ \biggl\Vert\Bigl(\sum|\Delta_{k}|^{2}\Bigr)^{1/2}\biggr\Vert_{p}\simeq\|f\|_{p}. \]
What makes the proof of Theorem 4 easier than that of Theorem 2 are the various simplifications inherent in the fact that \( \{s_{2^{k}}(f)\} \) is a martingale sequence. The name “martingale” had not yet been coined. Moreover, a systematic extension of Theorem 4 from the point of view of martingales, and its further exploration in the magical world of Brownian motion — all these came much later, as we shall see. However in Paley’s time some of the arguments typical of martingale theory were already understood. Thus it had been observed that \( s_{2^{k}}(f) \) was constant on each \( 2^{k} \) intervals (of length \( 2^{-k} \)) of the form \[ \bigl((l-1)/2^{k}, \ l/2^{k}\bigr) , \qquad l=1,\ldots,2^{k}, \] and that the value of \( s_{2^{k}}(f) \) on each of these intervals was the mean-value of \( f \) there. From this it is obvious when \( f\in L^{p} \), \( 1\leq p\leq\infty \), then \( \{s_{2^{k}}(f)\} \) are bounded in \( L^{p} \) norm; the analogue for Fourier series is definitely nonobvious when \( 1 < p < \infty \), and in fact false when \( p=1 \) or \( p=\infty \).
We shall now describe the main device Paley used in his proof of Theorem 4. Paley was, from what one can learn about his life, a man of courage and almost reckless daring. A hint of that spirit can be found in his approach to difficult mathematical problems. When faced by the proof of an inequality like \begin{equation} \int\Bigl(\sum|\Delta_{k}|^{2}\Bigr)^{p/2}\,dx\leq A_{p}^{p}\int|f|^{p}\,dx \label{eqnse} \end{equation} where \( p \) is e.g. an even integer \( 2r \), he instinctively sought to face the problem head-on by multiplying out the \( r \) infinite sums, and then coming to grips directly with the resulting multitude of terms. This kind of audacious attack is not so common in our time when it is easier to rely on a variety of sophisticated gadgets which are household items for the working analyst. But given Paley’s resourcefulness this approach worked marvelously well. His key observation was that \begin{equation} \sum_{i_{r}}\int\Delta_{i_{1}}^{2}\Delta_{i_{2}}^{2}\cdots\Delta_{i_{r}}^{2}\,dx \leq \int\Delta_{i_{1}}^{2}\cdots\Delta_{i_{r-1}}^{2}f^{2}\,dx \label{eqnei} \end{equation} where the summation is taken over those \( i_{r} \) for which \( i_{r} > \max(i_{1},\ldots \), \( i_{r-1}) \), which in turn follows from the martingale property that \begin{equation} \int g(x)\Delta_{k}(x)\,dx=0 \label{eqnni} \end{equation} whenever \( g \) is “measurable with respect to the past”. From \eqref{eqnei} Paley was able to achieve the proof of \eqref{eqnse} in a few strokes.
The same idea inspired Littlewood and Paley’s proof of Theorem 3, although the execution is more complicated; a more recondite form of \eqref{eqnei} must be proved, and here nothing as simple as \eqref{eqnni} holds. The appropriate substitute must be fashioned with care out of Green’s theorem in conjunction with the identity \[ \Delta(|\Phi|^{2})=4|\Phi^{\prime}|^{2} .\] With Theorem 3 proved, Littlewood and Paley were able to deduce Theorem 4, but here also the steps required were not easy. It was only after their theory was reexamined by Zygmund and his student Marcinkiewicz, that a clearer and broader view of the whole subject began to emerge. To this we shall now turn.
Third period (1938–1945): Marcinkiewicz and Zygmund
There are two significant events that marked the period we are now concerned with. The first, which even predated the Littlewood–Paley collaboration, was the introduction by Lusin in 1930 [e13] of his “area integral”. The idea of Lusin seems to have sparked no further interest until Marcinkiewicz and Zygmund took up the subject again about 8 years later. There began a brief but very creative period of work by them — a flowering of the theory where connections with a variety of other ideas were brought to light. The second event, a tragic one, followed soon thereafter with the death of Marcinkiewicz in 1940, and it was left to Zygmund alone to resolve some of the issues that their work had led them to.
It may help to clarify the description of the principal ideas that Marcinkiewicz and Zygmund contributed to the study of square functions if we organize our presentation in terms of the four main lines along which their work proceeded.
The first subject we shall treat (and the only one that was, strictly speaking, joint work) deals with the area integral of Lusin. The definition of this is as follows. Suppose \( \Phi(z) \) is holomorphic in the unit disc and define \( A(\Phi)(\theta) \) by \begin{equation} (A(\Phi)(\theta))^{2}=\int_{\Gamma(\theta)}|\Phi^{\prime}(z)|^{2}\,dx\,dy \label{eqnonze} \end{equation} with \( \Gamma(\theta) \) a standard “triangle” (nontangential approach region) in the unit disc with vertex at \( e^{i\theta} \). Observe that the expression represents the area of the image of \( \Gamma(\theta) \) under the mapping \( z\rightarrow\Phi(z) \), with points counted according to their multiplicity. Lusin’s discovery was that if \( \Phi \) is bounded, then \( A(\Phi)(\theta) \) is finite for almost any \( \theta \); more generally that \begin{equation} \|A(\Phi)(\theta)\|_{2}\simeq\|\Phi\|_{2} \quad\text{if } \Phi(0)=0. \label{eqnonon} \end{equation}
Marcinkiewicz and Zygmund realized that on the one hand there was a close analogy between the Littlewood–Paley \( g \)-function and \( A(\Phi) \) (in fact \( A \) is a pointwise majorant of \( g \), and the same kind of \( L^{p} \) inequalities held for \( A \) as for \( g) \); but on the other hand they surmised that the parallel between these two square functions should not be pushed too far. The main result they obtained for \( A \) was a localized version of Lusin’s result. This can be stated as follows. Let \[ \Phi^{*}(\theta)=\sup_{z\in\Gamma(\theta)}|\Phi(z)| .\]
If \( \Phi \) is holomorphic in the unit disc, then for almost every \( \theta \), \( \Phi^{*}(\theta) < \infty \) implies \( A(\Phi)(\theta) < \infty \).
The converse was proved five years later by Spencer,3 namely
If \( \Phi \) is holomorphic in the unit disc, then for almost every \( \theta \), \( A(\Phi)(\theta) < \infty \) implies \( \Phi^{*}(\theta) < \infty \).
A corresponding converse for \( g \)-functions is false, and so the area integral \( A \) has some special affinities with the boundary behavior of \( \Phi \), going beyond what it shares with \( g \).
The second line of investigation was Zygmund’s reexamination of the Littlewood–Paley theorem for the dyadic decomposition of Fourier series. His analysis led him to recast and simplify the ideas of the proof. These simplifications had important consequences for later work, as we shall see; but their immediate interest was that it allowed him to connect the square function \( \bigl(\sum|\Delta_{k}|^{2}\bigr)^{1/2} \) with the one he and Kaczmarz had considered a dozen years earlier in their study of summability of orthogonal series (see \eqref{eqnfi}). We suppose that we take the Fourier expansion and set \[ f(\theta)\sim\sum_{n\geq 0}a_{n}e^{in \theta} ,\] \( f\in L^{p} \), so that \( f\in H^{p} \). If we write as before \[ K(f)(\theta)=\biggl(\sum_{n\geq 1}n \bigl|\sigma_{n}(\theta)-\sigma_{n-1}(\theta)\bigr|^{2}\biggr)^{1/2} \] where \[ \sigma_{n}(\theta)=\sum_{0\leq k < n}\Bigl(1-\frac kn\Bigr)a_{k}e^{ik\theta} ,\] then we can state the following theorem:
\( \|K(f)\|_{p}\leq A_{p}\|f\|_{p},\ 1 < p < \infty \).4
The proof of this theorem required two steps. First, like that of Theorem 2, one needed the \( L^{p} \) inequalities for the \( g \)-function (see \eqref{eqnsi}). Here the major simplification was made by Zygmund some years later5 and it came in the proof of the fact that \[ \|g(f)\|_{p}\leq A_{p}\|f\|_{p} ,\] when \( p > 2 \). (The case \( p=2 \) was easy, and the range \( p < 2 \) was reducible to \( p=2 \) by the artifice standard in those days of using Blaschke product decompositions for \( H^{p} \) functions.) For the difficult case \( p > 2 \) a “square duality” was used. An ingenious argument shows that whenever \( \phi \geq 0 \), \begin{equation} \int g(f)^{2}\phi\,d\theta \leq c \biggl\{\int g(f)g(\phi)M(f)\,d\theta + \int|f|^{2}\phi\,d\theta\biggr\} \label{eqnontw} \end{equation} where \( M \) is the Hardy–Littlewood maximal function. For \( p\geq 4 \), \eqref{eqnontw} then gives the desired result as a consequence of the case \( p\leq 2 \) applied to \( g(\phi) \). Incidentally, the notion of square duality which seems to have originated in this context continues to find other applications of interest.
The second simplification Zygmund made was in the manner in which one could reduce the \( L^{p} \) control of \( \bigl(\sum|\Delta_{k}|^{2}\bigr)^{1/2} \) to that of the \( g \)-function; and in fact a whole list of other square functions (in particular, \( \bigl(\sum_n |\sigma_{n}-\sigma_{n-1}|^{2}\bigr)^{1/2} \)) could be handled in the same way.6 This streamlining of the proof he found can be said to have led directly to the “Marcinkiewicz multiplier theorem”.
In its one-dimensional form the celebrated theorem that bears Marcinkiewicz’s name can be stated as follows. Suppose we consider a transformation \( T \) given by a multiplier sequence \( \{\lambda_{n}\}_{-\infty}^{\infty} \), defined by \[ Tf\sim\sum\lambda_{n}a_{n}e^{in \theta} \quad\text{ whenever } f\sim\sum a_{n}e^{in \theta}. \] Then \( T \) is bounded on \( L^{p},\ 1 < p < \infty \), if (i) the sequence \( \{\lambda_{n}\} \) is bounded, and (ii) if it varies boundedly over each dyadic block; more precisely, \[ \sum_{2^{k}\leq|j| < 2^{k+1}} |\lambda_{j}-\lambda_{j-1}|\leq M .\] (Note that the special case when the sequence is constant on each dyadic block is an immediate consequence of Theorem 2.) In one dimension the theorem’s greatest merit is, I believe, in its formulation rather than its proof; the latter is much the same as that of Theorem 6.
It is in the passage to higher dimensions, however, that one finds the great significance of Marcinkiewicz’s work on multipliers. Its importance was not only the fact that one could use hitherto one-dimensional methods to prove \( n \)-dimensional results; even more profound were the applications to other questions, such as estimates for partial differential equations, already envisaged at that time. We can now see in retrospect that Marcinkiewicz thus anticipated some of the basic inequalities later proved by the theory of singular integrals.7 For simplicity of notation we shall state the Marcinkiewicz multiplier theorem in the case of two dimensions. Consider the multiplier operator \( T \) given by \[ Tf\sim\sum\lambda_{nm}a_{nm}e^{i(n\theta+m\phi)} \quad\text{for } f\sim\sum a_{nm}e^{i(n\theta+m\phi)} .\] Let \( I_{k} \) denote the dyadic interval \[ \{n\mid 2^{k-1}\leq|n| < 2^{k}\} \quad\text{and}\quad J_{l}=\{m\mid 2^{l-1}\leq|m| < 2^{l}\} .\] Write \begin{align*} & \Delta_{1}\lambda_{n,m}=\lambda_{n+1,m}-\lambda_{n,m},\\ & \Delta_{2}\lambda_{n,m}=\lambda_{n,m+1}-\lambda_{n,m}, \text{ and}\\ & \Delta_{1,2}=\Delta_{1}\cdot\Delta_{2}. \end{align*} Now assume the finiteness of the following four quantities:
\( \sup_{n,m}|\lambda_{n,m}| \);
\( \sup_{k,m}\sum_{n\in I_{k}}|\Delta_{1}\lambda_{n,m}| \), and \( \sup_{m,l}\sum_{m\in J_{l}}|\Delta_{2}\lambda_{n,m}| \); and
\( \sup_{k,l}\sum_{n\in I_{k}}\sum_{n\in J_{l}}|\Delta_{1}\Delta_{2}\lambda_{n,m}| \).
Under the assumption made above, \( T \) is bounded on \( L^{p},\ 1 < p < \infty \).
The last of the four major lines of investigation concerning square functions that Marcinkiewicz and Zygmund undertook dealt with the attempt to find a completely “real-variable” analogue of the functions of Lusin and Littlewood–Paley. Starting with a function \( f \) on the circle, the area integral and \( g \)-functions are defined in terms of holomorphic (or harmonic) functions whose boundary values are related to \( f \). Also the dyadic square function of Theorem 2 requires the Fourier expansion of \( f \). What was desired was a variant that could be defined more directly in terms of the basic real-variable operations such as integration, differentiation, etc.
After some experimentation Marcinkiewicz hit upon the idea of considering \begin{equation} \mu(F)(x)=\biggl(\int_{0}^{\pi}\bigl|F(x+t)+F(x-t)-2F(x)\bigr|^{2} \frac{dt}{t^{3}}\biggr)^{1/2} \label{eqnonth} \end{equation} with \[ F(x)=\int^{x}f(t)\,dt. \]
It was not difficult to see that \[ \|\mu(F)\|_{L^{2}}\simeq\|f\|_{L^{2}} \quad\text{ if }\,\int_{0}^{2\pi}f(x)\,dx =0. \] With this, and using the real-variable tools he had already developed, he was able to prove the analogue of the theorem he and Zygmund had found for the area integral (Theorem 5a). The result was as follows.
Suppose \( F\in L^{2} \). If \( F^{\prime}(x) \) exists in a set \( E \), then \( \mu(F)(x) < \infty \) for almost every \( x \in E \).
The questions that arose were first, whether some of the other properties of the area integral or \( g \)-function held as well for \( \mu \); and, more interestingly, what was the real significance of the Marcinkiewicz function. Zygmund found an answer to the first question in 1944 [5] when he proved
For \( 1 < p < \infty \), \[ \|\mu(F)\|_{L^{p}}\simeq\|f\|_{L^{p}} \quad\text{ if }\, \int_{0}^{2\pi}f(x)\,dx=0. \]
The argument he developed to show this was not an easy one. He was required to invoke the most arcane of the square functions, the function \( g^{*} \), which Littlewood and Paley had also studied. He established the \( L^{p} \) inequalities for it and showed that it actually was a majorant of the Marcinkiewicz function. Incidentally \( g^{*} \) is defined by \[ (g^{*}(\Phi)(\theta))^{2}=\int_{0}^{1}\int_{0}^{2\pi}\bigl|\Phi^{\prime}(r e^{i(\theta+\phi)})\bigr|^{2}\bigg|\frac{1-r}{1-r e^{i\phi}}\bigg|^{2}\,d\phi\, dr, \] and so majorizes also of the area integral \eqref{eqnonze}, but it takes into account “the tangential” approach to the boundary.8 The problem that remained was to discover whether there was a converse to the local result given by Theorem 8a, or to put the question more broadly, to find the meaning of the Marcinkiewicz function. It was to be almost twenty more years before an answer to that question would be found.
Fourth period (1950–1964): Zygmund and his students
Starting about 1950 a new direction of considerable importance began to emerge in force. Hinted at in earlier work (of Besicovitch and Marcinkiewicz, among others), its thrust was the development of “real-variable” methods to replace complex function theory — that favored ally of one-dimensional Fourier analysis. What made this new emphasis particularly timely, in fact indispensable, was that only with techniques coming from real-variable theory could one hope to come to grips with many interesting \( n \)-dimensional analogues of the one-dimensional theory.
The mathematician animating this development was Antoni Zygmund. In many ways he set the broad outlines of the effort, he mastered by his work some of the crucial difficulties, and was throughout the source of inspiration for his students and collaborators.
a: The area integral
A pioneering result in this new direction was Calderón’s extension to \( \mathbf{R}^{n} \) of the theorem of Marcinkiewicz and Zygmund concerning the area integral, a subject he had taken up at the suggestion of Zygmund. The setting for this is as follows. We let \[ \mathbf{R}_{+}^{n+1}=\{(x, y), x=(x_{1},\ldots,x_{n})\in \mathbf{R}^{n}, y\in \mathbf{R}^{+}\} \] be the upper half-space, and suppose that \( u(x, y) \) is harmonic (with respect to the \( n+1 \) variable \( x_{1},\ldots, \) \( x_{n}, y \)). Sometimes we shall assume that \( u \) is in fact the Poisson integral of an appropriate function \( f \) defined on \( \mathbf{R}^{n} \), and then we shall write \( u=\operatorname{PI}(f) \). We let \( \Gamma=\{(x, y), \) \( |x| < y\} \) be a standard cone with vertex at the origin, \( \Gamma^{\prime} \) its truncated version, \( \Gamma^{\prime}=\Gamma\cap\{y < 1\} \). For any \( \bar{x}\in \mathbf{R}^{n} \), \( \Gamma(\bar{x}) \) and \( \Gamma^{\prime}(\bar{x}) \) will be the corresponding cones with vertices at \( \bar{x} \). The area integral of \( u \) is defined by \begin{equation} (A(u)(\bar{x}))^{2}=\int_{\Gamma(\bar{x})}|\nabla u|^{2}y^{1-n}\,dx\, dy \label{eqnonfo} \end{equation} where \( |\nabla u|^{2}=|\partial u/\partial y|^{2}+\sum_{j=1}^{n}|\partial u/\partial x_{\partial}|^{2} \).
Similarly for the local theory one needs the analogue of \eqref{eqnonfo} where \( \Gamma(\bar{x}) \) is replaced by \( \Gamma^{\prime}(\bar{x}) \); this defines \( A_{\mathrm{loc}}(u)(\bar{x}) \). The maximal function \( u^{*} \) is defined by \[ u^{*}(\bar{x})=\sup_{(x,y)\in\Gamma(\bar{x})}|u(x, y)| ,\] and its local analogue \( u_{\mathrm{loc}}^{*} \) is given by replacing \( \Gamma(\bar{x}) \) by \( \Gamma^{\prime}(\bar{x}) \) in the definition.
Suppose \( u \) is harmonic in \( \mathbf{R}_{+}^{n+1} \). Then \( A_{\mathrm{loc}}u(\bar{x}) < \infty \) at almost every point \( \bar{x}\in \mathbf{R}^{n} \) where \( u_{\mathrm{loc}}^{*}(\bar{x}) < \infty \).
Calderón’s proof of this theorem was published at the same time (1950) as another important result he found, namely the extension of Privalov’s theorem: \( u \) has a nontangential limit at almost every \( \bar{x}\in \mathbf{R}^{n} \), where \( u_{\mathrm{loc}}^{*}(\bar{x}) < \infty \). We shall discuss the ideas behind the proof of Theorem 9a later when we take up its converse. Now we turn to the “global” version, i.e., the higher-dimensional analogue of the Littlewood–Paley theorem (Theorem 3).
Suppose \( u= \operatorname{PI} (f) \), then \[ \|A(u)\|_{L^{p}}\simeq\|f\|_{L^{p}},\quad 1 < p < \infty. \]
It would be difficult after 25 years to recall the precise thoughts that motivated the proof of Theorem 9b, nor would it be easy now for one to appreciate the difficulties that seemed then to stand in the way. But I do remember that those of us who were graduate students of Zygmund in the middle 1950’s were shaped by the event, akin to the Creation, which appeared to some of us to be the beginning of everything important: the 1952 Acta paper which developed via the Calderón–Zygmund lemma, the real variable methods giving the extension of the Hilbert transform to \( n \)-dimensions. What was more natural, therefore, than to attempt to prove the \( L^{p} \) boundedness of \( f\rightarrow A(u) \) by adapting these methods? This idea indeed worked, although the initial complicated proofs were later much simplified. The analysis succeeded as well for the Marcinkiewicz function \eqref{eqnonth}, and proved also that the mappings \( f\rightarrow A(u) \) and \( f\rightarrow\mu(F) \) were of weak-type (1, 1).
We turn now to the proof of Theorem 9a. Its one-dimensional version (Theorem 5a) had been done by using complex function theory, in particular conformal mappings. So a completely different approach was needed. The idea behind it can be understood by examining the case \( p=2 \) of Theorem 9b, which has an easy proof. A direct calculation shows that \begin{equation} \int_{\mathbf{R}^{n}}A^{2}(u)\,dx=c\int_{\mathbf{R}_{+}^{n+1}}y|\nabla u|^{2}\,dx\,dy, \label{eqnonfi} \end{equation} where \( c \) is the volume of the unit ball. Next we can use the fact that \[ |\nabla u|^{2}=\tfrac{1}{2}\Delta(|u|^{2}) ,\] and so by Green’s theorem \begin{align*} \int_{\mathbf{R}^{n}}A^{2}(u)\,dx & =\frac{c}{2}\iint_{\mathbf{R}_{+}^{n+1}}y\Delta(|u|^{2})\,dx\,dy\\ & =\frac{c}{2}\int|u(x,0)|^{2}\,dx, \end{align*} which proves Theorem 9b for \( p=2 \), since \( u(x, 0)=f(x) \). Thus in order to control \( A_{\mathrm{loc}}(u)(x) \) on a set \( E \), it is natural to consider \[ \int_{E}A_{\mathrm{loc}}^{2}(u)(x)\,dx \] which in turn is dominated by \[ c \int_{R(E)} y|\nabla u|^{2}\,dx\,dy ,\] where \( R(E) \) is a standard “sawtooth” region in \( \mathbf{R}_{+}^{n+1} \) based on \( E \). At this stage (which is the turning point of the proof) Calderón invoked Green’s theorem for another region containing \( R(E) \), whose Green’s function he could essentially bound from below by \( c^{\prime}y \).
To prove the converse of Theorem 9a along these lines appeared to require, among other things, appropriate bounds from above for Green’s function for such regions, and that seemed much beyond what could be done then.9 What turned out to be the right course of action was to finesse the problem of Green’s function and to proceed directly with estimates that followed from the finiteness of \[ \int_{R(E)}y|\nabla u|^{2}\,dx\,dy .\] These arguments also proved to be useful in other situations, as we shall see later. The result obtained was
Suppose \( u \) is harmonic in \( \mathbf{R}_{+}^{n+1} \). Then \( u_{\mathrm{loc}}^{*}(\bar{x}) < \infty \) for almost all points \( \bar{x}\in \mathbf{R}^{n} \) where \( A_{\mathrm{loc}}(u)(\bar{x}) < \infty \).
I remember quite vividly the excitement surrounding the events at the time of this work. It was March 1959, and I had returned to the University of Chicago the fall before. Frequently I met with my friends Guido Weiss and Mary Weiss, and together we often found ourselves in Zygmund’s office (Eckhart 309, two doors from mine). With our teacher our conversations ranged over a wide variety of topics (not all mathematical) and more than once the subject of square functions arose. When this happened the mood would change, if only slightly, as if in deference to their special status, and the enigma that surrounded them. I had an idea which seemed promising. But before we could see where it might lead came the spring break. Further work would have to be held in abeyance since we were each going our own ways: Zygmund travelled to Boston to visit Calderón; Guido and Mary Weiss, having borrowed my Chevrolet, drove to Virginia for a vacation trip; and I went to New York to be married.
b: The Marcinkiewicz function
Influenced by the renewed interest in area integrals, and encouraged by some recent work he had done with Mary Weiss,10 Zygmund returned to the study of the Marcinkiewicz integral \eqref{eqnonth} and the problem of finding a converse to Theorem 8a. He was convinced that now (more than 20 years after Marcinkiewicz’s original work) the time was ripe to see matters to a conclusion. He suggested to me that we work on the problem together, and of course I was very happy to accept his offer. For me this was a unique and rewarding collaboration — not just because of the special satisfaction one derives when accepted as an equal by one’s teacher — but also because as it turned out he did most of the work that really counted!
We realized first that Theorem 8a itself could be somewhat strengthened; what was required was the notion of the derivative \( F^{\prime}(x) \) existing (at \( x \)) “in the \( L^{2} \) sense”. Thus \( F^{\prime}(x) \) existed in this generalized sense if11 \begin{equation} \frac{1}{h}\int_{0}^{h}\bigg|\frac{F(x+t)-F(x)}{t}-F^{\prime}(x)\bigg|^{2}\,dt\rightarrow 0, \quad\text{ as }\, h\rightarrow 0. \label{eqnonsi} \end{equation}
The finer version of Theorem 8a was then: If \( F\in L^{2} \) had a derivative in the sense of \eqref{eqnonsi} at each \( x \in E \), then \( \mu(F)(x) < \infty \) for almost every \( x\in E \). It was in this form that one might seek a converse. The basic plan was to try to make matters turn on the analogous situation which held for the area integral, where one can pass from the finiteness of a quadratic expression to the existence of a limit. After a series of reductions we were able to show that at each point \( x \) where \( \mu(F)(x) < \infty \) one had \begin{equation} \int_{|t|\leq y}\bigg|\frac{\partial^{2}u}{\partial y^{2}}(x+t, y)+\frac{\partial^{2}u}{\partial y^{2}}(x-t, y)\bigg|^{2}\,dt\,dy < \infty \label{eqnonse} \end{equation} with \( u=\operatorname{PI}(F) \). On the other hand we could show (using Theorem 5b) that at almost every \( x \) where \begin{equation} \int_{|t|\leq y}\bigg|\frac{\partial^{2}u}{\partial y^{2}}(x+t, y)\bigg|^{2}\,dt\,dy < \infty \label{eqnonei} \end{equation} the conclusion \eqref{eqnonsi} actually held.
The basic difficulty, the passage from \eqref{eqnonse} to \eqref{eqnonei}, was overcome by Zygmund using a clever “desymmetrization” argument; several weeks later he presented me with an essentially final draft of the paper which he had typed himself!
There were several variants of the final result — involving extensions to \( n \)-dimensions, or higher derivatives, or even fractional derivatives. The simplest version, however, was the following:
Let \( F\in L^{2}(0,2\pi) \). Then the set of point \( x \) where \[ \int_{0}^{\pi}\bigl|F(x+t)+F(x-t)-2F(x)\bigr|^{2}\,dt/t^{3} < \infty, \] and the set of points where \( F^{\prime}(x) \) exists in the \( L^{2} \) sense (i.e., \eqref{eqnonsi}) differ by a set of measure zero.
Fifth period (1966–present): Further applications of square functions
We have traced the development of square functions from their beginnings to a stage where their nature was much better understood, in terms of a series of deep theorems that had been obtained. Yet it is only more recently that their central role in several fields of analysis has become more apparent. I shall try to describe this very briefly in terms of three specific areas: \( H^{p} \) spaces, symmetric diffusion semigroups, and differentiation theory in \( \mathbf{R}^{n} \).
a: \( H^{p} \) theory
Beginning in about 1966 two separate directions of research involving square functions were undertaken, and when brought together these ultimately led to a rich harvest in the theory of \( H^{p} \) spaces. The first started with Burkholder’s [e31] extension of Paley’s theorem (Theorem 4 for Walsh–Paley series) to general martingales. He observed that Paley’s argument extended to this general setting, but also found his own approach which was very different. He showed that if \[ E_{k}=E(\,\cdot\,\mid\mathcal{F}_{k}) \] are the conditional expectations for an increasing sequence of \( \sigma \)-fields \( \{\mathcal{F}_{k}\}_{k=0}^{\infty} \), then with \( E_{-1}(f)\equiv 0 \), \begin{equation} \biggl\Vert\Bigl(\sum_{k=0}^{\infty}\bigl|(E_{k}-E_{k-1})(f)\bigr|^{2}\Bigr)^{1/2}\biggr\Vert_p\simeq\lim_{k\rightarrow\infty}\|E_{k}(f)\|_{p}, \quad 1 < p < \infty. \label{eqnonni} \end{equation}
Next, in work with Gundy, and later also with Silverstein, the following advances were made:12 It was shown that \eqref{eqnonni} extended to \( p\leq 1 \) if \( \lim_{k\rightarrow\infty}\|E_{k}(f)\|_{p} \) was replaced with \( \|\sup_{k}E_{k}(f)\|_{p} \), for a large class of martingales. This class incidentally includes those occurring for the Walsh–Paley series, but more importantly these results went over to the (continuous parameter) martingales arising from Brownian motion applied to harmonic functions. To be more precise, let \( z_{t}(\omega) \) denote the standard Brownian motion in the complex \( z \)-plane, starting at the origin and stopped when reaching the unit circle. Here \( 0\leq t < \infty \) is the time parameter, and \( \omega \) labels the Brownian path, with \( \omega \in\Omega \), \( \Omega \) being the probability space. If \( u \) is harmonic in the unit disc, \( t\rightarrow u(z_{t}(\omega)) \) is a continuous-time martingale. Let \[ M_{B}(u)(\omega)=\sup_{0\leq t < \infty} |u(z_{t}(\omega))| \] be the Brownian maximal function, and \( S(u)(\omega) \) the martingale square function, \[ S(u)(\omega)=\biggl(\int_{0}^{\infty}|\nabla u(z_{t}(\omega))|^{2}\,dt\biggr)^{1/2} .\] Their result then was that \begin{equation} \|Su\|_{L^{p}(\Omega)}\simeq\|M_{B}(u)\|_{L^{p}(\Omega)},\quad 0 < p < \infty, \label{eqntwze} \end{equation} whenever \( u(0)=0 \).
The most striking application of this circle of ideas was a conclusion drawn from \eqref{eqntwze}, to wit, whenever \( F=u+iv \) is holomorphic in the unit disc, then \( F\in H^{p} \) if and only if \( u^{*}\in L^{p},\ 0 < p < \infty \).
The second line of research began when a more direct connection between standard multiplier operators and square function was discovered. The result was easy to state. Whenever \( T \) is a multiplier operator of the Marcinkiewicz type on \( \mathbf{R}^{n} \) (more precisely one that satisfies the kind of conditions put in Hörmander’s version of that multiplier theorem), then the area integral corresponding to \( T(f) \) is pointwise dominated by a \( g^{*} \) function of \( f \), i.e., \begin{equation} A(T\mkern-4mu f)(x)\leq cg_{\lambda}^{*}(f)(x), \label{eqntwon} \end{equation} where \[ g_{\lambda}^{*}(f)(x)=\biggl(\int\bigl|\nabla u(x-t, y)\bigr|^{2}\Bigl(\frac{y}{y+|t|}\Bigr)^{n\lambda}y^{1-n}\,dy\,dt\biggr)^{1/2}, \] and \( \lambda \) is a parameter which depends on the nature of the multiplier. An \( H^{p} \) theory in \( \mathbf{R}^{n} \) had already been initiated several years before (by the efforts of G. Weiss and others), and using it and \eqref{eqntwon} it followed that these multipliers also extended to bounded operators on \( H^{p} \).
From these considerations it might be guessed that a basic tool for \( H^{p} \) theory is the relation between square functions and maximal properties of (harmonic) functions. Here important contributions were made by C. Fefferman. One of the results obtained in this direction was the following theorem:
Suppose that \( u \) is harmonic in \( \mathbf{R}_{+}^{n+1} \), and \( u(x, y)\rightarrow 0 \) as \( y\rightarrow\infty \). Then [e40] \[ \|A(u)\|_{p}\simeq\|u^{*}\|_{p}, \quad 0 < p < \infty. \]
Incidentally it should be remarked that the proof used the same approach as its “local” analogue, Theorem 9c, but additional arguments of a quantitative nature were of course needed. More recently some of these results for square functions have been extended to product domains, and in this context generalizations of Theorems 9 and 11 have been found.13
b: Symmetric diffusion semigroups
The semigroups which are the subject of the title are a family of operators \( \{T^{t}\}_{t\geq 0} \), each bounded and selfadjoint on \( L^{2} \), with \( T^{t} \) having norm \( \leq 1 \) on every \( L^{p} \), \( 1\leq p\leq\infty \), and \[ T^{t_{1}+t_{2}}=T^{t_{1}}T^{t_{2}} ,\] with \[ \lim_{t\rightarrow 0}T_{f}^{t}=f \] for \( f\in L^{2} \). Sometimes the additional hypotheses are made that \( T^{t}(1)=1 \), and \( T^{t} \) is positivity-preserving.
The significance of this notion derives from the many important examples of such semigroups in analysis, and the many rich properties that they share. In fact some of the basic results discussed above have sessions valid in this context. Here we mention two, a maximal theorem, and a multiplier theorem in the spirit of Marcinkiewicz’s theorem (Theorem 7).
\( \bigl\|\sup_{t > 0}|T^{t}f|\bigr\|_{p}\leq A_{p}\|f\|_{p},\ 1 < p\leq\infty \).
To formulate the multiplier theorem we write \( T^{t} \) in terms of its spectral decomposition, \[ T^{t}\mkern-2mu=\mkern-2mu\int_0^{\infty} e^{-\lambda t}\,dE(\lambda) ,\] where \( E(\lambda) \) is a spectral resolution on \( L^{2} \). For each bounded Borel measurable function \( m \) on \( (0, \infty) \), consider the “multiplier” operator \( T_{m} \) given by \[ T_{m}=\int_0^\infty m(\lambda)\,dE(\lambda) .\] Here we assume that \( m \) is of the form \[ m(\lambda)=\lambda\int_{0}^{\infty}M(s)e^{-\lambda s}\,ds ,\] with \( M \) a bounded function.
\( \|T_{m}(f)\|_{p}\leq A_{p}\|f\|_{p},\ 1 < p < \infty \).
A key tool used for the proof of both these theorems are the Littlewood–Paley type functions \[ g_{k}(f)(x)=\biggl(\int_{0}^{\infty}t^{2k-1}\Bigl|\frac{\partial^{k}}{\partial t^{k}}T^{t}(f)\Bigr|^{2}\,dt\biggr)^{1/2} \quad\text{with }k=1,2,\ldots. \] Also for \( T_{m} \) a relation of the same kind as \eqref{eqntwon} holds.14
c: Differentiation theorems in \( \mathbf{R}^{n} \)
Probably the most dramatic applications of square functions occur in differentiation theory. The general problem here is to prove that \begin{equation} \lim_{\operatorname{diam} R\rightarrow 0}\frac{1}{\mu(R)}\int_{R}f(x-y)\,d\mu(y)=f(x)\quad \text{ a.e.} \label{eqntwtw} \end{equation} where \( R \) ranges over a suitable collection \( \mathcal{R} \) of sets “centered” at the origin. The classical examples of these are (i) where \( \mathcal{R} \) is the collection of all balls (or cubes) containing the origin, and (ii) where \( \mathcal{R} \) is the collection of all rectangles containing the origin, with sides parallel to the axes. For each of these results a Vitali-type covering theorem has played a decisive result. Thus it may seem surprising that the alien notion of square functions would turn out to be the appropriate idea in related situations, where covering arguments were unavailing. In formulating the results obtained this way we shall, as is usual, deal with the corresponding maximal function \[ M_{\mathcal{R}}(f)(x)=\sup_{R\in\mathcal{R}}\frac{1}{\mu(R)}\bigg|\int_{R}f(x-y)\,d\mu(y)\bigg|, \] and the possibility of asserting inequalities of the type \begin{equation} \|M_{\mathcal{R}}(f)\|_{p}\leq A_{p}\|f\|_{p}. \label{eqntwth} \end{equation}
The inequality \eqref{eqntwth} holds in the following cases:
\( \mathcal{R} \) is the collection of spheres centered at the origin; \( d\mu \) is the uniform surface measure; and \( n\geq 3 \), with \( p > n/(n-1) \).
\( \mathcal{R} \) is the collection of initial segments \( \{\gamma(t), 0\leq t\leq h\} \) of a smooth curve \( t\rightarrow\gamma(t) \), with \( \gamma(0)=0 \), and \( \gamma \) having nonzero “curvature” at the origin; here \( d\mu \) is arc-length, \( n\geq 1 \) and \( p > 1 \).
\( \mathcal{R} \) is the collection of rectangles (in \( \mathbf{R}^{2} \)) containing the origin, which make an angle \( \theta_{k} \) with a fixed direction, where \( \{\theta_{k}\} \) is a sequence of numbers tending rapidly to zero; here \( p > 1 \).
The proof of each part of this theorem requires its own square function. We shall not describe these rather complicated quadratic functions here, but refer the reader to the literature for further details.15
Epilogue
Since the original draft of this essay was written two new results were found which use square functions in a decisive way.
The first is the solution of the problem of Cauchy’s integral for Lipschitz curves by Coifman, McIntosh, and Meyer [e50]. It is to be noted that in Calderón’s initial work on this problem (1965), square functions were already used in a crucial way. In particular the inequality \[ c\|F\|_{H^{p}}\leq\|A(F)\|_{p}, \quad p\leq 1 ,\] was proved there for this purpose.
The second result deals with the standard maximal function in \( \mathbf{R}^{n} \) \[ M_{n}(f)(x)=\sup_{r > 0}\frac{1}{c_{n}r^{n}}\bigg|\int_{|y|\leq r}f(x-y)\,dy\bigg|, \] where \( c_{n} \) is the volume of the unit ball in \( \mathbf{R}^{n} \).
The question that arises is, how does the \( L^{p} \) norm of \( M_{n} \) behave for large \( n \)? The best that can be proved by the usual Vitali covering arguments gives \[ \|M_{n}(f)\|_{p}\leq A(p, n)\|f\|_{p}, \quad 1 < p ,\] with \( A(p, n)\leq A(p)\,2^{n/p} \), which is a large growth as \( n\rightarrow\infty \). However much more can be said.
\( \|M_{n}(f)\|_{p}\leq A_{p}\|f\|_{p},\ 1 < p\leq\infty \), with \( A_{p} \) independent of \( n \).
The idea of the proof is to consider in \( \mathbf{R}^{m} \) the maximal functions \( M_{m,k} \) defined by \[ M_{m,k}(f)(x)=\sup_{r > 0}\frac{\big|\int_{|y|\leq r}f(x-y)|y|^{k}\,dy\big|}{\int_{|y|\leq r}|y|^{k}\,dy},\quad k\geq 0. \] Then if \( m \) is so large that \( p > m/(m-1) \),
\begin{equation} \|M_{m,k}(f)\|_{p}\leq A_{p,m}\|f\|_{p} \label{eqntwfo} \end{equation}
with \( A_{p,m} \) independent of \( k \), \( k\geq 0 \). This follows from Theorem 12, Part (a). From this Theorem 13 is obtained by lifting the \( m \)-dimensional result \eqref{eqntwfo} into \( \mathbf{R}^{n} \), where \( n\geq m \) (and \( k=n-m \)), by integrating over the Grassmannian of \( m \)-planes in \( \mathbf{R}^{n} \) through the origin.