Celebratio Mathematica

David H. Blackwell

A Conversation with David Blackwell

by Morris H. DeGroot

Dav­id Black­well was born on April 24, 1919, in Centralia, Illinois. He entered the Uni­versity of Illinois in 1935, and re­ceived his A.B. in 1938, his A.M. in 1939, and his Ph.D. in 1941, all in math­em­at­ics. He was a mem­ber of the fac­ulty at Howard Uni­versity from 1944 to 1954, and has been a Pro­fess­or of Stat­ist­ics at the Uni­versity of Cali­for­nia, Berke­ley, since that time. He was Pres­id­ent of the In­sti­tute of Math­em­at­ic­al Stat­ist­ics in 1955. He has also been Vice Pres­id­ent of the Amer­ic­an Stat­ist­ic­al As­so­ci­ation, the In­ter­na­tion­al Stat­ist­ic­al In­sti­tute, and the Amer­ic­an Math­em­at­ic­al So­ci­ety, and Pres­id­ent of the Bernoulli So­ci­ety. He is an Hon­or­ary Fel­low of the Roy­al Stat­ist­ic­al So­ci­ety and was awar­ded the von Neu­mann The­ory Prize by the Op­er­a­tions Re­search So­ci­ety of Amer­ica and the In­sti­tute of Man­age­ment Sci­ences in 1979. He has re­ceived hon­or­ary de­gree from the Uni­versity of Illinois Michigan State Uni­versity, South­ern Illinois Uni­versity, and Carne­gie-Mel­lon Uni­versity. The fol­low­ing con­ver­sa­tion took place in his of­fice at Berke­ley one morn­ing in Oc­to­ber 1984.

“I expected to be an elementary-school teacher”

De­G­root: How did you ori­gin­ally get in­ter­ested in stat­ist­ics and prob­ab­il­ity?

Black­well: I think I have been in­ter­ested in the concept of prob­ab­il­ity ever since I was an un­der­gradu­ate at Illinois, al­though there wasn’t very much prob­ab­il­ity or stat­ist­ics around. Doob was there but he didn’t teach prob­ab­il­ity. All the prob­ab­il­ity and stat­ist­ics were taught by a very nice old gen­tle­man named Crathorne; you prob­ably nev­er heard of him. But he was a very good friend of Henry Rim and, in fact, they col­lab­or­ated on a col­lege al­gebra book. I think I took all the courses that Crathorne taught: two un­der­gradu­ate courses and one first-year gradu­ate course. Any­way, I have been in­ter­ested in the sub­ject for a long time, but after I got my Ph.D. I didn’t ex­pect to get pro­fes­sion­ally in­ter­ested in stat­ist­ics.

De­G­root: But did you al­ways in­tend to go on to gradu­ate school?

Black­well: No. When I star­ted out in col­lege I ex­pec­ted to be an ele­ment­ary-school teach­er. But some­how I kept post­pon­ing tak­ing those edu­ca­tion courses. [Laughs.] So I ended up get­ting a mas­ter’s de­gree, and then I got a fel­low­ship to con­tin­ue my work there at Illinois.

De­G­root: So your gradu­ate work wasn’t par­tic­u­larly in the area of stat­ist­ics or prob­ab­il­ity?

Black­well: No, ex­cept of course that I wrote my thes­is un­der Doob in prob­ab­il­ity.

De­G­root: What was the sub­ject of your thes­is?

Black­well: Markov chains. There wasn’t very much ori­gin­al in it. There was one beau­ti­ful idea, which was Doob’s idea and which he gave to me. The thes­is was nev­er pub­lished as such.

De­G­root: But your first couple of pa­pers per­tained to Markov chains.

Black­well: The first couple of pa­pers came out of my thes­is, that’s right.

De­G­root: So after you got your de­gree…

Black­well: After I got my de­gree, I sort of ex­pec­ted to work in prob­ab­il­ity, real vari­ables, meas­ure the­ory, and such things.

De­G­root: And you have done a good deal of that.

Black­well: Yes, a fair amount. But it was Abe Gir­shick who got me in­ter­ested in stat­ist­ics.

De­G­root: In Wash­ing­ton?

Black­well: Yes. I was teach­ing at Howard, and the math­em­at­ics en­vir­on­ment was not really very stim­u­lat­ing, so I had to look around bey­ond the uni­versity just for whatever was go­ing on in Wash­ing­ton that was in­ter­est­ing math­em­at­ic­ally.

De­G­root: Not just stat­ist­ic­ally, but math­em­at­ic­ally?

Black­well: I was just look­ing for any­thing in­ter­est­ing in math­em­at­ics that was go­ing on in Wash­ing­ton.

De­G­root: About what year would this be?

Black­well: I went to Howard in 1944. So this would have been dur­ing the year 1944–1945.

De­G­root: Gir­shick was at the De­part­ment of Ag­ri­cul­ture?

Black­well: That’s right. And I heard him give a lec­ture sponsored by the Wash­ing­ton Chapter of the Amer­ic­an Stat­ist­ic­al As­so­ci­ation. That’s a pretty lively chapter. I first met George Dantzig when he gave a lec­ture there around that same time. His lec­ture had noth­ing to do with lin­ear pro­gram­ming, by the way. In fact, I first be­came ac­quain­ted with the idea of a ran­dom­ized test by hear­ing Dantzig talk about it. I think that he was the guy who in­ven­ted a test func­tion, in­stead of hav­ing just a re­jec­tion re­gion that is a sub­set of the sample space. At one of those meet­ings Abe Gir­shick spoke on se­quen­tial ana­lys­is. Among oth­er things, he men­tioned Wald’s equa­tion.

De­G­root: That’s the equa­tion that the ex­pect­a­tion of a sum of ran­dom vari­ables is \( E(N) \) times the ex­pect­a­tion of an in­di­vidu­al vari­able?

Black­well: Yes. That was just such a re­mark­able equa­tion that I didn’t be­lieve it. So I went home and thought I had con­struc­ted a counter­example. I mailed it to Abe, and I’m sure that he dis­covered the er­ror. But he didn’t write back and tell me it was an er­ror; he just called me up and said, let’s talk about it. So we met for lunch, and that was the start of a long and beau­ti­ful as­so­ci­ation that I had with him.

De­G­root: Would you re­gard the Black­well and Gir­shick book [e3] as the cul­min­a­tion of that as­so­ci­ation?

Black­well: Oh, that was a nat­ur­al out­growth of the as­so­ci­ation. I learned a great deal from him.

De­G­root: Were you to­geth­er at any time at Stan­ford?

Black­well: Yes, I spent a year at Stan­ford. I think it was 1950–1951. But he and I were also to­geth­er at oth­er times. We spent sev­er­al months to­geth­er at Rand. So we worked to­geth­er in Wash­ing­ton, and then at Rand, and then at Stan­ford.

“I wrote 105 letters of application”

David Blackwell, about 1945.

De­G­root: Tell me a little about the years between your Ph.D. from Illinois in 1941 and your ar­rival at Howard in 1944. You were at a few oth­er schools in between.

Black­well: Yes. I spent my first postdoc­tor­al year at the In­sti­tute for Ad­vanced Study. Again, I con­tin­ued to show my in­terest in stat­ist­ics. I sat in on Sam Wilks’ course in Prin­ceton dur­ing that year. Henry Scheffé was also sit­ting in on that class. He had just com­pleted his Ph.D. at Wis­con­sin. Jim­mie Sav­age was at the In­sti­tute for that year. He was at some of Wilks’ lec­tures, too. There were a lot of stat­ist­i­cians about our age around Prin­ceton at that time. Alex Mood was there. George Brown was there. Ted An­der­son was there. He was in Wilks’ class that year.

De­G­root: He was a gradu­ate stu­dent?

Black­well: He was a gradu­ate stu­dent, just com­plet­ing his Ph.D. So that was my first postdoc­tor­al year. Also, I had a chance to meet von Neu­mann that year. He was a most im­press­ive man. Of course, every­body knows that. Let me tell you a little story about him.

When I first went to the In­sti­tute, he greeted me, and we were talk­ing, and he in­vited me to come around and tell him about my thes­is. Well, of course, I thought that was just his way of mak­ing a new young vis­it­or feel at home, and I had no in­ten­tion of telling him about my thes­is. He was a big, busy, im­port­ant man. But then a couple of months later, I saw him at tea and he said, “When are you com­ing around to tell me about your thes­is? Go in and make an ap­point­ment with my sec­ret­ary.” So I did, and later I went in and star­ted telling him about my thes­is. He listened for about ten minutes and asked me a couple of ques­tions, and then he star­ted telling me about my thes­is. What you have really done is this, and prob­ably this is true, and you could have done it in a some­what sim­pler way, and so on. He was a really re­mark­able man. He listened to me talk about this rather ob­scure sub­ject and in ten minutes he knew more about it than I did. He was ex­tremely quick. I think he may have wasted a cer­tain amount of time, by the way, be­cause he was so will­ing to listen to second- or third-rate people and think about their prob­lems. I saw him do that on many oc­ca­sions.

De­G­root: So, from the In­sti­tute you went where?

Black­well: I went to South­ern Uni­versity in Bat­on Rouge, Louisi­ana. That’s a state school and at that time it was the state uni­versity in Louisi­ana for blacks. I stayed there just one year. Then the next year, I went to Clark Col­lege in At­lanta, also a black school. I stayed there for one year. Then I went to Howard Uni­versity in Wash­ing­ton and stayed there for ten years.

De­G­root: Was Howard at a dif­fer­ent level in­tel­lec­tu­ally from these oth­er schools?

Black­well: Oh yes. It was the am­bi­tion of every black schol­ar in those days to get a job at Howard Uni­versity. That was the best job you could hope for.

De­G­root: How large was the math de­part­ment there in terms of fac­ulty?

Black­well: Let’s see. There were just four reg­u­lar people in the math de­part­ment. Two pro­fess­ors. I went there as an as­sist­ant pro­fess­or. And there was one in­struct­or. That was it.

De­G­root: Have you main­tained any con­tact with Howard through the years?

Black­well: Oh, yes. I guess the last time I gave a lec­ture there was about three years ago, but I vis­ited many times dur­ing the years.

De­G­root: Do you see much change in the place through the years?

Black­well: Yes, the math de­part­ment now is a live­li­er place than it was when I was there. It’s much big­ger and the cur­rent chair­man, Jim Don­ald­son, is very good and very act­ive. There are some in­ter­est­ing things go­ing on there.

De­G­root: Did you feel or find that dis­crim­in­a­tion against blacks af­fected your edu­ca­tion or your ca­reer after your Ph.D.?

Black­well: It nev­er bothered me; I’ll put it that way. It surely shaped my ex­pect­a­tions from the very be­gin­ning. It nev­er oc­curred to me to think about teach­ing in a ma­jor uni­versity since it wasn’t in my ho­ri­zon at all.

De­G­root: Even in your gradu­ate-stu­dent days at Illinois?

Black­well: That’s right. I just as­sumed that I would get a job teach­ing in one of the black col­leges. There were 105 black col­leges at that time, and I wrote 105 let­ters of ap­plic­a­tion.

De­G­root: And got 105 of­fers, I sup­pose.

Black­well: No, I even­tu­ally got three of­fers, but I ac­cep­ted the first one that I got. From South­ern Uni­versity.

David Blackwell (lower left), 1930, probably sixth grade.

De­G­root: Let’s move a little fur­ther back in time. You grew up in Illinois?

Black­well: In Centralia, Illinois. Did you ever get down to Centralia or that part of Illinois when you were in Chica­go?

De­G­root: No, I didn’t.

Black­well: Well, it’s a rather dif­fer­ent part of the world from north­ern Illinois. It’s quite south­ern. Centralia in fact was right on the bor­der line of se­greg­a­tion. If you went south of Centralia to the south­ern tip of Illinois, the schools were com­pletely se­greg­ated in those days. Centralia had one com­pletely black school, one com­pletely white school, and five “mixed” schools.

De­G­root: Well that sounds like the bound­ary all right. Which one did you go to?

Black­well: I went to one of the mixed schools, be­cause of the part of town I lived in. It’s a small town. The pop­u­la­tion was about 12,000 then, and it’s still about 12,000. The high school had about 1,000 stu­dents. I had very good high-school teach­ers in math­em­at­ics. One of my high-school teach­ers or­gan­ized a math­em­at­ics club, and used to give us prob­lems to work. Whenev­er we would come up with something that had the idea for a solu­tion, he would write up the solu­tion for us, and send it in our name to a journ­al called School Sci­ence and Math­em­at­ics. It was a great thrill to see your name in the magazine. I think my name got in there three times. And once my solu­tion got prin­ted. As I say, it was really Mr. Huck’s write-up based on my idea. [Laughs.]

De­G­root: Was your fam­ily en­cour­aging about your edu­ca­tion?

Black­well: It was just sort of as­sumed that I would go to col­lege. There was no “Now be sure to study hard” or any­thing like that. It was just taken for gran­ted that I was go­ing to go to col­lege. They were very, very sup­port­ive.

Some favorite papers

Kenneth Arrow, David Blackwell, and M. A. Girshick; Santa Monica, September 1948.

De­G­root: You were quite young when you re­ceived your Ph.D. You were 21 or so?

Black­well: 22. There wasn’t any big jump. I just sort of did everything a little faster than nor­mal.

De­G­root: And you’ve been do­ing it that way ever since. You’ve pub­lished about 80 pa­pers since that time. Do you have any fa­vor­ites in that list that you par­tic­u­larly like, or that you feel were par­tic­u­larly im­port­ant or in­flu­en­tial?

Black­well: Oh, I’m sure that I do, but I’d have to look at the list and think about that. May I look?

De­G­root: Sure. This is an open-book ex­am.

Black­well: Good. Let’s see… Well, my first stat­ist­ic­al pa­per, called “On an equa­tion of Wald” [e1], grew out of that ori­gin­al con­ver­sa­tion with Abe Gir­shick. That’s a pa­per that I am still really very proud of. It just gives me pleas­ant feel­ings every time I think about it.

De­G­root: Re­mind me what the main idea was.

Black­well: For one thing it was a proof of Wald’s the­or­em un­der, I think, weak­er con­di­tions than it had been proved be­fore, un­der sort of nat­ur­al con­di­tions. And the proof is neat. Let me show it to you. [Goes to black­board.]

Sup­pose that \( X_1,X_2,\dots \) are i.i.d. and you have a stop­ping rule \( N \), which is a ran­dom vari­able. You want to prove that \( E(X_1 + \dots + X_n) = E(X_1)E(N) \). Well, here’s my idea. Do it over and over again. So you have stop­ping times \( N_1,N_2,\dots \), and you get \begin{align*} S_1 &=X_1+\dots+ X_{N_1}, \\ S_2&=X_{N_1+1}+\dots+ X_{N_1+N_2}, \\ &\dots \end{align*} Con­sider \( S_1+ \dots+ S_k=X_1+\dots + X_{N_1+\dots+N_k} \). We can write this equa­tion as \[ \frac{S_1+\dots + S_k}{k}= \biggl(\frac{X_1+\dots+X_{N_1+\dots + N_k}}{N_1+\dots + N_k}\biggr) \biggl(\frac{N_1+\dots + N_k}{k}\biggr). \] Now let \( k \to\infty \). The first term on the right is a sub­sequence of the \( X \) av­er­ages. By the strong law of large num­bers, this con­verges to \( E(X_1) \). The second term on the right is the av­er­age of \( N_1,\dots,N_k \). We are as­sum­ing that they have a fi­nite ex­pect­a­tion, so this con­verges to that ex­pect­a­tion \( E(N) \). There­fore, the se­quence \[ \frac{S_1+\dots + S_k}{k} \] con­verges a.e. Then the con­verse of the strong law of large num­bers says that the ex­pec­ted value of each \( S_i \) must be fi­nite, and that \[ \frac{S_1+\dots + S_k}{k} \] must con­verge to that ex­pect­a­tion \( E(S_1) \). Isn’t that neat?

De­G­root: Beau­ti­ful, beau­ti­ful.

Black­well: So that’s the proof of Wald’s equa­tion, just by in­vok­ing the strong law of large num­bers and its con­verse. I think I like that be­cause that was the first time that I de­cided that I could do something ori­gin­al. The pa­pers based on my thes­is were nice, but those were really Doob’s ideas that I was just car­ry­ing out. But here I had a really ori­gin­al idea, so I was very pleased with that pa­per. Then I guess I like my pa­per with Ken Ar­row and Abe Gir­shick, “Bayes and min­im­ax solu­tions of se­quen­tial de­cision prob­lems” [e2].

De­G­root: That was cer­tainly a very in­flu­en­tial pa­per.

Black­well: That was a ser­i­ous pa­per, yes.

De­G­root: Then was some con­tro­versy about that pa­per, wasn’t there? Wald and Wolfow­itz were do­ing sim­il­ar things at more or less the same time.

Black­well: Yes, they had pri­or­ity. There was no ques­tion about that, and I think we did give in­ad­equate ac­know­ledg­ment to them in our work. So they were very much dis­turbed about it, es­pe­cially Wolfow­itz. In fact, Wolfow­itz was cool to me for more than 20 years.

De­G­root: But cer­tainly your pa­per was dif­fer­ent from theirs.

Black­well: We had things that they didn’t have, there was no doubt about that. For in­stance, in­duc­tion back­ward—cal­cu­la­tion back­ward—that was in our pa­per and I don’t think there is any hint of it in their work. We did go bey­ond what they had done. Our pa­per didn’t seem to both­er Wald too much, but Wolfow­itz was an­noyed.

De­G­root: Did you know Wald very well, or have much con­tact with him?

Black­well: Not very well. I had just three or four con­ver­sa­tions with him.

Important influences

De­G­root: I gath­er from what you said that Gir­shick was a primary in­flu­ence on you in the field of stat­ist­ics.

Black­well: Oh yes.

De­G­root: Were there oth­er people that you felt had a strong in­flu­ence on you? Ney­man, for ex­ample?

Black­well: Not in my stat­ist­ic­al think­ing. Gir­shick was cer­tainly the most im­port­ant in­flu­ence on me. The oth­er per­son who had just one in­flu­ence, but it was a very big one, was Jim­mie Sav­age.

De­G­root: What was that one in­flu­ence?

Black­well: Well, he ex­plained to me that the Bayes ap­proach was the right way to do stat­ist­ic­al in­fer­ence. Let me tell you how that happened. I was at Rand, and an eco­nom­ist came in one day to talk to me. He said that he had a prob­lem. They were pre­par­ing a re­com­mend­a­tion to the Air Force on how to di­vide their re­search budget over the next five years and, in par­tic­u­lar, they had to de­cide what frac­tion of it should be de­voted to long-range re­search, and what frac­tion of it should be de­voted to more im­me­di­ate de­vel­op­ment­al re­search.

“Now,” he said, “one of the things that this de­pends on is the prob­ab­il­ity of a ma­jor war in the next five years. If it’s large, then of course that would shift the em­phas­is to­ward de­vel­op­ing what we already know how to do, and if it’s small then there would be more em­phas­is on long-range re­search. I’m not go­ing to ask you to tell me a num­ber, but if you could give me any guide as to how I could go about find­ing such a num­ber I would be grate­ful.” Oh, I said to him, that ques­tion just doesn’t make sense. Prob­ab­il­ity ap­plies to a long se­quence of re­peat­able events, and this is clearly a unique situ­ation. The prob­ab­il­ity is either 0 or 1, but we won’t know for five years, I pon­ti­fic­ated. [Laughs.] So the eco­nom­ist looked at me and nod­ded and said, “I was afraid you were go­ing to say that. I have spoken to sev­er­al oth­er stat­ist­i­cians and they have all told me the same thing. Thank you very much.” And he left.

Well, that con­ver­sa­tion bothered me. The fel­low had asked me a reas­on­able, ser­i­ous ques­tion and I had giv­en him a frivol­ous, sort of flip, an­swer, and I wasn’t happy. A couple of weeks later Jim­mie Sav­age came to vis­it Rand, and I went in and said hello to him. I happened to men­tion this con­ver­sa­tion that I had had, and then he star­ted telling me about deFinetti and per­son­al prob­ab­il­ity. Any­way, I walked out of his of­fice half an hour later with a com­pletely dif­fer­ent view on things. I now un­der­stood what was the right way to do stat­ist­ic­al in­fer­ence.

De­G­root: What year was that?

Black­well: About 1950, maybe 1951, some­where around there. Look­ing back on it, I can see that I was emo­tion­ally and in­tel­lec­tu­ally pre­pared for Jim­mie’s mes­sage, be­cause I had been think­ing in a Bayesian way about se­quen­tial ana­lys­is, hy­po­thes­is test­ing, and oth­er stat­ist­ic­al prob­lems for some years.

De­G­root: What do you mean by think­ing in a Bayesian way? In terms of pri­or dis­tri­bu­tions?

Black­well: Yes.

De­G­root: Wald used them as a math­em­at­ic­al device.

Black­well: That’s right. It just turned out to be clearly a very nat­ur­al way to think about prob­lems, and it was math­em­at­ic­ally beau­ti­ful. I simply re­gret­ted that it didn’t cor­res­pond with real­ity. [Laughs.] But then what Jim­mie was telling me was that the way that I had been think­ing all the time was really the right way to think, and not to worry so much about em­pir­ic­al fre­quen­cies. Any­way, as I say, that was just one very big in­flu­ence on me.

De­G­root: Would you say that your stat­ist­ic­al work has mainly used the Bayesian ap­proach since that time?

Black­well: Yes; I simply have not worked on prob­lems where that ap­proach could not be used. For in­stance, all my work in dy­nam­ic pro­gram­ming just has that Bayes ap­proach in it. That is the stand­ard way of do­ing dy­nam­ic pro­gram­ming.

De­G­root: You wrote a beau­ti­ful book called Ba­sic Stat­ist­ics [◊] that was really based on the Bayesian ap­proach, but as I re­call you nev­er once men­tioned the word “Bayes” in that book. Was that in­ten­tion­al?

Black­well: No, it was not in­ten­tion­al.

De­G­root: Was it that the ter­min­o­logy was ir­rel­ev­ant to the con­cepts that you were try­ing to get across?

Black­well: I doubt if the word “the­or­em” was ever men­tioned in that book. That was not ori­gin­ally in­ten­ded as a book, by the way. It was simply in­ten­ded as a set of notes to give my stu­dents in con­nec­tion with lec­tures in this ele­ment­ary stat­ist­ics course. But the stu­dents sug­ges­ted that it should be pub­lished and a Mc­Graw-Hill man said that he would be in­ter­ested. It’s just a set of notes. It’s short; I think it’s less than 150 pages.

De­G­root: It’s beau­ti­ful. There are a lot of won­der­ful gems in those 150 pages.

Black­well: Well, I en­joyed teach­ing the course.

De­G­root: Do you en­joy teach­ing from your own books?

Black­well: No, not after a while. I think about five years after the book was pub­lished I stopped us­ing it. Just be­cause I got bored with it. When you reach the point where you’re not learn­ing any­thing, then it’s prob­ably time to change something.

De­G­root: Are you work­ing on oth­er books at the present time?

Black­well: No, ex­cept that I am think­ing about writ­ing a more ele­ment­ary ver­sion of parts of your book on op­tim­al stat­ist­ic­al de­cisions, be­cause I have been us­ing it in a course and the un­der­gradu­ate stu­dents say that it’s too hard.

De­G­root: Uh, oh. I’ve been think­ing of do­ing the same thing. [Laughs.] Well, I am just think­ing gen­er­ally, in terms of an in­tro­duc­tion to Bayesian stat­ist­ics for un­der­gradu­ates.

Black­well: Very good. I really hope you do it, Mor­rie. It’s needed.

De­G­root: Well, I really hope you do it, too. It would be in­ter­est­ing. Are there courses that you par­tic­u­larly en­joy teach­ing?

Black­well: I like the course in Bayesian stat­ist­ics us­ing your book. I like to teach game the­ory. I haven’t taught it in some years, but I like to teach that course. I also like to teach, and I’m teach­ing right now, a course in in­form­a­tion the­ory.

De­G­root: Are you us­ing a text?

Black­well: I’m not us­ing any one book. Pat Billings­ley’s book Er­god­ic The­ory and In­form­a­tion comes closest to what I’m do­ing. I like to teach meas­ure the­ory. I re­gard meas­ure the­ory as a kind of hobby, be­cause to do prob­ab­il­ity and stat­ist­ics you don’t really need very much meas­ure the­ory. But there are these fine, nit-pick­ing points that most people ig­nore, and rightly so, but that I sort of like to worry about. [Laughs.] I know that it is not im­port­ant, but it is in­ter­est­ing to me to worry about reg­u­lar con­di­tion­al prob­ab­il­it­ies and such things. I think I’m one of only three people in our de­part­ment who rally takes meas­ure the­ory ser­i­ously. Lester [Du­bins] takes it fairly ser­i­ously, and so does Jim Pit­man. But the rest of the people just sort of ig­nore it. [Laughs.]

“I would like to see more emphasis on Bayesian statistics”

David Blackwell, 1984.

De­G­root: Lets talk a little bit about the cur­rent state of stat­ist­ics. What areas do you think are par­tic­u­larly im­port­ant these days? Where do you see the field go­ing?

Black­well: I can tell you what I’d like to see hap­pen. First, of course, I would like to see more em­phas­is on Bayesian stat­ist­ics. With­in that area it seems to me that one prom­ising dir­ec­tion which hasn’t been ex­plored at all is Bayesian ex­per­i­ment­al design. In a way, Bayesian stat­ist­ics is much sim­pler than clas­sic­al stat­ist­ics in that, once you’re giv­en a sample, all you have to do are cal­cu­la­tions based on that sample. Now, of course, I say “all you have to do”—some­times those cal­cu­la­tions can be hor­rible. But if you are try­ing to design an ex­per­i­ment, that’s not all you have to do. In that case, you have to look at all the dif­fer­ent samples you might get, and eval­u­ate every one of them in or­der to cal­cu­late an over­all risk, to de­cide wheth­er the ex­per­i­ment is worth do­ing and to choose among the ex­per­i­ments. Ex­cept in very spe­cial situ­ations, such as when to stop sampling, I don’t think a lot of work has been done in that area.

De­G­root: I think the reas­on there hasn’t been very much done is be­cause the prob­lems are so hard. It’s really hard to do ex­pli­citly the cal­cu­la­tions that are re­quired to find the op­tim­al ex­per­i­ment. Do you think that per­haps the com­put­ing power that is now avail­able would be help­ful in this kind of prob­lem?

Black­well: That’s cer­tainly go­ing to make a dif­fer­ence. Let me give you a simple ex­ample that I have nev­er seen worked out, but I am sure could be worked out. Sup­pose that you have two in­de­pend­ent Bernoulli vari­ables, say, a pro­por­tion among males and a pro­por­tion among fe­males. They are in­de­pend­ent, and you are in­ter­ested in es­tim­at­ing the sum of those pro­por­tions or some lin­ear com­bin­a­tion of those pro­por­tions. You are go­ing to take a sample in two stages. First of all you can ask, how large should the first sample be? And then, based on the first sample, how should you al­loc­ate pro­por­tions in the second sample?

De­G­root: Are you go­ing to draw the first sample from the total pop­u­la­tion?

Black­well: No. You have males and you have fe­males, and you have a total sample ef­fort of size \( N \). Now you can pick some num­ber \( n \leq N \) to be your sample size. And you can al­loc­ate those \( n \) ob­ser­va­tions among males and fe­males. Then, based on how that sample comes out, you can al­loc­ate your second sample. What is the best ini­tial al­loc­a­tion, and how much bet­ter is it than just do­ing it all in one stage? Well, I haven’t done that cal­cu­la­tion but I’m sure that it can be done. It would be an in­ter­est­ing kind of thing and it could be ex­ten­ded to more than two cat­egor­ies. That’s an ex­ample of the sort of thing on which I would like to see a lot of work done—Bayesian ex­per­i­ment­al design.

One of the things that I worry about a little is that I don’t see the­or­et­ic­al stat­ist­i­cians hav­ing as much con­tact with people in oth­er areas as I would like to see. I no­tice here at Berke­ley, for ex­ample, that the people in Op­er­a­tions Re­search seem to have much closer con­tact with in­dustry than the people in our de­part­ment do. I think we might find more in­ter­est­ing prob­lems if we did have closer con­tact.

De­G­root: Do you think that the dis­tinc­tions between ap­plied and the­or­et­ic­al stat­ist­ics are still as ri­gid as they were years ago, or do you think that the field is blend­ing more in­to a uni­fied field of stat­ist­ics in which such dis­tinc­tions are not par­tic­u­larly mean­ing­ful? I see the em­phas­is on data ana­lys­is which is com­ing about, and the de­vel­op­ment of the­ory for data ana­lys­is and so on, blur­ring these dis­tinc­tions between the­or­et­ic­al and ap­plied stat­ist­ics in a healthy way.

Black­well: I guess I’m not fa­mil­i­ar enough with data ana­lys­is and what com­puters have done to have any in­ter­est­ing com­ments on that. I see what some of our people and people at Stan­ford are do­ing in look­ing at large-di­men­sion­al data sets and ro­tat­ing them so that you can see lots of three-di­men­sion­al pro­jec­tions and such things, but I don’t know wheth­er that sug­gests in­ter­est­ing the­or­et­ic­al ques­tions or not. Maybe that’s not im­port­ant, wheth­er it sug­gests in­ter­est­ing the­or­et­ic­al ques­tions. Maybe the im­port­ant thing is that it helps con­trib­ute to the solu­tion of prac­tic­al prob­lems.

Infinite games

De­G­root: What kind of things are you work­ing on these days?

Black­well: Right now I am work­ing on some thongs in in­form­a­tion the­ory, and still try­ing to un­der­stand some things about in­fin­ite games of per­fect in­form­a­tion.

De­G­root: What do you mean by an in­fin­ite game?

Black­well: A game with an in­fin­ite num­ber of moves. Here’s an ex­ample. I write down a 0 or a 1, and you write down a 0 or a 1, and we keep go­ing in­def­in­itely. If the se­quence we pro­duce has a lim­it­ing fre­quency, I win. If not, you win. That’s a trivi­al game be­cause I can force it to have a lim­it­ing fre­quency just by do­ing the op­pos­ite of whatever you do. But that’s a simple ex­ample of an in­fin­ite game.

De­G­root: For­tu­nately, it’s one in which I’ll nev­er have to pay off to you.

Black­well: Well, we can play it in such a way that you would have to pay off.

De­G­root: How do we do that?

Black­well: You must spe­cify a strategy. Let me give you an ex­ample. You know how to play chess in just one move: You pre­pare a com­plete set of in­struc­tions so that for every situ­ation on the chess board you spe­cify a pos­sible re­sponse. Your one move is to pre­pare that com­plete set of in­struc­tions. If you have a com­plete set and I have a com­plete set, then we can just play the game out ac­cord­ing to those in­struc­tions. It’s just one move. So in the same way, you can spe­cify a strategy in this in­fin­ite game. For every fi­nite se­quence that you might see up to a giv­en time as past his­tory, you spe­cify your next move. So you can define this func­tion once and for all, and I can define a func­tion, and then we can math­em­at­ic­ally asses those func­tions. I can prove that there is a spe­cif­ic func­tion of mine such that, no mat­ter what func­tion you spe­cify, the set will have a lim­it­ing fre­quency.

De­G­root: So you could ex­tract money from me in a fi­nite amount of time. [Laughs.]

Black­well: Right. Any­way it’s been proved that all such in­fin­ite games with Borel pay­offs are de­term­ined, and I’ve been try­ing to un­der­stand the proof for sev­er­al years now. I’m still work­ing on it, hop­ing to un­der­stand it and sim­pli­fy it.

De­G­root: Have you pub­lished pa­pers on that top­ic?

Black­well: Just one pa­per, many years ago. Let me re­mind my­self of the title [check­ing his files], “In­fin­ite games and ana­lyt­ic sets” [e5]. This is the only pa­per I’ve pub­lished on in­fin­ite games; and that’s one of my pa­pers that I like very much, by the way. It’s an ap­plic­a­tion of games to prove a the­or­em in to­po­logy. I sort of like the idea of con­nect­ing those two ap­par­ently not closely re­lated fields.

De­G­root: Have you been in­volved in ap­plied pro­jects or ap­plied prob­lems through the years, at Rand or else­where, that you have found in­ter­est­ing and that have stim­u­lated re­search of your own?

Black­well: I guess so. My im­pres­sion though is this: When I have looked at real prob­lems, in­ter­est­ing the­or­ems have some­times come out of it. But nev­er any­thing that was help­ful to the per­son who had the prob­lem. [Laughs.]

De­G­root: But pos­sibly to some­body else at an­oth­er time.

Black­well: Well, my work on com­par­is­on of ex­per­i­ments was stim­u­lated by some work by Bo­hnen­blust, Sher­man, and Shap­ley. We were all at Rand. They called their ori­gin­al pa­per “Com­par­is­on of re­con­nais­sances,” and it was clas­si­fied be­cause it arose out of some ques­tion that some­body had asked them. I re­cog­nized a re­la­tion between what they were do­ing and suf­fi­cient stat­ist­ics, and proved that they were the same in a spe­cial case. Any­way, that led to this de­vel­op­ment which I think is in­ter­est­ing the­or­et­ic­ally, and to which you have con­trib­uted.

De­G­root: Well, I have cer­tainly used your work in that area. And it has spread in­to di­verse oth­er areas. It is used in eco­nom­ics in com­par­ing dis­tri­bu­tions of in­come, and I used it in some work on com­par­ing prob­ab­il­ity fore­casters.

Black­well: And ap­par­ently people in ac­count­ing have made some use of these ideas. But any­way, as I say, noth­ing that I have done has ever helped the per­son who raised the ques­tion. But there is no doubt in my mind that you do get in­ter­est­ing prob­lems by look­ing at the real world.

“I don’t have any difficulties with randomization”

De­G­root: One of the in­ter­est­ing top­ics that comes out of a Bayesian view of stat­ist­ics is the no­tion of ran­dom­iz­a­tion, and the role that it should play in stat­ist­ics. Just this little ex­ample you were talk­ing about be­fore with two pro­por­tions made me think about that. We just as­sume that we are draw­ing the ob­ser­va­tions at ran­dom from with­in each sub­pop­u­la­tion in that ex­ample, but per­haps ba­sic­ally be­cause we don’t have much choice. Do you have any thoughts about wheth­er one should be draw­ing ob­ser­va­tions at ran­dom?

Black­well: I don’t have any dif­fi­culties with ran­dom­iz­a­tion. I think it’s prob­ably a good idea. The strict the­or­et­ic­al ideal­ized Bayesian would of course nev­er need to ran­dom­ize. But ran­dom­iz­a­tion prob­ably pro­tects us against our own bi­ases. There are just lots of ways in which people dif­fer from the ideal Bayesian. I guess the ideal Bayesian, for ex­ample, could not think about a the­or­em as be­ing prob­ably true. For him, pre­sum­ably, all true the­or­ems have prob­ab­il­ity 1 and all false ones have prob­ab­il­ity 0. But you and I know that’s not the way we think. I think of ran­dom­iz­a­tion as be­ing a pro­tec­tion against your own im­per­fect think­ing.

De­G­root: It is also to some ex­tent a pro­tec­tion against oth­ers. Pro­tec­tion for you as a stat­ist­i­cian in present­ing your work to the sci­entif­ic com­munity, in the sense that they can have more be­lief in your con­clu­sions if you use some ran­dom­iz­a­tion pro­ced­ure rather than your own se­lec­tion of a sample. So I see it as in­volved with the so­ci­ology of sci­ence in some way.

Black­well: Yes, that’s an im­port­ant vir­tue of ran­dom­iz­a­tion. That re­minds me of something else, though. We tend to think of evid­ence as be­ing val­id only when it comes from ran­dom samples or samples se­lec­ted in a prob­ab­il­ist­ic­ally spe­cified way. That’s wrong, in my view. Most of what we have learned, we have learned just by ob­serving what hap­pens to come along, rather than from care­fully con­trolled ex­per­i­ments. Some­times stat­ist­i­cians have made a mis­take in throw­ing away ex­per­i­ments be­cause they were not prop­erly con­trolled. That is not to say that ran­dom­iz­a­tion isn’t a good idea, but it is to say that you should not re­ject data just be­cause they have been ob­tained un­der un­con­trolled con­di­tions.

De­G­root: You were the Rouse Ball Lec­turer at Cam­bridge in 1974. How did that come about and what did it in­volve?

Black­well: Well, I was in Eng­land for two years, 1973–1975, as the dir­ect­or of the edu­ca­tion-abroad pro­gram in Great Bri­tain and Ire­land for the Uni­versity of Cali­for­nia. I think that award was just either Peter Whittle’s or Dav­id Kend­all’s idea of how to get me to come up to Cam­bridge to give a lec­ture. One of the things which de­lighted me was that it was named the Rouse Ball Lec­ture be­cause it gave me an op­por­tun­ity to say something at Cam­bridge that I liked—namely, that I had heard of Rouse Ball long be­fore I had heard of Cam­bridge. [Laughs.]

De­G­root: Well, tell me about Rouse Ball.

Black­well: He wrote a book called Math­em­at­ic­al Re­cre­ations and Es­says. You may have seen the book. I first came across it when I was a high-school stu­dent. It was one of the few math­em­at­ics books in our lib­rary. I was fas­cin­ated by that book. I can still pic­ture it. Rouse Ball was a 19th cen­tury math­em­atician, I think. [Wal­ter Wil­li­am Rouse Ball, 1850–1925.] Any­way, this is a lec­ture­ship that they have named after him.

De­G­root: I guess there aren’t too many Bayesians on the stat­ist­ics fac­ulty here at Berke­ley.

Black­well: No. I’d say, Lester and I are the only ones in our de­part­ment. Of course, over in op­er­a­tions Re­search, Dick Bar­low and Bill Jew­ell are cer­tainly sym­path­et­ic to the Bayesian ap­proach.

De­G­root: Is it a top­ic that gets dis­cussed much?

Black­well: Not really; it used to be dis­cussed here, but you very soon dis­cov­er that it’s sort of like re­li­gion; that it has an ap­peal for some people and not for oth­er people, and you’re not go­ing to change any­body’s mind by dis­cuss­ing it. So people just go their own ways. What has happened to Bayesian stat­ist­ics sur­prised me. I ex­pec­ted it either to catch on and just sweep the field, or to die. And I was rather con­fid­ent that it would die. Even though to me it was the right way to think, I just didn’t think that it would have a chance to sur­vive. But I thought that, if it did, then it would sweep things. Of course, neither one of those things has happened. Sort of a steady 5–10\% of all the work in stat­ist­ic­al in­fer­ence is done from a Bayesian point of view. Is that what you would have ex­pec­ted 20\,years ago?

De­G­root: No, it cer­tainly doesn’t seem as though that would be a stable equi­lib­ri­um. And maybe the sys­tem is still not in equi­lib­ri­um. I see the Bayesian ap­proach grow­ing, but it cer­tainly is not sweep­ing the field by any means.

Black­well: I’m glad to hear that you see it grow­ing.

De­G­root: Well, there seem to be more and more meet­ings of the Bayesians, any­way. The ac­tu­ar­ial group that met here at Berke­ley over the last couple of days to dis­cuss cred­ib­il­ity the­ory seems to be a group that just nat­ur­ally ac­cepts the Bayesian ap­proach in their work in the real world. So there seem to be some pock­ets of users out there in the world, and I think maybe that’s what has kept the Bayesian ap­proach alive.

Black­well: There’s no ques­tion in my mind that, if the Bayesian ap­proach does grow in the stat­ist­ic­al world, it will not be be­cause of the in­flu­ence of oth­er stat­ist­i­cians but be­cause of the in­flu­ence of ac­tu­ar­ies, en­gin­eers, busi­ness people, and oth­ers who ac­tu­ally like the Bayesian ap­proach and use it.

De­G­root: Do you get a chance to talk much to re­search­ers out­side of stat­ist­ics on cam­pus, re­search­ers in sub­stant­ive areas?

Black­well: No, I talk mainly to people in Op­er­a­tions Re­search and Math­em­at­ics, and oc­ca­sion­ally Elec­tric­al En­gin­eer­ing. But the things in Elec­tric­al En­gin­eer­ing are the­or­et­ic­al and ab­stract.

“The word ‘science’ in the title bothers me a little”

De­G­root: What do you think about the idea of this new journ­al, Stat­ist­ic­al Sci­ence, in which this con­ver­sa­tion will ap­pear? I have the im­pres­sion that you think the I.M.S. is a good or­gan­iz­a­tion do­ing use­ful things, and there is really no need to mess with it.

Black­well: That is the way I feel. On the oth­er hand, I must say that I felt ex­actly the same way about split­ting the An­nals of Math­em­at­ic­al Stat­ist­ics in­to two journ­als, and that split seems to be work­ing. So I’m hop­ing that the new journ­al will add something. I guess the word “sci­ence” in the title both­ers me a little. It’s not clear what the word is in­ten­ded to con­vey there, and you sort of have the feel­ing that it’s there more to con­trib­ute a tone than any­thing else.

De­G­root: My im­pres­sion is that it is in­ten­ded to con­trib­ute a tone. To give a fla­vor of something broad­er than just what we would think of as the­or­et­ic­al stat­ist­ics. That is, to reach out and talk about the im­pact of stat­ist­ics on the sci­ences and the in­ter­re­la­tion­ship of stat­ist­ics with the sci­ences, all kinds of sci­ences.

Black­well: Now, I’m all in fa­vor of that. For ex­ample, the re­la­tion of stat­ist­ics to the law is to me a quite ap­pro­pri­ate top­ic for art­icles in this journ­al. But some­how call­ing it “sci­ence” doesn’t em­phas­ize that dir­ec­tion. In fact, it rather sug­gests that that’s not the dir­ec­tion. It sounds as though it’s tied in with things that are sup­por­ted by the Na­tion­al Sci­ence Found­a­tion, and to me that re­stricts it.

De­G­root: The in­ten­tion of that title was to con­vey a broad im­pres­sion rather than a re­stric­ted one. To give a broad­er im­pres­sion than just stat­ist­ics and prob­ab­il­ity, to con­vey an ap­plied fla­vor and to sug­gest links to all areas.

Black­well: Yes. It’s ana­log­ous to com­puter sci­ence, I guess. I think that term was rather de­lib­er­ately chosen. My feel­ing is that the I.M.S. is just a beau­ti­ful or­gan­iz­a­tion. It’s about the right size. It’s been suc­cess­ful for a good many years. I don’t like to see us be­come am­bi­tious. I like the idea of just sort of stay­ing the way we are, an or­gan­iz­a­tion run es­sen­tially by am­a­teurs.

De­G­root: Do you have the feel­ing that the field of stat­ist­ics is mov­ing away from the I.M.S. in any way? That was one of the mo­tiv­a­tions for start­ing this journ­al.

Black­well: Well, of course, stat­ist­ics has al­ways been sub­stan­tially big­ger than the I.M.S. But you’re sug­gest­ing that the I.M.S. rep­res­ents a smal­ler and smal­ler frac­tion of stat­ist­ic­al activ­ity.

De­G­root: Yes, I think that might be right.

Black­well: You know, Mor­rie, I see what you’re talk­ing about hap­pen­ing in math­em­at­ics. It’s less and less true that all math­em­at­ics is done in math­em­at­ics de­part­ments. On the Berke­ley cam­pus, I see lots of in­ter­est­ing math­em­at­ics be­ing done in our de­part­ment, in Op­er­a­tions Re­search, in Elec­tric­al En­gin­eer­ing, in Mech­an­ic­al En­gin­eer­ing, some in Busi­ness Ad­min­is­tra­tion, a lot in the Eco­nom­ics De­part­ment by Ger­ard Debreu and his col­leagues; a lot of really in­ter­est­ing, high-class math­em­at­ics is be­ing done out­side math­em­at­ics de­part­ments. What you’re sug­gest­ing is that stat­ist­ics de­part­ments and the journ­als in which they pub­lish are not ne­ces­sar­ily the cen­ters of stat­ist­ics the way they used to be, that a lot of work is be­ing done out­side. I’m sure that’s right.

De­G­root: And per­haps should be done out­side stat­ist­ics de­part­ments. That used to be an un­healthy sign in the field, and we worked hard in stat­ist­ics de­part­ments to col­lect up the stat­ist­ics that was be­ing done around the cam­pus. But I think, now that the field has grown and ma­tured, that it is prob­ably a healthy thing to have some in­ter­est­ing stat­ist­ics be­ing done out­side.

Black­well: Yes. Con­sider the old prob­lem of pat­tern re­cog­ni­tion. That’s a stat­ist­ic­al prob­lem. But to the ex­tent that it gets solved, it’s not go­ing to be solved by people in stat­ist­ics de­part­ments. It’s go­ing to be solved by people work­ing for banks and people work­ing for oth­er or­gan­iz­a­tions who really need to have a device that can look at a per­son and re­cog­nize him in lots of dif­fer­ent con­fig­ur­a­tions. That’s just one ex­ample of the cases where we’re some­how too nar­row to work on a lot of ser­i­ous stat­ist­ic­al prob­lems.

De­G­root: I think that’s right, and yet we have something im­port­ant to con­trib­ute to those prob­lems.

Black­well: I would say that we are con­trib­ut­ing, but in­dir­ectly. That is, people who are work­ing on the prob­lems have stud­ied stat­ist­ics. It seems to me that a lot of the en­gin­eers I talk to are very fa­mil­i­ar with the ba­sic con­cepts of de­cision the­ory. They know about loss func­tions and min­im­iz­ing ex­pec­ted risks and such things. So, we have con­trib­uted, but just in­dir­ectly.

De­G­root: You are in the Na­tion­al Academy of Sci­ences…

Black­well: Yes, but I’m very in­act­ive.

De­G­root: You haven t been in­volved in any of their com­mit­tees or pan­els?

Black­well: No, and I’m not sure that I would want to be. I guess I don’t like the idea of an of­fi­cial com­mit­tee mak­ing sci­entif­ic pro­nounce­ments. I like people to form opin­ions about sci­entif­ic mat­ters just on the basis of listen­ing to in­di­vidu­al sci­ent­ists. To have one group with such over­whelm­ing prestige both­ers me a little.

De­G­root: And it s pre­cisely the prestige of the Academy that they rely on, when re­ports get is­sued by these com­mit­tees.

Black­well: Yes. So I think it’s just great as a purely hon­or­if­ic or­gan­iz­a­tion, so to speak. To meet just once a year, and elect people more or less at ran­dom. I think every­body that’s in it has done something reas­on­able and even pretty good, in fact. But on the oth­er hand, there are at least as many people not in it who have done good things as there are in it. It’s kind of a ran­dom se­lec­tion pro­cess.

De­G­root: So you think it’s a good or­gan­iz­a­tion as long as it doesn’t do any­thing.

Black­well: Right I’m proud to be in it, but I haven’t been act­ive. It’s sort of like get­ting elec­ted to Phi Beta Kappa—it’s nice if it hap­pens to you…

“I play with this computer”

De­G­root: Do you feel any re­la­tion­ship between your pro­fes­sion­al work and the rest of your life, your in­terests out­side of stat­ist­ics? Is there any in­flu­ence of the out­side on what you do pro­fes­sion­ally, or are they just sort of sep­ar­ate parts of your life?

Black­well: Sep­ar­ate, ex­cept my friends are also my col­leagues. It’s only through the people with whom I as­so­ci­ate out­side that there’s any con­nec­tion. It’s hard to think of any oth­er real con­nec­tion.

De­G­root: It’s not ob­vi­ous what these con­nec­tions might be for any­one. One’s polit­ic­al views or so­cial views seem to be pretty much in­de­pend­ent of the tech­nic­al prob­lems we work on.

Black­well: Yes. Al­though it’s hard to see how it could not have an in­flu­ence, isn’t it? I guess my life seems all of a pace to me but yet it’s hard to see where the con­nec­tions are. [Laughs.]

De­G­root: What do you see for your fu­ture?

Black­well: Well, just gradu­ally to wind down, grace­fully I hope. I ex­pect to get more in­ter­ested in com­put­ing. I have a little com­puter at home, and it’s a lot of fun just to play with it. In fact, I’d say that I play with this com­puter here in my of­fice at least as much as I do ser­i­ous work with it.

De­G­root: What do you mean by play?

Black­well: Let me give you an ex­ample. You know the al­gorithm for cal­cu­lat­ing square roots. You start with a guess and then you di­vide the num­ber by your guess and take the av­er­age of the two. That’s your next guess. That’s ac­tu­ally New­ton’s meth­od for find­ing square roots, and it works very well. Some­times do­ing stat­ist­ic­al work, you want to take the square root of a pos­it­ive-def­in­ite mat­rix. It oc­curred to me to ask wheth­er that al­gorithm works for find­ing the square root of a pos­it­ive-def­in­ite mat­rix. Be­fore I got in­ter­ested in com­put­ing, I would have tried to solve it the­or­et­ic­ally. But what did I do? I just wrote up a pro­gram and put it on the com­puter to see if it worked. [Goes to black­board.]

Sup­pose that you are giv­en the mat­rix \( M \) and want to find \( M^{1/2} \). Let \( G \) be your guess of \( M^{1/2} \). Then you new guess is \( (G+ MG^{-1})/2 \). You just it­er­ate this and see if it con­verges to \( M^{1/2} \). Now, Mor­rie, I want to show you what hap­pens. [Goes to ter­min­al.]

Let’s do it for a \( 3{\times}3 \) mat­rix. We’re go­ing to find the square root of a pos­it­ive-def­in­ite \( 3{\times}3 \) mat­rix. Now, if you hap­pen to have in mind a par­tic­u­lar \( 3{\times}3 \) pos­it­ive-def­in­ite mat­rix whose square root you want, you could enter it dir­ectly. I don’t hap­pen to have one in mind, but I do know a the­or­em: If you take any nonsin­gu­lar \( 3{\times}3 \) mat­rix \( A \), then \( AA^{\prime} \) is go­ing to be pos­it­ive def­in­ite. So I’m just go­ing to enter any \( 3{\times}3 \) nonsin­gu­lar mat­rix [put­ting some num­bers in­to the ter­min­al] and let \( M = AA^{\prime} \). Now, to see how far off your guess \( G \) is at any stage, you cal­cu­late the Eu­c­lidean norm of the \( 3{\times}3 \) mat­rix \( M - G^2 \). That’s what I call the er­ror. Let’s start out with the iden­tity mat­rix \( I \) as our ini­tial guess. We get a big er­ror, 29 mil­lion. Now let’s it­er­ate. Now the er­ror has dropped down to 7 mil­lion. It’s go­ing to keep be­ing di­vided by 4 for a long time. [Con­tinu­ing the it­er­a­tions for a while.] Now no­tice, we’re not bad. There’s our guess, there’s its square, there’s what we’re try­ing to get. It’s pretty close. In fact the er­ror is less than one. [Con­tinu­ing.] Now the er­ror is really small. Look at that, isn’t that beau­ti­ful? So there’s just no ques­tion about it. If you enter a mat­rix at ran­dom and it works, then that sort of settles it.

But now wait a minute, the story isn’t quite fin­ished yet. Let me just con­tin­ue these it­er­a­tions… Look at that! The er­ror got big­ger, and it keeps get­ting big­ger. [Con­tinu­ing.] Isn’t that lovely stuff?

De­G­root: What happened?

Black­well: Isn’t that an in­ter­est­ing ques­tion, what happened? Well, let me tell you what happened. Now you can study it the­or­et­ic­ally and ask, should it con­verge? And it turns out that it will con­verge if, and es­sen­tially only if, your first guess com­mutes with the mat­rix \( M \). That’s what the the­ory gives you. Well, my first guess was \( I \). It com­mutes with everything. So the pro­ced­ure the­or­et­ic­ally con­verges. However, when you cal­cu­late, you get round-off er­rors. By the way, if your first guess com­mutes, then all sub­sequent guesses will com­mute. However, be­cause of round-off er­rors, the matrices that you ac­tu­ally get don’t quite com­mute. There are two ways to do this. We could take \( MG^{-1} \) or we could have taken \( G^{-1}M \). Of course, if \( M \) com­mutes with \( G \), then it com­mutes with \( G^{-1} \) and it doesn’t mat­ter which way you do it. But if you don’t cal­cu­late \( G \) ex­actly at some stage, then it will not quite com­mute. And in fact, what I have here on the com­puter is a cal­cu­la­tion at each stage of the non­com­mut­ativ­ity norm. That shows you how dif­fer­ent \( MG^{-1} \) is from \( G^{-1}M \). I didn’t point those val­ues out to you, but they star­ted out as es­sen­tially 0, and then there was a 1 in the 15th place, and then a 1 in the 14th place, and so on. By this stage, the non­com­mut­ativ­ity norm has built up to the point where it’s hav­ing a siz­able in­flu­ence on the thing.

De­G­root: Is it go­ing to di­verge, or will it come back down after some time?

Black­well: It won’t come back down. It will reach a cer­tain size, and some­times it will stay there and some­times it will os­cil­late. That is, one \( G \) will go in­to a quite dif­fer­ent \( G \), but then that \( G \) will come back to the first one. You get peri­ods, neither one of them near the truth. So that’s what I mean by just play­ing, in­stead of sit­ting down like a ser­i­ous math­em­atician and try­ing to prove a the­or­em. Just try it out on the com­puter and see if it works. [Laughs.]

De­G­root: You can save a lot of time and trouble that way.

Black­well: Yes. I ex­pect to do more and more of that kind of play­ing. Maybe I get lazi­er as I get older. It’s fun, and it’s an in­ter­est­ing toy.

De­G­root: Do you find your­self grow­ing less rig­or­ous in your math­em­at­ic­al work?

Black­well: Oh, yes. I’m much more in­ter­ested in the ideas, and in truth un­der not-com­pletely-spe­cified hy­po­theses. I think that has happened to me over the last 20\,years. I can cer­tainly no­tice it now. Jim Mac­Queen was telling me about something that he had dis­covered. If you take a vec­tor and cal­cu­late the squared cor­rel­a­tion between that vec­tor and some per­muta­tion of it­self, then the av­er­age of that squared cor­rel­a­tion over all pos­sible per­muta­tions is some simple num­ber. Also, there was some ex­ten­sion of this res­ult to \( k \) vec­tors. He has an in­ter­est­ing al­geb­ra­ic iden­tity. He told me about it, but in­stead of my try­ing to prove it, I just se­lec­ted some num­bers at ran­dom and checked it on the com­puter. Also, I had a con­jec­ture that some stronger res­ult was true. I checked it for some num­bers se­lec­ted at ran­dom, and it turned out to be true for him and not true for what I had said. Well, that just settles it. Be­cause sup­pose you have an al­geb­ra­ic func­tion \( f(x_1,\dots,x_n) \), and you want to find out if it is identic­ally 0. Well, I think it’s true that any al­geb­ra­ic func­tion of \( n \) vari­ables is either identic­ally 0 or the set of \( x \)’s for which it is 0 is a set that has meas­ure 0. So you can just se­lect \( x \)’s at ran­dom and eval­u­ate \( f \). If you get 0, it’s identic­ally 0. [Laughs.]

De­G­root: You wouldn’t try even a second set of \( x \)’s?

Black­well: I did. [Laughs.]

De­G­root: Get­ting more con­ser­vat­ive in your old age.

Black­well: Yes. [Laughs.] I’ve been won­der­ing wheth­er in teach­ing stat­ist­ics the typ­ic­al set-up will be a lot of ter­min­als con­nec­ted to be a big cent­ral com­puter or a lot of small per­son­al com­puters. Let me turn the in­ter­view around. Do you have any thoughts about which way that is go­ing or which way it ought to go?

De­G­root: No, I don’t know. At Carne­gie-Mel­lon we are try­ing to have both worlds by hav­ing per­son­al com­puters but hav­ing them net­worked with each oth­er. There’s a plan at Carne­gie-Mel­lon that each stu­dent will have to have a per­son­al com­puter.

Black­well: Now when you say each stu­dent will have to have a per­son­al com­puter, where will it be phys­ic­ally loc­ated?

De­G­root: Wherever he lives.

Black­well: So that they would not ac­tu­ally use com­puters in class on the cam­pus?

De­G­root: Well, this will cer­tainly lessen the bur­den on the com­puters that are on cam­pus, but in a class you would have to have either ter­min­als or per­son­al com­puters for them.

Black­well: Yes. I’m pretty sure that in our de­part­ment in five years we’ll have sev­er­al classrooms in which each seat will be a work sta­tion for a stu­dent, and in front of him will be either a per­son­al com­puter or a ter­min­al. I’m not sure which, but that’s the way we’re go­ing to be in five years.

“I wouldn’t dream of talking about a theorem like that now”

De­G­root: A lot of people have seen you lec­ture on film. I know of at least one film you made for the Amer­ic­an Math­em­at­ic­al So­ci­ety that I’ve seen a few times. That’s a beau­ti­ful film, “Guess­ing at Ran­dom.”

Black­well: Yes. I now, of course, don’t think much of those ideas. [Laughs.]

De­G­root: There were some min­im­ax ideas in there…

Black­well: Yes, that’s right. That was some work that I did be­fore I be­came such a com­mit­ted Bayesian. I wouldn’t dream of talk­ing about a the­or­em like that now. But it’s a nice res­ult…

De­G­root: It’s a nice res­ult and it’s a beau­ti­ful film. De­livered so well.

Black­well: Let’s see… How does it go? If I were do­ing it now I would do a weak­er and easi­er Bayesian form of the the­or­em. You were giv­en an ar­bit­rary se­quence of 0s and 1s, and you were go­ing to ob­serve suc­cess­ive val­ues, and you had to pre­dict the next one. I proved cer­tain the­or­ems about how well you could do against every pos­sible se­quence. Well, now I would say that you have a prob­ab­il­ity dis­tri­bu­tion on the set of all se­quences. It’s a gen­er­al fact that if you’re a Bayesian, you don’t have to be clev­er. You just cal­cu­late. Sup­pose that some­body gen­er­ates an ar­bit­rary se­quence of 0s and 1s and it’s your job after see­ing each fi­nite seg­ment to pre­dict the next co­ordin­ate, 0 or 1, and we keep track of how well you do. Then I have to be clev­er and in­voke the min­im­ax the­or­em to de­vise a pro­ced­ure that asymp­tot­ic­ally does very well in a cer­tain sense. But now if you just put a pri­or dis­tri­bu­tion on the set of se­quences, any Bayesian knows what to do. You just cal­cu­late the prob­ab­il­ity of the next term be­ing a 1 giv­en the past his­tory. If it’s more than \( 1/2 \) you pre­dict a 1, if it’s less than \( 1/2 \) you pre­dict a 0. And that simple pro­ced­ure has the cor­res­pond­ing Bayesian ver­sion of all the things that I talked about in that film. You just know what is the right thing to do.

De­G­root: But how do you know that you’ll be do­ing well in re­la­tion to the real­ity of the se­quence?

Black­well: Well, the the­or­em of course says that you’ll do well for all se­quences ex­cept a set of meas­ure zero ac­cord­ing to your own pri­or dis­tri­bu­tion, and that’s all a Bayesian can hope for. That is, you have to give up something, but it just makes life so much neat­er. You just know that this is the right thing to do.

I en­countered the same phe­nomen­on in in­form­a­tion the­ory. There is a very good the­ory about how to trans­mit over a chan­nel, or how to trans­mit over a se­quence of chan­nels. The chan­nel may change from day to day, but if you know what it is every day, then you can trans­mit over it. Now sup­pose that the chan­nel var­ies in an ar­bit­rary way. That is, you have one of a fi­nite set of chan­nels, and every day you’re go­ing to be faced with one of these chan­nels. You have to put in the in­put, and a guy at the oth­er end gets an out­put. The ques­tion is, how well can you do against all pos­sible chan­nel se­quences?

You don’t really know what the weath­er is out there, so you don’t know what the in­ter­fer­ence is go­ing to be. But you want to have a code that trans­mits well for all pos­sible weath­er se­quences. If you just ana­lyze the prob­lem crudely, it turns out that you can’t do any­thing against all pos­sible se­quences. However, if you se­lect the code in a cer­tain ran­dom way, your over­all er­ror prob­ab­il­ity will be small for each weath­er se­quence. So, you see, it’s a nice the­or­et­ic­al res­ult but it’s un­ap­peal­ing. However, you can get ex­actly the same res­ult if you just put a prob­ab­il­ity dis­tri­bu­tion on the se­quences. Well, the weath­er could be any se­quence, but you ex­pect it to be sort of this way or that. Once you put a prob­ab­il­ity dis­tri­bu­tion on the set of se­quences, you no longer need ran­dom codes. And there is a de­term­in­ist­ic code that gives you that same res­ult that you got be­fore. So either you must be­have in a ran­dom way, or you must put a prob­ab­il­ity dis­tri­bu­tion on nature.

[Look­ing over a copy of his pa­per, Black­well, Breiman and Thomasi­an: “The ca­pa­cit­ies of cer­tain chan­nel classes un­der ran­dom cod­ing” [e4].] I don’t think we did the nice easy part. We be­haved the way Wald be­haved. You see, the min­im­ax the­or­em says that if for every pri­or dis­tri­bu­tion you can achieve a cer­tain gain, then there is a ran­dom way of be­hav­ing that achieves that gain for every para­met­er value. You don’t need the pri­or dis­tri­bu­tion; you can throw it away. Well, I’m afraid that in this pa­per, we in­voked the min­im­ax the­or­em. We said, take any pri­or dis­tri­bu­tion on the set of chan­nel se­quences. Then you can achieve a cer­tain rate of trans­mis­sion for that pri­or dis­tri­bu­tion. Now you in­voke the min­im­ax the­or­em and say, there­fore, there is a ran­dom­ized way of be­hav­ing which en­ables you to achieve that rate against every pos­sible se­quence. I now wish that we had stopped at the earli­er point. [Laughs.] For us, the Bayesian ana­lys­is was just a pre­lim­in­ary which, with the aid of the min­im­ax the­or­em, en­abled us to reach the con­clu­sions we were seek­ing. That was Wald’s view and that’s the view that we took in that pa­per. I’m sure I was already con­vinced that the Bayes ap­proach was the right ap­proach, but per­haps I de­ferred to my col­leagues.

De­G­root: That’s a very mild com­prom­ise. Go­ing bey­ond what was ne­ces­sary for a Bayesian res­ol­u­tion of the prob­lem.

Black­well: That’s right. Also, I sus­pect that I had Wolfow­itz in mind. He was a real ex­pert in in­form­a­tion the­ory, but he wouldn’t have been in­ter­ested in any­thing Bayesian.

De­G­root: What about the prob­lem of put­ting pri­or dis­tri­bu­tions on spaces of in­fin­ite se­quences, or func­tion spaces? Is that a prac­tic­al prob­lem and is there a prac­tic­al solu­tion to the prob­lem?

Black­well: I wouldn’t say for in­fin­ite se­quence, but I think it’s a very im­port­ant prac­tic­al prob­lem for large fi­nite se­quences, and I have no idea how to solve it. For ex­ample, you could think that the pat­tern-re­cog­ni­tion prob­lem that I was talk­ing about be­fore is like that. You see an im­age on a TV screen. That’s just a long fi­nite se­quence of 0s and 1s. And now you can ask how likely it is that that se­quence of 0s and 1s is in­ten­ded to be the fig­ure 7, say. Well, with some you’re cer­tain that it is, and some you’re cer­tain that it isn’t, and with oth­ers there’s a cer­tain prob­ab­il­ity that it is and a prob­ab­il­ity that it isn’t. The prob­lem of de­scrib­ing that prob­ab­il­ity dis­tri­bu­tion is a very im­port­ant prob­lem. And we’re just not close to know­ing how to de­scribe prob­ab­il­ity dis­tri­bu­tions over long fi­nite se­quences that cor­res­pond to our opin­ions.

De­G­root: Is there hope for get­ting such de­scrip­tions?

Black­well: I don’t know. But again it’s a stat­ist­ic­al prob­lem that is not go­ing to be solved by pro­fess­ors of stat­ist­ics in uni­versit­ies. It might be solved by people in ar­ti­fi­cial in­tel­li­gence, or by re­search­ers out­side uni­versit­ies.

“Just tell me one or two interesting things”

De­G­root: There’s an ar­gu­ment that says that, un­der the Bayesian ap­proach, you have to seek the op­tim­al de­cision and that’s of­ten just too hard to find. Why not settle for some oth­er ap­proach that re­quires much less struc­ture, and get a reas­on­ably good an­swer out of it, rather than an op­tim­al an­swer? Es­pe­cially in these kinds of prob­lems where we don’t know how to find the op­tim­al an­swer.

Black­well: Oh, I think every­body would be sat­is­fied with a reas­on­able an­swer. I don’t see that there’s more of an em­phas­is in the Bayesian ap­proach on op­tim­al de­cisions than in oth­er ap­proaches. I sep­ar­ate Bayesian in­fer­ence from Bayesian de­cision. The in­fer­ence prob­lem is just cal­cu­lat­ing a pos­teri­or dis­tri­bu­tion, and that has noth­ing to do with the par­tic­u­lar de­cision that you’re go­ing to make. The same pos­teri­or dis­tri­bu­tion could be used by many dif­fer­ent people mak­ing dif­fer­ent de­cisions. Even in cal­cu­lat­ing the pos­teri­or dis­tri­bu­tion, there is a lot of ap­prox­im­a­tion. It just can’t be done pre­cisely in in­ter­est­ing and im­port­ant cases. And I don’t think any­body who is in­ter­ested in ap­ply­ing Bayes’ meth­od would in­sist on something that’s pre­cise to the fifth decim­al place. That’s just the con­cep­tu­al frame­work in which you want to work, and which you want to ap­prox­im­ate.

De­G­root: That same spir­it can be car­ried over in­to the de­cision prob­lem, too. If you can’t find the op­tim­um de­cision, you settle for an ap­prox­im­a­tion to it.

Black­well: Right.

De­G­root: In your opin­ion, what have been the ma­jor break­throughs in the field of stat­ist­ics or prob­ab­il­ity through the years?

Black­well: It’s hard to say… I think that the­or­et­ic­al stat­ist­ic­al think­ing was just com­pletely dom­in­ated by Wald’s ideas for a long time. Charles Stein’s dis­cov­ery that \( \bar{X} \) is in­ad­miss­ible was cer­tainly im­port­ant. Herb Rob­bins’ work on em­pir­ic­al Bayes was also a big step, but pos­sibly in the wrong dir­ec­tion.

You know, I don’t view my­self as a states­man or a guy with a broad view of the field or any­thing like that. I just picked dir­ec­tions that in­ter­ested me and worked in them. And I have had fun.

De­G­root: Well, des­pite the fact that you didn’t choose the prob­lems for their im­pact or be­cause of their im­port­ance, a lot of people have gained a lot from your work.

Black­well: I guess that’s the way schol­ars should work. Don’t worry about the over­all im­port­ance of the prob­lem; work on it if it looks in­ter­est­ing. I think there’s prob­ably a suf­fi­cient cor­rel­a­tion between in­terest and im­port­ance.

De­G­root: One com­pon­ent of the in­terest is prob­ably that oth­ers are in­ter­ested in it, any­way.

Black­well: That’s a big com­pon­ent. You want to tell some­body about it after you’ve done it.

De­G­root: It has not al­ways been clear that the pub­lished pa­pers in our more ab­stract journ­als did suc­ceed in telling any­body about it.

Black­well: That’s true. But if you get the fel­low to give a lec­ture on it, he’ll prob­ably be able to tell you something about it. Es­pe­cially if you try to re­strict him: Look, don’t tell me everything. Just tell me one or two in­ter­est­ing things.

De­G­root: You have a repu­ta­tion as one of the finest lec­tur­ers in the field. Is that your style of lec­tur­ing?

Black­well: I guess it is. I try to em­phas­ize that with stu­dents. I no­tice that when stu­dents are talk­ing about their theses or about their work, they want to tell you everything they know. So I say to them: You know much more about this top­ic than any­body else. We’ll nev­er un­der­stand it if you tell it all to us. Pick just one in­ter­est­ing thing. Maybe two.

De­G­root: Thank you, Dav­id.