Celebratio Mathematica

[1]D. Blackwell: “On the functional equation of dynamic programming,” J. Math. Anal. Appl. 2 : 2 (April 1961), pp. 273–276. MR 0126090 Zbl 0096.35503 article

[2]D. Blackwell: “Discrete dynamic programming,” Ann. Math. Stat. 33 : 2 (1962), pp. 719–726. MR 0149965 Zbl 0133.12906 article

We consider a system with a finite number \( S \) of states \( s \), labeled by the integers \( {}1 \), \( 2, \dots \), \( S \). Periodically, say once a day, we observe the current state of the system, and then choose an action \( a \) from a finite set \( A \) of possible actions. As a joint result of the current state \( s \) and the chosen action \( a \), two things happen:

we receive an immediate income \( i(s,a) \) and
the system moves to a new state \( s^{\prime} \) with the probability of a particular new state \( s^{\prime} \) given by a function \( q = q(s^{\prime}\mid s,a) \).

Finally there is specified a discount factor \( \beta \), \( 0 \leq \beta < 1 \), so that the value of unit income \( n \) days in the future is \( \beta^n \). Our problem is to choose a policy which maximizes our total expected income. This problem, which is an interesting special case of the general dynamic programming problem, has been solved by Howard in his excellent book [Howard 1960]. The case \( \beta = 1 \), also studied by Howard, is substantially more difficult. We shall obtain in this case results slightly beyond those of Howard, though still not complete. Our method, which treats \( \beta = 1 \) as a limiting case of \( \beta < 1 \), seems rather simpler than Howard’s.

[3]D. Blackwell: “Memoryless strategies in finite-stage dynamic programming,” Ann. Math. Stat. 35 : 3 (1964), pp. 863–865. MR 0162642 Zbl 0127.36406 article

[4]D. Blackwell: “Probability bounds via dynamic programming,” pp. 277–280 in Stochastic processes in mathematical physics and engineering (New York, 30 April–2 May 1963). Edited by R. E. Bellman. Proceedings of Symposia in Applied Mathematics XVI. American Mathematical Society (Providence, RI), 1964. MR 0163347 Zbl 0139.13804 incollection

[5]D. Blackwell: “Discounted dynamic programming,” Ann. Math. Stat. 36 : 1 (1965), pp. 226–235. MR 0173536 Zbl 0133.42805 article

[6]D. Blackwell: “Positive dynamic programming,” pp. 415–418 in Proceedings of the fifth Berkeley symposium on mathematical statististics and probability (Berkeley, CA, 21 June–18 July 1965), vol. I: Theory of statistics. Edited by L. M. Le Cam and J. Neyman. University of California Press (Berkeley and Los Angeles), 1967. MR 0218104 Zbl 0189.19804 incollection

Lucien Marie Le Cam	Related
Jerzy Neyman	Related

[7]D. Blackwell: “On stationary policies,” J. R. Stat. Soc., Ser. C 133 : 1 (1970), pp. 33–37. With discussion. A Russian translation was published in Mathematika 14:2 (1970). MR 0449711 article

[8]D. Blackwell, D. Freedman, and M. Orkin: “The optimal reward operator in dynamic programming,” Ann. Probab. 2 : 5 (1974), pp. 926–941. MR 0359818 Zbl 0318.49021 article

David Freedman	Related
Michael Lawrence Orkin	Related

[9]D. Blackwell: “The stochastic processes of Borel gambling and dynamic programming,” Ann. Statist. 4 : 2 (1976), pp. 370–374. MR 0405557 Zbl 0331.93055 article

David H. Blackwell

Dynamic programming