[1] D. Blackwell :
“On the functional equation of dynamic programming ,”
J. Math. Anal. Appl.
2 : 2
(April 1961 ),
pp. 273–276 .
MR
0126090
Zbl
0096.35503
article
BibTeX
@article {key0126090m,
AUTHOR = {Blackwell, David},
TITLE = {On the functional equation of dynamic
programming},
JOURNAL = {J. Math. Anal. Appl.},
FJOURNAL = {Journal of Mathematical Analysis and
Applications},
VOLUME = {2},
NUMBER = {2},
MONTH = {April},
YEAR = {1961},
PAGES = {273--276},
DOI = {10.1016/0022-247X(61)90035-X},
NOTE = {MR:0126090. Zbl:0096.35503.},
ISSN = {0022-247x},
}
[2] D. Blackwell :
“Discrete dynamic programming ,”
Ann. Math. Stat.
33 : 2
(1962 ),
pp. 719–726 .
MR
0149965
Zbl
0133.12906
article
Abstract
BibTeX
We consider a system with a finite number \( S \) of states \( s \) , labeled by the integers \( {}1 \) , \( 2, \dots \) , \( S \) . Periodically, say once a day, we observe the current state of the system, and then choose an action \( a \) from a finite set \( A \) of possible actions. As a joint result of the current state \( s \) and the chosen action \( a \) , two things happen:
we receive an immediate income \( i(s,a) \) and
the system moves to a new state \( s^{\prime} \) with the probability of a particular new state \( s^{\prime} \) given by a function \( q = q(s^{\prime}\mid s,a) \) .
Finally there is specified a discount factor \( \beta \) , \( 0 \leq \beta < 1 \) , so that the value of unit income \( n \) days in the future is \( \beta^n \) . Our problem is to choose a policy which maximizes our total expected income. This problem, which is an interesting special case of the general dynamic programming problem, has been solved by Howard in his excellent book [Howard 1960]. The case \( \beta = 1 \) , also studied by Howard, is substantially more difficult. We shall obtain in this case results slightly beyond those of Howard, though still not complete. Our method, which treats \( \beta = 1 \) as a limiting case of \( \beta < 1 \) , seems rather simpler than Howard’s.
@article {key0149965m,
AUTHOR = {Blackwell, David},
TITLE = {Discrete dynamic programming},
JOURNAL = {Ann. Math. Stat.},
FJOURNAL = {Annals of Mathematical Statistics},
VOLUME = {33},
NUMBER = {2},
YEAR = {1962},
PAGES = {719--726},
DOI = {10.1214/aoms/1177704593},
NOTE = {MR:0149965. Zbl:0133.12906.},
ISSN = {0003-4851},
}
[3] D. Blackwell :
“Memoryless strategies in finite-stage dynamic programming ,”
Ann. Math. Stat.
35 : 3
(1964 ),
pp. 863–865 .
MR
0162642
Zbl
0127.36406
article
Abstract
BibTeX
Given three sets \( X \) , \( Y \) , \( A \) and a bounded function \( u \) on \( Y \times A \) , suppose that we are to observe a point \( (x,y)\in X\times Y \) and then select any point \( a \) we please from \( A \) , after which we receive an income \( u(y,a) \) . In trying to maximize our income, is there any point to letting our choice of \( a \) depend on \( x \) as well as on \( y \) ? We shall give a formalization to this question in which sometimes there is a point. If \( (x,y) \) is selected according to a known distribution \( Q \) , however, we show that dependence on \( x \) is pointless, and apply the result to obtain memoryless strategies in finite-state dynamic programming problems.
@article {key0162642m,
AUTHOR = {Blackwell, David},
TITLE = {Memoryless strategies in finite-stage
dynamic programming},
JOURNAL = {Ann. Math. Stat.},
FJOURNAL = {Annals of Mathematical Statistics},
VOLUME = {35},
NUMBER = {3},
YEAR = {1964},
PAGES = {863--865},
DOI = {10.1214/aoms/1177703586},
NOTE = {MR:0162642. Zbl:0127.36406.},
ISSN = {0003-4851},
}
[4] D. Blackwell :
“Probability bounds via dynamic programming ,”
pp. 277–280
in
Stochastic processes in mathematical physics and engineering
(New York, 30 April–2 May 1963 ).
Edited by R. E. Bellman .
Proceedings of Symposia in Applied Mathematics XVI .
American Mathematical Society (Providence, RI ),
1964 .
MR
0163347
Zbl
0139.13804
incollection
People
BibTeX
@incollection {key0163347m,
AUTHOR = {Blackwell, David},
TITLE = {Probability bounds via dynamic programming},
BOOKTITLE = {Stochastic processes in mathematical
physics and engineering},
EDITOR = {Bellman, Richard Ernest},
SERIES = {Proceedings of Symposia in Applied Mathematics},
NUMBER = {XVI},
PUBLISHER = {American Mathematical Society},
ADDRESS = {Providence, RI},
YEAR = {1964},
PAGES = {277--280},
NOTE = {(New York, 30 April--2 May 1963). MR:0163347.
Zbl:0139.13804.},
ISBN = {9780821813164},
}
[5] D. Blackwell :
“Discounted dynamic programming ,”
Ann. Math. Stat.
36 : 1
(1965 ),
pp. 226–235 .
MR
0173536
Zbl
0133.42805
article
BibTeX
@article {key0173536m,
AUTHOR = {Blackwell, David},
TITLE = {Discounted dynamic programming},
JOURNAL = {Ann. Math. Stat.},
FJOURNAL = {Annals of Mathematical Statistics},
VOLUME = {36},
NUMBER = {1},
YEAR = {1965},
PAGES = {226--235},
DOI = {10.1214/aoms/1177700285},
NOTE = {MR:0173536. Zbl:0133.42805.},
ISSN = {0003-4851},
}
[6] D. Blackwell :
“Positive dynamic programming ,”
pp. 415–418
in
Proceedings of the fifth Berkeley symposium on mathematical statististics and probability
(Berkeley, CA, 21 June–18 July 1965 ),
vol. I: Theory of statistics .
Edited by L. M. Le Cam and J. Neyman .
University of California Press (Berkeley and Los Angeles ),
1967 .
MR
0218104
Zbl
0189.19804
incollection
People
BibTeX
@incollection {key0218104m,
AUTHOR = {Blackwell, David},
TITLE = {Positive dynamic programming},
BOOKTITLE = {Proceedings of the fifth {B}erkeley
symposium on mathematical statististics
and probability},
EDITOR = {Le Cam, Lucien Marie and Neyman, Jerzy},
VOLUME = {I: Theory of statistics},
PUBLISHER = {University of California Press},
ADDRESS = {Berkeley and Los Angeles},
YEAR = {1967},
PAGES = {415--418},
NOTE = {(Berkeley, CA, 21 June--18 July 1965).
MR:0218104. Zbl:0189.19804.},
}
[7] D. Blackwell :
“On stationary policies ,”
J. R. Stat. Soc., Ser. C
133 : 1
(1970 ),
pp. 33–37 .
With discussion.
A Russian translation was published in Mathematika 14 :2 (1970) .
MR
0449711
article
Abstract
BibTeX
In dynamic programming, stationary policies are those whose choice at different times depends only on the current state occupied. The author proves that if there is an optimal policy, there is an optimal policy that is stationary. Several general comments are made, based on the facts established.
@article {key0449711m,
AUTHOR = {Blackwell, David},
TITLE = {On stationary policies},
JOURNAL = {J. R. Stat. Soc., Ser. C},
FJOURNAL = {Journal of the Royal Statistical Society.
Series C. Applied Statistics},
VOLUME = {133},
NUMBER = {1},
YEAR = {1970},
PAGES = {33--37},
URL = {http://www.jstor.org/pss/2343810},
NOTE = {With discussion. A Russian translation
was published in \textit{Mathematika}
\textbf{14}:2 (1970). MR:0449711.},
ISSN = {0035-9254},
}
[8] D. Blackwell, D. Freedman, and M. Orkin :
“The optimal reward operator in dynamic programming ,”
Ann. Probab.
2 : 5
(1974 ),
pp. 926–941 .
MR
0359818
Zbl
0318.49021
article
Abstract
People
BibTeX
Consider a dynamic programming problem with analytic state space \( S \) , analytic constraint set \( A \) , and semi-analytic reward function \( r(x, P, y) \) for \( (x, P)\in A \) and \( y\in S \) : namely, \( \{r > a\} \) is an analytic set for all \( a \) . Let \( Tf \) be the optimal reward in one move, with the modified reward function \( r(x, P, y) + f(y) \) . The optimal reward in \( n \) moves is shown to be \( T^n0 \) , a semi-analytic function on \( S \) . It is also shown that for any \( n \) and positive \( \varepsilon \) , there is an \( \varepsilon \) -optimal strategy for the \( n \) -move game, measurable on the \( \sigma \) -field generated by the analytic sets.
@article {key0359818m,
AUTHOR = {Blackwell, D. and Freedman, D. and Orkin,
M.},
TITLE = {The optimal reward operator in dynamic
programming},
JOURNAL = {Ann. Probab.},
FJOURNAL = {The Annals of Probability},
VOLUME = {2},
NUMBER = {5},
YEAR = {1974},
PAGES = {926--941},
DOI = {10.1214/aop/1176996558},
NOTE = {MR:0359818. Zbl:0318.49021.},
ISSN = {0091-1798},
}
[9] D. Blackwell :
“The stochastic processes of Borel gambling and dynamic programming ,”
Ann. Statist.
4 : 2
(1976 ),
pp. 370–374 .
MR
0405557
Zbl
0331.93055
article
Abstract
BibTeX
Associated with any Borel gambling model \( G \) or dynamic programming model \( D \) is a corresponding class of stochastic processes \( M(G) \) or \( M(D) \) . Say that \( G(D) \) is regular if there is a \( D(G) \) with \( M(D) = M(G) \) . Necessary and sufficient conditions for regularity are given, and it is shown how to modify any model slightly to achieve regularity.
@article {key0405557m,
AUTHOR = {Blackwell, David},
TITLE = {The stochastic processes of {B}orel
gambling and dynamic programming},
JOURNAL = {Ann. Statist.},
FJOURNAL = {The Annals of Statistics},
VOLUME = {4},
NUMBER = {2},
YEAR = {1976},
PAGES = {370--374},
DOI = {10.1214/aos/1176343412},
NOTE = {MR:0405557. Zbl:0331.93055.},
ISSN = {0090-5364},
}