If You Roll a Dice and Get Whats on the Face You Can Choose to Roll Again

Question

This problem is solved using the theory of optimal stopping for Markov bondage. I will explain some of the theory, so plough to your specific question. Y'all tin can learn more than about this fascinating topic in Chapter 4 of Introduction to Stochastic Processes by Gregory F. Lawler.

Call back of a Markov concatenation with country space $\cal S$ as a game.

A payoff function $f:{\cal S}\to[0,\infty)$ assigns a monetary "payoff'' to each state of the Markov chain. This is the corporeality you would collect if you lot stop playing with the chain in that state.

In dissimilarity, the value part $v:{\cal Southward}\to[0,\infty)$ is divers as the greatest expected payoff possible from each starting point; $$v(ten)=\sup_T \mathbb{Due east}_x(f(X_T)).$$ There is a single optimal strategy $T_{\rm opt}$ so that $v(x)=\mathbb{E}_x(f(X_{T_{\rm opt}})).$ Information technology can be described every bit $T_{\rm opt}:=\inf(n\geq 0: X_n\in{\cal Eastward})$, where ${\cal East}=\{x\in {\cal S}\mid f(x)=v(x)\}$. That is, yous should stop playing as soon equally you striking the set $\cal Eastward$.

Example:

Y'all roll an ordinary die with outcomes $1,two,iii,4,5,six$. You tin can keep the value or roll over again. If you lot scroll, you lot can go along the new value or roll a tertiary time. After the third gyre you lot must stop. Y'all win the amount showing on the die. What is the value of this game?

Solution: The land space for the Markoff chain is $${\cal S}=\{\mbox{First}\}\cup\left\{(n,d): n=2,1,0; d=one,ii,3,4,v,half dozen\correct\}.$$ The variable $n$ tells you how many rolls you have left, and this decreases by ane every fourth dimension you roll. Annotation that us with $north=0$ are absorbing.

You tin think of the state space as a tree, the concatenation moves forward along the tree until it reaches the end.

enter image description here

The function $v$ is given higher up in green, while $f$ is in ruby.

The payoff function $f$ is zero at the start, and otherwise equals the number of spots on $d$.

To detect the value function $v$, let'south start at the right mitt side of the tree. At $n=0$, we accept $v(0,d)=d$, and nosotros calculate $v$ elsewhere by working backwards, averaging over the next roll and taking the maximum of that and the electric current payoff. Mathematically, nosotros use the property $v(10)=\max(f(x),(Pv)(10))$ where $Pv(ten)=\sum_{y\in {\cal S}} p(x,y)5(y).$

The value of the game at the start is \$4.66. The optimal strategy is to proceed playing until you reach a state where the red number and dark-green number are the same.

If You Roll a Dice and Get Whats on the Face You Can Choose to Roll Again

0 Response to "If You Roll a Dice and Get Whats on the Face You Can Choose to Roll Again"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel