You are on page 1of 7

Solution of a Satisficing Model for Random Payoff Games Author(s): R. G. Cassidy, C. A. Field, M. J. L.

Kirby Reviewed work(s): Source: Management Science, Vol. 19, No. 3, Theory Series (Nov., 1972), pp. 266-271 Published by: INFORMS Stable URL: http://www.jstor.org/stable/2629509 . Accessed: 10/11/2011 23:56
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Management Science.

http://www.jstor.org

MANAGEMENT SCIENCE
Vol. 19, No. 3, November, Printed in U.S.A. 1972

SOLUTION OF A SATISFICING MODEL FOR RANDOM PAYOFF GAMES*t


R. G. CASSIDY,T C. A. FIELD? AND M. J. L. KIRBY?

In this paper, we consider a "satisficing" criterion to solve two-person zero-sum games with random payoffs. In particular, a player wants to maximize the payoff level he can achieve with a specified confidence. The problem reduces to solving a nonconvex mathematical progr amming problem. The main result shows that solving this problem is equivalent to finding the root of an equation whose values are determined by solving a linear problem. This linear problem results from maximizing the confidence with fixed payoff level.

I. Introduction

In this paper we consider a two-person, zero-sum game with m X n payoff matrix A = {aij}, where A is a random matrix with known distribution function F. The random variable aij represents the payoff from player II to player I when player I plays row i and player II plays column j. The actual payoff will be aij(w), where w is selected from the domain of aij according to the known marginal probability distribution of the random variable a j . Given such a payoff matrix A, the question arises as to what is meant by playing the game in an optimal way. Because the actual payoff on any play of the game depends not only on the row i selected by player I and the column j selected by player II but also on the sample point aij(w), the players cannot guarantee themselves a certain payoff level. They are in effect forced to gamble. The question of how one gambles in an optimal way is one which is open to considerable discussion and interpretation. Hence the definitions of optimality which we use below are, at least partly, subjective definitions. One of the most obvious methods of handling this type of game is to replace aij by its expected value and then solve the resulting deterministic game. This technique can be justified in the context of utility theory as developed by von Neumann and Morgenstern. In this theory, the payoff is expressed in terms of the utility attached to the possible outcomes; and it can be shown [8] that a player's optimal strategy is the one which maximizes the expected value of this utility. Our approach, in this paper, is to consider alternate optimality criteria for situations in which the payoffs are not necessarily given in terms of a utility function. A model was developed in [2] based on a satisficing criterion of optimality in which a player maximized the probability of his winning a specified amount no matter what strategy his opponent used. Mathematically, this can be expressed as solving
(1.1) maxx miny P(Z(X, Y) _ A), =

where Z(X, Y) is the observed payoff to player I when he uses mixed strategy X
(x1,
...

, x,)

and player II uses mixed strategy Y

(yi,

...

, yn). By noting that

Z(X, Y) is the random variable aij with probability xiyj, it follows that (1.1) can be
* Received January 1971; revised October 1971.

t The research underlying this report was partly supported 0126-0009, NRC grant A 4024 and DRB grant 9701-18. t Carnegie-Mellon University. ? Dalhousie University. 266

by ONR Contract N00014-67-A-

SOLUTION

OF A SATISFICING

MODEL FOR RANDOM

PAYOFF

GAMES

267

rewritten as the linear programming problem: maxx a (1.2) S.T.

ZlixiP(aij_ > )
571-1 Xi =1, xi _

a a Vj,
O, Vi)

, x) is a mixed strategy for player I and P(aij ? A) is the where X probability that the random variable aij is greater than or equal to a prescribed payoff level d. It is worth noting that (1.2) can be derived directly by means of utility theory if we assume a utility function of the form

= (x1, x2,

u(K) =0

if K<,

= 1 if K _A. In ?1I of this paper, we develop an alternative to the satisficing criterion used in (1.2) by considering the case in which a player wants to maximize his payoff level subject to the constraint that he receive that payoff with at least a specified level of confidence. Mathematically the problem can be expressed as max S.T.
A

miny P(Z(X, Y) > A) > a,

where a is specified by the player. (1.3) is a nonlinear programming problem, and an algorithm for its solution is given in the following section. The result, given in Theorem 2.1, that the model (1.3) is the "inverse of the model (1.2) " forms the basis of the algorithm. In ?III, a numerical example is studied. Before proceeding, we mention briefly some related work. A satisficing objective has been used by Charnes, Kirby and Raike in [3] where they study the problem max S.T. ( P(Z'7.=1xiaij > 3) ? a Vj.

The results we obtain for solving (1.3) do not require independence of the random variables aij as was required in [3]. Harsanyi [5] deals with problems in which the players have incomplete information about the game. Thomas and David in [10] consider the distribution of the value of a random payoff game in terms of the distribution of the payoff matrix A. In [6], Hou considers a random payoff game from the point of view of the distribution of the observations as we have done. However, he studies the case where the payoffs are vector valued and the objective is to maximize the long-run expected payoff. Finally, we mention stochastic games as introduced by Shapley. In these games, the randomness appears in the transition from one stage to the next and the solutions maximize the expected payoff. For recent work and references, the reader is referred to Pollatschek and Avi-Itzhak [9]. II. A Model for Maximizing a Player's Payoff Level Suppose that player I wants to maximize his payoff level (, subject to the constraint that he achieve the payoff level ( with at least a prescribed probability a, no matter what strategy his opponent uses. We suppose in this section tihat a > 0, since other-

268

R. G. CASSIDY, C. A. FIELD AND M. J. L. KIRBY

wise the problem is unbounded. This problem can be expressed mathematically as max 3 S.T. miny P(Z(X, Y) _ ) ? a,

where Z(X, Y) is observed payoff to player I when he uses strategy X and player II uses strategy Y. Since Z(X, Y) is the random variable ai1 with probability xiyj, (2.1) is equivalent to max ,

(2.2)

S.T.

Z=
Z=I

xiP(ai1
Xi-1,

,B)> a Vj,
xi

> O,

vi,

where a is a given constant 0 < a < 1. In order to examine the relationship that model (2.2) has to (1.2) we first define the following:' Let v(:) be the optimal value of a in problem (1.2) with fixed payoff level 13.Similarly let vi(a) be the optimal value of 13in problem (2.2) with fixed confidence level a for 0 < a _? 1.
THEOREM

2.1. Let v(A) and v,(a) be definedas above. Then v,(v(18)) > 13, V(v,(a))
=

(1) (2)
PROOF.

a.

Let X(a, 1) = {x I

Z== xifij(13) ?> a Vj and xi _


0}

0 Vi and

xi E==

l}

where fij(1) --P(a*j

_ 13).Then and v,(a) = sup{13X(a,1A) #

v(d) =sup Ia IX(a, ,) P

0}

where 0 denotes the empty set. Since fij( ) is nondecreasing, X (a, *) is nonincreasing for each a; i.e., f13> 132 implies X(a, 1,1) C X(a, 12). Also, X(., 13) is nonincreasing for each 1: ai > a2 implies X(ai , 1) C X (a2, 13). For any specific value of 13,our definitions give X(v(13), 1') # 0 if 1' < 13.Hence, which is (1). For (2), the definition of vl(.) and the nonincreasing charv,(v(1)) ? 13 acter of X(., vi(a)) imply X(a', v,(a)) $ 0 if a' < a. Hence (2a) v (v?(a) ) ? a.
=

On the other hand, for any particular value of a, there is j

j, say, with

E't-IXifii(v(a))
Therefore, a' > a implies (2b)
E.=,

a,

x E X(a, vi(a)).
=

xi fii(v,(a)) < a so X(a', V(ca)) v(vi (a) ) _ a.

0 if a' > a and

Then (2a) and (2b) yield (2). Suppose that fij is strictly monotone for every i and j. Then a proof similar to that of (2b) shows
COROLLARY 2.1. If the distributionfunctions go, then v,(v(1)) = 10.

of the ai3's are strictly right monotoneat

This result implies that when the conditions of Corollary 2.1 are satisfied, if we solve
1 We would like to thank the referee for pointing out a simpler proof of Theorem 2.1.

SOLUTION

OF A SATISFICING

MODEL FOR RANDOM

PAYOFF

GAMES

269

(1.2) with a fixed payoff level fi and obtain a corresponding optimal solution ao, as and then solve (2.2) with this confidence level ao , we obtain (3o the optimal value of ,B.This fact will be used in the solution algorithm given below. Because of the form of the constraints of (2.2), the problem is not only nonliniearin and (, but also the region of feasible solutions may the variables xi (i = 1, 2, ... , mn) not even form a convex set. The following example exhibits this pathology: Example 2.1. In (2.2) let a = 0.6 and n = 2. We assume that all is N(O, 1) i.e., normally distributed with mean 0 and variance 1, a12 is N(1, 1), a2l is N(2, 1) and a22 is N(0, 1). The feasible region consists of the points (X1, X2, d) which satisfy

x,P(aij ? () +

X2P(a2j >

(3) > 0.6 for j

1, 2,

xl + x2 = 1.

It is straightforward to verify that the points a = (0, 1, -0.255) and b = (0.4, 0.6, 0.315) are feasible. Consider the point c = 1/2a + 1/2b = (0.2, 0.8, 0.03) which is a convex combination of the feasible points a and b. But 0.2P(a12 ? 0.03) + 0.8P(a22 > 0.03) = 0.5855 < 0.6 and hence c is not feasible so that we have a noniconvexfeasible region. < . We note that if (x, Oo) is feasible for (2.2), then (x, () is also feasible for all ( ?( From this we conclude that every local optimum of (2.2) is also a global optimum. In order to develop algorithms for solving (2.2) we first show that the optimal S must lie in an interval determined by the 1 - ath fractile points of the set of aij . The interval in which an optimal (3must lie is given by the following theorem:
LEMMA

2.1. The optimal solution, (3* to (2.2) with given confidencelevel a lies in the [maxi minj C(i, j), miiij maxi C(i, j)],

interval wvhere C(i, j) is the 1


-

ath fractile point of a1j . That is,


C(i,j) = sup {y: P(aij > y) > a}.

the In our notation we have suppr-essed dependenceof C(i, j) on a.


PROOF. This result follows by nioting that maxi minj C(i, j) represents the payoff level player I can guarantee if he is restricted to pure strategies and minj maxi C(i, j) is the level which player II can prevent player I exceeding by using pure strategies. The following theorem establishes the key result for the solution of (2.2).
THEOREM 2.2. If the random variables aij have continuous, strictly monotonedistributhe tion functions init interval [maxi minmC(i, j), minj maxi C(i, j)] then (* is a solution level of (2.2) wvith coanfidence a iff (* is a root of the equationv((3) - a = 0. ( In addition, if x* is an optimal strategy for (1.2) wvith = (3*, then it is an optimal confidencelevel a. strategyfor (2.2) wvith

PROOF. We note that, by the definition of vi, (3* is optimal for (2.2) with confidence level a if and only if (* = vi(a). ('" We have that (3* = vi(a). Using (ii) of Theorem 2.1, it follows that v((*)
=

v(vi(a))

= a.

a = v((*) implies that vi(v(*)) = vi(a). But by Corollary 2.1, vi(v((*)) = * Hence vi(a) = (3*as required. The last part of the theorem follows by inotingthat x appears in both ( 1.2). and (2.2) onlv in the constraint set and the constraints are identical for both problems.

270

R. G. CASSIDY,

C. A. FIELD

AND

M. J. L. KIRBY

The implications of this theorem are that, in order to solve (2.2), it suffices to find a root of the function v(A) - a where the values of the function are determined by solving a linear programme. We note that, under the conditions of Theorem 2.2, v(a) - a is a strictly monotone decreasing continuous function of A on [maxi minj C(i, j), minj maxi C(i, j)]. This follows by observing that v(Sl) is the value of the game with payoff matrix {P(aij _ 3)} which is monotone decreasing and continuous as a function of d. (See [1, p. 56].) Therefore in order to specify an algorithm, it suffices to give a technique for finding the root of a continuous, monotone decreasing function. Since there are a number of techniques available in the numerical analysis literature, we do not specify any details here. For the calculations in the example of ?111 we use the method of "regulafalsi" (see Collatz [4, ?18.1-2]) to determine the root of v(,) - a numerically. We now consider briefly the case in which the distribution functions of the aij are simply continuous. The following result is now true:
THEOREM 2.3. If the random variables aij have continuous distributionfunctions over the interval [maxi minj C(i, j), minj maxi C(i, j)] then d* is a solution of (2.2) with confidencelevela iff A* = max [6 v(A) = a]. PROOF. "=" A* optimal implies that v(3*) = a as in Theorem 2.4. It is then clear that d* must be the maximal root of v(A) - a, since (2.2) is a maximization problem in A. ",#" The definition of d* implies that v(3* + e) < v(O*) for all e > 0. From the proof of Corollary 2.1, we see that this is the result we need to guarantee that

vl(v(f*))

= A*.

The result follows as in Theorem 2.2. An algorithm for this case would require not only that we find a root of v([) - a, but that we find the maximal root. Since the values of v(A) are determined by solving a linear programme, the techniques of parametric programming apply to determine the maximum a such that v(a) = ax. A first approach would be to determine whether any of the distribution functions occurring in the basis are flat. If so, then it may be possible to increase the value of the root parametrically until the value of the linear programme is less than ax. Using known results from linear programming, the variation can be characterized in terms of discrete jumps corresponding to basis changes. III. Examples We consider a situation in which two competing firms are marketing the same product. We assume that the demand for the product is fixed and that a consumer in this market will buy the product from one of the two firms. The goal of each firm is to choose from among its possible courses of action that one which will attract as many customers as possible. Each course of action is identified as a type of marketing expenditure, for example, visual advertising, free samples etc. Thus the mixed strategy X can be used by firm I to determine how to divide its budget among its various marketing alternatives. Since the consumers gained by one firm can be thought of as lost to its competitor, we are in the situation of a constant sum game. However, the game is not deterministic, since the response of the consumer to a given course of action by each of the firms is not predictable with certainty. Thus the number of consumers gained by firm I when

SOLUTION

OF A SATISFICING

MODEL FOR RANDOM

PAYOFF

GAMES

271

it uses its ith alternative and its competitor, firm II, uses its jth alternative is a random variable aij . For simplicity we assume that each firm has only two marketing alternatives, so that for firm I, i = 1, 2 and for firm JJ,j = 1, 2. From past statistical evidence the two firms determine that the aij are distributed as follows: all is N(O, 1), a12 is N(1, 1), a2l is N(2, 1) and a22is N(O, 1) where N(,, o-2) refers to the normal distribu2. tion with mean A and variance o- We suppose the aij are expressed in units of hundred thousands. We assume that firm I wants to attract as many customers as possible with a confidence level of 60 %/o a = 0.6). In order to solve the problem numerically, we use (or the method of "regula falsi" to determine the root of v(A) - 0.6 (see [4, ?18.1-2]). In addition, we specify that : is optimal when it determines a value v(A) which is within e = 0.01 of the prespecified ao = 0.6. To proceed, we note that by Lemma 2.1, the optimal A lies between maxi minj C(i, j) = -0.25 and minj maxi C(i, j) = 0.75 where C(i, j) is the 60th percentile of aij . Starting with /1 = 0, and solving (1.2), the result is v(fl) = 0.698. Since 0.698 > 0.6, this indicates that the optimal a is greater than /1 . An iteration of the method of "regula falsi" yields a value 02 = 0.315. Solving (1.2) with 02 = 0.315, we obtain X = (0.6, 0.4) and v(f2) = 0.6045 which is within the specified tolerance region. Hence the optimal strategy for firm I is X* = (0.6, 0.4) yielding a payoff level of $315,000 with 60 % confidence.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
BOHENBLUST,

H. F. AND KARLIN, S., "Solutions of Discrete Two Person Games," Ann. of Math. Studies 24, Princeton Univ. Press, Princeton, N. J., 1950, pp. 51-72. R. G., "On Random Payoff Games," Ph.D. dissertation, Dalhousie University, CASSIDY, Halifax, Nova Scotia, Canada. A., KIRBY, M. J. L. AND RAIKE, W. M., "Zero Zero Chance Constrained Games," CHARNES, Theory of Prob. Appl., Vol. 13 (1968), pp. 663-681. COLLATZ, L., Functional Analysis and Numerical Mathematics, Academic Press, New York, 1966. HARSANYI, J., "Games with Incomplete Information Played by 'Bayesian' Players, I-III, Part I. The Basic Model," Management Science, Vol. 14 (November 1967), pp. 159-182. Hou, T. F., "Weak Approachability in a Two Person Game," Ann. of Math. Stat., Vol. 40 (1969), pp. 789-813. KUHN, H. W. AND TUCKER, A. W., "Contributions to the Theory of Games," Ann of Math. Studies 24, Princeton University Press, Princeton, New Jersey, 1950. 1969. OWEN, G., Game Theory, W. B. Saunders Co., Philadelphia, B., "Algorithms for Stochastic Games with Geometrical POLLATSCHERK, M. A. AND AvI-ITZHAK, Interpretation," Maragement Science, Vol. 15, No. 7 (March 1969), pp. 399-415. Ann. of Math. Stat., Vol. 38 THOMAS, D. R. AND DAVID, H. T., "Game Value Distribution," (1967), pp. 242-250.

You might also like