2DBarcode.info

barcode for Visual Studio .NET Regret Minimization and Game Theory in VS .NET Encoder Code-39 in VS .NET Regret Minimization and Game Theory

4.4 Regret Minimization and Game Theory Using Barcode printer for Visual Studio .NET Control to generate, create bar code image in .NET framework applications.VS .NET bar code for Visual Studio .NET In this section VS .NET barcode we outline the connection between regret minimization and central concepts in game theory. We start by showing that in a two-player constant sum game, a player with external regret sublinear in T will have an average payoff that is at least the value of the game, minus a vanishing error term.

For a general game, we will see that if all the players use procedures with sublinear swap-regret, then they will converge to an approximate correlated equilibrium. We also show that for a player who minimizes swap-regret, the frequency of playing dominated actions is vanishing..

bar code 128 4.4.1 Game Theoretic Model We start with th bar code for .NET e standard de nitions of a game (see also 1). A game G = M, (Xi ), (si ) has a nite set M of m players.

Player i has a set Xi of N actions and a loss function si : Xi ( j =i Xj ) [0, 1] that maps the action of player i and the actions of the other players to a real number. (We have scaled losses to [0, 1].) The joint action space is X = Xi .

We consider a player i that plays a game G for T time steps using an online procedure ON. At time step t, player i plays a distribution (mixed action) Pit , while the other players t play the joint distribution P i . We denote by tON the loss of player i at time t, i.

e.,. regret minimization and game theory Ex P t [si (x t barcode for .NET )], and its cumulative loss is LT = T tON .2 It is natural to de ne, for ON t=1 t t t t player i at time t, the loss vector as t = ( t1 , .

. . , tN ), where tj = Ex i P i [si (xj , x i )].

t Namely, j is the loss player i would have observed if at time t it had played action xj . The cumulative loss of action xj Xi of player i is LT = T tj , and LT = min j t=1 minj LT . j.

4.4.2 Constant Sum Games and External Regret Minimization A two-player con .NET framework bar code stant sum game G = {1, 2}, (Xi ), (si ) has the property that for some constant c, for every x1 X1 and x2 X2 we have s1 (x1 , x2 ) + s2 (x1 , x2 ) = c. It is well known that any constant sum game has a well-de ned value (v1 , v2 ) for the game, and player i {1, 2} has a mixed strategy which guarantees that its expected loss is at most vi , regardless of the other player s strategy.

(See Owen, 1982, for more details.) In such games, external regret-minimization procedures provide the following guarantee. Theorem 4.

9 Let G be a constant sum game with game value (v1 , v2 ). If player i {1, 2} plays for T steps using a procedure ON with external regret R, then its 1 average loss T LT is at most vi + R/T . ON proof Let q be the mixed strategy corresponding to the observed frequencies t t of the actions player 2 has played; that is, qj = T P2,j /T , where P2,j is the t=1 weight player 2 gives to action j at time t.

By the theory of constant sum games, for any mixed strategy q of player 2, player 1 has some action xk X1 such that Ex2 q [s1 (xk , x2 )] v1 (see Owen, 1982). This implies, in our setting, that if player 1 has always played action xk , then its loss would be at most v1 T . Therefore LT LT v1 T .

Now, using the fact that player 1 is playing a procedure ON min k with external regret R, we have that LT LT + R v1 T + R . ON min Thus, using a procedure with regret R = O( T log N) as in Theorem 4.6 will guarantee average loss at most vi + O( (log N)/T ).

In fact, we can use the existence of external regret minimization algorithms to prove the minimax theorem of two-player zero-sum games. For 1 1 player 1, let vmin = minx1 X1 maxz (X2 ) Ex2 z [s1 (x1 , x2 )] and vmax = maxx2 X2 1 minz (X1 ) Ex1 z [s1 (x1 , x2 )]. That is, vmin is the best loss that player 1 can guaran1 tee for itself if it is told the mixed action of player 2 in advance.

Similarly, vmax is the best loss that player 1 can guarantee to itself if it has to go rst in selecting a mixed action, and player 2 s action may then depend on it. The minimax theorem states that 1 1 2 1 vmin = vmax . Since s1 (x1 , x2 ) = s2 (x1 , x2 ) we can similarly de ne vmin = vmax and 2 1 vmax = vmin .

In the following we give a proof of the minimax theorem based on the existence 1 1 of external regret algorithms. Assume for contradiction that vmax = vmin + for some 1 1 > 0 (it is easy to see that vmax vmin ). Consider both players playing a regret.

Alternatively, w e could consider xit as a random variable distributed according to Pit , and similarly discuss the expected loss. We prefer the above presentation for consistency with the rest of the chapter..

Copyright © 2DBarcode.info . All rights reserved.