人工智能（四）

10 Sep 2019 5701字 20分
CC BY 4.0 （除特别声明或转载文章外）
如果这篇博客帮助到你，可以请我喝一杯咖啡~

Game Tree Search

So far: our search problems have assumed agent has complete control of environment
- state does not change unless the agent (robot) changes it
Assumption not always reasonable
- other agents whose interests conﬂict with yours
In these cases, we need to generalize our view of search to handle state changes that are not in the control of the agent

Players have their own interests
Each player tries to alter the world so as to best beneﬁt itself
They are hard because: How you should play depends on how you think the other person will play; but how they play depends on how they think you will play

Two-player
Discrete: Game states or decisions can be mapped on discrete values.
Finite: There are only a ﬁnite number of states and possible decisions that can be made
Zero-sum (“): Fully competitive
- if one player wins, the other loses an equal amount
- note that some games don’t have this property
Deterministic: no chance involved
- no dice, or random deals of cards, or coin ﬂips, etc.
Perfect information: all aspects of the state are fully observable
- e.g., no hidden cards.

But R,P,S is a simple /one shot0(一次性) game
- single move each
- in game theory: a strategic or normal form game (策略或范式博弈)
Many games extend over multiple moves
- turn-taking: players act alternatively
- e.g., chess, checkers, etc.
- in game theory: extensive form games (扩展形式博弈)
We’ll focus on the extensive form
- that’s where the computational questions emerges

Two players A (Max) and B (Min)
Set of states S (a finite set of states of the game)
An initial stateI∈S(where game begins)
Terminal positionsT⊆S(Terminal states of the game:states where the game is over)
Successors (or Succs - a function that takes a state as inputand returns a set of possible next states to whomever is dueto move)
Utility (效益) or payoff (收益) functionV:T→R. (amapping from terminal states to real numbers that show howgood is each terminal state for player A and bad for player B.)Y. LiuIntro to AI8 / 48

Players alternate moves (starting with A, or Max)
- Game ends when some terminal t ∊T is reached
A game state: a state-player pair
- Tells us what state we’re in and whose move it is
Utility function and terminals replace goals
- A, or Max, wants to maximize the terminal payoff
- B, or Min, wants to minimize the terminal payoff
Think of it as:
- A, or Max, gets V(t)and B, or Min, gets –V(t) for terminal node t
- This is why it’s called zero (or constant) sum

Assume that the other player will always play their best move
- you always play a move that will minimize the payoff thatcould be gained by the other player.
- By minimizing the other player’s payoff, you maximize yourown.
Note that if you know that Min will play poorly in somecircumstances, there might be a better strategy than MiniMax(i.e., a strategy that gives you a better payoff)
Build full game tree (all leaves are terminals)
- Root is start state, edges are possible moves, etc.
- Label terminal nodes with utilities
Back values upthe tree
- U(t)is defined for all terminals (part of input)
- U(n)= min {U(c) : c isa child of n} if nis a Min node
- U(n)= max {U(c) : cis a child of n} if nis a Max node

Building the entire game tree and backing up values gives each player their strategy.
However, the game tree is exponential in size.
Furthermore, as we will see later it is not necessary to know all of the tree.
To solve these problems we find a depth-firstimplementation of minimax.
We run the depth-first search after each move to compute what is the next move for the MAXplayer. (We could do the same for the MINplayer).
This avoids explicitly representing the exponentially sized game tree: we just compute each move as it is needed.

It is not necessary to examine entire tree to make correct MiniMax decision
Assume depth-first generation of tree
- After generating value for only someof n’s children we can prove that we’ll never reach n in a MiniMax strategy.
- So we needn’t generate or evaluate any further children of n!
Two types of pruning (cuts):
- pruning of max nodes (α-cuts)
- pruning of min nodes (β-cuts)

At a Max node n:
- Let βbe the lowest value of n’s siblings examined so far (siblings to the left of n that have already been searched)
- Letαbe the highest value of n’s children examined so far (changes as children examined)
While at a Max node n, if αbecomes ≥ βwe can stop expanding the children of n
- Min will never choose to move from n’s parent to nsince it would choose one of n’s lower valued siblings first

“Real” games are too large to enumerate tree

We must limit depth of search tree

Can’t expand all the way to terminal nodes
We must make heuristic estimates about the values of the(nonterminal) states at the leaves of the tree
These heuristics are often called evaluation function

Should order the terminal states in the same way as the trueutility function.
The computation must not take too long!
For nonterminal states, the evaluation function should bestrongly correlated with the actual chances of winning.

Features of the states,e.g., in chess, the number of whitepawns(卒), black pawns, white queens, etc.
The features, taken together, define various tags orequivalence classes of states: the states in each category havethe same values for all the features.
Any given category will contain some states that lead to wins,some that lead to draws, and some that lead to losses.
e.g., suppose our experience suggests that 72% of the statesin a category lead to a win, 20% to a loss, and 8% to a draw.
Then a reasonable evaluation for states in the category is theexpected utility value:0.72·1 + 0.20·(−1) + 0.08·0 = 0.52.
However, there are too many tags
Most evaluation functions compute separate numerical contributions from each feature and then combine them
e.g., each pawn is worth 1, a knight(马) or bishop(象) isworth 3, a rook(车)) 5, and the queen 9
Mathematically, a weighted linear functionEval(s) =w1·f1(s) +…+wn·fn(s) =∑ni=1wi·fi(s)
Deep Blue used over 8000 features
This involves a strong assumption: the contribution of eachfeature is independent of the values of the other features.
The assumption may not hold, hence nonlinear combinationsare also used
The features and weights are not part of the rules of chess!
They come from centuries of human chess-playing experience.
In case this kind of experience is not available, the weights ofthe evaluation function can be estimated by machine learning techniques.