CC BY 4.0 （除特别声明或转载文章外）

如果这篇博客帮助到你，可以请我喝一杯咖啡~

## Game Tree Search

### Generalizing search problems

- So far: our search problems have assumed agent has complete control of environment
- state does not change unless the agent (robot) changes it

- Assumption not always reasonable
- other agents whose interests conﬂict with yours

- In these cases, we need to generalize our view of search to handle state changes that are not in the control of the agent

### What are key features of a game

- Players have their own interests
- Each player tries to alter the world so as to best beneﬁt itself
- They are hard because: How you should play depends on how you think the other person will play; but how they play depends on how they think you will play

### Game properties

- Two-player
- Discrete: Game states or decisions can be mapped on discrete values.
- Finite: There are only a ﬁnite number of states and possible decisions that can be made
- Zero-sum (“): Fully competitive
- if one player wins, the other loses an equal amount
- note that some games don’t have this property

- Deterministic: no chance involved
- no dice, or random deals of cards, or coin ﬂips, etc.

- Perfect information: all aspects of the state are fully observable
- e.g., no hidden cards.

### Extensive Form Two-Player Zero-Sum Games

- But R,P,S is a simple /one shot0(一次性) game
- single move each
- in game theory: a strategic or normal form game (策略或范式博弈)

- Many games extend over multiple moves
- turn-taking: players act alternatively
- e.g., chess, checkers, etc.
- in game theory: extensive form games (扩展形式博弈)

- We’ll focus on the extensive form
- that’s where the computational questions emerges

### Two-Player Zero-Sum Game – Definition

- Two players A (Max) and B (Min)
- Set of states S (a finite set of states of the game)
- An initial stateI∈S(where game begins)
- Terminal positionsT⊆S(Terminal states of the game:states where the game is over)
- Successors (or Succs - a function that takes a state as inputand returns a set of possible next states to whomever is dueto move)
- Utility (效益) or payoff (收益) functionV:T→R. (amapping from terminal states to real numbers that show howgood is each terminal state for player A and bad for player B.)Y. LiuIntro to AI8 / 48

### Two-Player Zero-Sum Game – Intuition

- Players alternate moves (starting with A, or Max)
- Game ends when some terminal t ∊T is reached

- A game state: a state-player pair
- Tells us what state we’re in and whose move it is

- Utility function and terminals replace goals
- A, or Max, wants to maximize the terminal payoff
- B, or Min, wants to minimize the terminal payoff

- Think of it as:
- A, or Max, gets V(t)and B, or Min, gets –V(t) for terminal node t
- This is why it’s called zero (or constant) sum

## The MiniMax Strategy

- Assume that the other player will always play their best move
- you always play a move that will minimize the payoff thatcould be gained by the other player.
- By minimizing the other player’s payoff, you maximize yourown.

- Note that if you know that Min will play poorly in somecircumstances, there might be a better strategy than MiniMax(i.e., a strategy that gives you a better payoff)
- Build full game tree (all leaves are terminals)
- Root is start state, edges are possible moves, etc.
- Label terminal nodes with utilities

- Back values upthe tree
- U(t)is defined for all terminals (part of input)
- U(n)= min {U(c) : c isa child of n} if nis a Min node
- U(n)= max {U(c) : cis a child of n} if nis a Max node

### DFS Minmax

- Building the entire game tree and backing up values gives each player their strategy.
- However, the game tree is exponential in size.
- Furthermore, as we will see later it is not necessary to know all of the tree.
- To solve these problems we find a depth-firstimplementation of minimax.
- We run the depth-first search after each move to compute what is the next move for the MAXplayer. (We could do the same for the MINplayer).
- This avoids explicitly representing the exponentially sized game tree: we just compute each move as it is needed.

### Pruning

- It is not necessary to examine entire tree to make correct MiniMax decision
- Assume depth-first generation of tree
- After generating value for only someof n’s children we can prove that we’ll never reach n in a MiniMax strategy.
- So we needn’t generate or evaluate any further children of n!

- Two types of pruning (cuts):
- pruning of max nodes (α-cuts)
- pruning of min nodes (β-cuts)

### Cutting Max Nodes （Alpha Cuts）

- At a Max node n:
- Let βbe the lowest value of n’s siblings examined so far (siblings to the left of n that have already been searched)
- Letαbe the highest value of n’s children examined so far (changes as children examined)

- While at a Max node n, if αbecomes ≥ βwe can stop expanding the children of n
- Min will never choose to move from n’s parent to nsince it would choose one of n’s lower valued siblings first

## Practical Matters

“Real” games are too large to enumerate tree

- e.g., chess branching factor is roughly 35
- Depth 10 tree: 2,700,000,000,000,000 nodes
- Even alpha-beta pruning won’t help here!

We must limit depth of search tree

- Can’t expand all the way to terminal nodes
- We must make heuristic estimates about the values of the(nonterminal) states at the leaves of the tree
- These heuristics are often called evaluation function

## Evaluation functions: basic requirements

- Should order the terminal states in the same way as the trueutility function.
- The computation must not take too long!
- For nonterminal states, the evaluation function should bestrongly correlated with the actual chances of winning.

## How to design evaluation functions

- Features of the states,e.g., in chess, the number of whitepawns(卒), black pawns, white queens, etc.
- The features, taken together, define various tags orequivalence classes of states: the states in each category havethe same values for all the features.
- Any given category will contain some states that lead to wins,some that lead to draws, and some that lead to losses.
- e.g., suppose our experience suggests that 72% of the statesin a category lead to a win, 20% to a loss, and 8% to a draw.
- Then a reasonable evaluation for states in the category is theexpected utility value:0.72·1 + 0.20·(−1) + 0.08·0 = 0.52.
- However, there are too many tags
- Most evaluation functions compute separate numerical contributions from each feature and then combine them
- e.g., each pawn is worth 1, a knight(马) or bishop(象) isworth 3, a rook(车)) 5, and the queen 9
- Mathematically, a weighted linear functionEval(s) =w1·f1(s) +…+wn·fn(s) =∑ni=1wi·fi(s)
- Deep Blue used over 8000 features
- This involves a strong assumption: the contribution of eachfeature is independent of the values of the other features.
- The assumption may not hold, hence nonlinear combinationsare also used
- The features and weights are not part of the rules of chess!
- They come from centuries of human chess-playing experience.
- In case this kind of experience is not available, the weights ofthe evaluation function can be estimated by machine learning techniques.