# 人工智能（四）

### Generalizing search problems

• So far: our search problems have assumed agent has complete control of environment
• state does not change unless the agent (robot) changes it
• Assumption not always reasonable
• other agents whose interests conﬂict with yours
• In these cases, we need to generalize our view of search to handle state changes that are not in the control of the agent

### What are key features of a game

• Players have their own interests
• Each player tries to alter the world so as to best beneﬁt itself
• They are hard because: How you should play depends on how you think the other person will play; but how they play depends on how they think you will play

### Game properties

• Two-player
• Discrete: Game states or decisions can be mapped on discrete values.
• Finite: There are only a ﬁnite number of states and possible decisions that can be made
• Zero-sum (“): Fully competitive
• if one player wins, the other loses an equal amount
• note that some games don’t have this property
• Deterministic: no chance involved
• no dice, or random deals of cards, or coin ﬂips, etc.
• Perfect information: all aspects of the state are fully observable
• e.g., no hidden cards.

### Extensive Form Two-Player Zero-Sum Games

• But R,P,S is a simple /one shot0(一次性) game
• single move each
• in game theory: a strategic or normal form game (策略或范式博弈)
• Many games extend over multiple moves
• turn-taking: players act alternatively
• e.g., chess, checkers, etc.
• in game theory: extensive form games (扩展形式博弈)
• We’ll focus on the extensive form
• that’s where the computational questions emerges

### Two-Player Zero-Sum Game – Definition

• Two players A (Max) and B (Min)
• Set of states S (a finite set of states of the game)
• An initial stateI∈S(where game begins)
• Terminal positionsT⊆S(Terminal states of the game:states where the game is over)
• Successors (or Succs - a function that takes a state as inputand returns a set of possible next states to whomever is dueto move)
• Utility (效益) or payoff (收益) functionV:T→R. (amapping from terminal states to real numbers that show howgood is each terminal state for player A and bad for player B.)Y. LiuIntro to AI8 / 48

### Two-Player Zero-Sum Game – Intuition

• Players alternate moves (starting with A, or Max)
• Game ends when some terminal t ∊T is reached
• A game state: a state-player pair
• Tells us what state we’re in and whose move it is
• Utility function and terminals replace goals
• A, or Max, wants to maximize the terminal payoff
• B, or Min, wants to minimize the terminal payoff
• Think of it as:
• A, or Max, gets V(t)and B, or Min, gets –V(t) for terminal node t
• This is why it’s called zero (or constant) sum

## The MiniMax Strategy

• Assume that the other player will always play their best move
• you always play a move that will minimize the payoff thatcould be gained by the other player.
• By minimizing the other player’s payoff, you maximize yourown.
• Note that if you know that Min will play poorly in somecircumstances, there might be a better strategy than MiniMax(i.e., a strategy that gives you a better payoff)
• Build full game tree (all leaves are terminals)
• Root is start state, edges are possible moves, etc.
• Label terminal nodes with utilities
• Back values upthe tree
• U(t)is defined for all terminals (part of input)
• U(n)= min {U(c) : c isa child of n} if nis a Min node
• U(n)= max {U(c) : cis a child of n} if nis a Max node

### DFS Minmax

• Building the entire game tree and backing up values gives each player their strategy.
• However, the game tree is exponential in size.
• Furthermore, as we will see later it is not necessary to know all of the tree.
• To solve these problems we find a depth-firstimplementation of minimax.
• We run the depth-first search after each move to compute what is the next move for the MAXplayer. (We could do the same for the MINplayer).
• This avoids explicitly representing the exponentially sized game tree: we just compute each move as it is needed.

### Pruning

• It is not necessary to examine entire tree to make correct MiniMax decision
• Assume depth-first generation of tree
• After generating value for only someof n’s children we can prove that we’ll never reach n in a MiniMax strategy.
• So we needn’t generate or evaluate any further children of n!
• Two types of pruning (cuts):
• pruning of max nodes (α-cuts)
• pruning of min nodes (β-cuts)

### Cutting Max Nodes （Alpha Cuts）

• At a Max node n:
• Let βbe the lowest value of n’s siblings examined so far (siblings to the left of n that have already been searched)
• Letαbe the highest value of n’s children examined so far (changes as children examined)
• While at a Max node n, if αbecomes ≥ βwe can stop expanding the children of n
• Min will never choose to move from n’s parent to nsince it would choose one of n’s lower valued siblings first

## Practical Matters

“Real” games are too large to enumerate tree

• e.g., chess branching factor is roughly 35
• Depth 10 tree: 2,700,000,000,000,000 nodes
• Even alpha-beta pruning won’t help here!

We must limit depth of search tree

• Can’t expand all the way to terminal nodes
• We must make heuristic estimates about the values of the(nonterminal) states at the leaves of the tree
• These heuristics are often called evaluation function

## Evaluation functions: basic requirements

• Should order the terminal states in the same way as the trueutility function.
• The computation must not take too long!
• For nonterminal states, the evaluation function should bestrongly correlated with the actual chances of winning.

## How to design evaluation functions

• Features of the states,e.g., in chess, the number of whitepawns(卒), black pawns, white queens, etc.
• The features, taken together, define various tags orequivalence classes of states: the states in each category havethe same values for all the features.
• Any given category will contain some states that lead to wins,some that lead to draws, and some that lead to losses.
• e.g., suppose our experience suggests that 72% of the statesin a category lead to a win, 20% to a loss, and 8% to a draw.
• Then a reasonable evaluation for states in the category is theexpected utility value:0.72·1 + 0.20·(−1) + 0.08·0 = 0.52.
• However, there are too many tags
• Most evaluation functions compute separate numerical contributions from each feature and then combine them
• e.g., each pawn is worth 1, a knight(马) or bishop(象) isworth 3, a rook(车)) 5, and the queen 9
• Mathematically, a weighted linear functionEval(s) =w1·f1(s) +…+wn·fn(s) =∑ni=1wi·fi(s)
• Deep Blue used over 8000 features
• This involves a strong assumption: the contribution of eachfeature is independent of the values of the other features.
• The assumption may not hold, hence nonlinear combinationsare also used
• The features and weights are not part of the rules of chess!
• They come from centuries of human chess-playing experience.
• In case this kind of experience is not available, the weights ofthe evaluation function can be estimated by machine learning techniques.