�ݺ�ߣ

G51I AI Introduction to AI Andrew Parkes Game Playing 2: Alpha-Beta Search and General Issues

Game Playing – Summary So Far Game tree describes the possible sequences of play is a graph if we merge together identical states Minimax: utility values assigned to the leaves Values “backed up” the tree by MAX node takes max value of children MIN node takes min value of children Can read off best lines of play Depth Bounded Minimax utility of terminal states estimated using an “evaluation function”

= terminal position = agent = opponent 1 MAX MIN MAX A Recap of (depth-bounded) minimax: D E F G 4 -5 -5 1 -7 2 -3 -8 4 1 2 -3 1 -3 B C

Game Playing – Beyond Minimax Efficiency of the search Game trees are very big Evaluation of positions is time-consuming How can we reduce the number of nodes to be evaluated? “ alpha-beta search” Bounding the depth of minimax has deficiencies Why? How can we mitigate these deficiencies?

Game Playing – Improving Efficiency Suppose that we are doing depth-bounded minimax We have a game tree to create and then insert the minimax values in order to find the values for our possible moves from the current position

Game Playing – Minimax using DFS The presentation of minimax was done by “backing up from the leaves” – a “bottom-up” breadth-first search. This has the disadvantage of taking a lot of space Compare this with the space usage issues for DFS vs. BFS in earlier lectures If we can do minimax using DFS then it is likely to take a lot less space Minimax can be implemented using DFS But reduced space is not the only advantage:

A 6 5 8 MAX MIN 6 >=8 MAX <=6 = agent = opponent On discovering util( D ) = 6 we know that util( B ) <= 6 On discovering util( J ) = 8 we know that util( E ) >= 8 STOP! What else can you deduce now!? STOP! What else can you deduce now!? Can stop expansion of E as best play will not go via E Value of K is irrelevant – prune it! STOP! What else can you deduce now!? B C D E H I J K

Game Playing – Pruning nodes If we are scanning the tree using DFS then there was no point in evaluating node K Whatever the value of K there cannot be any rational sequence of play that would go through it Node K can be pruned from the search: i.e. just not selected for further expansion “ At node B then MIN will never select E; because J is better than D for MAX and so MIN must not allow MAX to have that opportunity” Q. So what! It’s just one node? A. Suppose that the depth limit were such that K was far from the depth bound. Then evaluating K corresponds to a large sub-tree. Such prunings can save an exponential amount of work

Game Playing – Self-Study Suggestion MAKE SURE YOU THOROUGHLY UNDERSTAND THE REASONING FOR WHY NODE K CAN BE PRUNED. GO THROUGH THE ANIMATION UNTIL IT IS TOTALLY CLEAR! (EXAM QUESTIONS OFTEN ASK YOU TO APPLY SUCH REASONING TO A GAME TREE.)

Game Playing – Improving Efficiency Suppose that we were doing Breadth-First Search, would you still be able to prune nodes in this fashion? NO! Because the pruning relied on the fact that we had already evaluated node D by evaluating the tree underneath D This form of pruning is an example of “alpha-beta pruning” and relies on doing a DEPTH-FIRST search of the depth bounded tree

Game Playing – Node-ordering Suppose that nodes K and J were evaluated in the opposite order can we expect that we would be able to do a similar pruning? The answer depends on the value of K Suppose that K had a value of 2 and is expanded first:

A 6 5 2 MAX MIN 6 >=2 MAX <=6 = agent = opponent On discovering util( D ) = 6 we know that util( B ) <= 6 On discovering util( K ) = 2 we know that util( E ) >= 2 STOP! What else can you deduce now!? STOP! What else can you deduce now!? Can NOT stop expansion of E as best play might still go via E Value of J is relevant – no pruning STOP! What else can you deduce now!? 8 B C D E H I K J

Game Playing – Node-ordering When K had a value of 2 and was expanded first then we did not get to prune a child of E To maximise pruning we want to first expand those children that are best for the parent cannot know which ones are really best use heuristics for the “best-first” ordering If this is done well then alpha-beta search can effectively double the depth of search tree that is searchable in a given time Effectively reduces the branching factor in chess from about 30 to about 8 This is an enormous improvement!

Game Playing – Improving Efficiency The games are symmetric so is natural that we can also do a similar pruning with the MIN and MAX roles reversed The reasoning is identical other than for the reversal of roles Can deduce that some other nodes can not be involved in the line of best play

A 6 5 8 MAX MIN 6 >=8 MAX 6 = agent = opponent 2 1 2 <=2 >=6 B C D E F G H I J K L M

Game Playing – Alpha-Beta Implementation The pruning was based on using the results of the “DFS so far” to deduce upper and lower bounds on the values of nodes Conventionally these bounds are stored in terms of two parameters alpha α beta β

Game Playing – Alpha-Beta Implementation α values are stored with each MAX node each MAX node is given a value of alpha that is the current best lower-bound on its final value initially is - ∞ to represent that nothing is known as we do the search then α at a node can increase, but it can never decrease – it always gets better for MAX

Game Playing – Alpha-Beta Implementation β values are stored with each MIN node each MIN node is given a value of beta that is the current best upper-bound on its final value initially is + ∞ to represent that nothing is known as we do the search then β at a node can decrease, but it can never increase – it always gets better for MIN

A 6 5 8 MAX MIN 6 α = 8 MAX β = 6 = agent = opponent 2 1 2 2 6 beta pruning as α (E) > β (B) alpha pruning as β (C) < α (A) Alpha-beta Pruning B C D E F G H I J K L M

Game Playing – Deficiencies of Minimax The bound on the depth of search is artificial and can lead to many anomalies. We only consider two: Non-quiescence “quiescent” = inactive, quiet, calm, … Horizon Effect (These deficiencies also apply to alpha-beta as it is just a more efficient way to do the same calculation as minimax)

Game Playing – Non-Quiescence Suppose that change depth bound from k to k+1 – i.e. expand one more move The values given to a node might change wildly

= terminal position = agent = opponent 4 direct, but 1 by minimax MIN MAX A B Utility values of “terminal” positions obtained by an evaluation function Example of non-quiescence Direct evaluation does not agree with one more expansion and then using of minimax 1 -3 B C

Game Playing – Quiescence Search Suppose that change depth bound from k to k+1 – i.e. expand one more move The values given to a node might change wildly Keep on increasing the depth bound in that region of the game tree until the values become “quiescent” (“quiet”, i.e. stop being “noisy”)

Game Playing – Quiescence Search In quiescence search the depth bound is not applied uniformly but adapted to the local situation in this case so that the values are not wildly changing Many other improvements to minimax also work by adapting to depth-bound to get better results and/or do less work

Game Playing – Horizon Effect Sometimes there is a bad thing, “X”, such that X cannot be avoided X can be delayed by some pointless moves X is not detected by the evaluation function In this case depth-limited minimax can be fooled It will use the pointless moves to push X beyond the depth limit, “horizon”, in which case it will “not see X”, and ignore it. This can lead the search to take bad moves because it ignores the inevitability of X

Game Playing – Beyond alpha-beta We looked briefly at two problems “ non-quiescence”, and the “horizon effect” and one solution “quiescence search” To seriously implement a game Deep-blue, chinook, etc it is necessary to solve many such problems! Good programs implement many techniques and get them to work together effectively

Game Playing – Game Classification So far have only considered games such as chess, checkers, and nim. These games are: Fully observable Both players have full and perfect information about the current state of the game Deterministic There is no element of chance The outcome of making a sequence of moves is entirely determined by the sequence itself

Game Playing – Game Classification Fully vs. Partially Observable Some games are only partially observable Players do not have access to the full “state of the game” E.g. card games – you typically cannot see all of your opponents cards

Game Playing – Game Classification Deterministic vs. Stochastic In many games there is some element of chance E.g. Backgammon – throw dice in order to move (You are expected to be aware of these simple classifications)

Game Playing – Summary Game Trees Minimax utility values propagate back to the root Bounded Depth Minimax Alpha-Beta Search uses DFS with depth bound ordering of nodes is important in order to maximise pruning Deficiencies of Bounded Depth Search Non-quiescence Combat using quiescence search Horizon Problem Combat with ?? (look it up!)

�ݺ�ߣ

Badiya haihn

More Related Content

Badiya haihn