Index   Main   Namespaces   Classes   Hierarchy   Annotated   Files   Compound   Global   Pages

# SgUctSearch.h

Go to the documentation of this file.
00001 //----------------------------------------------------------------------------
00002 /** @file SgUctSearch.h
00003     Class SgUctSearch and helper classes.
00004 */
00005 //----------------------------------------------------------------------------
00006
00007 #ifndef SG_UCTSEARCH_H
00008 #define SG_UCTSEARCH_H
00009
00010 #include <fstream>
00011 #include <vector>
00012 #include <boost/scoped_array.hpp>
00013 #include <boost/shared_ptr.hpp>
00019 #include "SgBlackWhite.h"
00020 #include "SgBWArray.h"
00021 #include "SgTimer.h"
00022 #include "SgUctTree.h"
00023 #include "SgMpiSynchronizer.h"
00024
00025 #define SG_UCTFASTLOG 1
00026 #if SG_UCTFASTLOG
00027 #include "SgFastLog.h"
00028 #endif
00029
00030 //----------------------------------------------------------------------------
00031
00032 /** @defgroup sguctgroup Monte Carlo tree search
00033     Game-independent Monte Carlo tree search using UCT.
00034
00035     The main class SgUctSearch keeps a tree with statistics for each node
00036     visited more than a certain number of times, and then continues with
00037     random playout (not necessarily uniform random).
00038     Within the tree, the move with the highest upper confidence bound is
00039     chosen according to the basic UCT formula:
00040     @f[ \bar{X}_j + c \sqrt{\frac{\log{n}}{T_j(n)}} @f]
00041     with:
00042     - @f$j @f$ the move index
00043     - @f$X_{j,\gamma} @f$ reward for move @f$j @f$ at sample @f$\gamma @f$
00044     - @f$n @f$ number of times the father node was visited
00045     - @f$T_j(n) @f$ number of times the move has been played
00046     - @f$c @f$ an appropriate constant
00047
00048     References:
00049     - Kocsis, Szepesvari:
00050       <a href="http://zaphod.aml.sztaki.hu/papers/ecml06.pdf">
00051       Bandit based Monte-Carlo Planning</a>
00052     - Auer, Cesa-Bianchi, Fischer:
00053       <a href="http://homes.dsi.unimi.it/~cesabian/Pubblicazioni/ml-02.pdf">
00054       Finite-time Analysis of the Multiarmed Bandit Problem</a>
00055     - Gelly, Wang, Munos, Teytaud:
00056       <a href="http://hal.inria.fr/docs/00/12/15/16/PDF/RR-6062.pdf">
00057       Modification of UCT with patterns in Monte-Carlo Go</a>
00058     - Silver, Gelly:
00059       <a href=
00060       "http://imls.engr.oregonstate.edu/www/htdocs/proceedings/icml2007/papers/387.pdf">
00061       Combining Online and Offline Knowledge in UCT</a>
00062
00063     @see
00064     - @ref sguctsearchlockfree
00065     - @ref sguctsearchweights
00066     - @ref sguctsearchproven
00067 */
00068
00069 /** @page sguctsearchlockfree Lock-free mode in SgUctSearch
00070
00071 The basic idea of the lock-free mode in SgUctSearch is to share a tree between
00072 multiple threads without using any locks. Because of specific requirements on
00073 the memory model of the hardware platform, lock-free mode is an optional
00074 feature of the SgUctSearch and needs to be enabled explicitly.
00075
00076 @section sguctsearchlockfreetree Modifying the Tree Structure
00077
00078 The first change to make the lock-free search work is in the handling of
00079 concurrent changes to the structure of the tree. SgUctSearch never deletes
00080 nodes during a search; new nodes are created in a pre-allocated memory array.
00081 In the lock-free algorithm, each thread has its own memory array for creating
00082 new nodes. Only after the nodes are fully created and initialized, are they
00083 linked to the parent node. This can cause some memory overhead, because if
00084 several threads expand the same node only the children created by the last
00085 thread will be used in future simulations. It can also happen that some of the
00087 lost.
00088
00089 The child information of a node consists of two variables: a pointer to the
00090 first child in the array, and the number of children. To avoid that another
00091 thread sees an inconsistent state of these variables, all threads assume that
00092 the number of children is valid if the pointer to the first child is not null.
00093 Linking a parent to a new set of children requires first writing the number of
00094 children, then the pointer to the first child. The compiler is prevented from
00095 reordering the writes by declaring these variables as volatile.
00096
00097 @section sguctsearchlockfreevalues Updating Values
00098
00099 The move and RAVE values are stored in the nodes as counts and mean values.
00100 The mean values are updated using an incremental algorithm. Updating them
00101 without protection by a mutex can cause updates of the mean to be lost with or
00102 without increment of the count, as well as updates of the mean occurring
00103 without increment of the count. It could also happen that one thread reads the
00104 count and mean while they are written by another thread, and the first thread
00105 sees an erroneous state that exists only temporarily. In practice, these
00106 faulty updates occur with a low probability and will have only a small effect
00107 on the counts and mean values. They are intentionally ignored.
00108
00109 The only problematic case is if a count is zero, because the mean value is
00110 undefined if the count is zero, and this case has a special meaning at several
00111 places in the search. For example, the computation of the values for the
00112 selection of children in the in-tree phase distinguishes three cases: if the
00113 move count and RAVE count is non-zero, the value will be computed as a
00114 weighted linear combination of both mean values, if the move count is zero,
00115 only the RAVE mean is used, and if both counts are zero, a configurable
00116 constant value, the first play urgency, is used. To avoid this problem, all
00117 threads assume that a mean value is only valid if the corresponding count is
00118 non-zero. Updating a value requires first writing the new mean value, then the
00119 new count. The compiler is prevented from reordering the writes by declaring
00120 the counts and mean values as volatile.
00121
00122 @section sguctsearchlockfreeplatform Platform Requirements
00123
00124 There are some requirements on the memory model of the platform to make the
00125 lock-free search algorithm work. Writes of certain basic types (size_t, int,
00126 float, pointers) must be atomic. Writes by one thread must be seen by other
00127 threads in the same order. The IA-32 and Intel-64 CPU architectures, which are
00128 used in most modern standard computers, guarantee these assumptions. They also
00129 synchronize CPU caches after writes. (See
00131 Intel 64 and IA-32 Architectures Software Developer's Manual</a>, chapter
00132 7.1 Locked Atomic Operations and 7.2 Memory Ordering).
00133 */
00134
00135 /** @page sguctsearchweights Estimator weights in SgUctSearch
00136     The weights of the estimators (move value, RAVE value) are chosen by
00137     assuming that the estimators are uncorrelated and modeling the mean
00138     squared error of estimator @f$i @f$ by a function that depends on the
00139     number of samples and parameter constants, which represent the variance
00140     and bias of the estimator and need to be determined experimentally:
00141
00142     @f[
00143     w_i = \frac{1}{Z} \frac{1}{{\rm MSE}_i}
00144     \qquad Z = \sum_i \frac{1}{{\rm MSE}_i}
00145     \qquad {\rm MSE}_i = \frac{c_{\rm variance}}{N_i} + c_{\rm bias}^2
00146     @f]
00147
00148     Note that this formula is nearly equivalent to the formula suggested
00149     by David Silver on the Computer Go mailing list for the case of two
00150     estimators (move value and RAVE value) and used in newer versions of MoGo.
00151     However, MoGo uses the measured variance of the current RAVE value (for
00152     both the move weight and RAVE weight) instead variance parameter
00153     constants.
00154
00155     The formula is then reformulated to use different constants that describe
00156     the initial steepness and final asymptotic value of the unnormalized
00157     weight:
00158
00159     @f[
00160     Z w_i =
00161     \frac{c_{\rm initial}}
00162          {\frac{1}{N} + \frac{c_{\rm initial}}{c_{\rm final}}}
00163     @f]
00164
00165     with:
00166     - @f$N @f$ sample count of the estimator
00167     - @f$c_{\rm initial} @f$ Initial weight parameter; this is they weight if
00168       @f$N = 1 @f$ and @f$c_{\rm final} \gg c_{\rm initial} @f$
00169     - @f$c_{\rm final} @f$ Final weight parameter; this is they weight if
00170       @f$N \rightarrow \infty @f$
00171
00172     For the move value, @f$c_{\rm bias} = 0 @f$, and the variance can become
00173     part of the normalization constant, so the weight is simply
00174     @f$N_{\rm move} @f$.
00175     If no estimator has a sample count yet, the first-play-urgency parameter
00176     is used for the value estimate.
00177 */
00178
00179 /** @page sguctsearchproven Proven Nodes in SgUctSearch
00180
00181     When expanding a node (or computing knowledge at a node) it is
00182     possible the derived class can prove that the state corresponding
00183     to the node is a proven win or loss. Note this differs from a
00184     terminal node since a terminal node by definition has no children,
00185     whereas a proven node can have any number of children.
00186
00187     Any in-tree phase that encounters a proven node is immediately
00188     terminated and the result passed back up the tree just as if a
00189     playout (with the value corresponding to the proven value) had
00190     been performed.
00191
00192     The existence of proven nodes in a SgUctTree does not change the
00193     move selection in any way. For example: a node with both winning
00194     and losing children will continue to select children without
00195     regard to their proven value. This means losing children will not
00196     be avoided and winning children will not be chosen except as a
00197     result of the playouts recorded as those children are visisted
00198     (the values of which would now be perfectly accurate).
00199
00200     Proven values are not currently passed up the tree like in a
00201     min-max search. It would be possible to pass back winning moves,
00202     that is, mark the parent of winning moves as winning, but it is
00203     not possible mark a node as losing if all its moves are
00204     losing. This is because it is possible progressive widening (or
00205     other heuristics) are being used by the derived state and in such
00206     a case it may not be true that all the current children being
00207     losses implies the parent state is a loss.
00208 */
00209
00210 //----------------------------------------------------------------------------
00211
00212 typedef SgStatistics<float,std::size_t> SgUctStatistics;
00213
00214 typedef SgStatisticsExt<float,std::size_t> SgUctStatisticsExt;
00215
00216 //----------------------------------------------------------------------------
00217
00218 /** Game result, sequence and nodes of one Monte-Carlo game in SgUctSearch.
00219     @ingroup sguctgroup
00220 */
00221 struct SgUctGameInfo
00222 {
00223     /** The game result of the playout(s).
00224         The result is from the view of the player at the root.
00225     */
00226     std::vector<float> m_eval;
00227
00228     /** The sequence of the in-tree phase. */
00229     std::vector<SgMove> m_inTreeSequence;
00230
00231     /** The sequence of the playout(s).
00232         For convenient usage, they also include the moves from
00233         m_inTreeSequence, even if they are the same for each playout.
00234     */
00235     std::vector<std::vector<SgMove> > m_sequence;
00236
00237     /** Was the playout aborted due to maxGameLength (stored for each
00238         playout).
00239     */
00240     std::vector<bool> m_aborted;
00241
00242     /** Nodes visited in the in-tree phase. */
00243     std::vector<const SgUctNode*> m_nodes;
00244
00245     /** Flag to skip RAVE update for moves of the playout(s).
00246         For convenient usage, the index corresponds to the move number from
00247         the root position on, even if the flag is currently only used for
00248         moves in the playout phase, so the flag is false for all moves in the
00249         in-tree phase.
00250     */
00251     std::vector<std::vector<bool> > m_skipRaveUpdate;
00252
00253     void Clear(std::size_t numberPlayouts);
00254 };
00255
00256 //----------------------------------------------------------------------------
00257
00258 /** Move selection strategy after search is finished.
00259     @ingroup sguctgroup
00260 */
00261 enum SgUctMoveSelect
00262 {
00263     /** Select move with highest mean value. */
00264     SG_UCTMOVESELECT_VALUE,
00265
00266     /** Select move with highest count. */
00267     SG_UCTMOVESELECT_COUNT,
00268
00269     /** Use UCT bound (or combined bound if RAVE is enabled). */
00270     SG_UCTMOVESELECT_BOUND,
00271
00272     /** Use the weighted sum of UCT and RAVE value (but without
00273         bias terms).
00274     */
00275     SG_UCTMOVESELECT_ESTIMATE
00276 };
00277
00278 //----------------------------------------------------------------------------
00279
00280 /** Base class for the thread state.
00281     Subclasses must be thread-safe, it must be possible to use different
00282     instances of this class in different threads (after construction, the
00283     constructor does not need to be thread safe). Beware not to use classes
00284     that are not thread-safe, because they use global variables
00285     (e.g. SgRandom::Global())
00286     @note Technically it is possible to use a non-thread safe implementation
00287     of subclasses, as long as the search is run with only one thread.
00288     @ingroup sguctgroup
00289 */
00291 {
00292 public:
00293     /** Number of the thread between 0 and SgUctSearch::NumberThreads() - 1 */
00295
00296     bool m_isSearchInitialized;
00297
00298     /** Flag indicating the a node could not be expanded, because the
00299         maximum tree size was reached.
00300     */
00301     bool m_isTreeOutOfMem;
00302
00303     SgUctGameInfo m_gameInfo;
00304
00305     /** Local variable for SgUctSearch::UpdateRaveValues().
00306         Reused for efficiency. Stores the first time a move was played
00307         by the color to play at the root position (move is used as an index,
00308         do m_moveRange must be > 0); numeric_limits<size_t>::max(), if the
00309         move was not played.
00310     */
00311     boost::scoped_array<std::size_t> m_firstPlay;
00312
00313     /** Local variable for SgUctSearch::UpdateRaveValues().
00314         Like m_firstPlayToPlay, but for opponent color.
00315     */
00316     boost::scoped_array<std::size_t> m_firstPlayOpp;
00317
00318     /** Local variable for SgUctSearch::PlayInTree().
00319         Reused for efficiency.
00320     */
00321     std::vector<SgMoveInfo> m_moves;
00322
00323     /** Local variable for SgUctSearch::CheckCountAbort().
00324         Reused for efficiency.
00325     */
00326     std::vector<SgMove> m_excludeMoves;
00327
00328     /** Thread's counter for Randomized Rave in SgUctSearch::SelectChild().
00329      */
00330     int m_randomizeCounter;
00331
00333
00335
00336     /** @name Pure virtual functions */
00337     // @{
00338
00339     /** Evaluate end-of-game position.
00340         Will only be called if GenerateAllMoves() or GeneratePlayoutMove()
00341         returns no moves. Should return larger values if position is better
00342         for the player to move.
00343     */
00344     virtual float Evaluate() = 0;
00345
00346     /** Execute a move.
00347         @param move The move
00348      */
00349     virtual void Execute(SgMove move) = 0;
00350
00351     /** Execute a move in the playout phase.
00352         For optimization if the subclass uses uses a different game state
00353         representation in the playout phase. Otherwise the function can be
00354         implemented in the subclass by simply calling Execute().
00355         @param move The move
00356      */
00357     virtual void ExecutePlayout(SgMove move) = 0;
00358
00359     /** Generate moves.
00360         Moves will be explored in the order of the returned list.
00361         If return is true, trees under children will be deleted.
00362         @param count Number of times node has been visited. For knowledge-
00363         based computations.
00364         @param[out] moves The generated moves or empty list at end of game
00365     */
00366     virtual bool GenerateAllMoves(std::size_t count,
00367                                   std::vector<SgMoveInfo>& moves,
00368                                   SgProvenNodeType& provenType) = 0;
00369
00370     /** Generate random move.
00371         Generate a random move in the play-out phase (outside the UCT tree).
00372         @param[out] skipRaveUpdate This value should be set to true, if the
00373         move should be excluded from RAVE updates. Otherwise it can be
00374         ignored.
00375         @return The move or SG_NULLMOVE at the end of the game.
00376     */
00377     virtual SgMove GeneratePlayoutMove(bool& skipRaveUpdate) = 0;
00378
00379     /** Start search.
00380         This function should do any necessary preparations for playing games
00381         in the thread, like initializing the thread's copy of the game state
00382         from the global game state. The function does not have to be
00384     */
00385     virtual void StartSearch() = 0;
00386
00387     /** Take back moves played in the in-tree phase. */
00388     virtual void TakeBackInTree(std::size_t nuMoves) = 0;
00389
00390     /** Take back moves played in the playout phase.
00391         The search engine does not assume that the moves are really taken back
00392         after this function is called. If the subclass implements the playout
00393         in s separate state, which is initialized in StartPlayout() and does
00394         not support undo, the implementation of this function can be left
00395         empty in the subclass.
00396     */
00397     virtual void TakeBackPlayout(std::size_t nuMoves) = 0;
00398
00399     // @} // name
00400
00401
00402     /** @name Virtual functions */
00403     // @{
00404
00405     /** Function that will be called by PlayGame() before the game.
00406         Default implementation does nothing.
00407     */
00408     virtual void GameStart();
00409
00410     /** Function that will be called at the beginning of the playout phase.
00411         Will be called only once (not once per playout!). Can be used for
00412         example to save some state of the current position for more efficient
00413         implementation of TakeBackPlayout().
00414         Default implementation does nothing.
00415     */
00416     virtual void StartPlayouts();
00417
00418     /** Function that will be called at the beginning of each playout.
00419         Default implementation does nothing.
00420     */
00421     virtual void StartPlayout();
00422
00423     /** Function that will be called after each playout.
00424         Default implementation does nothing.
00425     */
00426     virtual void EndPlayout();
00427
00428     // @} // name
00429 };
00430
00431 //----------------------------------------------------------------------------
00432
00433 class SgUctSearch;
00434
00435 /** Create game specific thread state.
00437     @ingroup sguctgroup
00438 */
00440 {
00441 public:
00443
00445                                      const SgUctSearch& search) = 0;
00446 };
00447
00448 //----------------------------------------------------------------------------
00449
00450 /** Statistics of the last search performed by SgUctSearch. */
00451 struct SgUctSearchStat
00452 {
00453     double m_time;
00454
00455     /** Number of nodes for which the knowledge threshold was exceeded. */
00456     std::size_t m_knowledge;
00457
00458     /** Games per second.
00459         Useful values only if search time is higher than resolution of
00460         SgTime::Get().
00461     */
00462     double m_gamesPerSecond;
00463
00464     SgUctStatisticsExt m_gameLength;
00465
00466     SgUctStatisticsExt m_movesInTree;
00467
00468     SgUctStatistics m_aborted;
00469
00470     void Clear();
00471
00472     void Write(std::ostream& out) const;
00473 };
00474
00475 //----------------------------------------------------------------------------
00476
00477 /** Optional parameters to SgUctSearch::Search() to allow early aborts.
00478     If early abort is used, the search will be aborted after a fraction of the
00479     resources (max time, max nodes) are spent, if the value is a clear win
00480     (above a threshold).
00481 */
00482 struct SgUctEarlyAbortParam
00483 {
00484     /** The threshold to define what a clear win is. */
00485     float m_threshold;
00486
00487     /** The minimum number of games to allow an early abort.
00488         For a very low number of simulations, the value can be very
00489         unreliable.
00490     */
00491     std::size_t m_minGames;
00492
00493     /** The inverse fraction of the total resources (max time, max nodes),
00494         after which the early abort check is performed.
00495     */
00496     int m_reductionFactor;
00497 };
00498
00499 //----------------------------------------------------------------------------
00500
00501 /** Monte Carlo tree search using UCT.
00502     The evaluation function is assumed to be in <code>[0..1]</code> and
00503     inverted with <code>1 - eval</code>.
00504     @ingroup sguctgroup
00505 */
00506 class SgUctSearch
00507 {
00508 public:
00509     static float InverseEval(float eval);
00510
00511     /** Constructor.
00513         Can be null and set later (before using the search) with
00514         SetThreadStateFactory to allow multi-step construction.
00515         @param moveRange Upper integer limit (exclusive) used for move
00516         representation. Certain enhancements of SgUctSearch (e.g. Rave())
00517         need to store data in arrays using the move as an index for
00518         efficient implementation. If the game does not use a small integer
00519         range for its move representation, this parameter should be 0.
00520         Then, enhancements that require a small move range cannot be enabled.
00521     */
00523                 int moveRange = 0);
00524
00525     virtual ~SgUctSearch();
00526
00528
00529     /** @name Pure virtual functions */
00530     // @{
00531
00532     /** Convert move to string (game dependent).
00533         This function needs to be thread-safe.
00534     */
00535     virtual std::string MoveString(SgMove move) const = 0;
00536
00537     /** Evaluation value to use if evaluation is unknown.
00538         This value will be used, if games are aborted, because they exceed
00539         the maximum game length.
00540         This function needs to be thread-safe.
00541     */
00542     virtual float UnknownEval() const = 0;
00543
00544     // @} // name
00545
00546
00547     /** @name Virtual functions */
00548     // @{
00549
00550     /** Hook function that will be called by Search() after each game.
00551         Default implementation does nothing.
00552         This function does not need to be thread-safe.
00553         @param gameNumber The number of this iteration.
00555         @param info The game info of the thread which played this iteration.
00556         @warning If LockFree() is enabled, this function will be called from
00557         multiple threads without locking. The subclass should handle this
00558         case appropriately by using its own lock or disabling functionality
00559         that will not work without locking.
00560     */
00562                                    const SgUctGameInfo& info);
00563
00564     /** Hook function that will be called by StartSearch().
00565         Default implementation calls m_mpiSynchronizer.StartSearch().
00566         This function does not need to be thread-safe.
00567     */
00568     virtual void OnStartSearch();
00569
00570     /** Hook function that will be called when search
00571     completes.  Default implementation calls
00572     m_mpiSynchronizer.EndSearch().  This function
00573     does not need to be thread-safe.
00574     */
00575     virtual void OnEndSearch();
00576
00578
00580
00581     virtual std::size_t GamesPlayed() const;
00582
00583     // @} // name
00584
00585
00586     /** @name Search functions */
00587     // @{
00588
00589     /** Get a list of all generated moves.
00590         Sets up thread state 0 for a seach and calls GenerateAllMoves
00592     */
00593     void GenerateAllMoves(std::vector<SgMoveInfo>& moves);
00594
00595     /** Play a single game.
00596         Plays a single game using the thread state of the first thread.
00597         Call StartSearch() before calling this function.
00598     */
00599     void PlayGame();
00600
00601     /** Start search.
00602         Initializes search for current position and clears statistics.
00603         @param rootFilter Moves to filter at the root node
00604         @param initTree The tree to initialize the search with. 0 for no
00605         initialization. The trees are actually swapped, not copied.
00606     */
00607     void StartSearch(const std::vector<SgMove>& rootFilter
00608                      = std::vector<SgMove>(),
00609                      SgUctTree* initTree = 0);
00610
00611     void EndSearch();
00612
00613     /** Calls StartSearch() and then maxGames times PlayGame().
00614         @param maxGames The maximum number of games (greater or equal one).
00615         @param maxTime The maximum time in seconds.
00616         @param[out] sequence The move sequence with the best value.
00617         @param rootFilter Moves to filter at the root node
00618         @param initTree The tree to initialize the search with. 0 for no
00619         initialization. The trees are actually swapped, not copied.
00620         @param earlyAbort See SgUctEarlyAbortParam. Null means not to do an
00621         early abort.
00622         @return The value of the root position.
00623     */
00624     float Search(std::size_t maxGames, double maxTime,
00625                  std::vector<SgMove>& sequence,
00626                  const std::vector<SgMove>& rootFilter
00627                  = std::vector<SgMove>(),
00628                  SgUctTree* initTree = 0,
00629                  SgUctEarlyAbortParam* earlyAbort = 0);
00630
00631     /** Do a one-ply Monte-Carlo search instead of the UCT search.
00632         @param maxGames
00633         @param maxTime
00634         @param[out] value The value of the best move
00635     */
00636     SgPoint SearchOnePly(size_t maxGames, double maxTime, float& value);
00637
00638     /** Find child node with best move.
00639         @param node The father node.
00640         @param excludeMoves Optional list of moves to ignore in the children
00641         nodes.
00642         @return The best child or 0 if no child nodes exists.
00643     */
00644     const SgUctNode*
00645     FindBestChild(const SgUctNode& node,
00646                   const std::vector<SgMove>* excludeMoves = 0) const;
00647
00648     /** Extract sequence of best moves from root node.
00649         @param[out] sequence The resulting sequence.
00650     */
00651     void FindBestSequence(std::vector<SgMove>& sequence) const;
00652
00653     /** Return the bound of a move.
00654         This is the bound that was used for move selection. It can be the
00655         pure UCT bound or the combined bound if RAVE is used.
00656         @param useRave Whether rave should be used or not.
00657         @param node The node corresponding to the position
00658         @param child The node corresponding to the move
00659     */
00660     float GetBound(bool useRave, const SgUctNode& node,
00661                    const SgUctNode& child) const;
00662
00663     // @} // name
00664
00665
00666     /** @name Search data */
00667     // @{
00668
00669     /** Info for last game.
00670         Returns the last game info of the first thread.
00671         This function is not thread-safe.
00672     */
00673     const SgUctGameInfo& LastGameInfo() const;
00674
00675     /** One-line summary text for last game.
00676         Contains: move, count and mean for all nodes; result
00677         Returns the last game info of the first thread.
00678         This function is not thread-safe.
00679     */
00680     std::string LastGameSummaryLine() const;
00681
00682     /** See parameter earlyAbort in Search() */
00683     bool WasEarlyAbort() const;
00684
00685     const SgUctTree& Tree() const;
00686
00687     /** Get temporary tree.
00688         Returns a tree that is compatible in size and number of allocators
00689         to the tree of the search. This tree is used by the search itself as
00690         a temporary tree for certain operations during the search, but can be
00691         used by other code while the search is not running.
00692     */
00693     SgUctTree& GetTempTree();
00694
00695     // @} // name
00696
00697
00698     /** @name Parameters */
00699     // @{
00700
00701     /** Constant c in the bias term.
00702         This constant corresponds to 2 C_p in the original UCT paper.
00703         The default value is 0.7, which works well in 9x9 Go.
00704     */
00705     float BiasTermConstant() const;
00706
00707     /** See BiasTermConstant() */
00708     void SetBiasTermConstant(float biasTermConstant);
00709
00710     /** Points at which to recompute children.
00711         Specifies the number of visits at which GenerateAllMoves() is
00712         called again at the node. This is to allow multiple knowledge
00713         computations to occur as a node becomes more
00714         important. Returned move info will be merged with info in the
00715         tree. This is can be used prune, add children, give a bonus to
00716         a move, etc.
00717     */
00718     std::vector<std::size_t> KnowledgeThreshold() const;
00719
00720     /** See KnowledgeThreshold() */
00721     void SetKnowledgeThreshold(const std::vector<std::size_t>& counts);
00722
00723     /** Maximum number of nodes in the tree.
00724         @note The search owns two trees, one of which is used as a temporary
00725         tree for some operations (see GetTempTree()). This functions sets
00726         the maximum number of nodes per tree.
00727     */
00728     std::size_t MaxNodes() const;
00729
00730     /** See MaxNodes()
00731         @param maxNodes Maximum number of nodes (>= 1)
00732     */
00733     void SetMaxNodes(std::size_t maxNodes);
00734
00735     /** The number of threads to use during the search. */
00737
00740
00741     /** Lock-free usage of multi-threaded search.
00742         @ref sguctsearchlockfree
00743     */
00744     bool LockFree() const;
00745
00746     /** See LockFree() */
00747     void SetLockFree(bool enable);
00748
00749     /** See SetRandomizeRaveFrequency() */
00750     int RandomizeRaveFrequency() const;
00751
00752     /** Randomly turns off rave with given frequency -
00753         once in every frequency child selections.
00754         Helps in rare cases where rave ignores the best move.
00755         Set frequency to 0 to switch it off.
00756     */
00757     void SetRandomizeRaveFrequency(int frequency);
00758
00759     /** Don't update RAVE value if opponent played the same move first.
00760         Default is false (since it depends on the game and move
00761         representation, if it should be used).
00762     */
00763     bool RaveCheckSame() const;
00764
00765     /** See RaveCheckSame() */
00766     void SetRaveCheckSame(bool enable);
00767
00768     /** First play urgency.
00769         The value for unexplored moves. According to UCT they should
00770         always be preferred to moves, that have been played at least once.
00771         According to the MoGo tech report, it can be useful to use smaller
00772         values (as low as 1) to encorouge early exploitation.
00773         Default value is 10000.
00774         @see @ref sguctsearchweights
00775     */
00776     float FirstPlayUrgency() const;
00777
00778     /** See FirstPlayUrgency() */
00779     void SetFirstPlayUrgency(float firstPlayUrgency);
00780
00781     /** Log one-line summary for each game during Search() to a file.
00782         @todo File name still is hard-coded to "uctsearch.log"
00783     */
00784     bool LogGames() const;
00785
00786     /** See LogGames() */
00787     void SetLogGames(bool enable);
00788
00789     /** Maximum game length.
00790         If the number of moves in a game exceed this length, it will be
00791         counted as a loss.
00792         The default is @c numeric_limits<size_t>::max()
00793     */
00794     std::size_t MaxGameLength() const;
00795
00796     /** See MaxGameLength() */
00797     void SetMaxGameLength(std::size_t maxGameLength);
00798
00799     /** Required number of simulations to expand a node in the tree.
00800         The default is 2, which means a node will be expanded on the second
00801         visit.
00802     */
00803     std::size_t ExpandThreshold() const;
00804
00805     /** See ExpandThreshold() */
00806     void SetExpandThreshold(std::size_t expandThreshold);
00807
00808     /** The number of playouts per simulated game.
00810         Default is 1.
00811     */
00812     std::size_t NumberPlayouts() const;
00813
00814     void SetNumberPlayouts(std::size_t n);
00815
00816     /** Use the RAVE algorithm (Rapid Action Value Estimation).
00817         See Gelly, Silver 2007 in the references in the class description.
00818         In difference to the original description of the RAVE algorithm,
00819         no "RAVE bias term" is used. The estimated value of a move is the
00820         weighted mean of the move value and the RAVE value and then a
00821         single UCT-like bias term is added.
00822         @see RaveWeightFunc
00823     */
00824     bool Rave() const;
00825
00826     /** See Rave() */
00827     void SetRave(bool enable);
00828
00829     /** See SgUctMoveSelect */
00830     SgUctMoveSelect MoveSelect() const;
00831
00832     /** See SgUctMoveSelect */
00833     void SetMoveSelect(SgUctMoveSelect moveSelect);
00834
00835     /** See @ref sguctsearchweights. */
00836     float RaveWeightInitial() const;
00837
00838     /** See @ref sguctsearchweights. */
00839     void SetRaveWeightInitial(float value);
00840
00841     /** See @ref sguctsearchweights. */
00842     float RaveWeightFinal() const;
00843
00844     /** See @ref sguctsearchweights. */
00845     void SetRaveWeightFinal(float value);
00846
00848         Gives more weight to moves that are closer to the position for which
00849         the RAVE statistics are stored. The weighting function is linearly
00850         decreasing from 2 to 0 with the move number from the position for
00851         which the RAVE statistics are stored to the end of the simulated game.
00852     */
00854
00857
00858     /** Whether search uses virtual loss. */
00859     bool VirtualLoss() const;
00860
00861     /** See VirtualLoss() */
00862     void SetVirtualLoss(bool enable);
00863
00864     /** Prune nodes with low counts if tree is full.
00865         This will prune nodes below a minimum count, if the tree gets full
00866         during a search. The minimum count is PruneMinCount() at the beginning
00867         of the search and is doubled every time a pruning operation does not
00868         reduce the tree by at least a factor of 2. The creation of the pruned
00869         tree uses the temporary tree (see GetTempTree()).
00870     */
00871     bool PruneFullTree() const;
00872
00873     /** See PruneFullTree() */
00874     void SetPruneFullTree(bool enable);
00875
00876     /** See PruneFullTree() */
00877     std::size_t PruneMinCount() const;
00878
00879     /** See PruneFullTree() */
00880     void SetPruneMinCount(std::size_t n);
00881
00882
00883     void SetMpiSynchronizer(const SgMpiSynchronizerHandle &synchronizerHandle);
00884
00885     SgMpiSynchronizerHandle MpiSynchronizer();
00886
00887     const SgMpiSynchronizerHandle MpiSynchronizer() const;
00888
00889     // @} // name
00890
00891
00892     /** @name Statistics */
00893     // @{
00894
00895     const SgUctSearchStat& Statistics() const;
00896
00897     void WriteStatistics(std::ostream& out) const;
00898
00899     // @} // name
00900
00901     /** Get state of one of the threads.
00903     */
00905
00907         The threads are created at the beginning of the first search
00908         (to allow multi-step construction with setting the policy after
00909         the constructor call).
00910     */
00912
00914         The threads are created at the beginning of the first search
00915         (to allow multi-step construction with setting the policy after
00916         the constructor call). This function needs to be called explicitely
00917         only, if a thread state is going to be used before the first search.
00918     */
00920
00921 private:
00922     typedef boost::recursive_mutex::scoped_lock GlobalLock;
00923
00925
00927     {
00928     public:
00930
00932
00934
00935         void StartPlay();
00936
00937         void WaitPlayFinished();
00938
00939     private:
00940         /** Copyable function object that invokes Thread::operator().
00941             Needed because the the constructor of boost::thread copies the
00942             function object argument.
00943         */
00944         class Function
00945         {
00946         public:
00948
00949             void operator()();
00950
00951         private:
00953         };
00954
00956
00957         SgUctSearch& m_search;
00958
00959         bool m_quit;
00960
00962
00963         boost::mutex m_startPlayMutex;
00964
00965         boost::mutex m_playFinishedMutex;
00966
00967         boost::condition m_startPlay;
00968
00969         boost::condition m_playFinished;
00970
00971         boost::mutex::scoped_lock m_playFinishedLock;
00972
00973         GlobalLock m_globalLock;
00974
00976             Order dependency: must be constructed as the last member, because
00977             the constructor starts the thread.
00978         */
00980
00981         void operator()();
00982
00983         void PlayGames();
00984     };
00985
00987
00988     /** See LogGames() */
00989     bool m_logGames;
00990
00991     /** See Rave() */
00992     bool m_rave;
00993
00994     /** See KnowledgeThreshold() */
00995     std::vector<std::size_t> m_knowledgeThreshold;
00996
00997     /** Flag indicating that the search was terminated because the maximum
00998         time or number of games was reached.
00999     */
01000     volatile bool m_aborted;
01001
01002     volatile bool m_isTreeOutOfMemory;
01003
01004     std::auto_ptr<boost::barrier> m_searchLoopFinished;
01005
01006     /** See SgUctEarlyAbortParam. */
01007     bool m_wasEarlyAbort;
01008
01009     /** See SgUctEarlyAbortParam.
01010         The auto pointer is empty, if no early abort is used.
01011     */
01012     std::auto_ptr<SgUctEarlyAbortParam> m_earlyAbort;
01013
01014     /** See SgUctMoveSelect */
01015     SgUctMoveSelect m_moveSelect;
01016
01017     /** See RaveCheckSame() */
01018     bool m_raveCheckSame;
01019
01020     /** See SetRandomizeRaveFrequency() */
01021     int m_randomizeRaveFrequency;
01022
01023     /** See LockFree() */
01024     bool m_lockFree;
01025
01028
01029     /** See PruneFullTree() */
01030     bool m_pruneFullTree;
01031
01034
01035     /** See NumberPlayouts() */
01036     std::size_t m_numberPlayouts;
01037
01038     /** See MaxNodes() */
01039     std::size_t m_maxNodes;
01040
01041     /** See PruneMinCount() */
01042     std::size_t m_pruneMinCount;
01043
01044     /** See parameter moveRange in constructor */
01045     const int m_moveRange;
01046
01047     /** See MaxGameLength() */
01048     std::size_t m_maxGameLength;
01049
01050     /** See ExpandThreshold() */
01051     std::size_t m_expandThreshold;
01052
01053     /** Number of games limit for the current search. */
01054     std::size_t m_maxGames;
01055
01056     /** Number of games played in the current search. */
01057     std::size_t m_numberGames;
01058
01059     std::size_t m_startRootMoveCount;
01060
01061     /** Interval in number of games in which to check time abort.
01062         Avoids that the potentially expensive SgTime::Get() is called after
01063         every game. The interval is updated dynamically according to the
01064         current games/sec, such that it is called ten times per second
01065         (if the total search time is at least one second, otherwise ten times
01066         per total maximum search time)
01067     */
01068     std::size_t m_checkTimeInterval;
01069
01070     double m_lastScoreDisplayTime;
01071
01072     /** See BiasTermConstant() */
01073     float m_biasTermConstant;
01074
01075     /** See FirstPlayUrgency() */
01076     float m_firstPlayUrgency;
01077
01078     /** See @ref sguctsearchweights. */
01079     float m_raveWeightInitial;
01080
01081     /** See @ref sguctsearchweights. */
01082     float m_raveWeightFinal;
01083
01084     /** 1 / m_raveWeightInitial precomputed for efficiency */
01085     double m_raveWeightParam1;
01086
01087     /** m_raveWeightInitial / m_raveWeightFinal precomputed for efficiency */
01088     double m_raveWeightParam2;
01089
01090     /** Time limit for current search. */
01091     double m_maxTime;
01092
01093     /** See VirtualLoss() */
01094     bool m_virtualLoss;
01095
01096     std::string m_logFileName;
01097
01098     SgTimer m_timer;
01099
01100     SgUctTree m_tree;
01101
01102     /** See GetTempTree() */
01103     SgUctTree m_tempTree;
01104
01105     /** See parameter rootFilter in function Search() */
01106     std::vector<SgMove> m_rootFilter;
01107
01108     std::ofstream m_log;
01109
01110     /** Mutex for protecting global variables during multi-threading.
01111         Currently, only the play-out phase of games is thread safe, therefore
01112         this lock is always locked elsewhere (in-tree phase, updating of
01113         values and statistics, etc.)
01114     */
01115     boost::recursive_mutex m_globalMutex;
01116
01117     SgUctSearchStat m_statistics;
01118
01120         The elements are owned by the vector (shared_ptr is only used because
01121         auto_ptr should not be used with standard containers)
01122     */
01124
01125 #if SG_UCTFASTLOG
01126     SgFastLog m_fastLog;
01127 #endif
01128
01129     boost::shared_ptr<SgMpiSynchronizer> m_mpiSynchronizer;
01130
01131
01132     void ApplyRootFilter(std::vector<SgMoveInfo>& moves);
01133
01135
01136     bool CheckEarlyAbort() const;
01137
01139                          std::size_t remainingGames) const;
01140
01141     void Debug(const SgUctThreadState& state, const std::string& textLine);
01142
01144
01145     void ExpandNode(SgUctThreadState& state, const SgUctNode& node);
01146
01147     void CreateChildren(SgUctThreadState& state, const SgUctNode& node,
01148                         bool deleteChildTrees);
01149
01150     float GetBound(bool useRave, float logPosCount,
01151                    const SgUctNode& child) const;
01152
01153     float GetValueEstimate(bool useRave, const SgUctNode& child) const;
01154
01155     float GetValueEstimateRave(const SgUctNode& child) const;
01156
01157     float Log(float x) const;
01158
01159     bool NeedToComputeKnowledge(const SgUctNode* current);
01160
01161     void PlayGame(SgUctThreadState& state, GlobalLock* lock);
01162
01163     bool PlayInTree(SgUctThreadState& state, bool& isTerminal);
01164
01165     bool PlayoutGame(SgUctThreadState& state, std::size_t playout);
01166
01167     void PrintSearchProgress(double currTime) const;
01168
01169     void SearchLoop(SgUctThreadState& state, GlobalLock* lock);
01170
01171     const SgUctNode& SelectChild(int& randomizeCounter, const SgUctNode& node);
01172
01173     std::string SummaryLine(const SgUctGameInfo& info) const;
01174
01175     void UpdateCheckTimeInterval(double time);
01176
01177     void UpdateDynRaveBias();
01178
01180
01181     void UpdateRaveValues(SgUctThreadState& state, std::size_t playout);
01182
01183     void UpdateRaveValues(SgUctThreadState& state, std::size_t playout,
01184                           float eval, std::size_t i,
01185                           const std::size_t firstPlay[],
01186                           const std::size_t firstPlayOpp[]);
01187
01189
01190     void UpdateTree(const SgUctGameInfo& info);
01191 };
01192
01193 inline float SgUctSearch::BiasTermConstant() const
01194 {
01195     return m_biasTermConstant;
01196 }
01197
01198 inline std::size_t SgUctSearch::ExpandThreshold() const
01199 {
01200     return m_expandThreshold;
01201 }
01202
01203 inline float SgUctSearch::FirstPlayUrgency() const
01204 {
01205     return m_firstPlayUrgency;
01206 }
01207
01208 inline float SgUctSearch::InverseEval(float eval)
01209 {
01210     return (1 - eval);
01211 }
01212
01213 inline bool SgUctSearch::LockFree() const
01214 {
01215     return m_lockFree;
01216 }
01217
01218 inline const SgUctGameInfo& SgUctSearch::LastGameInfo() const
01219 {
01221 }
01222
01223 inline bool SgUctSearch::LogGames() const
01224 {
01225     return m_logGames;
01226 }
01227
01228 inline std::size_t SgUctSearch::MaxGameLength() const
01229 {
01230     return m_maxGameLength;
01231 }
01232
01233 inline std::size_t SgUctSearch::MaxNodes() const
01234 {
01235     return m_maxNodes;
01236 }
01237
01238 inline SgUctMoveSelect SgUctSearch::MoveSelect() const
01239 {
01240     return m_moveSelect;
01241 }
01242
01244 {
01246 }
01247
01248 inline std::size_t SgUctSearch::NumberPlayouts() const
01249 {
01250     return m_numberPlayouts;
01251 }
01252
01253 inline void SgUctSearch::PlayGame()
01254 {
01256 }
01257
01258 inline bool SgUctSearch::PruneFullTree() const
01259 {
01260     return m_pruneFullTree;
01261 }
01262
01263 inline std::size_t SgUctSearch::PruneMinCount() const
01264 {
01265     return m_pruneMinCount;
01266 }
01267
01268 inline bool SgUctSearch::Rave() const
01269 {
01270     return m_rave;
01271 }
01272
01273 inline bool SgUctSearch::RaveCheckSame() const
01274 {
01275     return m_raveCheckSame;
01276 }
01277
01278 inline float SgUctSearch::RaveWeightInitial() const
01279 {
01280     return m_raveWeightInitial;
01281 }
01282
01283 inline float SgUctSearch::RaveWeightFinal() const
01284 {
01285     return m_raveWeightFinal;
01286 }
01287
01288 inline void SgUctSearch::SetBiasTermConstant(float biasTermConstant)
01289 {
01290     m_biasTermConstant = biasTermConstant;
01291 }
01292
01293 inline void SgUctSearch::SetExpandThreshold(std::size_t expandThreshold)
01294 {
01295     SG_ASSERT(expandThreshold >= 1);
01296     m_expandThreshold = expandThreshold;
01297 }
01298
01299 inline void SgUctSearch::SetFirstPlayUrgency(float firstPlayUrgency)
01300 {
01301     m_firstPlayUrgency = firstPlayUrgency;
01302 }
01303
01304 inline void SgUctSearch::SetLockFree(bool enable)
01305 {
01306     m_lockFree = enable;
01307 }
01308
01309 inline void SgUctSearch::SetLogGames(bool enable)
01310 {
01311     m_logGames = enable;
01312 }
01313
01314 inline void SgUctSearch::SetMaxGameLength(std::size_t maxGameLength)
01315 {
01316     m_maxGameLength = maxGameLength;
01317 }
01318
01319 inline void SgUctSearch::SetMaxNodes(std::size_t maxNodes)
01320 {
01321     m_maxNodes = maxNodes;
01323         m_tree.SetMaxNodes(m_maxNodes);
01324 }
01325
01326 inline void SgUctSearch::SetMoveSelect(SgUctMoveSelect moveSelect)
01327 {
01328     m_moveSelect = moveSelect;
01329 }
01330
01331 inline std::vector<std::size_t> SgUctSearch::KnowledgeThreshold() const
01332 {
01333     return m_knowledgeThreshold;
01334 }
01335
01336 inline void
01337 SgUctSearch::SetKnowledgeThreshold(const std::vector<std::size_t>& t)
01338 {
01339     m_knowledgeThreshold = t;
01340 }
01341
01342 inline void SgUctSearch::SetNumberPlayouts(std::size_t n)
01343 {
01344     SG_ASSERT(n >= 1);
01345     m_numberPlayouts = n;
01346 }
01347
01348 inline void SgUctSearch::SetPruneFullTree(bool enable)
01349 {
01350     m_pruneFullTree = enable;
01351 }
01352
01353 inline void SgUctSearch::SetPruneMinCount(std::size_t n)
01354 {
01355     m_pruneMinCount = n;
01356 }
01357
01358 inline void SgUctSearch::SetMpiSynchronizer(const SgMpiSynchronizerHandle
01359                                             &synchronizerHandle)
01360 {
01361     m_mpiSynchronizer = SgMpiSynchronizerHandle(synchronizerHandle);
01362 }
01363
01364 inline SgMpiSynchronizerHandle SgUctSearch::MpiSynchronizer()
01365 {
01366     return SgMpiSynchronizerHandle(m_mpiSynchronizer);
01367 }
01368
01369 inline const SgMpiSynchronizerHandle SgUctSearch::MpiSynchronizer() const
01370 {
01371     return SgMpiSynchronizerHandle(m_mpiSynchronizer);
01372 }
01373
01374 inline int SgUctSearch::RandomizeRaveFrequency() const
01375 {
01376     return m_randomizeRaveFrequency;
01377 }
01378
01379 inline void SgUctSearch::SetRandomizeRaveFrequency(int frequency)
01380 {
01381     m_randomizeRaveFrequency = frequency;
01382 }
01383
01384 inline void SgUctSearch::SetRaveCheckSame(bool enable)
01385 {
01386     m_raveCheckSame = enable;
01387 }
01388
01389 inline void SgUctSearch::SetRaveWeightFinal(float value)
01390 {
01391     m_raveWeightFinal = value;
01392 }
01393
01394 inline void SgUctSearch::SetRaveWeightInitial(float value)
01395 {
01396     m_raveWeightInitial = value;
01397 }
01398
01400 {
01402 }
01403
01404 inline bool SgUctSearch::VirtualLoss() const
01405 {
01406     return m_virtualLoss;
01407 }
01408
01409 inline void SgUctSearch::SetVirtualLoss(bool enable)
01410 {
01411     m_virtualLoss = enable;
01412 }
01413
01414 inline const SgUctSearchStat& SgUctSearch::Statistics() const
01415 {
01416     return m_statistics;
01417 }
01418
01420 {
01422 }
01423
01425 {
01428 }
01429
01430 inline const SgUctTree& SgUctSearch::Tree() const
01431 {
01432     return m_tree;
01433 }
01434
01435 inline bool SgUctSearch::WasEarlyAbort() const
01436 {
01437     return m_wasEarlyAbort;
01438 }
01439