Note that this formula is nearly equivalent to the formula suggested by David Silver on the Computer Go mailing list for the case of two estimators (move value and RAVE value) and used in newer versions of MoGo. However, MoGo uses the measured variance of the current RAVE value (for both the move weight and RAVE weight) instead variance parameter constants.
The formula is then reformulated to use different constants that describe the initial steepness and final asymptotic value of the unnormalized weight:
with:
For the move value, , and the variance can become part of the normalization constant, so the weight is simply . If no estimator has a sample count yet, the first-play-urgency parameter is used for the value estimate.