There are several different ways to evaluate Loki's performance, however since self-play is the most controlled, results are easily measured. But, we must always be careful in interpreting the results (such as taking into account the interdependencies between players). With only self-play, it is difficult to measure Loki's strength. To help determine this, we use evidence gathered from play on IRC which suggests that it is better than the average human amateur (at least in multi-player scenarios). In particular, all of the evidence indicates that the addition of opponent modeling results in a significant increase in strength.
The performance gain from generic modeling is conclusive, however the further superiority of specific modeling is not so clear. This could be due to the granularity of the data gathering, or how the action frequencies are used in the re-weighting, or because the information is not used in the betting strategy. In fact, we strongly believe that the present ad hoc betting strategy, which was originally designed to allow us to quickly test other components, may be a bottleneck preventing further significant progress. This is witnessed in BotWarz with the relatively good performance of less sophisticated computer programs, which put more effort into designing a (rule-based) betting strategy. It is also evidenced by the ease with which strong human players can take advantage of Loki in short-handed play (too many head to head situations against better players).