Publications
Koop, A (2007). Investigating Experience: Temporal Coherence and Empirical Knowledge Representation. University of Alberta Master's thesis.
ABSTRACT:
This thesis investigates the idea of artificial intelligence as an agent
making sense of its experience, illustrating some of the benefits of
representing knowledge as predictions of future experience. Experience
is here defined as the temporal sequence of sensations and actions that are
the inputs and outputs of the agent. One characteristic of this sequence is
that it can have temporal coherence: what is experienced in a short
period of time is likely to be consistent. The first part of this thesis
examines how an agent with dynamic memory can take advantage of the temporal
coherence of its experience. Results in a simple prediction task and the more
complex problem of Computer Go show how such an agent can dramatically improve
on the performance of the best stationary solutions. The prediction task is
then used to illustrate how temporal coherence can provide a natural testbed
for meta-learning.
In the second part of the thesis, the frameworks of predictive representations and options are adapted for use in knowledge representation. The traditional approach to knowledge representation for artificial intelligence uses the framework of formal logic, in which knowledge is dissociated from experience. The knowledge representation presented here is defined in terms of experience, predictions and time. This kind of representation is defined in this thesis as an empirical knowledge representation. Using objects as a case study, the final chapter shows how an empirical knowledge representation makes it possible to represent even abstract concepts in terms of experience.
In the second part of the thesis, the frameworks of predictive representations and options are adapted for use in knowledge representation. The traditional approach to knowledge representation for artificial intelligence uses the framework of formal logic, in which knowledge is dissociated from experience. The knowledge representation presented here is defined in terms of experience, predictions and time. This kind of representation is defined in this thesis as an empirical knowledge representation. Using objects as a case study, the final chapter shows how an empirical knowledge representation makes it possible to represent even abstract concepts in terms of experience.
Sutton, R. S., Koop, A., Silver, D. (2007). On the Role of Tracking in Stationary Environments. In Proceedings of the 2007 International Conference on Machine Learning.
ABSTRACT: It is often thought that
learning algorithms that track the
best
solution, as opposed to converging to it, are important only on
nonstationary problems. We present three results suggesting that this
is not so. First we illustrate in a simple concrete example, the Black
and White problem, that tracking can perform better than any converging
algorithm on a stationary problem. Second, we show the same point on a
larger, more realistic problem, an application of temporal-difference
learning to computer Go. Our third result suggests that tracking in
stationary problems could be important for meta-learning research
(e.g., learning to learn, feature selection, transfer). We apply a
meta-learning algorithm for step-size adaptation, IDBD,e to the Black
and White problem, showing that meta-learning has a dramatic long-term
effect on performance whereas, on an analogous converging problem,
meta-learning has only a small second-order effect. This small result
suggests a way of eventually overcoming a major obstacle to
meta-learning research: the lack of an independent methodology for task
selection.
Tanner, B., Bulitko, V., Koop, A., Paduraru, C. (2007). Grounding Abstractions in Predictive State Representations. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 1077-1082.
ABSTRACT: This paper proposes a systematic approach of representing
abstract features in terms of low-level, subjective state
representations. We demonstrate that a mapping between the agent’s
predictive state representation and abstract features can be derived
automatically from high-level training data supplied by the designer.
Our empirical evaluation demonstrates that an experience-oriented state
representation built around a single-bit sensor can represent useful
abstract features such as “back against a wall”, “in a corner”, or “in
a room”. As a result, the agent gains virtual sensors that could be
used by its control policy.
Precup, D., Sutton, R. S., Paduraru, C., Koop, A., Singh, S. (2006). Off-policy Learning with Recognizers (online proceedings version, Nov 11 2005). Advances in Neural Information Processing Systems 18 (NIPS*05).
ABSTRACT: We introduce a new algorithm
for
off-policy temporal-difference learning with function approximation
that has much lower variance and requires less knowledge of the
behavior policy than prior methods. We develop the notion of a
recognizer, a filter on actions that distorts the behavior policy to
produce a related target policy with low-variance importance-sampling
corrections. We also consider target policies that are deviations from
the state distribution of the behavior policy, such as potential
temporally abstract options, which further reduces variance. This paper
introduces recognizers and their potential advantages, then develops a
full algorithm for MDPs and proves that its updates are in the same
direction as on-policy TD updates, which implies asymptotic
convergence. Our algorithm achieves this without knowledge of the
behavior policy or even requiring that there exists a behavior policy.
Sutton, R. S., Rafols, E. J., Koop, A. (2006). Temporal abstraction in temporal-difference networks (online proceedings version, Nov 11 2005). Advances in Neural Information Processing Systems 18 (NIPS*05).
ABSTRACT: Temporal-difference (TD)
networks have
been proposed as a way of representing and learning a wide variety of
predictions about the interaction between an agent and its environment
(Sutton & Tanner, 2005). These predictions are compositional in that their targets
are defined in terms of other predictions, and subjunctive
in that they are about what would happen if an action or sequence of
actions were taken. In conventional TD networks, the
inter-related predictions are at successive time steps and contingent
on a single action; here we generalize them to accommodate extended
time intervals and contingency on whole ways of behaving. Our
generalization is based on the options framework for temporal
abstraction (Sutton, Precup & Singh, 1999). The primary
contribution of this paper is to introduce a new algorithm for
intra-option learning in TD networks with function approximation and
eligibility traces. We present empirical examples of our
algorithm's effectiveness and of the greater representational
expressiveness of temporally-abstract TD networks.
Presentations and Misc.
Robotic MiniGolf The final class project for the CMPUT 608 class - a.k.a. trying to get the robot to beat the professor on five holes. The robot had a map of the room and the coordinates of the "dead zone", but had to find the course, ball, and hole within those coordinates. Then, of course, shoot the ball into the hole. I worked on the ball and hole detection.