I am a PhD candidate in Computing Science Department of University of Alberta. I am interested in web search and reinforcement learning.
Yao, H., Szepesvari, Cs., Sutton, R., and Bhatnagar,S. 2013. Universal Option Models. submitted.
Yao, H., Rafiei, D., and Sutton, R. 2013. A Study of Temporal Citation Count Prediction using Reinforcement Learning. submitted.
Yao, H. and Schuurmans, D. 2013. Reinforcement Ranking. arXiv:1303.5988.
Yao, H. 2012. MaxRank: Discovering and Leveraging the Most Valuable Links for Ranking. arxiv 1210.1626.
Yao, H. and Szepesvari, Cs. Approximate Policy Iteration with Linear Action Models. Twenty-Sixth Conference on Artificial Intelligence. AAAI. Toronto, Canada. 2012. pdf
Yao, H. Off-policy learning with linear action models: an efficient "One-Collection-For-All-Solution". In workshop on "Planning and Acting with Uncertain Models" at the 28th ICML, Bellevue, Washington, USA. 2011. pdf
Yao, H. Linear least-squares Dyna-style planning. Technical Report TR11-04, Department of Computing Science, University of Alberta. 2011.
Yao, H., Bhatnagar, S., and Diao, D. Multi-step linear Dyna-style planning. Advances in Neural Information Processing Systems (NIPS) 22, Vancouver, BC, Canada. 2009. retyped pdf
Yao, H., Bhatnagar, S., and Szepesvari, Cs. LMS-2: towards an algorithm that is as cheap as LMS and almost as efficient as RLS. The Forty-eighth IEEE Control and Decision Conference (CDC), Shanghai, China. December 2009.pdf
Yao, H., Sutton, R. S., Bhatnagar, S., Diao, D., and Szepesvari, Cs. Dyna(k): A multi-step Dyna planning. Abstraction in Reinforcement Learning. Montreal, Canada. June 2009. pdf
Yao, H., Bhatnagar, S., and Szepesvari, Cs. Temporal difference learning by direct preconditioning. Multidisciplinary Symposium on Reinforcement Learning (MSRL), Montreal, Canada. June 2009. pdf
Yao, H., and Liu, Z-Q. Preconditioned temporal difference learning. The 25th International Conference on Machine learning (ICML), Helsinki, Finland. June 2008. pdf
Yao, H., and Liu, Z-Q. Minimal residual approaches for policy evaluation in large sparse Markov chains. The Tenth International Symposium on Artificial Intelligence and Mathematics (ISAIM), Fort Lauderdale, USA. January 2008. pdf
fast algorithms for learning LAM from data
fast algorithms for approximate policy iteration with LAM
check out linear Dyna paper and LAM-API paper for details on LAM
The package plus two domains: LAMAPI.zip. Recently I also implemented a generalized algorithm of kernel embedding for approximate value iteration by Steffen Grunewalder et. al. Also, a new policy evaluation algorithm similar to LSPE was added. I will post the update soon.