Hengshuai Yao

I am a PhD candidate in Computing Science Department of University of Alberta. I am interested in web search and reinforcement learning.


Yao, H.. and Szepesvari, Cs. 2014. Pseudo-MDPs and a New Method for Nonlinear Feature Construction. in preparation.

Lee, C., Yao, H., He, X., Su, C., and Chang, J-Y. 2014. A System to Predict Future Popularity: Learning to Classify. WWW,Seol,Korea.

Yao, H., Szepesvari, Cs., Sutton, R., and Bhatnagar,S. 2013. Universal Option Models. submitted.

Yao, H., Rafiei, D., and Sutton, R. 2013. A Study of Temporal Citation Count Prediction using Reinforcement Learning. accepted. IEEE Transactions on Systems, Man, and Cybernetics, Part B.

Yao, H. and Schuurmans, D. 2013. Reinforcement Ranking. arXiv:1303.5988.

Yao, H. 2012. MaxRank: Discovering and Leveraging the Most Valuable Links for Ranking. arxiv 1210.1626.

Yao, H. and Szepesvari, Cs. Approximate Policy Iteration with Linear Action Models. Twenty-Sixth Conference on Artificial Intelligence. AAAI. Toronto, Canada. 2012. pdf

Yao, H. Off-policy learning with linear action models: an efficient "One-Collection-For-All-Solution". In workshop on "Planning and Acting with Uncertain Models" at the 28th ICML, Bellevue, Washington, USA. 2011. pdf

Yao, H. Linear least-squares Dyna-style planning. Technical Report TR11-04, Department of Computing Science, University of Alberta. 2011.

Yao, H., Bhatnagar, S., and Diao, D. Multi-step linear Dyna-style planning. Advances in Neural Information Processing Systems (NIPS) 22, Vancouver, BC, Canada. 2009. retyped pdf

Yao, H., Bhatnagar, S., and Szepesvari, Cs. LMS-2: towards an algorithm that is as cheap as LMS and almost as efficient as RLS. The Forty-eighth IEEE Control and Decision Conference (CDC), Shanghai, China. December 2009.pdf

Yao, H., Sutton, R. S., Bhatnagar, S., Diao, D., and Szepesvari, Cs. Dyna(k): A multi-step Dyna planning. Abstraction in Reinforcement Learning. Montreal, Canada. June 2009. pdf

Yao, H., Bhatnagar, S., and Szepesvari, Cs. Temporal difference learning by direct preconditioning. Multidisciplinary Symposium on Reinforcement Learning (MSRL), Montreal, Canada. June 2009. pdf

Yao, H., and Liu, Z-Q. Preconditioned temporal difference learning. The 25th International Conference on Machine learning (ICML), Helsinki, Finland. June 2008. pdf

Yao, H., and Liu, Z-Q. Minimal residual approaches for policy evaluation in large sparse Markov chains. The Tenth International Symposium on Artificial Intelligence and Mathematics (ISAIM), Fort Lauderdale, USA. January 2008. pdf


Linear Action Model(LAM) for Reinforcement Learning

The package plus two domains: LAMAPI.zip. Recently I also implemented a generalized algorithm of kernel embedding for approximate value iteration by Steffen Grunewalder et. al. Also, a new policy evaluation algorithm similar to LSPE was added. I will post the update soon.


parsing URLs

Hadoop MapReduce


Data Sets

Mac OS



C++ books

Java books

An intelligent Reinforcement Learning Tetris Player



UA calendar