In this third Part of Applying Temporal Difference Methods to Machine Learning, I will be experimenting with the intra-sequence update variant of TD learning. It is a method where after each time step, the parameters are updated rather than waiting at the end of the sequence.
In this post I detail my project for the course Reinforcement Learning (COMP767) taken at McGill, applying Temporal Difference (TD) methods in a Machine Learning setting.
This concept was first discussed by Sutton when he introduced this family of learning algorithms. I aim to go over what was discussed in the paper and see how it performs on a traditional machine learning problem.
In this part, I will be covering the concepts underlying this application.Read More »
Jupyter Notebook submitted as an assignment for the Reinforcement Learning class at McGill analyzing the theoretical and empirical performance of Expected Sarsa.
Read More »
Summary and notes on On-Policy First-Visit Monte Carlo method, based on course offered at McGill, in addition to chapter 5 of Richard S. Sutton and Andrew G. Barto, “Reinforcement learning: An introduction”, Second Edition, MIT Press.
Summary and notes of the first class of the Reinforcement Learning (RL) course offered at McGill held on January 6th, in addition to chapter 2 of Richard S. Sutton and Andrew G. Barto, “Reinforcement learning: An introduction”, Second Edition, MIT Press.