![](https://d3ilqtpdwi981i.cloudfront.net/gOj0xWKKkVBGKzkBYdmkvCG3syY=/425x550/smart/https://bepress-attached-resources.s3.amazonaws.com/uploads/b0/11/98/b0119808-d4b6-4498-82df-540612a9a5b9/thumbnail_7eb845d0-808f-4a2e-89ee-e02342cb3af5.jpg)
We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares function approximation. We define an algorithm we call Least-Squares TD (LS TD) for which we prove probability-one convergence when it is used with a function approximator linear in the adjustable parameters. We then define a recursive version of this algorithm, Recursive Least-Squares TD (RLS TD). Although these new TD algorithms require more computation per time-step than do Sutton's TD(A) algorithms, they are more efficient in a statistical sense because they extract more information from training experiences. We describe a simulation experiment showing the substantial improvement in learning rate achieved by RLS TD in an example Markov prediction problem. To quantify this improvement, we introduce the TD error variance of a Markov chain, arc,, and experimentally conclude that the convergence rate of a TD algorithm depends linearly on ~ro. In addition to converging more rapidly, LS TD and RLS TD do not have control parameters, such as a learning rate parameter, thus eliminating the possibility of achieving poor performance by an unlucky choice of parameters.
Available at: http://works.bepress.com/andrew_barto/2/