0%

## Introduction

There are two kinds of methods in reinforcement learning, tabular methods and approximate methods. The purpose of RL is to get an optimal policy, which tells you to choose what action A when you at state S. If the state and aciton spaces are small enough, value fucntion can be represented as arrays, or tables. The problem with large state space is not just the memory needed for large tables, but the time and data needed to fill them accurately. If the state and aciton spaces are too large, due to the limitions of time and data, value functions need to be approximated with limited computational resources. In this case, out goal instead is to find a good enough approximate solution compared to optimal solution.

MountainCar is a task of driving an underpowered car up a steep mountain road. The difficulty is that gravity is stronger than the car’s engine, and even at full throttle the car cannot accelerate up the steep slope. The only solution is to first move away from the goal and up the opposite slope on the left. Then, by applying full throttle the car can build up enough inertia to carry it up the steep slope even though it is slowing down the whole way.

## State

Its state space has only two entries. The first is the position of the car. The second is the speed of the car. They are limited in:

$$position \in [-1.2, 0.6]$$ $$speed \in [-0.07, 0.07]$$

The negative speed means that the direction of the movement of the car is opposite to the right, which is left in this environement.

This post will record all the mysql command that i used in my projects.

This tutorial will provide a brief overview of the core concepts and functionality of Tensorflow. This tutorial will cover the following:

1. What is Tensorflow
2. How to input data
3. How to perform computations
4. How to create variables
5. How to train a neural network for a simple regression problem
6. Tips and tricks

Gym is a toolkit for developing and comparing reinforcement learning algorithms. The Gym library is a collection of test problems - environments - that you can use to work out your reinforcement learning algorithms. These environments have a shared interface, allowing you to write general algorithms.

## CartPole-v1

A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent if from falling over. Areward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center.

Openai Gym CartPole-v1

Unlike Dynamic Programming Methods, Monte Carlo Methods do not assume complete knowledge of the environment. MC only requires experience–sample sequences of states, actions, and rewards from actual or simulated interaction with an environment.

## Monte Carlo Prediction

The idea underlies all Monte Carlo methods is that as more returns are observed the average should converge to the expected value. So we begin by considerng Monte Carlo methods for learning the state-value function for a given policy. A way to estimate the value of a state from experience is simply to average the returns observed after visits to that state.

## Background

The homework of Linear System Theory is challenging. Here is Problem 1:

Consider a linear system with a state transition matrix $\phi(t,t_0)$

$$\phi(t,t_0)=\displaystyle \left[\begin{matrix} e^{t} \cos{\left(2 t \right)} & e^{- 2 t} \sin{\left(2 t \right)}\\ -e^{t} \sin{\left(2 t \right)} & e^{- 2 t} \cos{\left(2 t \right)} \end{matrix}\right]$$

Compute A(t).

Since the given system is linear time variant system, by using some properties of $\phi$, we can easily compute $A(t)$. However, the expression is so complicate that i could not simplify it by hand. I call for some tools for help. The tools are Python and Matlab.

This post compare the difference of simplify function in Python and Matlab.

## Problem 4.1

Determine whether the following continuous-time linear time-invariant system is fully controllable

# STEP1: Choose the paper

• Use tools such as Google Scholar.