MountainCar is a task of driving an underpowered car up a steep mountain road. The difficulty is that gravity is stronger than the car’s engine, and even at full throttle the car cannot accelerate up the steep slope. The only solution is to first move away from the goal and up the opposite slope on the left. Then, by applying full throttle the car can build up enough inertia to carry it up the steep slope even though it is slowing down the whole way.
Its state space has only two entries. The first is the position of the car. The second is the speed of the car. They are limited in:
$$position \in [-1.2, 0.6]$$ $$speed \in [-0.07, 0.07]$$
0: full throttle backword
1: zero throttle
2: full throttle forward
The reward of this environment is always -1 on all time steps until it moves past its goal position at the top of the mountain, which ends the spisode.
This section I would not use a reinforcement learning algorithm, instead following a very simple principle.
When the car is at right of the zero positon and the speed comes close to zero, I choose full throttle backword action to move away from the goal direction to build more inertia.
When the car is at left of the zero positon and the speed comes close to zero, which means the car cannot get more inertia in this period, so i choose full throttle forward action to move towards the goal.
In this period, Gravitational potential energy transforms to kinetic energy first and then kinetic energy transforms to gravitational potential energy.