0%

Play with MountainCar

MountainCar is a task of driving an underpowered car up a steep mountain road. The difficulty is that gravity is stronger than the car’s engine, and even at full throttle the car cannot accelerate up the steep slope. The only solution is to first move away from the goal and up the opposite slope on the left. Then, by applying full throttle the car can build up enough inertia to carry it up the steep slope even though it is slowing down the whole way.

State

Its state space has only two entries. The first is the position of the car. The second is the speed of the car. They are limited in:

$$position \in [-1.2, 0.6]$$ $$speed \in [-0.07, 0.07]$$

The negative speed means that the direction of the movement of the car is opposite to the right, which is left in this environement.

Action

0: full throttle backword

1: zero throttle

2: full throttle forward

Reward

The reward of this environment is always -1 on all time steps until it moves past its goal position at the top of the mountain, which ends the spisode.

Physical Model

Play with MountainCar by yourself

This section I would not use a reinforcement learning algorithm, instead following a very simple principle.

When the car is at right of the zero positon and the speed comes close to zero, I choose full throttle backword action to move away from the goal direction to build more inertia.

When the car is at left of the zero positon and the speed comes close to zero, which means the car cannot get more inertia in this period, so i choose full throttle forward action to move towards the goal.

In this period, Gravitational potential energy transforms to kinetic energy first and then kinetic energy transforms to gravitational potential energy.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import gym
import gym.spaces
gym.logger.set_level(40)
gym.__version__
import time
import itertools

env = gym.make('MountainCar-v0')
env = gym.wrappers.Monitor(env, "recording")
print(env.observation_space.high)
state = env.reset()
env.render()
state, reward, done, _ = env.step(2)
print(state)

while(True):
print(state)
if state[0] > 0.599:
break
if state[1]>-0.005 and state[1]<0.005 and state[0]>-0.5:
while(True):
state, reward, done, _ = env.step(0)
print("Full throttle backword")
print(state)
time.sleep(0.1)
env.render()
if state[1]<0.005 and state[1]>-0.005 and state[0]<-0.5:
break
if state[0] > 0.599:
break

if state[1]<0.005 and state[1]>-0.005 and state[0]<-0.5:
while(True):
state, reward, done, _ = env.step(2)
print("Ful throttle forward")
print(state)
time.sleep(0.1)
env.render()
if state[1]>-0.005 and state[1]<0.005 and state[0]>-0.5:
break
if state[0] > 0.599:
break

env.close()

MountainCar

If you like my blog, please donate for me.